Sunday, December 14, 2025 12/14/2025

Robust methods for missing data in electronic health records-based studies

Award Number: R01DK128150
ORGANIZATION: NATIONAL INSTITUTE OF DIABETES & DIGESTIVE & KIDNEY DISEASES
OPDIV: NIH
AWARD CLASS: DISCRETIONARY
AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)
PERIOD OF PERFORMANCE START DATE: 04/12/2021
PERIOD OF PERFORMANCE END DATE: 03/31/2025

Group Awards By:

View Award Description

Robust methods for missing data in electronic health records-based studies - PROJECT SUMMARY Electronic health record (EHR) data represent a huge opportunity for cost-efﬁcient clinical and public health research, especially when a randomized trial or a prospective observational study is not feasible or ethical. EHR systems, however, are typically developed to support clinical and/or billing activities. As such, substantial care is needed when using EHR data to address a particular scientiﬁc question. In this, an important potential threat to validity is missing data. Moreover, since EHR data are not collected for any particular research question, it will often be the case that measurements that are critical to answering the question will be unavailable in the record of some patients. This, in turn, requires researchers to contend with the potential for selection bias and compromised generalizability. Towards addressing issues of missing data in an EHR, researchers could, in principle, appeal to a vast statistical literature and use standard methods such as multiple imputation (MI), inverse-probability weighting (IPW) or doubly- robust (DR) estimation. These methods, however, have generally been developed outside of the EHR context. As such, they typically fail to acknowledge the complexity of the EHR data, in particular the many decisions made by patients and health care providers that give rise to `complete data' in the EHR, known to as the data provenance. Because of the disconnect between this complexity and the settings for which most missing data methods are developed, the application of standard missing data methods to EHR-based studies will often fail to resolve selection bias and generalizability will remain compromised. Unfortunately, in contrast to confounding bias, very little attention has been paid to developing methods for missing data that are speciﬁcally tailored to the complexity of EHR-based studies. We will begin to address this gap by developing, implementing and evaluating a suite of novel, innovative statistical tools including: Aim 1: A uniﬁed framework for robust causal inference in unmatched and matched EHR-based cohort studies with missing confounder data; Aim 2: A formal, robust framework for causal inference in emulated target trials based on EHR data; Aim 3: A novel blended analysis framework for missing data in EHR-based studies that combines MI and IPW in an innovative and unique way; Aim 4: A novel double-sampling strategy for when the EHR data are suspected to be missing-not-at-random. The proposed aims are motivated by challenges the investigative team has faced in a series of EHR-based studies of long-term outcomes among patients who have undergone bariatric surgery. Throughout this research, we will use data from one of these studies, the DURABLE study, which has rich demographic and longitudinal clinical information from three Kaiser Permanente health systems on ≈45,000 patients who underwent bariatric surgery between 1997-2015, as well as on ≈1,636,000 non-surgical enrollees during that time period.


Issue Date FY	Funding FY	Legal Entity Name	Legal Entity Address	Legal Entity City	Legal Entity State	Legal Entity Zip Code	Legal Entity COUNTY	Legal Entity COUNTRY	Assistance Listing	Award Code	Budget Year	Action Date	Action Type	Action Amount

Issue Date FY: 2024 ( Subtotal = $496,299 )
2024	2024	HARVARD COLLEGE PRESIDENT & FELLOWS OF	677 HUNTINGTON AVE	BOSTON	MA	02115	SUFFOLK	USA	Diabetes, Digestive, and Kidney Diseases Extramural Research	000	4	5/3/2024	NON-COMPETING CONTINUATION	$496,299
														Subtotal = $496,299

Issue Date FY: 2023 ( Subtotal = $517,976 )
2023	2023	HARVARD COLLEGE PRESIDENT & FELLOWS OF	677 HUNTINGTON AVE	BOSTON	MA	02115	SUFFOLK	USA	Diabetes, Digestive, and Kidney Diseases Extramural Research	000	3	3/20/2023	NON-COMPETING CONTINUATION	$517,976
														Subtotal = $517,976

Issue Date FY: 2022 ( Subtotal = $508,825 )
2022	2022	PRESIDENT AND FELLOWS OF HARVARD COLLEGE	677 HUNTINGTON AVE	BOSTON	MA	02115	SUFFOLK	USA	Diabetes, Digestive, and Kidney Diseases Extramural Research	000	2	3/9/2022	NON-COMPETING CONTINUATION	$457,940
2022	2022	PRESIDENT AND FELLOWS OF HARVARD COLLEGE	677 HUNTINGTON AVE	BOSTON	MA	02115	SUFFOLK	USA	Diabetes, Digestive, and Kidney Diseases Extramural Research	001	2	3/28/2022	NON-COMPETING CONTINUATION	$50,885
														Subtotal = $508,825

Issue Date FY: 2021 ( Subtotal = $566,781 )
2021	2021	PRESIDENT AND FELLOWS OF HARVARD COLLEGE	677 HUNTINGTON AVE	BOSTON	MA	02115	SUFFOLK	USA	Diabetes, Digestive, and Kidney Diseases Extramural Research	000	1	4/9/2021	NEW	$566,781
														Subtotal = $566,781

Grand Total All Awards = $2,089,881

Top

All Categories

About

Search

Reports

Data Submission

Award Information

Robust methods for missing data in electronic health records-based studies

Award Number: R01DK128150

ORGANIZATION: NATIONAL INSTITUTE OF DIABETES & DIGESTIVE & KIDNEY DISEASES

OPDIV: NIH

AWARD CLASS: DISCRETIONARY

AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)

PERIOD OF PERFORMANCE START DATE: 04/12/2021

PERIOD OF PERFORMANCE END DATE: 03/31/2025

Federal Websites

Department of Health & Human Services

HHS Operating Divisions

HHS Staff Divisions

Download A Document Viewer