Wednesday, December 24, 2025 12/24/2025

Development and implementation of statistical machine learning methods to shorten rare disease odysseys

Award Number: K99LM014429
ORGANIZATION: NATIONAL LIBRARY OF MEDICINE
OPDIV: NIH
AWARD CLASS: DISCRETIONARY
AWARD ACTIVITY TYPE: TRAINING/TRAINEESHIP
PERIOD OF PERFORMANCE START DATE: 01/01/2024
PERIOD OF PERFORMANCE END DATE: 07/31/2025

Group Awards By:

View Award Description

Development and implementation of statistical machine learning methods to shorten rare disease odysseys - Patients with rare diseases (RDs) face tremendous physical, psychosocial, and economic suffering in their protracted journeys toward diagnosis and therapy. These journeys, known as diagnostic and therapeutic odysseys, are riddled with diagnostic delays and difficulties finding effective treatment strategies. Undiagnosed Diseases Network (UDN) at the NIH was established to diagnose individuals who a The re living with the often dire consequences of an RD. Despite the UDN’s comprehensive diagnostic approach, 70% of patients remain undiagnosed, highlighting the need for novel diagnostic strategies. The diagnostic approach at the UDN currently relies on manual extraction of RD phenotypes from clinical notes in electronic health records (EHR), which is laborious and time-consuming. A promising alternative is to leverage natural language processing (NLP) models, which can automatically extract fine-grained RD phenotypes from clinical notes, to support timely diagnosis at the UDN. Existing general NLP models, however, are not suitable for supporting diagnosis at the UDN. Furthermore, NLP models have limited impact on diagnosis due to scarce infrastructure for delivering them to the clinic, highlighting the need to bridge the implementation gap between NLP and practice. Even after diagnosis, patients often undergo therapeutic odysseys. Despite advancements in gene therapy, evidence shows that genetics alone do not account for the wide diversity in RD phenotypes. Exposures also play a critical role, but less is known about how their causal effects vary across individuals. This knowledge gap underscores the need to elucidate the complex phenome-genome-exposome interplay on an individual-level basis, which is crucial in informing personalized disease management strategies. The overall objective of this proposal is to develop and implement advanced statistical machine learning (ML) methods aimed at shortening RD odysseys. During the K99 phase, I will develop a novel NLP system to extract RD phenotypes from clinical notes (Aim 1) and implement it using REDCap at the Vanderbilt UDN (Aim 2). During the R00 phase, I will leverage phenomic, genomic, and exposomic data from All of Us and build a causal inference framework that uses modern statistical ML techniques to estimate personalized causal effects of exposures on RD phenotypes (Aim 3). The expected outcomes are a novel, open-source NLP system for RDs, an implementation framework using REDCap to support timely diagnosis at the Vanderbilt UDN, and an advanced, reproducible causal inference framework to elucidate the complex phenome-genome- exposome interplay underlying RDs on an individual-level basis. During the K99 phase, the PI will be mentored by experts in NLP, REDCap, EHR phenotyping, and RDs at Vanderbilt, and develop competencies in those areas. This proposal will yield results for subsequent studies on data-driven approaches aimed at shortening RD odysseys. This award will provide the necessary training to supplement the PI’s expertise in statistical ML and causal inference and help her transition into an independent career in biomedical data science.


Issue Date FY	Funding FY	Legal Entity Name	Legal Entity Address	Legal Entity City	Legal Entity State	Legal Entity Zip Code	Legal Entity COUNTY	Legal Entity COUNTRY	Assistance Listing	Award Code	Budget Year	Action Date	Action Type	Action Amount

Issue Date FY: 2025 ( Subtotal = $90,612 )
2025	2025	VANDERBILT UNIVERSITY MEDICAL CENTER	1161 21ST AVE S STE D3300 MCN	NASHVILLE	TN	37232	DAVIDSON	USA	Medical Library Assistance	000	2	12/9/2024	NON-COMPETING CONTINUATION	$90,612
2025	2025	VANDERBILT UNIVERSITY MEDICAL CENTER	1161 21ST AVE S STE D3300 MCN	NASHVILLE	TN	37232	DAVIDSON	USA	Medical Library Assistance	001	2	7/16/2025	NON-COMPETING CONTINUATION	$0
														Subtotal = $90,612

Issue Date FY: 2024 ( Subtotal = $90,612 )
2024	2024	VANDERBILT UNIVERSITY MEDICAL CENTER	1161 21ST AVE S STE D3300 MCN	NASHVILLE	TN	37232	DAVIDSON	USA	Medical Library Assistance	000	1	12/29/2023	NEW	$90,612
														Subtotal = $90,612

Grand Total All Awards = $181,224

Top

All Categories

About

Search

Reports

Data Submission

Award Information

Development and implementation of statistical machine learning methods to shorten rare disease odysseys

Award Number: K99LM014429

ORGANIZATION: NATIONAL LIBRARY OF MEDICINE

OPDIV: NIH

AWARD CLASS: DISCRETIONARY

AWARD ACTIVITY TYPE: TRAINING/TRAINEESHIP

PERIOD OF PERFORMANCE START DATE: 01/01/2024

PERIOD OF PERFORMANCE END DATE: 07/31/2025

Federal Websites

Department of Health & Human Services

HHS Operating Divisions

HHS Staff Divisions

Download A Document Viewer