Tuesday, October 21, 2025 10/21/2025

Developing large language models for drug safety and effectiveness causal analysis

Award Number: R01LM014667
ORGANIZATION: NATIONAL LIBRARY OF MEDICINE
OPDIV: NIH
AWARD CLASS: DISCRETIONARY
AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)
PERIOD OF PERFORMANCE START DATE: 08/01/2025
PERIOD OF PERFORMANCE END DATE: 07/31/2029

Group Awards By:

View Award Description

Developing large language models for drug safety and effectiveness causal analysis - Project Summary Because randomized controlled trials often severely underrepresent frail and complex patients, it is pivotal to inform physicians’ treatment choices with drug safety and effectiveness studies based on real-world data. Electronic health records (EHR) contain rich clinical information and are among the most commonly used real- world data for causal effect estimation for pharmacotherapies. However, much of the essential data is embedded in the free-text clinical notes and reports (unstructured EHR). However, the traditional natural language processing (NLP) approaches require a labor-intensive process of knowledge acquisition and training dataset creation for each phenotype. This makes it not scalable for the large numbers of outcome phenotypes, risk stratification factors, and potential confounders (often>200) that need to be created for a typical pharmacoepidemiologic study. In contrast, developing Large Language Models (LLMs) is a more scalable approach because LLMs can be used to predict phenotypes not defined during the training stage. Yet, existing LLMs were not tailored for determining essential phenotypes for causal effect estimation of pharmacotherapies. Our objective is to build an LLM-based causal analytical platform for drug safety and effectiveness using two large multi-center EHR systems linked with Centers for Medicare & Medicaid Services (CMS) utilization, clinical assessment, and pharmacy dispensing data covering>1.3 million lives from 2000-2024. Our central working hypothesis is that our novel LLMs have robust performance in determining a wide variety of clinical phenotypes, including those not originally targeted during the training stage, and they can be used to reduce missing data for pharmacoepidemiology causal analysis. In Aim 1, we will train novel LLMs for phenotypes commonly used in drug safety and effectiveness causal analysis building on existing general-purpose LLMs. The reference standard of the target phenotypes will be provided by large-scale annotation based on structured data in the linked external clinical data. The targeted phenotypes include cognitive function, mental and functional status, pain levels, mood symptoms, adherence to chronic medications, and healthcare utilization outside of study EHR. In Aim 2, we will assess the generalizability of the novel LLMs to predict eight new categories of phenotypes (not already targeted in Aim 1) in an independent dataset. We will further optimize the LLMs based on the performance in the validation dataset. In Aim 3, we will determine the impact of LLM-derived features on causal effect estimation in three categories of highly relevant empirical drug safety and effectiveness studies in terms of bias and variance reduction. This LLM-based causal analytical platform can be used to generate a wide range of high-validity clinical features that enable causal effect estimation with adequate patient outcome phenotyping, confounding adjustment, and treatment effect heterogeneity evaluation, which is required for high-quality evidence for individualized prescribing.


Issue Date FY	Funding FY	Legal Entity Name	Legal Entity Address	Legal Entity City	Legal Entity State	Legal Entity Zip Code	Legal Entity COUNTY	Legal Entity COUNTRY	Assistance Listing	Award Code	Budget Year	Action Date	Action Type	Action Amount

Issue Date FY: 2025 ( Subtotal = $402,750 )
2025	2025	BRIGHAM & WOMENS HOSPITAL INC	75 FRANCIS ST	BOSTON	MA	02115	SUFFOLK	USA	Medical Library Assistance	000	1	7/31/2025	NEW	$402,750
														Subtotal = $402,750

Grand Total All Awards = $402,750

Top

All Categories

About

Search

Reports

Data Submission

Award Information

Developing large language models for drug safety and effectiveness causal analysis

Award Number: R01LM014667

ORGANIZATION: NATIONAL LIBRARY OF MEDICINE

OPDIV: NIH

AWARD CLASS: DISCRETIONARY

AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)

PERIOD OF PERFORMANCE START DATE: 08/01/2025

PERIOD OF PERFORMANCE END DATE: 07/31/2029

Federal Websites

Department of Health & Human Services

HHS Operating Divisions

HHS Staff Divisions

Download A Document Viewer