Tuesday, December 2, 2025 12/2/2025

Leveraging Large Language Models to Automate and Improve Accuracy of Medical Registry Curation

Award Number: R21AR084242
ORGANIZATION: NATIONAL INSTITUTE OF ARTHRITIS & MUSCULOSKELETAL & SKIN DISEASES
OPDIV: NIH
AWARD CLASS: DISCRETIONARY
AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)
PERIOD OF PERFORMANCE START DATE: 09/15/2025
PERIOD OF PERFORMANCE END DATE: 08/31/2027

Group Awards By:

View Award Description

Leveraging Large Language Models to Automate and Improve Accuracy of Medical Registry Curation - ABSTRACT Registries are by far the most important source of data for orthopedics, particularly joint replacement. The importance of registries surpasses even randomized clinical trials (RCT) due to the long duration of postoperative surveillance, the substantial costs associated with the procedures and implants under investigation, and the heterogeneity that exists across institutions, regions, and countries. Unfortunately, building a large-scale and comprehensive registry is difficult. On one hand, comprehensive datasets usually result from expensive small cohort projects such as the Osteoarthritis Initiative (OAI). Universal implementation of this model is not feasible as smaller institutions might not have the resources to employ the required personnel or build the extensive initial infrastructure. On the other hand, nationwide registries such as Medicare databases are comparatively sparse and unhelpful. In orthopedics, large-scale registries like the American Joint Replacement Registry (AJRR) face challenges of data contribution. Participation is currently voluntary and the fractionated nature of the US healthcare system limits the data quality of contributions. Thus, the balancing act in registry construction is between comprehensive depth and participation/completeness. If data points are too onerous to abstract, participation will be low; if completeness is prioritized, interesting data points are difficult to include. To solve both these problems and take the next step in national-scale registry construction, we will develop automatic methods of data abstraction. The potential time and cost benefits of an entirely automated abstraction pipeline are immense, in addition to allowing for registries to easily scale to accommodate the records produced by the millions of arthroplasties performed nationwide. Our central hypothesis is that large language models, with proper fine-tuning, grounding, and prompting, can acquire trustworthy orthopedic-specific performance enabling them to interpret clinical notes for data extraction and complex synthesis tasks. Successful completion of this aim will yield fine-tuned LLMs capable of 1) efficiently and accurately extracting critical data for automated orthopedic registry curation, and 2) interpreting clinical notes for patient-specific phenotyping. These advancements are anticipated to reduce barriers to clinical registry construction and increase the comprehensive depth of registry data, encouraging cross-institutional collaborations on significant health issues. Additionally, increased registry participation will facilitate the integration of pragmatic and nested RCTs within registries, enabling prospective data collection and generation of high-level evidence to refine surgical techniques and implant design, ultimately improving patient outcomes.


Issue Date FY	Funding FY	Legal Entity Name	Legal Entity Address	Legal Entity City	Legal Entity State	Legal Entity Zip Code	Legal Entity COUNTY	Legal Entity COUNTRY	Assistance Listing	Award Code	Budget Year	Action Date	Action Type	Action Amount

Issue Date FY: 2025 ( Subtotal = $377,273 )
2025	2025	MAYO CLINIC	200 1ST ST SW	ROCHESTER	MN	55905	OLMSTED	USA	Arthritis, Musculoskeletal and Skin Diseases Research	000	1	9/11/2025	NEW	$377,273
														Subtotal = $377,273

Grand Total All Awards = $377,273

Top

All Categories

About

Search

Reports

Data Submission

Award Information

Leveraging Large Language Models to Automate and Improve Accuracy of Medical Registry Curation

Award Number: R21AR084242

ORGANIZATION: NATIONAL INSTITUTE OF ARTHRITIS & MUSCULOSKELETAL & SKIN DISEASES

OPDIV: NIH

AWARD CLASS: DISCRETIONARY

AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)

PERIOD OF PERFORMANCE START DATE: 09/15/2025

PERIOD OF PERFORMANCE END DATE: 08/31/2027

Federal Websites

Department of Health & Human Services

HHS Operating Divisions

HHS Staff Divisions

Download A Document Viewer