NeoGx-III: Interpretable Machine Learning with Integrated Analysis of Maternal & Infant Electronic Medical Records for Unbiased Prediction of Need for Genome Sequencing in Level III NICUs - ABSTRACT Prolonged diagnostic delays, or “diagnostic odysseys,” in neonatal intensive care units (NICUs) represent a significant burden for patients and families while posing challenges for clinicians, particularly when genome sequencing (GS) is delayed or omitted. Up to 20% of critically ill neonates may have a genetic disease, yet many diagnoses are made only after extended uncertainty, leading to worse outcomes, longer hospital stays, and higher healthcare costs. These issues are especially pronounced in underserved populations, such as racial and ethnic minorities, who face barriers to GS due to healthcare disparities, further compounding diagnostic delays and worsening outcomes. Our long-term goal is to eliminate health disparities in genetic testing, ensuring that no child with a genetic disease—regardless of racial, ethnic, or socioeconomic background—experiences a prolonged diagnostic odyssey. The overall objective of this application is to develop a machine learning (ML)-based approach that reduces health disparities by objectively identifying neonates from underserved populations who require genomic testing, using documented clinical data to mitigate provider- and system-driven biases that often contribute to unequal access to genetic services. Our central hypothesis is that the combined analysis of maternal and infant health records will enable efficient identification of neonates in Level III NICUs likely to benefit from early GS, facilitating faster and targeted diagnosis of genetic diseases. To test this hypothesis, our specific aim is to develop and evaluate an interpretable ML model that leverages both structured and unstructured data from neonatal and maternal electronic health records (EHRs) to systematically identify neonates most likely to benefit from early-life GS. The ML model will integrate data from clinical notes—encoded as Human Phenotype Ontology terms—and structured data elements such as ICD codes (mapped to PheCodes), laboratory results, clinical characteristics (e.g., gestational age, birth weight), neonatal critical care management (e.g., intubation, medications), and relevant maternal factors (e.g., maternal age, parity, prenatal care). Developed within a privacy-preserving environment, the model will be designed to integrate seamlessly into existing clinical workflows and EHR systems to provide clinicians with real-time decision support. By developing ML that integrates maternal and infant health data, this project introduces an innovative, data-driven approach to identifying at-risk neonates while minimizing human bias. The rationale is that early detection of genetic diseases triggered by predictive analytics will enable timely interventions, reduce health disparities, and improve outcomes in all populations, not just those with ready access to Level IV NICUs. This aligns with funding opportunity PAR-21-255 and helps the NHGRI advance its mission by addressing critical gaps in neonatal genomic medicine and reducing diagnostic disparities. Our team’s unique expertise in neonatal genomics, ML, and clinical decision support positions us to implement this transformative approach successfully, ultimately improving health outcomes and reducing healthcare costs for vulnerable neonates.