PROJECT SUMMARY
Given the rapid evolution of genomic knowledge, the need for genomic reinterpretation has been increasing.
However, there is no standard approach yet to identifying to whom, when, and how reinterpretation should be
provided to ensure accuracy, cost-effectiveness and fairness. Access to genomic tests and genetic specialists
has widened health disparities, which could be further exacerbated by limited ancestry-specific genetic data. Our
overarching goal is to design a scalable and sustainable informatics framework to support continuous genomic
reanalysis for symptomatic patients with non-diagnostic exome or genome sequencing in diverse populations.
Extending our prior published work on Doc2HPO, Criteria2Query, Phen2Gene, PhenCards, Phenominal, and
phenotype-disease knowledge graphs, we will first develop a natural language processing (NLP) pipeline to
create a multimodal phenome from clinical notes using the latest Phenopacket schema. By comparing changes
in longitudinal EHR phenotypes over time and analyzing the changes in the context of the new evidence for
variants, we will identify individuals who can benefit most from genomic reanalysis. Then we will incorporate
evolving clinical phenotypes extracted from longitudinal electronic health record (EHR) data to trigger automatic
variant reinterpretation using an ancestry-aware and age-sensitive knowledge graph (PhenoKG). Unlike typical
phenotype-based gene prioritization tools such as Phen2Gene, here we will build the knowledge graph by
extending our previous efforts and extracting phenotype-genotype relations from the EHR as well as the
literature. This knowledge graph will enable the query, extraction and inference of ancestry-aware, as well as
age-sensitive, phenotype-genotype relationships. By leveraging a multi-layer random-walk integrative network
approach, we will incorporate this heterogeneous knowledge graph into a phenotype-driven gene and variant
prioritization algorithm for continuous genomic reanalysis across diverse populations. With these methodological
developments, we will implement a routine reanalysis informatics pipeline at two academic institutions, Columbia
University Irving Medical Center (CUIMC) and Children’s Hospital of Philadelphia (CHOP). We will evaluate the
improvements in diagnostic yield across a diverse set of clinical exome/genome sequencing data over a 3-year
period. We will evaluate how our approach to fair phenotyping and continuous variant reinterpretation can reduce
genomic health disparities for underserved and underrepresented populations. Ultimately, these methods will
enable informatics-driven, efficient, scalable, continuous and fair genomic diagnostics for genomic medicine via
continuous genomic variant reinterpretation.