A major goal of the Biomedical Data Translator Program is to facilitate disease classification
based on molecular and cellular abnormalities. While many experimental approaches exist to
interrogate molecular or cellular processes, few can discern which among a host of potential
abnormalities are relevant to disease in the human system. Genetic variants associated with
disease are unique in providing molecular alterations causally related to human disease risk.
There are two types of genetic associations. Rare disease associations can (usually) be
clearly linked to a gene and are well represented by catalogs such as ClinVar, OMIM, and
Monarch. Complex disease associations are harder to interpret because they (a) are statistical
rather than qualitative and (b) usually lie in noncoding genomic regions that cannot be
immediately translated to molecular or cellular abnormalities. Many complementary resources to
help in the biological translation of complex disease associations have recently emerged,
broadly classifiable as either “functional genomic” datasets (e.g. from epigenomic profiling or
chromatin capture) or predictive bioinformatic methods (e.g. that integrate various genetic and
functional genomic datasets to predict disease-susceptibility genes or pathways). These
resources require expertise to curate and interpret, and there is as yet no knowledge source
that integrates them to interpret complex disease associations. Furthermore, techniques for
harmonizing heterogeneous functional genomic datasets with respect to one another are not yet
established, most predictive bioinformatic methods specify complex data-processing pipelines
that have not yet been scaled to run across many diseases, and there are few if any “gold
standards” to evaluate the molecular or cellular abnormalities identified by these resources.
The goal of our proposed project is to address these gaps within a complex
disease genetics Knowledge Provider for Translator. We are experts in complex disease
genetics and maintain the Knowledge Portal Network (KPN), a collection of open source web
portals and Smart APIs that make integrated genetic and genomic datasets publicly accessible
for >180 complex diseases. We have built the KPN by developing a protocol for working with
disease experts to aggregate and curate high-confidence genetic datasets, building
computational pipelines to harmonize these data and apply predictive bioinformatic methods
upon them, and extracting relationships mined from these data into a Neo4J graph database.
We propose to use the KPN as a foundation to implement a Translator Knowledge Provider of
high-confidence complex disease associations and predicted disease-relevant molecular and
cellular abnormalities. We will implement this Knowledge Provider by (a) expanding the data
sources, data types, and bioinformatic methods integrated within the KPN; (b) developing new
computational algorithms to improve the ability of genetic data to identify molecular and cellular
abnormalities underlying complex disease; (c) maintaining REST services provisioning
Translator with these resources; and (d) developing methodologies for evaluating the accuracy
and internal consistency of these data, further curating them, and defining use cases of them
within Translator. In so doing, we will enable Translator users to address questions such as:
• What genes are causally linked to complex disease [X], and with what confidence?
• What is the increase in risk for complex disease [X] when gene [Y] is perturbed?
• What pathways are enriched for associations with complex disease [X]?
• What tissues mediate the pathogenesis of complex disease [X]?
• What other diseases are genetically correlated with complex disease [X]?
We participated in the Translator feasibility study and contributed important insights to the
project vision including (a) a unifying architectural model of Translator (based on interviews with
each Translator team) closely followed by OTA-19-009; (b) the concept of Translator as a tool to
augment (rather than replace) human reasoning; and (c) the idea of a “Turing test” to evaluate
Translator capabilities. Our expertise in human genetics and hypothesis-driven science, but also
computer science and computational biology, ideally positions us to collaborate with NIH staff
and other awardees to help guide Translator data integration in a scientifically rigorous manner.