Biomedical Data Translator Technical Feasibility Assessment and Architecture Design - Our leadership spans the translational spectrum - clinical (Chute, Robinson, Koeller, Hamosh),
biological (Haendel, Hoatlin, Doheny), and computational (Mungall, Su, Liu, McWeeney, Overby),
with expertise in a wide variety of data sources, types, and models as well as data integration
strategies, standards, and algorithms. We are invested in open science, reproducibility, and
lead efforts in developing open software, data standards, and crowdsourcing curation platforms.
Our vision is to demonstrate connectivity between rare disease and common diseases via
genes, pathways, and pathophysiology. We will include disease-phenotype associations,
enriched with temporal information and decomposed into biological units. Innovative integration
of mechanism and function will allow creation of candidate mechanistic graphs for each rare
disease. We will use graph matching and probabilistic techniques to support basic research
hypothesis testing as well as clinical inquiry (diagnosis, prognosis, and treatment selection).
Finally, our team is deeply committed to enabling the collective use of all public biomedical data
by making it interoperable and openly accessible for all users, in all contexts.
Semantics matter to integration. The figure highlights the landscape of existing data resources,
each contain a portion of data with specific, relevant meaning (A). Aggregation alone
often results in loss of meaning (B). Semantic and probabilistic integration approaches provision
for more advanced query answering capabilities (C). We have first hand experience overcoming
challenges found within large-scale integration projects in general, but more important, we are
very familiar with data sources and types this proposal aims to integrate. For example, knockdown
of TP53 in zebrafish is used to reduce apoptosis; naive use of the data might attribute
phenotypic effects to targeted genes. Other issues are in knowing when and how to integrate
data where the associations between entities are not equivalent, such as when one source annotates
a disease to a gene and another to a variant.
Our existing infrastructure has successfully integrated and leveraged multimodal data for
rare disease diagnosis. Here we extend these systems with new data types and new methodologies
that generalize across diseases and contexts. The TransMed Knowledge Graph will have
an intelligent, adaptive scaffolding for managing and linking the phenomenological worldview of
clinical elements with the mechanistic emphases of basic science. Connections between biological
entities and events will be represented either directly, or through chaining, enabling the use
of powerful algorithms for query and inference. Elements in the graph will be stratified by either
classical, rigid taxonomies (disease nosologies, tissue and cell type) or through dynamic groups
based on shared mechanisms of molecular pathophysiology. External data can be compared
using different criteria. For instance, two patients (one rare disease, one common disease) may
be distant in classical nosology, but neighbors in pathway space - suggesting a treatment. The
graph will be seeded from open data sources containing diverse data types, supplemented by
knowledge from the literature and clinical data. TransMed will also be readily connected to other
data stores using a quality identifier strategy, methods that predict probability of equivalency
from associated metadata, and algorithms that match graphs based on similar members.
Summary. Familiarity with the data, combined with our technical experience, and connection to
real-world use cases positions us well to be both relevant and successful in our vision.