Unravelling genetic basis of comorbidity using EHR-linked biobank data - Rapid progress in translational bioinformatics and clinical informatics for precision medicine has
provided many computing and informatics methodologies to provide better prediction, diagnosis and
treatment strategy as a clinical utility. In particular, high dimensional and large-scale
biomedical data sets, ranging from clinical data to ‘omics data, provide an unprecedented
opportunity for translating the newly found knowledge from biomedical big data analytics to
support clinical decisions. The complexity and scale of these big data sets hold great
promise, yet present substantial challenges. As one of important concerns for clinicians,
comorbidity is a well- documented phenomenon in medicine in which one or more medical conditions
exist and potentially interact with one another, thereby influencing the primary clinical
condition. Several studies show variability in the number of comorbid conditions that can
exist at one time, and patterns of disease presentation differ from one chronic condition to
another. Thus, there is a clear need to improve care for individuals with multiple
comorbidities, but doing so requires a much more detailed understanding of the trends of disease
associations than we currently possess. Previous studies have primarily focused on a
handful of specific comorbidities; investigating the underlying causes of broad disease
comorbidity across the human diseasome has been challenging. Fortunately, in the past
decade, comprehensive collections of disease diagnosis data have become available, primarily
in the form of data from electronic health records (EHRs). Retrospectively, we can use a patient’s
health history to identify comorbidities and apply a data-driven approach to studying disease
comorbidity patterns that considers all possible disease comorbidities. In particular, developing
computing and modeling of large-scale data that integrates newly defined comorbidity patterns with
genomics will hold great potential for uncovering molecular mechanisms of disease. Primarily, we
will elucidate the underlying genetic and non-genetic factors that influence disease comorbidity.
We will apply two orthogonal approaches to identify comorbidities: 1) deriving from disease
co-occurrence using EHR data alone, and 2) deriving from pleiotropic genetic associations using the
EHR-linked biobank dataset. Network-based approaches have the potential to uncover unexpected
relationships between diseases. One of the most significant advantages of our proposal is the
linking of a single-source EHR to genomic data; this provides the opportunity to
revisit individual-level genotype and phenotype data for the design of more targeted
studies and to ask more specific questions. Additionally, our results can be used to develop
a novel comorbidity risk score that combines both clinical data and genetic effects, which might
constitute a new tool for clinical prevention and monitoring. These goals are very much in keeping
with today’s climate of precision medicine, where treatment and prevention are ideally designed to
consider an individual patient’s variability in genetics, lifestyle, and environmental exposures.