Statistical methods for modeling multi-omic data - Project Summary
Novel analytic paradigms allowing for a fully integrated interrogation of regulatory elements, protein-coding
genes, demographic characteristics and environmental factors on evoked and dynamic traits are essential for
providing new insight into the mechanistic underpinnings of genetic associations. In this proposal we aim
to develop, evaluate and apply sound statistical methods for leveraging and integrating the vast amount of
publicly available multi-omic data resources to improve understanding of the mechanistic relationships among
genes and regulatory elements associated with complex traits. As activation of innate immunity is a fundamen-
tal pathophysiological process in cardiometabolic disease, e.g., atherosclerosis and type 2 diabetes, as well as
complex inflammatory disorders, e.g., response to sepsis and trauma, our understanding of the genetic under-
pinnings of these evoked inflammatory biomarkers, provides clinically relevant impact toward development of
novel prognostic markers and therapeutic targets in complex diseases. Advancing knowledge of the molecular
and physiological underpinnings of complex diseases will deepen insight into disease etiology, while providing
opportunity to develop targeted interventions and lessen disease morbidity and mortality.
The Specific Aims are to: (1) Develop a novel statistical framework for inferential transcriptome association
analysis using reference data for rigorous interrogation of regulatory and gene-level underpinnings of dynamic
response(s) to stimulus. We will develop novel and precise estimation and hypothesis testing strategies to inves-
tigate and characterize the mechanistic foundations of genomic class-level associations with biological response
to inflammatory stress. (2) Extend the methodology of Aim 1 to incorporate repeatedly measured transcriptome
data, multiple expression patterns, and high linkage disequilibrium within genomic classes.. We will advance
the solid conceptual framework of Aim 1 to develop strategies for evaluating time varying transcriptome pro-
files as well as data on multiple cell and tissue types and several genes and regulatory elements. (3) Apply and
evaluate the methods of Aims 1 and 2 through leveraging multiple sources of layered -omics data, including
cell and tissue specific expression. In addition to fully vetting the proposed methods and comparing to existing
alternative strategies using extensive simulation studies, we will further unravel and elucidate the mechanisms
of gene and regulatory element control of induced response using multiple publicly-available reference tran-
scriptome data resources and repeatedly measured biomarker data arising from the GENE study.
In addition to supporting rigorous and novel statistical research at the forefront of precision medicine, this
NIH Academic Research Enhancement Award (AREA) Program (R15) application aims to meet the specific
NIH AREA objectives by offering a unique opportunity to expose and engage underrepresented undergradu-
ate students in STEM to biomedical big data science, while strengthening the research environment at Mount
Holyoke College (MHC), the world's longest standing institution of higher education for women. This applica-
tion launches from an extensive, decade-long and highly productive trans-disciplinary collaboration. Building
on a strong research and mentoring record, the proposed research offers novel statistical research addressing
pressing challenges in precision medicine, while offering an important and unique opportunity to engage young
women in cutting-edge biomedical big data research.