SUMMARY
The recent availability of large-scale whole genome datasets has revealed the startling scale of rare genetic
variation present in human populations. There is increasing evidence that rare genetic variants can have
profound effects on multiple complex disease phenotypes; however, the systematic characterization of these
variants is limited by current cohort sizes and approaches to interpretation. One powerful emerging approach
for rare variant interpretation is in the integration of functional data to enable in vivo assay of rare variant-driven
molecular dysregulation, enabled by large-scale data integration of genomic, functional, and phenotypic
resources. In this proposal, we outline computational and statistical approaches to systematically annotate and
isolate – on a genome-wide scale – rare variants linked with extreme effects on multiple molecular phenotypes
(rare molecular outlier variants) and, through integrating biobank-scale phenotypic data, their downstream
effects on diverse complex disease risk. We recently applied this approach in GTEx and TOPMed to show that
utilizing outlier gene expression provides a powerful framework for identifying large phenotypic-effect rare
variants in genes with known impact on complex diseases.
Specifically, in this proposal we will combine large-scale genomic and diverse multi-omics data to
develop and extend novel computational and statistical methods to provide the first systematic characterization
of personalized complex disease risk contributed by rare genetic variants. Our methods are readily applicable
to research in any complex disease area, including anthropometric, neurological and cancer research. Our
efforts will increase our understanding of how rare variants interact with polygenic disease risk predictions
derived from polygenic risk scores, currently limited to relatively small-effect common variant GWAS hits, and
show how rare molecular outlier variants provide a framework for systematically characterizing both cis- and
trans-regulatory disease networks impacting core disease genes as theorized in the omnigenic model.
Furthermore, we outline preliminary results suggesting that rare molecular outlier variants substantially
increase power for uncovering large-effect rare variants over genome annotation methods (namely, protein
truncating variants – limited to coding regions only), and outline an approach for quantifying these effects in
disease prediction and drug targeting applications.
Overall, these activities will increase our understanding of complex disease genetics. The use of
genetic-only methods such as GWAS would require cohort sizes within the millions, illustrating the importance
of functional genomic data to our approach. We have a strong track record of releasing software and pipelines
to implement prior methods, and will make any new work rapidly available on public repositories. Our efforts
will provide important contributions to understanding the rapidly growing discovery of rare variants from whole
genome data and the urgent need for methods to interpret these variants.