PROJECT SUMMARY
The overarching goal of this proposal is to develop a paradigm for comprehensive whole genome sequencing
(WGS) studies of complex genetic diseases and traits by leveraging new approaches for variant analysis and
trait association, and to use these methods for improving disease and trait mapping studies, using coronary
artery disease as a prototypical example. We will first build a new WGS variant analysis pipeline based on
methods that interrogate a population of fully assembled reference genomes from diverse populations (a
“pangenome”), rather than a single reference genome (e.g. GRCh38). By employing variation-aware
approaches to read alignment and genotyping, these next-generation methods promise to greatly improve
variant detection performance and to alleviate the ancestry bias that plagues the current generation of
methods. Our pipeline will be designed to detect all forms of variation, including single nucleotide variants,
small indels, structural variation, and tandem repeat length variation, and will be constructed based on
systematic benchmarking of candidate methods from our labs and the broader community. We will then use
these methods in combination with ancestry-aware trait association approaches to study the complex genetic
basis of coronary artery disease and cardiometabolic risk factors in a set of ~55,000 deeply sequenced human
genomes from diverse ancestry groups, generated by the Centers for Common Disease Genomics (CCDG).
We will assess the contribution of all variant types to coronary artery disease and complex coronary disease
risk factors and quantify the improvements in gene mapping studies of common disease that are possible
using new pangenomic analysis methods. We will leverage continental admixture, local ancestry patterns, and
a novel tree-based haplotype association method to fine map novel and known coronary artery disease and
risk factor loci, and to identify loci where local ancestry modulates risk of coronary disease and risk factors.
Finally, we will extend this work to a much larger set of individuals and traits by applying our methods to WGS
data from public biobanks allowing us to more broadly assess the role of complex genome variation in human
diseases and complex traits. Taken together, this work will yield valuable new methods and data resources for
the community, will help pave the way to improve the next generation of human disease gene mapping studies
by using pangenomic approaches, and has the potential to yield new insight into the multi-ethnic genetic basis
of coronary artery disease.