A paradigm for comprehensive genetic association studies of complex disease using pangenomic methods and local ancestry inference - PROJECT SUMMARY The overarching goal of this proposal is to develop a paradigm for comprehensive whole genome sequencing (WGS) studies of complex genetic diseases and traits by leveraging new approaches for variant analysis and trait association, and to use these methods for improving disease and trait mapping studies, using coronary artery disease as a prototypical example. We will first build a new WGS variant analysis pipeline based on methods that interrogate a population of fully assembled reference genomes from diverse populations (a “pangenome”), rather than a single reference genome (e.g. GRCh38). By employing variation-aware approaches to read alignment and genotyping, these next-generation methods promise to greatly improve variant detection performance and to alleviate the ancestry bias that plagues the current generation of methods. Our pipeline will be designed to detect all forms of variation, including single nucleotide variants, small indels, structural variation, and tandem repeat length variation, and will be constructed based on systematic benchmarking of candidate methods from our labs and the broader community. We will then use these methods in combination with ancestry-aware trait association approaches to study the complex genetic basis of coronary artery disease and cardiometabolic risk factors in a set of ~55,000 deeply sequenced human genomes from diverse ancestry groups, generated by the Centers for Common Disease Genomics (CCDG). We will assess the contribution of all variant types to coronary artery disease and complex coronary disease risk factors and quantify the improvements in gene mapping studies of common disease that are possible using new pangenomic analysis methods. We will leverage continental admixture, local ancestry patterns, and a novel tree-based haplotype association method to fine map novel and known coronary artery disease and risk factor loci, and to identify loci where local ancestry modulates risk of coronary disease and risk factors. Finally, we will extend this work to a much larger set of individuals and traits by applying our methods to WGS data from public biobanks allowing us to more broadly assess the role of complex genome variation in human diseases and complex traits. Taken together, this work will yield valuable new methods and data resources for the community, will help pave the way to improve the next generation of human disease gene mapping studies by using pangenomic approaches, and has the potential to yield new insight into the multi-ethnic genetic basis of coronary artery disease.