Characterizing the evolutionary architecture of complex disease within and across diverse populations - PROJECT SUMMARY
The past decade of genome-wide association studies (GWASs) has seen thousands of complex traits and
diseases studied and identified thousands of reproducibly associated genetic variants. GWAS has helped
characterize the complexity of common genetic architectures and shed light on the role of genetics in disease
risk. A large body of works have demonstrated that risks of complex traits are highly enriched in functional regions
of the genome, which indicates that risk is mediated through perturbed regulatory action on relevant susceptibility
genes. Similarly, multiple recent works have found that disease risks are shaped by forces of natural selection,
which kept the frequencies of deleterious alleles low in the population. Together, the functional mechanisms and
their interplay with natural selection can be coupled under a general mechanism we refer to as the evolutionary
architecture. Current frameworks to infer the evolutionary architecture for common complex diseases are only
applicable to relatively homogenous populations, such as individuals of European ancestry. Several recent works
have demonstrated that integrating multi-ethnic GWAS data substantially improves statistical power to identify
causal factors underlying complex traits and diseases due to the increased heterogeneity in allele frequencies.
Current approaches evolutionary architecture are unable to appropriately model the heterogeneity across
populations with respect to allele frequencies and linkage disequilibrium. Similarly, the resolution of these
methods is currently limited to complex diseases and phenotypes, whose inferred architectures, while
informative, fail to describe regulatory network mechanisms that mediate risk. Methods capable of analyzing
many molecular phenotypes simultaneously have the potential to identify shared architectures, and pinpoint core
genes relevant for disease risk. Lastly, several works have shown that integrating functional information with
GWAS substantially improves polygenic risk prediction. Together, these issues and opportunities highlight the
need for new computational approaches that can scale to multiple populations and large-scale molecular
phenotype catalogues while accounting for underlying heterogeneity and shared signals. Here, we propose novel
approaches to integrate GWAS data from multiple, geographically diverse, populations and phenotypes to
characterize the population-specific and shared evolutionary architectures. Importantly, our approaches run
directly on summary data, which enables immediate large-scale analysis. We propose to apply our novel
approaches to large-scale multi-ethnic GWAS data. Together, our work will systematically characterize
evolutionary architectures for complex diseases and molecular phenotypes and populations in a robust, open,
and reproducible approach.