Project Summary
Most chronic diseases are polygenic with hundreds to thousands of causal variants, and we are starting to predict
disease susceptibility with risk scores derived from genome-wide association studies. However, 77% of training
data for these risk scores come from European ancestries populations and thus do not include genetic variants
uniquely or more predominantly found in non-European populations, which limits both discovery and precision
medicine potential. Methods that better identify causal variants and implicated biological mechanisms across
populations are essential for equitable precision medicine implementation and can only be accomplished by
studying the genetic architectures of complex traits in diverse populations. Since this project began, we have
characterized the genetic architecture of the transcriptome and proteome within and across diverse populations.
We identified a subset of transcripts and proteins that are well-predicted in one population, but poorly predicted
in another and showed these differences are due, in part, to allele frequency and linkage disequilibrium
differences. When testing prediction accuracy, we have shown that we need to consider both similarity in training
and test population ancestries and total training sample size to optimally predict gene expression or protein
abundance. In this proposal, we seek to drive mechanistic understanding of complex traits in diverse populations
by (1) improving omics-trait prediction models for maximum utility within and between diverse populations and
(2) investigating causal relationships between omics traits and complex traits and disease in diverse populations.
We will integrate multi-omics data from African, African American, East Asian, European, and Hispanic
populations in this project, including genome-wide genotype, transcriptome, proteome, and microbiome data.
Since allele frequencies and linkage disequilibrium structures differ between populations due to different
demographic histories, genetic prediction models trained in one population do not perform as well in another and
thus are currently of limited utility for risk prediction and mechanistic interpretation. We will use fine-mapping,
machine learning, and multivariate adaptive shrinkage to improve genotypic prediction of gene expression and
protein levels across populations. Predicting the transcriptome and proteome from genotype data allows
inference of whether high or low transcript or protein levels are associated with traits of interest, but false
positives often result from linkage disequilibrium. We will integrate Mendelian randomization and colocalization
sensitivity analyses into our PrediXcan method to test for causal relationships of transcripts, proteins, gut
microbiota, or other exposures on disease outcomes across diverse populations. Together, our proposed aims
have the potential to identify likely causal genes and molecular pathways underlying complex diseases. Our aims
work toward development of effective risk assessment and potential treatment targets in diverse populations.
Our team is well positioned to perform novel PrediXcan-based analyses of omics traits in diverse populations
and promises to maximize impact by making our scripts, models, and results publicly available.