Project Summary
Deeper understanding of the degree of transferability of genetic association results and implicated biological
mechanisms across populations is essential for equitable precision medicine implementation and can only be
accomplished by studying the genetic architecture of complex traits in diverse populations. In our initial project
period, we have shown that genetic correlation of gene expression depends on shared ancestry proportions in
African American, Hispanic, and European populations. We identified a subset of genes that are well-predicted
in one population, but poorly predicted in another and showed these differences are due to allele frequency
differences between populations. Our results demonstrate that when comparing predicted expression levels to
the observed, a balance of the training population with ancestry similar to the test population and total sample
size leads to optimal predicted gene expression. Our studies of lipid traits in Yoruba, Filipino and Hispanic
populations uncovered key genes likely regulated by variants that are monomorphic or rare in European
populations, demonstrating why studies in diverse populations are crucial. We have optimized genetic
prediction models of gene expression levels in diverse populations and thus have broadened the scope of
PrediXcan. In this proposal, we seek to (1) optimize global and local ancestry-aware omics trait prediction
models within and across diverse populations and (2) predict the intermediate omics traits and perform poly-
omic PrediXcan analyses of complex traits in GWAS cohorts from diverse populations. We have gathered data
of multiple omics traits from diverse populations for this project (genome-wide genotype, RNA-Seq,
methylomics, metabolomics, and microbiome). We will use machine learning to optimize genotypic prediction
models of gene expression levels, splicing ratios, methylation, metabolite levels, and microbial diversity. We
expect a range of predictive power will be observed across omics traits dependent on the heritability of each
trait and differences in allele frequencies and effect sizes among populations. We will integrate regulatory data
and previous results from larger European populations when appropriate to prioritize functional variants in our
prediction models. For each omics trait, we will survey its genetic architecture to inform the best prediction
models. Our models will account for global and local ancestry and we will quantify the ancestry specific
components of each omics trait. We will test the predicted omics traits for association with phenotypes of
interest using either raw genotypes or summary statistics. We will use colocalization methods to determine if
the SNPs driving each omics trait prediction model are also those most associated with the phenotype and
thus most likely to be causal. We will combine predicted omics traits in poly-omic models to determine which
genes and biological pathways are implicated for a particular phenotype. Our team is well positioned to
perform novel PrediXcan-based analyses of omics traits in diverse populations and promises to maximize
impact by making our scripts, models, and results publicly available.