Genetic inference and prediction in diverse, multi-ancestry cohorts - Project Summary/Abstract The overarching goal of the research in my lab is to make genetics research generalizable to diverse populations, thereby improving our fundamental understanding of the genetic basis of complex diseases. We are particularly interested in studying how and why disease risk varies across human populations and whether this is due to variation in genetic risk. Unfortunately, because of the Eurocentric representation in genome-wide association studies (GWAS), our catalog of disease-causing variants, our understanding of their effects, and our ability to predict genetic risk from them is limited, particularly for individuals of non-European ancestry. To rectify this imbalance, several large-scale efforts seeking to diversify study participants are underway across the globe, including here in the United States through the NIH All of Us program. While these are welcome and exciting developments, the increasing genetic complexity of these cohorts raises methodological challenges that need to be addressed to fully realize their potential. Our goal over the next five years is to understand these challenges and develop statistical tools to solve them. For example, it is common practice in GWAS and genetic risk prediction to group individuals from diverse cohorts by genetically-defined ancestry (e.g. African, European) to be analyzed separately. This practice is primarily motivated by a need to reduce the confounding effects of population structure – a well-known problem that introduces biases in GWAS. But splitting diverse cohorts into ancestry groups is (i) arbitrary, running the risk of reifying racial categories, (ii) reduces the effective sample size, leading to an overall loss in discovery power and an increase in the uncertainty of effect estimates, especially for genetic variants that might underlie health disparities, and (iii) can introduce errors when making genetic predictions, raising the risk of misdiagnosis, especially for under-represented populations. In fact, my previous research has shown that the problem of population structure persists, even in ‘homogeneous’ populations, suggesting that we cannot avoid it and need to deal with it head on. This raises several fundamental questions: Is it actually necessary to split diverse cohorts by ancestry? What is the evidence for that? How do we maximize the power of diverse cohorts while minimizing false-positive associations due to their complex genetic structure? My lab will combine rigorous theoretical modeling and empirical data analysis to answer these questions and develop a statistical framework to fully leverage human genetic diversity to empower genetic discovery and risk prediction.