Models and Methods for Population Genomics
Understanding genome-wide genetic variation and its role in health-related complex traits in humans is one of
the most important goals of modern biomedical research. There continues to be a substantial need for new
statistical models and methods that can be applied in these studies, particularly as study designs become more
ambitious and sample sizes increase. The overarching goal of this grant is to develop statistical theory, methods,
and software useful in understanding population genomics studies that involve genome-wide genotyping, a wide
range of measured traits, very large sample sizes, structured populations, and varying study designs.
One of the most challenges aspects of modern population genomics studies is that there is a complex
evolutionary history underlying the present-day genetic variation that we observe. Individuals are members of
structured populations with varying levels of relatedness that do not follow the simple assumptions that underlie
classical population genetics theory. There is a need to model and estimate arbitrary forms of structure and
relatedness so that genetic variation in human populations can be accurately characterized, which in turn allows
for an accurate understanding of the genetic basis of complex traits. Our first focus is on flexible, broadly
applicable models that adapt to this arbitrary population structure and relatedness, resulting in principled
statistical methods that make accurate inferences. We then show how our methods improve the ability to identify
genetic associations, estimate genome-wide heritability of traits, and contribute to an understanding of how
predictive polygenic risk scores can be robustly constructed.
The specific aims involve (1) introducing a parametric framework for estimating kinship and FST, thereby bridging
identity-by-descent models with random allele frequency coancestry models of structure; (2) advancing models
and methods for quantifying genome-wide heritability, testing for associations, and building polygenic risk scores
by incorporating our new estimation framework of kinship and FST; (3) developing and distributing software; and
(4) analyzing important data sets to discover new biology and validate our methods and software.