PROJECT SUMMARY
Genetic studies have improved our understanding of disease etiology and treatment. However, there are at
least two shortcomings preventing current studies from reaching their potential in elucidating the genetic
architecture of complex traits for all humans. First, current genetic studies largely ignore the genetic
relationships among individuals in a study. Many of these relationships may be distant, but nonetheless can be
connected on genealogical trees at every position of the genome through a coalescent process. The collection
of such (unobserved) trees is encoded by the ancestral recombination graph (ARG). Second, genetic studies
are generally biased towards relatively homogeneous, continental, populations such as European or East
Asian populations, in part due to a lack of methods tailored towards admixed populations. In this proposal we
aim to develop new methods to address both shortcomings. Our framework leverages recent breakthroughs
that allow, for the first time, accurate and scalable estimation of ARGs. In Aim 1 we will leverage a new
estimator of relatedness based on the ARG that can retain more information of relatedness from incomplete
genetic data (e.g. array genotype data) compared to the current standard estimator for relatedness. We will
use this estimator to estimate trait heritability and cross-population genetic correlation of complex traits and
diseases in humans, as well as to correct for confounding due to population structure in genome-wide
association studies. In Aim 2, we will develop an association-testing framework that uses the ARG to identify
trait-associated genomic regions and prioritize trait-associated haplotypes. This principled approach can
naturally account for allelic heterogeneity and has the potential to improve the power of association studies
through lowered multiple testing burden, which is particularly important for understudied populations where
recruitment of participants is more challenging. Finally, in Aim 3 we will develop a population genetic
framework that uses ARGs to model the admixture history of a population. Using this model, we will develop
new ways to detect genes that have responded to recent selection and identify complex traits that have
evolved under different kinds of phenotypic selection. Importantly, our framework will address these
evolutionary questions in each ancestral component of the admixed population. Throughout each Aim we will
benchmark our methods with extensive simulations. We will also evaluate our methods empirically using large-
scale real-world human genetic data. Finally, we will apply our methods to genotyping and sequencing data
from admixed populations to discover new loci associated with human diseases and/or experienced natural
selection in the past. In summary, we will mine the wealth of information from the ARG and address
fundamental population- and human-genetic questions, particularly in understudied and admixed populations.