ABSTRACT
Thousands of genetic association studies have identified regions of the genome which contribute to
common diseases. The vast majority of association signal resides in the noncoding portion of the genome,
suggesting that genetic variation within regulatory elements significantly contributes to common disease
etiology by altering gene expression patterns. My long-term goals are (i) to enable rapid and routine
identification of the causal regulatory mechanisms underlying genetic associations, and (ii) to use that
information to prioritize candidate therapeutic targets later in my career. Doing so remains a significant
challenge, however, because of the limited resolution of genetic association signals and the typically low-
throughput of experimental validation studies. The objective of this proposal, a step towards that long-term
goal, is to complete the first genome- and population-wide experimental assessment of the effects of non-
coding genetic variants on gene regulatory element activity. I will also evaluate several foundational
hypotheses regarding the frequency, location, genomic context, and combinatorial interactions of non-coding
alleles that are most likely to functionally contribute to disease. In Aim 1, I will quantify the regulatory effects of
tens of millions of genetic variants across diverse human populations. To do so, I will complete genome-wide
high-throughput reporter assays across the genomes of ~300 individuals that I will prioritize based on their
genetic diversity. In Aim 2, I will predict the effects of identified regulatory variants on gene expression. To do
so, I will develop and apply novel statistical methods to integrate the data from high-throughput reporter assays
with data from genetic association studies and genomic analyses of chromatin state and gene regulation. In
Aim 3, I will estimate the genome-wide impact of additive combinations of regulatory variants — commonly
observed in regions of genetic association — on the expression of a gene; and test whether those effects can
explain downstream phenotypes. To do so, I will use the empirically measured effects of individual regulatory
variants to estimate their combined effect in haplotypes with multiple variants and then validate their functional
impact on gene expression in vitro using CRISPR genome editing. The expected outcome will be the first
comprehensive dataset detailing the location and functional effects of regulatory variants from diverse human
populations. By distributing those data via an online public database, researchers will be able to query which
regulatory variants may explain results from their own association studies. Thus, these results will enable
easier identification of causal genetic variants and enable researchers to shift their focus toward developing
prevention and treatment options for patients. In future work of my career, I hope to lead research teams in
the use of that information to prioritize new genes for targeted mechanistic evaluations and, ideally, for their
potential as therapeutic targets.