PROJECT SUMMARY/ABSTRACT
Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with
metabolic, cardiovascular, autoimmune and other diseases. These variants have the potential to reveal
molecular mechanisms that underpin human diseases, but their interpretation is extremely challenging, because
most are within non-coding genomic regions with unknown function. The long-term goal of the proposed research
is to elucidate the molecular basis of complex diseases by assembling comprehensive catalogs of regulatory
sequences and illuminating how non-coding genetic variants affect gene regulation. This proposal will leverage
the power of high-throughput genomic perturbations and computational analyses to discover regulatory
sequences, interpret non-coding genetic variants, and connect disease-associated variants to the genes they
regulate. Research Focus 1 will systematically discover novel regulatory sequences using CRISPR-directed
tiling deletion screens, which can discover regulatory sequences that are invisible to other approaches. These
screens will be performed in primary T cells and applied to megabase-scale regions surrounding T cell
differentiation genes, which are rich in uncharacterized GWAS hits. To determine how frequently GWAS hits
affect novel regulatory sequences lacking canonical enhancer marks, fine-mapped GWAS variants will be
intersected with regulatory sequences discovered by the screens. The function of novel regulatory sequences
will be determined with deletions followed by experiments to measure 3D chromatin contacts, gene expression,
and cellular proliferation. Research Focus 2 will utilize single-cell genome perturbations to connect thousands
of variants associated with human diseases to the genes they regulate across multiple cell types. Sequences
containing potentially causal GWAS variants will be targeted with CRISPR interference and gene expression will
be measured with single-cell RNA-seq in a mixture of disease-relevant immune cells. Using the single-cell data,
perturbed sequences will be connected to changes in gene expression in specific cell types. Variants predicted
to regulate gene expression will be validated by modifying alleles with genome editing. The expected outcomes
of this project are (i) systematic catalogs of regulatory sequences for genes involved in T cell differentiation, (ii)
molecular characterization of novel unmarked regulatory sequences that contain GWAS hits and (iii) connections
between sequences containing GWAS hits and genes that they regulate in specific cell types. This proposal will
establish genomic perturbations as a new strategy to interpret non-coding variants, uncover important new
regulatory biology, and accelerate mechanistic understanding of disease-associated variants.