Multi-scale functional dissection and modeling of regulatory variation associated with human traits - Our ability to identify genetic sequence variation in humans has thus far outstripped the field’s ability to interpret these mutations. Genome-wide association studies have identified hundreds of thousands of genomic loci associated with disease risk and human phenotypic traits, yet in few instances do we know the identity of the exact causal mutation, nor the molecular mechanism behind its function. Much of this limitation is due to a large portion of this variation residing in cis-regulatory regions (CREs), where our inability to identify a variants’ regulatory impacts or target gene(s) presents a major hurdle. Better understanding of this regulatory grammar - the complex logic of how sequence content in CREs controls transcription – is a crucial next step for genomics, but requires a vast expansion of well characterized regulatory mutations. To achieve this goal, we will employ a multi-pronged approach to build a large-scale, regulatory variant functional catalog. We will focus on CREs harboring genetically fine-mapped, likely causal variants from global populations for a variety of metabolic traits and disease (Aim 1). We will first identify CRE-gene interactions using highly-sensitive and scalable endogenous CRISPR approaches. This large-scale mapping effort will inform our understanding of the CRE-gene targeting logic of regulatory grammar. We will use this data to map the transcriptional architecture of metabolic complex traits. We then propose to interrogate sequence determinants of regulatory grammar for hundreds of trait-associated CREs at their endogenous location in the genome (Aim 2). We will first develop an endogenous saturation mutagenesis system to generate hundreds of thousands of nucleotide changes in these CREs. We will then assay the regulatory architecture of these changes using multiplexed amplicon ChIP-sequencing to identify epigenetic changes, and HCR-FlowFISH to detect transcriptional changes. In addition to identifying causal variants for a variety of metabolic diseases, this proposal will generate a repertoire of 300,000+ functionally characterized regulatory variants. This variant impact catalog will serve as an ideal training set to model regulatory grammar with our powerful machine learning approaches. We will incorporate endogenous saturation mutagenesis data into our variant effect prediction models (VEPs). Importantly, such models will find utility across global populations as they will explain a universal regulatory code of the human genome and thus enable interpretation of population-specific variation. We will then deploy these VEPs to understudied variation and in understudied populations. Overall, this proposal is structured to generate a functional characterization catalog at multiple levels: first providing molecular mechanisms and gene targets for thousands of causal variants, secondly building comprehensive genomic etiological understanding for phenotypically related complex traits, and lastly providing the scale of endogenous data necessary to improve VEPs. Our approach combines our group’s unique expertise spanning functional genomics, CRISPR screens, statistical genetics, and machine learning.