SUMMARY
Structural birth defects (SBDs) encompass a spectrum of congenital abnormalities affecting a wide range of
human organ systems. Progress in sequencing technologies has enabled significant advances in the discovery
of coding mutations underlying SBDs through whole-exome sequencing. Nonetheless, to date most cases
continue to remain “unsolved”, creating a major barrier to diagnostic interpretation and therapeutic development.
In particular, the identification and interpretation of mutations in noncoding sequence, which constitutes 98% of
the human genome, has presented a formidable challenge. The present proposal addresses the hypothesis that
noncoding sequence represents a major reservoir of causal mutations explaining many unsolved SBD cases.
Specifically, we will focus on distant-acting transcriptional enhancers, a predominant class of noncoding genome
elements with critical regulatory functions in embryonic development. There are isolated examples of SBD-
causing enhancer mutations, but three principal hurdles have prevented their identification at scale: a) the lack
of whole genome sequence data (WGS) from unsolved cases; b) inadequate annotations of noncoding genome
functions; c) the lack of testing pipelines to assess the in vivo relevance of enhancer mutations and determine
their causality. In this proposal, we address these challenges by creating an integrated pipeline for the
identification, function-based prioritization, and in vivo validation of causality of enhancer mutations in SBD
cases. This proposal will take advantage of growing aggregated WGS data, advanced analysis pipelines for
mutation identification, a unique catalog of prioritized predictions of developmental in vivo enhancers, and
advanced mouse engineering capabilities for in vivo validation of enhancers and enhancer mutations. Our
specific aims include: 1) Prioritize de novo noncoding gene regulatory mutations identified in growing
WGS catalogs in SBD patients. Taking advantage of preexisting aggregated WGS genetic data and innovative
analysis strategies, we will identify noncoding mutations in SBD at unprecedented scale. Noncoding findings will
be interpreted and prioritized using DevCisReg, a comprehensive catalog of gene regulatory sequences we
developed from analysis of >800 human and mouse epigenomic data sets. 2) Functionally test prioritized SBD
noncoding mutations for impacts on gene expression in scaled transgenic mouse enhancer assays. We
will use a targeted CRISPR-enabled transgenic approach to characterize 200 candidate enhancer alleles in mice
and determine which mutations impact on gene expression in vivo. 3) Functionally model prioritized SBD
noncoding mutations in knockin mice. We will create and phenotype 40 knockin mouse lines with human
alleles to test the in vivo impact of regulatory mutations in live animals. We will focus on mutations from SBDs
that can be modeled and studied by streamlined phenotyping in mice to increase the likelihood we can detect a
defect in vivo. Together, these efforts will create an integrated mutation-to-phenotype identification and testing
pipeline that will provide conclusive in vivo evidence for establishing the causality of enhancer mutations in SBD.