ABSTRACT
Large-scale reference datasets of genomic variation have both provided population-level allele frequency
information that is a critical component of variant interpretation in diagnostic settings, and enabled the
identification of genomic sequences under severe selective constraint (i.e. intolerant of genetic changes). We
have previously demonstrated that gene-level constraint metrics aid in disease gene discovery efforts, improve
variant interpretation, and highlight critical biological pathways. However, substantial gaps in our ability to
interpret genomic variation and predict its impact on biological processes remain, and hamper our ability to use
genetics to guide clinical care. Particularly for missense variation, there are numerous orthogonal pieces of
information that could be used to identify the most constrained residues. Here, we propose to leverage human
genomic data sets of unprecedented scale (>700,000 exomes at the outset of our project and over 3,000,000 by
the end of the funding period) and to expand our previous constraint work in three different directions. We will
incorporate functional (e.g., sites of post-translational modification) and structural (e.g., three-dimensional
protein structures) information in our evaluation of selective constraint (AIM 1). We will then investigate how
these signals of selective constraint are shared across protein complexes, broader molecular networks created
from proteomics data, and specifically at the interfaces of protein-protein interactions (AIM 2). Finally, we will
search for evidence of significant differences in constraint across diverse ancestral groups, which is made
possible by the increasingly large and diverse genomic datasets being generated by our group and others (AIM
3). All of our results will be widely and openly shared with the research and clinical communities. These aims
are poised to improve our ability to identify genomic variation under severe purifying selection, and thereby our
methods for variant prioritization and applications to gene discovery efforts.