Integrating genomic data and protein structures to improve measures of selective constraint - ABSTRACT Large-scale reference datasets of genomic variation have both provided population-level allele frequency information that is a critical component of variant interpretation in diagnostic settings, and enabled the identification of genomic sequences under severe selective constraint (i.e. intolerant of genetic changes). We have previously demonstrated that gene-level constraint metrics aid in disease gene discovery efforts, improve variant interpretation, and highlight critical biological pathways. However, substantial gaps in our ability to interpret genomic variation and predict its impact on biological processes remain, and hamper our ability to use genetics to guide clinical care. Particularly for missense variation, there are numerous orthogonal pieces of information that could be used to identify the most constrained residues. Here, we propose to leverage human genomic data sets of unprecedented scale (>700,000 exomes at the outset of our project and over 3,000,000 by the end of the funding period) and to expand our previous constraint work in three different directions. We will incorporate functional (e.g., sites of post-translational modification) and structural (e.g., three-dimensional protein structures) information in our evaluation of selective constraint (AIM 1). We will then investigate how these signals of selective constraint are shared across protein complexes, broader molecular networks created from proteomics data, and specifically at the interfaces of protein-protein interactions (AIM 2). Finally, we will search for evidence of significant differences in constraint across diverse ancestral groups, which is made possible by the increasingly large and diverse genomic datasets being generated by our group and others (AIM 3). All of our results will be widely and openly shared with the research and clinical communities. These aims are poised to improve our ability to identify genomic variation under severe purifying selection, and thereby our methods for variant prioritization and applications to gene discovery efforts.