Abstract
Population sequencing and clinical genetic testing are producing vast and growing catalogs of human genetic
variation. Some key emerging bottlenecks are to understand: (a) which variants alter gene function, (b) the
mechanism(s) by which they act, and (c) to what extent these functional changes influence traits and disease
risk. We address these questions by developing and applying massively parallel approaches to test many
thousands of gene variants under experimental functional selections. We have shown, in a diverse range of
genes, that the resulting maps are highly concordant with existing knowledge, and that they can guide the
interpretation of variants discovered during clinical genetic testing to increase its accuracy and actionability.
Here, we seek to capitalize on these methods’ scalability to apply them across an entire pathway of functionally
related genes involved in DNA mismatch repair (MMR), which is implicated in inherited cancer risk syndromes
affecting =1:300 individuals worldwide. We will carefully interrogate variants that display intermediate levels of
function, which are commonly overlooked in such large-scale screens. We hypothesize that rather than being
entirely ‘noise’, many of these are hypomorphic alleles, and that they may affect disease risk more subtly (e.g.,
with reduced penetrance) or only conditionally (e.g., when combined in cis with a regulatory variant). In addition,
we are intrigued by anecdotal examples of epistatic interactions within and between MMR genes – that is, pairs
of variants that together yield effects that differ from sum of their individual effects. We propose that the MMR
pathway genes will serve as a useful case study to assess how prevalent these types of variants and interactions
are, and to what extent they contribute to pathogenic burden. Finally, we propose to establish functional readouts
to newly enable pooled screens and/or selections for entire classes of clinically actionable human genes,
broadening our focus to DNA damage response genes beyond MMR, which are implicated in diverse disorders
including cancers as well as immunodeficiencies and neurodevelopmental disorders. To establish these maps’
real-world utility, we will validate and calibrate them by comparison to biobanks with linked genotype and health
record data. Importantly, by testing every possible variant, these approaches avoid population biases common
among current large genomic datasets. In sum, these studies will make hundreds of genes newly amenable to
large-scale structure-function mapping, and will improve the accurate and equitable interpretation of clinical
variants.