Almost all proteins function through interacting with other proteins. Previous studies have shown that the vast
majority of damaging single amino acid mutations in proteins disrupt only a subset of specific protein-protein
interactions, and that mutations in the same protein that disrupt different interactions tend to cause clinically
distinct disorders. Therefore, it is of great importance to determine interaction-specific disruptions caused by
each mutation. Furthermore, rapid advances in sequencing technologies have enabled the identification of tens
of millions of single nucleotide variants (SNVs) in the human population, driving an urgent need to understand
the impact of each SNV on the human interactome network. Unfortunately, there is currently no method that is
capable of predicting the specific impact of a large fraction of these SNVs on individual protein-protein
interactions. To address this issue, we propose to leverage our massively-parallel site-directed mutagenesis
pipeline, Clone-seq, to generate clones for ~6,000 coding SNVs in the human population: ~4,000 from
gnomAD and ~2,000 to be submitted by the international human genetics community. We will then
experimentally examine the impact on protein stability and individual protein-protein interactions for every
variant using high-throughput DUAL-FLUO and InPOINT (integrating PCA, LUMIER, Y2H, and wNAPPA)
assays. This proposal brings together three groups with complementary expertise in high-throughput
interactome experiments and network analysis from the Yu lab, in genomic and population genetic studies from
the Clark lab, and in comprehensive biophysical and structural modeling of mutation’s impact on binding free
energy of protein interactions from the Alexov lab. Out of the ~6,000 SNVs, we expect to identify ~1,200
disruptive SNVs and ~4,000 different SNV-interaction pairs where the SNV disrupt that specific interaction. The
data produced by our project will increase the available experimental information by >140× in number of
human proteins and >500× in number of interactions, allowing us for the first time to comprehensively assess
the relationships between the impact of SNVs on interactions and their various population genetic attributes
(including, but not limited to, allele frequency and flanking haplotype, inter-population differentiation, local rate
of recombination, allele age, modes of selection). Finally, we will establish a computational-experimental-
integrated iterative learning scheme to build a multi-layer random-forest-based framework, SIMPACT, which
can accurately predict specific impacts on all individual protein-protein interactions for all missense SNVs.
Our proposed work will fuel hypothesis-driven research, will significantly improve our functional understanding
of variants, and will likely fundamentally change the experimental design and data interpretation for whole
genome/exome studies going forward.