Summary
Advanced sequencing technologies provide ever-increasing quantities of data about human genetic variation
and viral evolution. However, predicting the outcomes of missense mutations in protein coding regions remains
a challenge, creating a bottleneck in discriminating biomedically-relevant variants from neutral ones (with little or
no effect on phenotype). In particular, outcome predictions are very poor when a missense mutation alters amino
acids that are located far from a protein’s functional/binding sites. These shortcomings also impair protein
design. We propose to ameliorate these needs by developing quantitative, computational models that predict
the effects of long-distance substitutions on binding interactions. To that end, we have developed an approach
in which (1) a protein’s collective motions are first revealed by molecular dynamics simulations and then (2) force
perturbation is used to disrupt the protein’s equilibrium, thereby approximating the effects of ligand binding. We
have used this approach in published studies and preliminary data to illuminate the propagation of dynamical
changes through a protein’s anisotropic network of interactions. Results suggest that changes in these dynamic
networks have crucial effects on protein function, thereby leading to our central hypothesis: The effects of long-
distance substitutions on ligand binding are emergent properties of changes in the protein’s dynamically-coupled,
anisotropic network. The goal of the current proposal is to extend this computational approach to develop
models that predict: (Aim 1) the magnitudes of binding affinity changes arising from long-distance, modulating
substitutions; (Aim 2) which pairs of non-contact substitutions have non-additive effects on binding affinities
(“epistasis”); and (Aim 3) which long-distance positions contribute to ligand specificity. To that end, we have a
well-established collaboration that allows us to iterate between computational predictions and experimental
testing, enabling development of quantitative models with computed accuracies. Our preliminary studies used
the well-characterized E. coli lactose repressor protein (LacI), for which experimental results validate our
preliminary computational models and provide specific hypotheses for Aims 1-3. Additional model proteins will
be used to show the generality of our approach and will include the LacI homolog PurR, the cAMP receptor
protein, and a viral protease SARS-Cov2-Mpro. Results will be used to provide novel computational tools for
predicting functional outcomes of long-distance substitutions. The success of this project will catalyze research
at the interface of protein structural biology, molecular genetics, evolution and medicine by advancing the
mechanistic understanding of how substitutions distal from functional sites alter ligand binding.