Project Summary / Abstract
In the last decade there has been an explosion of resolved high-resolution structures of G-protein coupled
receptors (GPCRs) and their complexes with several G-proteins, collectively known as transducer proteins. GPCRs
are dynamic proteins, and exist in multiple functional conformational states. Comparisons of three-dimensional
structures of the inactive and active states of GPCRs have led to identification of residue pair distances that show
distinct changes upon activation. Such residue pairs are known as “activation microswitches”. Molecular Dynamics
(MD) simulations is an attractive tool for identifying (i) the residue pairs that are critical to GPCR activation, (ii)
residue pairs involved in allosteric communication from the ligand binding site to the G protein coupling site, and (iii)
residue pairs in the GPCR:G protein interfaces that contribute to their coupling strength and selectivity.
While our ability to generate long time scale dynamics trajectories have increased exponentially, the results of
MD simulations have largely been analyzed using prior knowledge of the GPCRs. There is a critical need for
adopting the unbiased, data-driven, systems biology tools to analyze long time scale MD trajectories data to mine
knowledge on the residue motions that provide information on allosteric communication network in GPCRs. Our
overarching goal in this grant is to apply Bayesian Network (BN) modeling, an interpretable machine learning
methodology, to the MD simulation trajectories data on GPCR:G protein complexes in order to identify the residues in
various GPCR structural regions that contribute to ligand selectivity. Network-centered approaches have not been
used, so far, to analyze high-dimensional residue pairs MD simulation data. BN modeling, in particular, has attractive
properties (interpretability, probabilistic nature of the data representation, statistical validation, tools for topology
comparison and analysis) that presently deployed secondary MD simulation data analysis methods, such as principal
component analysis (PCA) of residue pairs, lack.
We propose to use BN modeling with large scale MD trajectories (multiple short and multiple long trajectories)
of inactive state GPCRs and fully active state GPCR:G protein complexes to (i) identify the activation microswitches,
the residue pairs that show large scale conformational changes upon activation, and (ii) outline the residue network
involved in the allosteric communication from the agonist binding site to the G-protein coupling sites. We will also (iii)
identify the residues in the GPCR:G protein interface that contribute to selectivity in coupling to specific family of G
proteins (Gs, Gi and Gq). In aim 2, we will (iv) dissect time-correlated events using BN series and dynamic BN (DBN)
models to delineate and scrutinize the residue networks that lead to large scale transitions.
Importantly, the deliverables will include an unprecedented “toolkit” (algorithms + software) incorporating
system biology tools for analyzing MD simulation trajectories data that are generalizable to any protein complexes.