An integrated toolkit combining computational systems biology techniques with molecular dynamics simulations to delineate functionality of GPCRs - Project Summary / Abstract In the last decade there has been an explosion of resolved high-resolution structures of G-protein coupled receptors (GPCRs) and their complexes with several G-proteins, collectively known as transducer proteins. GPCRs are dynamic proteins, and exist in multiple functional conformational states. Comparisons of three-dimensional structures of the inactive and active states of GPCRs have led to identification of residue pair distances that show distinct changes upon activation. Such residue pairs are known as “activation microswitches”. Molecular Dynamics (MD) simulations is an attractive tool for identifying (i) the residue pairs that are critical to GPCR activation, (ii) residue pairs involved in allosteric communication from the ligand binding site to the G protein coupling site, and (iii) residue pairs in the GPCR:G protein interfaces that contribute to their coupling strength and selectivity. While our ability to generate long time scale dynamics trajectories have increased exponentially, the results of MD simulations have largely been analyzed using prior knowledge of the GPCRs. There is a critical need for adopting the unbiased, data-driven, systems biology tools to analyze long time scale MD trajectories data to mine knowledge on the residue motions that provide information on allosteric communication network in GPCRs. Our overarching goal in this grant is to apply Bayesian Network (BN) modeling, an interpretable machine learning methodology, to the MD simulation trajectories data on GPCR:G protein complexes in order to identify the residues in various GPCR structural regions that contribute to ligand selectivity. Network-centered approaches have not been used, so far, to analyze high-dimensional residue pairs MD simulation data. BN modeling, in particular, has attractive properties (interpretability, probabilistic nature of the data representation, statistical validation, tools for topology comparison and analysis) that presently deployed secondary MD simulation data analysis methods, such as principal component analysis (PCA) of residue pairs, lack. We propose to use BN modeling with large scale MD trajectories (multiple short and multiple long trajectories) of inactive state GPCRs and fully active state GPCR:G protein complexes to (i) identify the activation microswitches, the residue pairs that show large scale conformational changes upon activation, and (ii) outline the residue network involved in the allosteric communication from the agonist binding site to the G-protein coupling sites. We will also (iii) identify the residues in the GPCR:G protein interface that contribute to selectivity in coupling to specific family of G proteins (Gs, Gi and Gq). In aim 2, we will (iv) dissect time-correlated events using BN series and dynamic BN (DBN) models to delineate and scrutinize the residue networks that lead to large scale transitions. Importantly, the deliverables will include an unprecedented “toolkit” (algorithms + software) incorporating system biology tools for analyzing MD simulation trajectories data that are generalizable to any protein complexes.