Project Summary
Predicting disease phenotypes from genotypes is a grand challenge in biology and personalized medicine. Our
long-term goal is to address this challenge using a combination of computational and experimental
approaches. Working towards this goal, we have developed and deployed a powerful evolutionary systems
approach to map the complex relationships connecting sequence, structure, function, regulation and disease in
biomedically important protein super-families such as protein kinases. We have made important contributions
describing the unique modes of allosteric regulation in various protein kinases, deciphering the structural basis
of oncogenic activation in a subset of receptor tyrosine kinases, uncovering the regulation of pseudokinases,
and developing new tools and resources for addressing data integration challenges in the signaling field. We
propose to build on these impactful studies to answer key questions emanating from our ongoing studies such
as: What are the functions of pseudokinases, the catalytically-inert members of the kinome, and how can we
use pseudokinases to better predict and characterize non-catalytic functions of kinases? What are the
functions of conserved cysteine residues in regulatory sites of protein and small molecule kinases and are they
post-translationally modified in redox signaling and oxidative stress response that are causally associated with
age-related disorders? How can we enhance existing computational models for predicting genome-phenome
relationships using structural information, and can machine learning on structurally enhanced knowledge
graphs reveal new relationships between patient-derived mutations and disease phenotypes?
We propose to answer these questions using a variety of approaches including statistical mining of large
sequence datasets, molecular dynamics simulations, machine learning, mass spectrometry, biochemical
analysis and in vivo assays. Completion of this work is expected to reveal new allosteric sites for targeting
pseudokinase and kinase non-catalytic functions in diseases, and significantly advance our understanding of
kinase regulatory mechanisms in disease and normal states. Our work will create new tools and resources for
knowledge graph mining and provide explainable models for inferring causal relationships linking genomes and
phenomes with potential applications in personalized medicine. Finally, the scope and impact of our work will
be significantly broadened by participation in studies extending our specialized tools and technological
approaches developed for the study of kinases to other biomedically important gene families such as
glycosyltransferases and sulfotransferases.