PROJECT SUMMARY
At the root of every human genetic disease lies molecular dysfunction of a biological process or protein
complex. Conversely, proteins interacting in the same biochemical complex are often linked to similar genetic
traits. Despite the revolution in high throughput biology, the molecular mechanisms underlying genetic
diseases remain only partly known. Previous studies have shown that highly conserved (ancient) proteins are
abundant across human cell types and tissues and are enriched for disease associations. My research aims to
exploit these trends by determining the most conserved protein interactions across the eukaryotic tree of life,
based on an analysis of available large scale proteomics data, and using this information to suggest new
candidate genes for diverse human diseases.
A large portion of these deeply conserved disease-associated proteins are traceable to the last eukaryotic
common ancestor (LECA), an ancestral organism that lived ~2 billion years ago. My own preliminary data
suggests that ~9,700 genes in the human genome can be dated back to LECA. Importantly, these deeply
conserved genes are responsible for a large and diverse subset of major human diseases, spanning
developmental disorders (e.g., Noonan syndrome, Leigh syndrome, microcephaly, neural tube defects),
cancers (e.g., leukemia, breast cancer, colorectal cancer), chronic respiratory diseases (e.g., ciliary dyskinesia,
asthma), neurological disorders (e.g., Charcot-Marie-Tooth disease, encephalopathy, schizophrenia, autism)
and motor problems (e.g., dystonia, spastic paraplegia).
My lab has collected and assembled protein interaction data for ~30 evolutionarily diverse eukaryotic
organisms. These data directly measure tens of thousands of protein interactions in each species. I propose
developing a draft map of the multiprotein assemblies that date back to the last eukaryotic common ancestor.
This unprecedented effort represents a synthesis of >20,000 mass spectrometry experiments, and thus will
require significant programming skill, statistical know-how, and computational resources. Using guilt-by-
association, I will then associate new candidate genes with diseases based on these conserved interactions. I
will concurrently verify the use of deep protein complex conservation as a way to associate genes with
diseases by functionally characterizing two novel proteins we previously observed to interact with Dnai2.
Defects in Dnai2 are known to cause primary ciliary dyskinesia, a subtype of ciliopathy marked by defects in
motile cilia; thus, these two novel proteins are also likely ciliopathy genes and may contribute to primary ciliary
dyskinesia.