Leveraging comparative proteomics to improve human disease models - PROJECT SUMMARY At the root of every human genetic disease lies molecular dysfunction of a biological process or protein complex. Conversely, proteins interacting in the same biochemical complex are often linked to similar genetic traits. Despite the revolution in high throughput biology, the molecular mechanisms underlying genetic diseases remain only partly known. Previous studies have shown that highly conserved (ancient) proteins are abundant across human cell types and tissues and are enriched for disease associations. My research aims to exploit these trends by determining the most conserved protein interactions across the eukaryotic tree of life, based on an analysis of available large scale proteomics data, and using this information to suggest new candidate genes for diverse human diseases. A large portion of these deeply conserved disease-associated proteins are traceable to the last eukaryotic common ancestor (LECA), an ancestral organism that lived ~2 billion years ago. My own preliminary data suggests that ~9,700 genes in the human genome can be dated back to LECA. Importantly, these deeply conserved genes are responsible for a large and diverse subset of major human diseases, spanning developmental disorders (e.g., Noonan syndrome, Leigh syndrome, microcephaly, neural tube defects), cancers (e.g., leukemia, breast cancer, colorectal cancer), chronic respiratory diseases (e.g., ciliary dyskinesia, asthma), neurological disorders (e.g., Charcot-Marie-Tooth disease, encephalopathy, schizophrenia, autism) and motor problems (e.g., dystonia, spastic paraplegia). My lab has collected and assembled protein interaction data for ~30 evolutionarily diverse eukaryotic organisms. These data directly measure tens of thousands of protein interactions in each species. I propose developing a draft map of the multiprotein assemblies that date back to the last eukaryotic common ancestor. This unprecedented effort represents a synthesis of >20,000 mass spectrometry experiments, and thus will require significant programming skill, statistical know-how, and computational resources. Using guilt-by- association, I will then associate new candidate genes with diseases based on these conserved interactions. I will concurrently verify the use of deep protein complex conservation as a way to associate genes with diseases by functionally characterizing two novel proteins we previously observed to interact with Dnai2. Defects in Dnai2 are known to cause primary ciliary dyskinesia, a subtype of ciliopathy marked by defects in motile cilia; thus, these two novel proteins are also likely ciliopathy genes and may contribute to primary ciliary dyskinesia.