Harnessing the Power of Data and Artificial Intelligence to Resolve the Human 3D Interactome - Project Summary Protein-protein interactions (PPIs) are fundamental to nearly all cellular functions, and disruptions caused by mutations often lead to disease. Despite decades of research, a significant portion of the human PPI network remains unknown. The challenges in elucidating the human interactome stem from the vast number of potential interactions, high false-positive rates in high-throughput experiments, and the presence of weak, transient interactions that evade experimental detection. Leveraging breakthroughs in protein structure prediction using Deep Learning (DL) and extensive genomic data for coevolutionary analysis, we have developed pipelines for de novo PPI screening. Our method has shown superior performance compared to large-scale experimental screens and has provided valuable insights into yeast and bacterial pathogen proteomes. This proposal aims to enhance our pipeline and extend its application to resolve the human interactome. First, we will leverage the unprecedented volume of sequence and structural data to transform our methods for proteome-wide PPI screening in humans. We will utilize petabytes of untapped genomic sequence data from draft eukaryotic genomes and genomic reads to enhance the statistical power of coevolutionary analysis. To efficiently perform proteome-wide predictions, we will develop fast and accurate DL networks for PPI prediction by adapting RoseTTAFold and AlphaFold networks and augmenting the PPI training datasets with domain- domain interactions from over 200 million AlphaFold models. Preliminary results indicate that these strategies can drastically boost our pipeline's performance, positioning us to uncover novel interactions in humans. Second, we will address the challenge of weak and transient interactions, particularly those mediated by short linear motifs (SLiMs). We will compile training datasets for biologically significant but weak interactions, detect interaction-mediating phosphorylation sites, and develop specialized DL networks to recognize these sites and weak interactions. We will integrate predicted interactions with experimental data and other bioinformatic analyses to catalog SLiMs in human proteomes and explore their functional roles. Third, we will leverage the predicted human interactome to identify genetic variants that disrupt PPIs and cause disease. We will adopt and develop tools to predict PPI-disrupting mutations based on evolutionary data and physicochemical properties of the interface. These predictions will be integrated with the vast amount of human and mouse genotype-phenotype data, particularly data from the Sequencing Populations to Accelerate Research and Care (SPARC) program led by our McDermott Center at UT Southwestern. This approach will provide mechanistic insights into poorly understood diseases, aiding patient diagnosis. In summary, we will develop and release a suite of computational tools to overcome current challenges in predicting human PPIs, use these tools to resolve the human interactome and catalog SLiM functions, and integrate our findings with patient sequencing data to uncover novel disease mechanisms.