Fast and slow prediction of stable and transient protein-protein interactions - Protein-protein interactions (PPIs) underpin processes ranging from transduction of signaling networks to maintenance of cellular structure. Identifying new human PPIs may uncover interactions targetable by PPI- modulating drugs, while identifying human-pathogen PPIs could shed light on processes driving infectious diseases. Many studies have experimentally mapped PPIs at scale, yet their cost and complexity, combined with the size of PPI space, have limited the degree to which PPIs can be fully mapped. In humans, only 20% of PPIs are estimated to be known while coverage of other organisms, including pathogens, is far lower. In silico methods can be faster and cheaper but have been beset by low accuracy and lack of generalizability, unable to predict PPIs involving proteins different in sequence or structure from ones they were trained on. Recently however this has begun to change with the development of AlphaFold, which has shown a robust capacity for generalization, including in formal blind competitions. We hypothesize that structure-informed PPI prediction can be made accurate, general, and fast by using new machine learning models and data modalities that AlphaFold does not use. We aim to realize our hypothesis in this proposal. Our team has been at the forefront of molecular machine learning and high-throughput characterization of PPIs, having developed key precursors to AlphaFold as well as OpenFold—the first trainable public implementation of AlphaFold—and some of the most complete experimental PPI maps. We will combine our expertise to tackle PPI prediction. First, we will develop complementary methods to predict transient PPIs involving peptide-binding domains and peptidic ligands. Transient PPIs are highly challenging for AlphaFold and require specialized treatment, in part because they regularly involve post-translational modifications that AlphaFold does not model. We will pursue both supervised approaches that learn directly from domain-peptide binding data and unsupervised approaches that do not rely on binding data but instead detect patterns of co-evolution across whole proteomes to infer domain- peptide binding. We will complement method development with a curation effort to collect transient PPI data from the vast, untapped reservoir of primary literature sources. Second, we will develop a new version of AlphaFold designed to discriminate between true and false PPIs and trained on a wide array of data types, including structural and binding data. This version will have a fast mode that we expect to be as accurate as the current AlphaFold but sufficiently fast to screen PPIs at proteome scale, and a slow mode that will be more accurate than AlphaFold at predicting the structures of protein complexes. We will assess our models using rigorous statistical methods that test their capacity to generalize to novel sequences and structures, and experimentally validate structurally novel predictions of both wild-type and mutated interacting proteins. In the future, we expect our models to help identify novel protein complexes and human and human-pathogen PPIs, and to elucidate the logic of signaling networks and their dysregulation by disease-causing mutations.