High-throughput discovery and modeling of protein stability and dynamics - Project Summary Recent breakthroughs in machine learning have stunned the world by making it possible to accurately predict folded structures for an enormous variety of protein sequences. These breakthroughs signaled the power of innovative algorithms and large datasets to unlock the hidden information in a protein’s sequence. Still, this only scratches the surface of what protein sequences can tell us. The goal of this project is to uncover the next layer of information hidden in protein sequences by measuring global folding stability and folding energy landscapes for thousands to millions of protein domains. Due to experimental limitations, these properties have historically been challenging to investigate at scale. Using new experimental methods developed by our lab, this project will lead to unique, massive datasets quantifying these two properties. These datasets will empower computational researchers in our own lab and around the world to develop new predictive models that can be applied in drug and vaccine development as well as in basic research. Global folding stability describes the physical propensity of a protein sequence to fold or unfold, and stability influences nearly every other protein property, including function, aggregation propensity, cellular abundance, immunogenicity, and more. Engineering higher stability proteins is a major goal in drug and vaccine development, and determining how genetic variants influence stability is a key goal in precision medicine. Computational tools to predict stability are widely used, but these tools have limited accuracy due to the low quantity of stability data available for optimizing the models. Here, we will use a new method developed by our lab to measure stability for three million new sequences (>100-fold more than all available traditional stability measurements), then use these data to develop a new accurate predictive model for protein stability. Folding energy landscapes describe the relative energies of all the different conformational states of a protein, including folded, partially folded, and unfolded states. Even when two proteins have similar structures and similar global stabilities, they can have very different energy landscapes, leading to different behavior in biological systems and in therapeutic development. These energy landscapes are challenging to study experimentally and have never been analyzed on a large scale. Our lab recently broke this barrier by developing a new experimental approach to measure these energy landscapes for thousands of proteins in parallel. Here, we aim to extend our method to new, larger protein libraries and improve the level of detail we resolve about each protein’s landscape. Finally, we will use these improved datasets to develop machine learning models to predict energy landscapes.