High-throughput thermodynamic and kinetic measurements for variant effects prediction in a major protein superfamily - PROJECT SUMMARY Many disease-associated variants in coding regions of the genome affect translated protein and enzyme products by perturbing their folded conformation or their function, such as interactions with substrates or macromolecular partners. However, we lack a unified predictive framework to predict functional effects of coding variants, limiting how genomic data can be used in precision medicine. Machine learning models trained on large sequence databases have claimed to predict deleterious effects from coding variants in several model proteins, but to date their practical usage has been limited because of two major challenges. The first is the lack of descriptive, “ground truth” biophysical datasets relating sequence variation to native protein properties, due to the low throughput of traditional biochemical and biophysical experiments. The second is that there is not a well- established method for integrating these data in state-of-the-art predictive models. To address these critical limitations, I propose to apply cutting-edge microfluidic techniques to generate large quantitative biophysical datasets connecting sequence variation to function in human acylphosphatase (ACYP), a model protein of the alpha/beta fold family (found in ~10% of human proteins), and leverage these data to enhance predictive models. This microfluidic platform (HT-MEK) contains an array of chambers that allow for parallel expression and purification of >1,700 proteins, and provides measurements of in vitro kinetic and thermodynamic constants for each. In Aim 1, I will engineer a series of ACYP functional assays using HT-MEK and derivative microfluidic technologies, first testing in vitro expression, on-chip stability, and catalytic turnover of a small library of ACYP variants and finally comparing to traditional biochemical measurements. In Aim 2, I will rapidly generate scanning mutagenesis libraries in ACYP and make measurements across hundreds of ACYP variants on HT-MEK. In Aim 3, in collaboration with ML experts, I will use this unprecedented quantitative biochemical dataset to fine-tune a cutting-edge deep learning to provide the first variant effects predictor enhanced by in vitro data at scale. My preliminary data has shown that this model can generate de novo ACYP sequences that fold and are catalytically proficient, suggesting that it will provide a strong foundation for functional prediction. Together, my results will provide insight into the utility of in vitro, biochemical datasets from human proteins in training better predictors of disease phenotypes. The training that I will obtain in carrying out these Aims will allow me to (1) develop skills in research design, analysis, and interpretation of protein biophysics data; (2) learn advanced techniques in protein biochemistry and statistical sequence analysis; and (3) obtain a competitive post-doctoral fellowship with the long-term goal of establishing an independently-funded laboratory at a research-intensive university.