Elucidating the sequence code for amyloid peptide self-assembly through all-atom simulations, machine learning, and experiments - ABSTRACT Extensive efforts are being dedicated to design β-sheet nanofibrils by amyloid-inspired peptides as they exhibit mechanical properties that are desirable for various biomedical applications. These efforts require tools that are accurate at predicting the propensity of a peptide to form fibrils from its amino acid sequence. Moreover, they must consider that deposits of amyloid fibrils in different tissues and organs are emblematic of diseases like Alzheimer’s and Parkinson’s. Accordingly, engineered non-toxic amyloids are expected to show a low degree of homology compared to diseases-causing amyloids. Existing bioinformatic tools, which are informed by disease- causing amyloid, are often not suitable to describe this class of peptides. This project combines all-atom molecular dynamics (MD) simulations, machine learning, and experiments, to develop and validate an approach that will be accessible, accurate, and efficient at predicting fibril formation for any peptide sequence. This project expands on recent studies showing that the combination of faster computers and more accurate force fields are now allowing all-atom molecular dynamics to simulate the spontaneous formation of amyloid fibrils from unbiased initial conditions. These studies have been used to identify intermediate states on pathway to fibril formation as well as describe the mechanisms allowing peptides to lock onto the fibril tip with atomic precision accounting for its growth. In addition, for a limited set of designed peptides, all-atom simulations showed more accurate propensities to form fibrils than bioinformatic tools highlighting its predictive potential. However, simulations remain computationally intensive requiring several weeks to be completed. Thus, they cannot be used for a high throughput investigation of sequences required in efforts to design peptides for biomedical applications. This project addresses this knowledge gap and expands the scope of all-atom simulations to peptides that form complex fibrils that resemble more closely the ones from disease-causing amyloids. Moreover, undergraduate students are involved in all aspects of this project including managing, setting up, and running MD simulations. The three aims of the project are: Aim 1 of this project develops machine learning algorithms to predict in a few seconds if a peptide will self- assemble into amyloid fibrils in MD simulations. These predictions will be validated and tested experimentally to establish the scope of application of different MD force fields. Aim 2 of this project performs a high throughput analysis of the sequence space to determine peptides that form fibrils and discover rules in the amino acid sequence that encode for these structures. Aim 3 of this project expands the use of unbiased all-atom simulations to study peptides that form complex fibrils characterized by parallel β-sheets connected to each other via β-arcs. The molecular mechanisms and pathways accounting for these fibrils will be investigated and will be used to provide insight into disease causing amyloids.