Deep Learning Models for Metabolomics Analysis - PROJECT SUMMARY Untargeted metabolomics using tandem mass spectrometry (MS) have attained substantial success in the discovery of biomarkers and advancing our understanding of cellular metabolism. Despite this success, only a small fraction of measured spectra can currently be annotated (assigned a chemical identity). This bottleneck can be attributed to the limitations of current annotation tools that have not yet exploited advances in deep learning and available data modalities (spectra, peaks, molecules, and fragments). The goal of this application is to advance the interpretation of spectra collected through untargeted metabolomics. We focus on annotating data collected through liquid or gas chromatology followed by MS, or MS/MS, as these three tandem technologies have become dominant technologies. Over the next five years, the plan is to harness deep learning to address three problems: 1) annotation, 2) translation between spectra measured under different instrument settings, and 3) explainable models for annotation, where explainability arises from connecting peaks to their respective molecular fragments. The Hassoun lab has extensive, relevant deep learning experience to effectively tackle these problems. The Lab also has experience in dealing with the nuances of metabolomics datasets. The Lab recently developed a novel deep learning annotation model that achieves 41% and 30% performance improvement over multi-layer neural networks and graph neural networks, respectively. Additionally, our lab has developed an ontology- traversal algorithm that yields correct-by-construction molecular substructures that can be assigned to peaks, thus giving rise to datasets that can be used to train explainable annotation models. The Significance of this research is that it addresses fundamental barriers that hinder developing deep learning annotation models. Our models and datasets will be released on GitHub to benefit biological and biomedical applications and metabolomics research. Because of their expected high accuracy and explainability, the models will expedite the interpretation of experiments, improve our understanding of cellular metabolism, and facilitate data sharing among labs. The innovation lies in maximally learn from data modalities and in creating models that exploit the learned representations. Further, the annotation and translation problems are formulated as a bidirectional mapping between domains, in contrast to current annotation models that assume unimodal mappings. These innovations are necessary to advance metabolomics research and they will open new research horizons in the field of metabolomics.