Development and validation of data standards and computational methods for large scale metabolomics data analysis for data driven drug discovery - Project Summary Data-driven drug discovery (D4) is the combined utilization of the phenotypic-based drug discovery paradigm with high-throughput omics technologies and machine learning. D4 has the advantage of being unbiased, similar to phenotypic-based drug discovery. In addition, D4 allows for gaining mechanistic insights as omics technologies allow for comprehensive measurement of 100s to 100,000s of biomolecules or cellular features, which enables deep characterization of diseases and compound/genetic perturbations. The predominant data type for D4 has historically been transcriptomics and, more recently, image-based assays have been utilized. Other biomolecular features (i.e., proteins and metabolites) are considered closer to the phenotype, but technical challenges in data generation and analysis, as well as the lack of standardized pipelines have precluded the systematic use of these data types. Sinopia Bioscience and Omix Technologies have combined their strengths in systems biology data analysis, AI/ML, and LC-MS/MS based metabolomics to develop a unique metabolomics- based drug discovery platform that has allowed for systematic metabolic characterization of a chemical library consisting of ~3,300 small molecules, covering more than 1,000 drug targets. Our preliminary results demonstrate that metabolomics is more sensitive, reproducible, and predictive of properties of small molecules, such as the molecular target, adverse drug reactions, and chemical structure, than transcriptomics. However, for a D4 platform to be successfully applied, standardized methods are required to generate robust and information- rich profiles of metabolism that can be compared between experiments. Whereas for gene expression such methods have been developed and benchmarked, multiple key challenges remain for metabolomics data. In this grant, we will address these challenges. We will develop computational methods for generating and comparing profiles and test them using computational benchmarking test cases and prospective experimental validation. In Phase II, we will expand the platform by increasing the number of compounds and diseases profiled, integrate metabolomics-based D4 with gene expression data, develop software for off-the-shelf application, and apply the platform in the fields of oncology and/or inflammation. The platform will ultimately be commercialized through pursuing internal drug development opportunities and/or in partnership with biotech/pharma collaborators.