Development of neural network models to enable efficient metabolomic characterization of compound libraries for data-driven drug discovery - Project Summary Data-driven drug discovery (D4) combines phenotypic-based drug discovery with high-throughput omics technologies and machine learning. Like phenotypic-based drug discovery, D4 is unbiased, but it offers additional biomolecular insights by leveraging omics technologies to measure 100s to 100,000s of biomolecules, enabling deep disease and compound characterization. The predominant data type for D4 has historically been transcriptomics and, more recently, image-based assays have been utilized. Other biomolecular features (i.e., proteins and metabolites) are considered closer to the phenotype, but technical challenges in data generation and analysis, as well as the lack of standardized pipelines have precluded the systematic use of these data types. Sinopia Bioscience and Omix Technologies have combined their strengths in systems biology data analysis, AI/ML, and LC-MS/MS based metabolomics to develop a unique metabolomics-based drug discovery platform, and used this for systematic metabolic characterization of a chemical library consisting of ~3,300 small molecules, covering more than 1,000 drug targets. Our preliminary results demonstrate that metabolomics is more sensitive, reproducible, and predictive of properties of small molecules, such as the molecular target, adverse drug reactions, and chemical structure, than transcriptomics. For a D4 platform to be successfully applied, it is necessary to screen large numbers of compounds to cover chemical space. In addition, these compounds need to be screened on many cell lines, as drug perturbations are often context dependent. The overall aim of this proposal is to develop an integrated workflow that combines computation and experimentation to efficiently expand the metD4 dataset. Phase I will focus on a critical and likely most challenging part of this workflow: development of computational methods to generate “virtual metabolomic profiles” of unscreened compounds. We will 1) develop methods to predict metabolic profiles of unscreened cell line/compound combinations, which will enable more efficient screening of new cell lines by combining sparse screening and computational inference, 2) develop methods to predict metabolomic drug perturbations of novel compounds based on chemical structure, 3) generate metabolomics data to prospectively validate the developed algorithms, and 4) develop a confidence metric that estimates the accuracy of the virtual metabolomic profiles. Phase II will focus on developing strategies to decide which virtual samples to utilize and to select optimal experimental screening strategies to efficiently expand the coverage of the metD4 dataset both in terms of chemical space and in terms of biological context. The iterative computational and experimental workflow developed in Phase II will allow us to efficiently scale our platform to: 1) significantly larger number of compounds screened, 2) significantly larger number of cell lines screened for more tailored and relevant screening for specific therapeutic areas of interest, and 3) screening on systems that inherently have low throughput (e.g. tissues, patient samples, etc.). These fundamental improvements will allow us to commercialize the platform through investment to pursue internal drug development opportunities and/or through partnership with biotech/pharma collaborators.