Development and validation of data standards and computational methods for large scale metabolomics data analysis for data driven drug discovery - Project Summary
Data-driven drug discovery (D4) is the combined utilization of the phenotypic-based drug discovery paradigm
with high-throughput omics technologies and machine learning. D4 has the advantage of being unbiased, similar
to phenotypic-based drug discovery. In addition, D4 allows for gaining mechanistic insights as omics
technologies allow for comprehensive measurement of 100s to 100,000s of biomolecules or cellular features,
which enables deep characterization of diseases and compound/genetic perturbations. The predominant data
type for D4 has historically been transcriptomics and, more recently, image-based assays have been utilized.
Other biomolecular features (i.e., proteins and metabolites) are considered closer to the phenotype, but technical
challenges in data generation and analysis, as well as the lack of standardized pipelines have precluded the
systematic use of these data types. Sinopia Bioscience and Omix Technologies have combined their strengths
in systems biology data analysis, AI/ML, and LC-MS/MS based metabolomics to develop a unique metabolomics-
based drug discovery platform that has allowed for systematic metabolic characterization of a chemical library
consisting of ~3,300 small molecules, covering more than 1,000 drug targets. Our preliminary results
demonstrate that metabolomics is more sensitive, reproducible, and predictive of properties of small molecules,
such as the molecular target, adverse drug reactions, and chemical structure, than transcriptomics. However, for
a D4 platform to be successfully applied, standardized methods are required to generate robust and information-
rich profiles of metabolism that can be compared between experiments. Whereas for gene expression such
methods have been developed and benchmarked, multiple key challenges remain for metabolomics data. In this
grant, we will address these challenges. We will develop computational methods for generating and comparing
profiles and test them using computational benchmarking test cases and prospective experimental validation. In
Phase II, we will expand the platform by increasing the number of compounds and diseases profiled, integrate
metabolomics-based D4 with gene expression data, develop software for off-the-shelf application, and apply the
platform in the fields of oncology and/or inflammation. The platform will ultimately be commercialized through
pursuing internal drug development opportunities and/or in partnership with biotech/pharma collaborators.