Development of a metabolomics and machine learning based high-throughput screening platform for data-driven drug discovery - Project Summary
High-throughput omics technologies allow for measuring various biomolecules comprehensively and over the
past decade have become exponentially less expensive. Coupling these developing emerging technologies with
automation approaches and the phenotypic-based drug discovery paradigm allows for “data-driven” drug
discovery (D4). D4 focuses on a complete cellular readout, quantitatively measuring 100s to 100,000s of
biomolecules, rather than focusing on a single protein, pathway, or physiological trait. The complexity of this data
requires computational tools for proper analysis and interpretation. A pioneering project and dataset for D4 is
the Broad Institute’s Connectivity Map, which is a transcriptomics screening and query platform for drug
characterization, discovery, and repositioning. Though with many successes, the Connectivity Map is based on
only a single biomolecule type, RNA, and downstream effects caused by chemical perturbations to proteins and
metabolites are ignored. In this proposal, we combine the dual strengths of experts in LC-MS/MS based
metabolomics (Omix Technologies) with leaders in metabolic network modeling and metabolomics data analysis
(Sinopia Biosciences) to develop a metabolomics based high-throughput compound screening platform. Our
preliminary data for two major drugs (doxorubicin and rapamycin) showcases that our approach has technical
validity, accuracy, and provides potential biological utility that complements transcriptomics based approaches.
This Phase I proposal will assess the biological utility of a metabolomics-based screening platform. First, we will
profile ~250 FDA approved small molecules from a broad range of drug classes. Second, we will develop the
necessary bioinformatics pipelines, mechanistic metabolic models, and machine learning algorithms to analyze
and interpret these complex datasets. Finally, we will assess whether adding metabolomics data to the
Connectivity Map boosts D4 predictions including assessing compound mechanisms of actions, compound
similarity, identifying biomarkers for drug efficacy and safety, and identifying drug repurposing opportunities.
After the biological utility of this approach is demonstrated in Phase I, Phase II will focus on profiling of novel
chemical and genetic perturbations to further demonstrate the power of the platform and identify commercial
opportunities for treating rare genetic diseases.