PROJECT SUMMARY
Today’s technologies allow profiling thousands of gene expression features for diseases and drugs at a very low
cost. This proposal entitled “Virtual Compound Screening Using Gene Expression” aims to develop novel data
science approaches to leverage emerging gene expression profiles to discover novel drugs. Previously, we
developed a scoring function called RGES to quantify the drug’s potency to reverse disease gene expression
based on the drug- and disease- expression profiles. We observed that RGES correlates with drug efficacy.
Using this idea, we and others identified drugs that could be repurposed to treat a number of diseases. However,
this approach currently does not support novel compound screening or lead optimization. To implement this
approach for large-scale screening of a big compound library, we first need to generate gene expression profiles
of the library compounds. However, because of the lack of large-scale gene expression profiles of new
compounds, virtual compound screening was impossible until recent efforts including ours demonstrated the
feasibility of predicting gene expression solely based on chemical structure. The objective of this project is thus
to develop novel machine learning methods to boost the performance of drug-gene expression prediction and
utilize the predicted profiles in practical drug discovery. To achieve the goals, we have assembled a team of
experts in computational drug discovery, machine learning, drug screening, and medicinal chemistry. First, we
will develop a robust, high-performance, and generalizable data-driven chemical structure embedding method
to enhance drug-induced gene expression prediction. With the predicted profiles, we will deploy RGES to score
compounds for given disease profiles. We will evaluate the performance in the screening of compounds for liver
cancer inhibitors, SARS-CoV-2 inhibitors, and cell reprogramming regulators. Finally, we will apply it to lead
optimization. Our previous drug repurposing efforts identified and validated two candidates: niclosamide in liver
cancer and Mycophenolic acid in DIPG. However, the poor solubility of niclosamide and the poor penetration of
Mycophenolic acid in the brain hindered their further development. Accordingly, we will develop a deep
reinforcement learning framework to achieve the optimization of these two drugs. In parallel, domain experts will
propose new analogs. We will synthesize the analogs and compare the performance between domain experts
and the AI model. We expect this work will unleash the power of the emerging omics data in drug discovery.