The goal of this study is to develop machine learning methods, especially deep learning models (DLMs), to
learn a better representation of activation states of cellular signaling pathways in an individual tumor and
use such information to predict its sensitivity to anti-cancer drugs. Cancer is mainly caused by somatic genome
alterations (SGAs) that perturb cellular signaling pathways, and aberrations in pathways eventually lead to
cancer development. Precision oncology aims to accurately detect and target tumor-specific aberrations, but
challenges remain. Currently, there is no well-established method to detect the activation states of signaling
pathways, and the common practice of using mutation status of a targeted gene as the indicator for
prescribing a molecularly targeted drug has limitations. To overcome such limitation, we hypothesize that, by
closely simulating the hierarchical organization of cellular signaling systems, DLMs can be used to
systematically identify major cancer signaling pathways, to detect tumor-specific aberrations in signaling
pathways, and to predict cancer cell sensitivity to anti-cancer drugs.
We will develop models that more precisely represent the state of signaling systems in cancer cells and use
such information to enhance precision oncology. I will design and apply innovative DLMs to cancer big data,
including large-scale pharmacogenomic data and cancer omics data to learn unified representation of
aberrations in signaling systems caused by driver SGAs in cancer cell, despite of their different growth
conditions, such as in cell culture, PDX and real tumor. This will enable us to transfer the models trained using
cell lines and PDXs to clinical setting (real tumors) in future. By the nature of drugs that may share common
target proteins, we develop model DLM-MLT (the combination of DLM and multi-task learning) to predict the
sensitivity of tumor samples to multiple drugs at once. Furthermore, we will develop model BioSI-DLM to use
various perturbations (ex. SGA/LINCS perturbation data) as side information to learn better representation that
potentially map latent variables in a DLM to biological entities. We hypothesize that the representation
learned from our designed models will significantly improve the prediction accuracy compared with the
conventional indication for drug treatment (ex. mutation state of the drug targeting protein). In summary, our
study uses deep learning based machine learning methods to learn better and concise representation
embedded in the cancer omics data to reflect the personalized genomic changes, which could be used to
guide the personalized treatment. Our study could significantly contribute to the development of cancer
ontology and promote the development of precision medicine.