Project Summary
Cancer is a leading cause of death worldwide. In the past few years, an average of around 18.1 million new
cases of cancer (per year) were diagnosed. Physicians often decide which treatment to give a patient with the
goals of prolonging overall survival, preventing recurrence, and minimizing complications. Generally,
Randomized Controlled Trials (RCTs) are used to determine the efficacy of one therapy versus another therapy
but are untenable in many situations due to ethical and financial constraints. Recent work has leveraged
observational data to develop machine learning models that capture the progression of chronic diseases such
as Cystic Fibrosis and Parkinson’s. However, using machine learning to determine treatment efficacy and
predict important clinical endpoints, such as overall survival (OS) or progression free survival (PFS), in cancer
has not been well studied. This gap in knowledge is due to a lack of benchmark cancer datasets, limited
sample sizes for rare cancers, and challenges specific to cancer management, such as tumor heterogeneity
within patients leading to differential treatment response. In spite of these challenges, recent methodological
improvements in machine learning, such as the use of inductive biases and auxiliary data to improve prediction
in data-scarce settings as well as improved treatment effect estimation methods, present an opportunity to test
the promise of machine learning in the cancer setting. Therefore, the overarching goal of the proposed
work is to develop methods that will enable training of machine learning models that capture the signal
in longitudinal, observational cancer data and ultimately improve prediction of clinical endpoints as
well as estimation of cancer treatment effects. As a case study and evaluation bed for my development of
these methods, I will focus on multiple myeloma, an incurable plasma cell cancer. Aim 1 of this proposal will
focus on improving prediction of survival endpoints and depth of treatment response. I will train a latent
variable model with a novel learning algorithm that will leverage auxiliary longitudinal data to improve the
power of the model, enabling better prediction of clinical endpoints. Aim 2 will tackle the related, yet distinct,
question of treatment effect estimation, particularly with respect to different combination chemotherapies.
Meta-learner models will be used to estimate average and conditional average treatment effects. A sensitivity
analysis framework with clinically-interpretable sensitive parameters will be used to assess reliability of the
estimates. Finally, aim 3 will provide a machine learning decision support tool to augment physician decision
making in cancer management. A user study will be conducted with the tool to determine if it improves
physician assessment of patients. This proposal provides a general methodological framework that can be
applied to any cancer dataset and improves understanding of how to effectively use machine learning models
trained on observational data to improve care of cancer patients.