Machine Learning Methods to Predict Cancer Progression and Estimate Treatment Effectiveness - Project Summary Cancer is a leading cause of death worldwide. In the past few years, an average of around 18.1 million new cases of cancer (per year) were diagnosed. Physicians often decide which treatment to give a patient with the goals of prolonging overall survival, preventing recurrence, and minimizing complications. Generally, Randomized Controlled Trials (RCTs) are used to determine the efficacy of one therapy versus another therapy but are untenable in many situations due to ethical and financial constraints. Recent work has leveraged observational data to develop machine learning models that capture the progression of chronic diseases such as Cystic Fibrosis and Parkinson’s. However, using machine learning to determine treatment efficacy and predict important clinical endpoints, such as overall survival (OS) or progression free survival (PFS), in cancer has not been well studied. This gap in knowledge is due to a lack of benchmark cancer datasets, limited sample sizes for rare cancers, and challenges specific to cancer management, such as tumor heterogeneity within patients leading to differential treatment response. In spite of these challenges, recent methodological improvements in machine learning, such as the use of inductive biases and auxiliary data to improve prediction in data-scarce settings as well as improved treatment effect estimation methods, present an opportunity to test the promise of machine learning in the cancer setting. Therefore, the overarching goal of the proposed work is to develop methods that will enable training of machine learning models that capture the signal in longitudinal, observational cancer data and ultimately improve prediction of clinical endpoints as well as estimation of cancer treatment effects. As a case study and evaluation bed for my development of these methods, I will focus on multiple myeloma, an incurable plasma cell cancer. Aim 1 of this proposal will focus on improving prediction of survival endpoints and depth of treatment response. I will train a latent variable model with a novel learning algorithm that will leverage auxiliary longitudinal data to improve the power of the model, enabling better prediction of clinical endpoints. Aim 2 will tackle the related, yet distinct, question of treatment effect estimation, particularly with respect to different combination chemotherapies. Meta-learner models will be used to estimate average and conditional average treatment effects. A sensitivity analysis framework with clinically-interpretable sensitive parameters will be used to assess reliability of the estimates. Finally, aim 3 will provide a machine learning decision support tool to augment physician decision making in cancer management. A user study will be conducted with the tool to determine if it improves physician assessment of patients. This proposal provides a general methodological framework that can be applied to any cancer dataset and improves understanding of how to effectively use machine learning models trained on observational data to improve care of cancer patients.