Project Abstract
The paradigmatic approach to chemotherapy has been to identify and target driver mutations. However, after
initial response to therapy, many patients develop a recurrent drug-resistant disease leading to high mortality
rates. This resistance may be encoded, driven by somatic mutations, or adaptive, where changes in the
epigenetic programs result in phenotypic plasticity. Critically, the relative contribution of encoded versus
adaptive mechanisms of drug resistance and how these impact therapeutic response is poorly
understood. Advances in single cell multiomics have been crucial for the detection of rare genetic and epigenetic
events that may drive resistance and cannot be observed by bulk sequencing. However, progress has been
limited as most experiments only profile either the encoded (via genome sequencing) or adaptive (via
transcriptome or epigenome profiles) states. Only recently have new techniques made it possible to measure
these modalities from the same cell, or population of cells. This project proposes the development of a new class
of scalable statistical models that will help identify causal determinants of treatment failure in small cell lung
cancer (SCLC) and metastasis in high grade serous ovarian cancer (HGSOC) and gastric adenocarcinoma
(GAC) — all diseases with significant morbidity and low cure rates. These cancers each exemplify components
of intratumoral heterogeneity and its interplay with the tumor microenvironment. Each translational study in this
project generates datasets comprising high-dimensional covariates that require scalable computational methods
to analyze. Machine learning methods are highly scalable but have difficulty with actionable interventional and
counterfactual queries, and do not account for confounding factors — covariates that affect both intervention and
its target. Causal models on the other hand, are designed to account for confounding factors, but do not scale
well. Here, we address these two needs by developing novel computational methods at the intersection of
multiview learning and causal inference. In the K99 phase, the focus will be on developing a causal inference
framework and software to identify the impact of cell intrinsic processes on patient response to therapy, inferred
from high dimensional multiomic single cell data. In the R00 phase, this framework will be extended to focus on
cell extrinsic processes, including profiling the tumor microenvironment and cell-cell interactions. The methods
developed here will be applicable to any type of cancer. Thus, we anticipate that this project will not only improve
our understanding of SCLC, GAC, and HGSOC progression, but have a broader impact on cancer research as
major consortia release similar data to the public. I have put together an interdisciplinary mentorship group with
expertise in genomics, phenotypic plasticity, and causal machine learning. This proposal also details a training
program that will help me successfully achieve the goals of this proposal and transition to a tenure track scientific
career in cancer research.