PROJECT SUMMARY
This proposal outlines a five-year research and career development program aimed at building computational
frameworks for understanding the phenotypic effects of perturbations and somatic alterations in cancer. The
application is heavily based on the candidate’s extensive PhD training in Carnegie Mellon University’s world-
renowned Computer Science Department. It is also grounded in the candidate’s rich prior experience working
as an Associate Computational Biologist at the Broad Institute, and his large network of top-level physicians
and scientists in the cancer field. It also leverages his current postdoctoral appointment under Dr. Gad Getz at
the Broad Institute, and the unique set of resources, facilities, collaborations and expertise in this institute.
Along with a series of relevant didactics and career building activities, these studies will form the basis of his
transition to an independent tenure track position as a scientist guided by the goal of enabling long-term
modeling and understanding of cancer as a disease. The large-scale availability of next-generation sequencing
data for cancer has offered an unprecedented characterization of somatic changes that happen in this disease.
Understanding their combinatorial phenotypic effects is still an open problem, and powerful in vitro perturbation
protocols have been designed to experimentally probe these effects. However, the search space for possible
combinations of perturbations to screen is prohibitively large. The objective of this work is to provide principled
Artificial Intelligence (AI)-driven methodology for inferring the effects of perturbations and observed somatic
alterations in cancer, a crucial step in understanding the mechanisms. The proposed work draws on recent
development in the technical fields of machine learning and causal discovery. In particular, two Specific Aims
will be evaluated: (Aim 1) inferring causal graphs from single-cell RNA-seq (with the option of pairing it with
whole-exome/whole-genome sequencing); (Aim 2) using a deep generative model, along with paired whole-
exome/whole-genome sequencing, to learn latent underlying factors of variation in single-cell RNA-seq. The
proposed work also includes steps to validate these computational aims. When completed, this work will
advance the field via algorithms/resources that can be used to: (1) use causal knowledge to computationally
select combinations of targets to test in the lab; and (2) computationally infer the effects of somatic DNA
alterations of interest on expression, leading to improved downstream experiment design. Therefore, put
together, the proposed aims are a crucial step in understanding mechanisms in cancer, and will lead to
significant progress towards efficiently discovering drugs for this disease.