An innovative integrated computational framework using gene signatures for patient stratification - Project Summary/Abstract Cancer is a very heterogeneous disease with each patient being driven by a specific set of genomic aberrations. As such, personalized treatment has been intensively investigated as a promising strategy for further improving patient prognosis. To aid personalized treatment, both genomic and expression- based biomarkers have been investigated. Somatic mutations and amplification/deletions of genes, especially driver genes, have been used to predict cancer prognosis and to preselect patients for targeted treatment. Despite some successful examples, the overall effectiveness of these genomic biomarkers remains unclear. Similarly, many gene expression-based biomarkers have been proposed, but only a few of them are translated into clinical applications. In this project, we propose a new strategy: develop an innovative statistical framework that integrates genomic and transcriptomic data to define gene signatures by modeling the quantitative relationships between genomic aberrations and gene expression alterations. These signatures recapitulate the downstream oncogenic pathways underlying driver genomic events, and importantly, can capture pathway de-regulation caused by other mechanisms. We will use this framework to leverage a vast amount of existing cancer data created from previous studies. Specifically, we will utilize the TCGA, ICGC and TARGET data to define a comprehensive list of gen signatures to characterize all driver genomic aberrations in 6 cancer types, including lung, breast, and pancreatic cancer, glioblastoma, melanoma, and acute myeloid leukemia. These gene signatures will then be combined to build integrative models to predict clinical outcomes, including patient prognosis and sensitivity to therapeutic treatment. We will further incorporate immune infiltration scores and clinical factors to maximize the prediction power of these models. Following that, we will utilize a collection of 85 cancer datasets with matched gene expression profiles and survival information to develop prognostic prediction models. Outputs from these models can be used to stratify patients for advising personalized treatment. In line with our long-term research interest, we will integrate in-house and existing lung cancer data to develop an optimized model for predicting post-surgical recurrence risk of patients with early-stage non-small cell lung cancer. The resulting software, source code, gene signatures, prediction models and other resources from this project will be released in a timely manner. These resources will benefit a broad scientific community in the filed of basic and translational cancer research.