Proteogenomic translator for cancer biomarker discovery towards precision medicine - PROJECT SUMMARY The goal of our PGDAC is to improve our understanding of the proteogenomic complexity of tumors. Towards this goal, our First Aim is to apply multiomics and network based system learning to reveal causative molecular regulatory relationships contributing to varieties of phenotypes in cancer using CPTAC proteogenomic data. We will start with rigorous preprocessing and quality control using a pipeline tailored to MS-based proteomics data to detect and correct batch effects, outliers, sample labeling errors, as well as to impute missing values (Aim 1.1). We will then utilize novel statistical tools to jointly model ≥6 types of omics data to systematically characterize functional impact of DNA alterations (such as DNA mutations, CNA, and methylations) (Aim 1.2). Such cis-/trans-regulatory networks will help us to elucidate how protein or pathway activities are shaped by genomic alterations in tumor cells. We will also construct protein/PTM co-expression networks based on global-, phospho-, glyco- and other PTM-proteomics data (Aim 1.3). When constructing these networks, we will use and create advanced computational tools to effectively borrow information from literature, publicly available open databases, and transcriptome profiles. Moreover, we will study cell type composition from bulk tissue using novel multi-omics deconvolution analyses, and identify immune subtypes with distinct immune activation or evasion mechanisms (Aim 1.4). Furthermore, we will perform comprehensive investigation of kinase and transcription factor activities by leveraging publicly available data extracted and processed from many regulatory network databases (Aim 1.5). All Aims 1.2-1.5 will contribute to a large collection of functionally related protein/PTM sets, co-expression network modules, immune signatures, as well as kinase/TF activity scores. These features and feature-sets will then be tested for their associations with disease phenotypes (Aim 1.6). For all analysis tasks in Aim 1, we will derive an integrated view of commonalities and differences across multiple tumor types via Pan-Cancer analyses. Our Second Aim is to further develop methods, software, and web-based tools to optimize the data analyses of our PGDAC. We will develop novel statistical/computational tools; implement these methods as computationally efficient and user- friendly software; and construct an integrated data analysis pipeline (Aim 2.1). We also plan to develop a set of web-based services for querying, visualizing, and interpreting analysis results from CPTAC studies (Aim 2.2). Our Third Aim is to nominate novel protein-based cancer biomarkers and drug targets for further investigation by targeted proteomics assays. We will first apply machine-learning-based prediction models on features and feature-sets from Aim 1 to identify protein biomarkers that predict disease outcome, treatment responses, and therapeutically distinct disease subtypes (Aim 3.1). We will also query disease related gene, protein, and PTM signatures against function perturbation databases, such as the LINCS L1000 database, to prioritize small molecules and drugs that could be tested for attenuating tumor growth or treatment response (Aim 3.2).