Integrative Approaches to Study Cell-Type-Specific Protein Dysregulation in Human Diseases - Project Summary/Abstract Complex diseases are driven by intricate interactions among various cell types present at the disease sites. Recent advancements in single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics have enabled cell-type-specific characterization of molecular phenotypes at the RNA-level. However, systematic profiling of the protein-level dysregulation for different cell types remains a technical challenge. Addressing this unmet need is critical for developing and optimizing therapeutic strategies, as proteins constitute the majority of clinical biomarkers and druggable targets. In recent years, advanced mass spectrometry (MS)-based proteomics has been applied to primary bulk disease samples to detect genome-wide protein expression and modifications. Moreover, by simultaneously profiling the disease samples via next generation sequencing (NGS), a new multiomics field, “proteogenomics”, has enabled deeper understanding of biological regulations at different molecular levels. Our previous studies have demonstrated the values proteogenomics data in understanding the functional consequences of genetic abnormalities and elucidating signaling pathway cascades. However, due to the moderate sensitivity of MS and the high requirement of NGS read depth, proteogenomics data are mostly generated from bulk disease samples and lack firsthand information about constituent cell types. In this project, our goal is to develop innovative computational approaches to infer the cell-type-specific protein phenotypes from MS-based proteogeomics data. First, for canonical proteins annotated by public databases, we will develop a data deconvolution pipeline to infer their cell-type-specific expression and active modifications. Second, we will create a proteogenomics pipeline to identify noncanonical peptides and proteins and map them to their cell type of origin. Of note, our computational approaches will address the unique challenges in MS proteomic data quantification and leverage the integrative nature of the proteogenomics data analysis. We will create two data portals, CellPathDb and cskAtlas, to host our methods and inferred results from current proteogenomic datasets. These data portals will enable interactive queries of cell-type-specific protein phenotypes for experimental biologists and clinical researchers.