Advancing Protein Isoform Analysis through an Integrated Computational Framework. - PROJECT SUMMARY Over 90% of human genes undergo alternative splicing, generating numerous transcripts or isoforms with distinct functions for each gene. This highly regulated process is often disrupted in cancer, leading to the production of harmful protein isoforms that contribute to tumor growth, survival, metastasis, and immune evasion. Accurately identifying these aberrant isoforms is essential for understanding cancer biology and developing targeted therapies. While RNA sequencing has advanced our understanding of alternative splicing in cancer, the study of protein isoforms at the proteomic level is still in its infancy. Advances in mass spectrometry (MS)-based shotgun proteomics have enabled unbiased identification and quantification of more than 10,000 protein coding genes from biological samples. However, because shotgun proteomics data analysis involves searching MS spectra against a reference protein database to identify peptides, and most identified peptides map to multiple protein isoforms, it remains challenging to accurately identify and quantify known protein isoforms in the reference database. Furthermore, novel isoforms absent from the reference database cannot be identified. Efforts have been made to address these challenges, such as the development of our recently published SEPepQuant algorithm, but integrating known and novel protein isoform identification and quantification into routine cancer research remains elusive. This integration faces two primary hurdles. Firstly, there are software-related obstacles in protein isoform identification and quantification, including the lack of user-friendly tools, insufficient benchmarking to guide software selection, difficulty in result interpretation, and compatibility issues with rapidly evolving MS technologies. Secondly, even with advanced software, its usage is confined to bioinformaticians, limiting direct benefits for most biologists and clinicians to gain insights into protein isoforms from the wealth of public cancer proteomic datasets. The overarching goal of this project is to overcome these hurdles by developing an integrated computational framework for shotgun proteomics- based protein isoform analysis and facilitating the dissemination of protein isoform data through accessible platforms. Leveraging the complementary expertise and unique resources of our investigator team, we will achieve this goal through three specific aims: Aim 1) Further develop SEPepQuant to improve robustness, interpretation, compatibility, and proteogenomic integration; Aim 2) Create a unified framework for protein isoform quantification and conduct systematic benchmarking; and Aim 3) Democratize protein isoform analysis for ordinary biologists and clinicians. Through these aims, we will create new informatics technologies to advance protein isoform analysis in cancer research, enhancing our understanding of protein isoform regulation, and facilitating the identification of dysregulated isoforms as potential biomarkers and therapeutic targets.