m6A-suite: an informatics pipeline and resource for elucidating roles of m6A epitranscriptome in cancer - Project Summary N6-methyl-adenosine (m6A) is the most abundant mRNA methylation in mammalian cells. Emerging evidence has linked m6A with cancer phenotypes in many cancers, spurring a surge of research in studying m6A and cancer biology. However, dysregulation of m6A effector writers, erasers, and readers and reprogramming of m6A sites are poorly characterized. How different modes of m6A-regulation of gene expression mediate the downstream cancer pathways and phenotypes is mostly missing. We have developed several widely used informatics tools for m6A peak detection, differential m6A analysis, and functional predictions for m6A targets from MeRIP-seq m6A profiling data. Using these tools, we worked together with cancer biology collaborators to reveal reprogramed viral and host m6A epitranscriptome in cells infected by the oncogenic virus KSHV and discovered a cross-talk between m6A writers, erasers, and readers to regulate cancer growth and progression. However, the fast-moving m6A and cancer research poses many unmet informatics challenges. Among them, the ability to accurately identify single-base m6A sites and predict key m6A regulatory mechanisms from profiling data is seriously lacking. Also, a comprehensive database that catalogs and enables queries of where, what, and how of m6A methylation and function in normal and cancer conditions is highly desirable. To address challenges, we propose to develop m6A-Suite, an informatics toolbox, pipeline, and resources to facilitate the mechanistic study of m6A in cancer. A key obstacle to developing tools in m6A-Suite is a lack of large, high- quality training datasets. Toward this end, we have collected 1,113 human and 680 mouse MeRIP-seq samples from cancer cell lines, tumors, and normal tissues and identified >4M m6A peaks. In parallel, we have also collected 194,060 single-base m6A sites in 9 cell lines and 3 tissues. We propose to leverage this data to construct the highly desirable training datasets. Using these datasets, we will develop efficient and accurate tools for single-base m6A detection and quantification from MeRIP-seq and nanopore data (Aim 1), enable the prediction of m6A-mediated RNA decay and splicing (Aim 2), and establish the comprehensive, queriable m6A- KB knowledgbase to catalog these predictions in an extensive collection of public MeRIP-seq and nanopore data in cancer and normal cells, and tissues in diverse conditions(Aim 3). We will systematically test and evaluate these tools within this project and through many established and emerging collaborations inside and outside the ITCR consortium. We will make the tools and data freely available to the research community and constantly seek feedback from the collaborators and users for improvement. Given the emerging nature of m6A and cancer research, the addition of these tools to the ITCR program will positively impact this important, fast-growing, new area of cancer research.