Project Summary
Aberrant RNA base modifications have been correlated with the development of major diseases including breast
cancer, type-2 diabetes, obesity, and neurological disorders, each affecting millions of Americans. However,
these modifications are undetectable by current high-throughput RNA sequencing technologies, which do not
directly sequence RNAs, but instead sequence cDNAs that only contain the four canonical deoxynucleotides.
Other tools to sequence nucleobase modifications in RNA are usually tailored for a single specific modified
nucleotide and cannot provide single-base-resolution spatial information for modifications. Thus, very few of the
over 160 identified RNA modifications have been studied. To better understand RNA with its rich modifications,
we have been developing a mass spectrometry (MS)-based 2-dimensional hydrophobic end-labeling sequencing
strategy (2-D HELS MS Seq) as: 1) a de novo and accurate method to directly sequence RNA and 2) a general
method to sequence all base modifications in any RNA type at single-base resolution. The method can currently
sequence purified or mixed samples of short synthetic RNAs and simultaneously identify, locate, and quantify
the frequency of a specific modification in a population. In this proposal, we focus on improving read-length,
throughput, and sensitivity to sequence rare RNA modifications, quantify post-transcriptional base modifications,
and detect active isoforms of mixed cellular RNA samples. We propose to (a) de novo MS sequence specific
and total cellular tRNA (<100 nt) as proof-of-concept examples (Aim 1), (b) de novo sequence complex
endogenous RNA samples (up to 100 strands, 950 nt per run) (Aim 2), and (c) quantify genome wide post-
transcriptional RNA modifications in metabolic disease models (Aim 3). This project is highly significant as
successful accomplishment of the proposed work will 1) bring the power of MS-based laddering technology to
RNA, thus providing a method comparable to analysis of peptide modifications in proteomics, that can reveal the
identity and position of various RNA modifications, 2) allow direct and de novo RNA sequencing without cDNA
synthesis, and 3) allow accurate reading of multiple base modifications at single nucleotide resolution in one
experiment without prior knowledge of sequences and modifications, helping to address a long-standing unmet
need in the broad field of epitranscriptomics. Our tool will promote better understanding of functions of post-
transcriptional modifications and isoforms including their correlations to human diseases; we will develop the
method into a gold standard for verifying other techniques for sequencing and annotating genome-wide base
modifications, thereby helping to build more accurate and inclusive reference epitranscriptomic databases.