PROJECT SUMMARY
An RNA sequence with all its diverse modifications constitutes ‘true’ information content of the RNA. Defects in
RNA modifications account for >100 human diseases, such as breast cancer, type-2 diabetes and obesity,
affecting millions of Americans. Despite its significance, the true sequence of a RNA, i.e., identity and location
of each and every nucleotide building block (modified or not) within a full-length RNA, remains a mystery, mainly
because of the lack of a general method to directly sequence any nucleotide, especially modified nucleotides
(including unknown ones) at single-nucleotide resolution. No existing technology can sequence all modifications
simultaneously to unfold the true RNA sequences at a large scale or the transcriptomic level.
What complicates RNA modification studies is that >170 modification types have been discovered, and
not all of nucleotide modifications are modified completely to 100% at their RNA sites. They are even
undetectable by NGS-based technologies, which require the conversion of RNA to cDNAs that do not have any
modification information. Tools to map RNA modifications are limited only to a few popular modifications, and
can usually analyze only one modification type at a time. Mass spectrometry (MS) is currently the only technique
that can characterize all RNA modifications; however, conventional MS methods lose information regarding the
location and co-occurrence of modified nucleotides.
To resolve these outstanding issues, we have recently developed a series of novel next generation mass
spectrometry-based sequencing (NextGen MassSpec-Seq) approaches that can de novo directly sequence
tRNAs without a cDNA and can sequence and quantify all nucleotide modifications simultaneously. For the
duration of this proposal, we will further develop NextGen MassSpec-Seq to sequence tRNAs efficiently in
different cellular and even disease conditions, make it scalable toward high throughput, and expand its
application to simultaneously sequence and map all modifications quantitatively on any RNA type and at the
transcriptomic level. Specifically, we propose to develop MS for large-scale de novo sequencing of full-length
tRNAs, together with all diverse nucleotide modifications (Aim 1), empower MS to simultaneously sequence and
quantify multiple RNA modifications, allowing quantitative mapping at single nucleotide and stoichiometric
precision (Aim 2), scale up NextGen MassSpec-Seq and combine it with high-throughput NGS sequencing for
direct sequencing of diverse RNA modifications at the transcriptomic level (Aim 3). Our tool will address a long-
standing issue of how to reveal the ‘true” RNA sequences and provide a transformative tool for studying RNA
modifications, which will promote better understanding of functions of post-transcriptional modifications and their
correlations to RNA-related diseases and pandemics.