RNA molecules play fundamental roles in nearly all cellular processes at the level of gene expression and
regulation. Not surprisingly, emerging biomedical advances such as precision medicine and synthetic biology all
point to RNA as the central regulators and information carriers. Recently, realizing the potential of using RNA
to intervene gene expression, scientists successfully developed Onpattro, the first FDA-approved RNA-based
therapy in August 2018.
Understanding RNA function and therapeutic applications requires knowledge about RNA structure. Unfortu-
nately, currently, the number of the known structures is a small fraction of what need to be determined. This gap
has to be closed by computational methods. Furthermore, an RNA molecule is a highly charged polyanion and
positive charges such as metal ions bind to an RNA and for an integral part of an RNA structure. Where and how
metal ions interact with an RNA can directly impact RNA structure and function as well as RNA-drug interactions.
Continuously supported by NIH for over 15 years, we have developed systematic computational tools for the
predictions of RNA structures, folding stability, kinetics, and metal ion effects. These tools have led to fruitful
applications in virology, microbiology, gene therapy, RNA biotechnology, and various RNA-based therapeutic de-
signs. However, despite over decade of efforts, many critical issues in computational RNA biology still remain:
de novo prediction of non-Watson-Crick interactions, structure prediction for large RNAs, effective incorporation
of experimental data such as cryo-EM and NMR data into structure prediction, and modeling of metal ion ef-
fects. In this grant, after 15 years of developing an initio physics-based models, we propose to target the above
and other pressing issues using a fundamentally different approach by systematically developing data-driven
(such as deep-learning) or hybrid data-driven/physics-based simulation methods. The new approaches are mo-
tivated by the increasing amount of experimental data and the pressing need to have more efficient and reliable
computational tools for data interpretation, especially for structure determination experiments. We will use ex-
perimental database, such as RNA-Puzzles database, PDB, EMDataBank, BMRB, for large-scale benchmark
tests, and biochemical and NMR data collected by our well-established collaborators for in-depth and interactive
information about various experiments such as HCV genomic RNAs and HIV PBS systems. Our goal, if success-
fully accomplished, will immediately impact experiments such as structure determination, including cryo-EM and
NMR-based structure determination, identification of metal ion sites, and rational design of RNA structures for
therapeutic applications.