Prediction of nearest neighbor parameters for folding RNAs with modified nucleotides - Natural and synthetic RNAs play key roles in cellular function, biotechnology, and medicine. RNAs fold into
intricate structures, which often drive their functions, thus determining RNA structure is fundamental to biology
and biotechnology. Computational thermodynamics-based secondary structure modeling (TSSM) is a popular,
low-cost, and rapid approach to structure prediction, which has enabled transcriptome-wide structure-function
studies and massive structure-based screens of synthetic RNA libraries. However, recent evidence suggests
that a diversity of post-transcriptional chemical nucleotide modifications additionally exert profound impact on
local and/or global structure, to ultimately modulate the RNA’s stability, expression, or regulatory function. Such
modifications are widespread in all life domains and represent a new and poorly understood layer of gene
regulation, which has been implicated in disease. Moreover, they are routinely introduced into RNA medicines
as a means of evading the innate immune response. Taken together, the wealth of natural modifications and
development of novel artificial ones, the growing interest in their mechanism, and their centrality to RNA medicine
underscore a pressing need to determine structures of RNAs with modified nucleotides rapidly and accurately.
However, TSSM methods cannot account for the effects of modifications due to a lack of parameters to estimate
their folding stabilities. They rely on the feature-rich Turner nearest-neighbor (NN) thermodynamic model, which
is parameterized by 294 free-energy change values derived for canonical bases from 802 costly and laborious
UV melting experiments. Given the diverse and rapidly expanding pool of modifications, it is impractical to repeat
such experiments for each type. The premise of this proposal is that NN parameters can be learned more
efficiently from alternative experiments, which are affordable, widely accessible, and high throughput.
Specifically, next-generation sequencing has transformed RNA Structure Probing (SP) into a routine massively
parallel experiment, which reports structural information about local nucleotide dynamics. SP is widely used to
gain insights into RNA structure and function from genome-wide studies and to constrain TSSM algorithms to
improve their predictions. However, unlike melting assays, the relationship between RNA folding stability and SP
measurements is highly nontrivial, and thus the problem of recovering the parameters from SP data is difficult.
The goal of this proposal is to develop novel algorithms and software to estimate NN parameters from
high-throughput SP data. We will design statistical inference methods that reconcile information from folding
algorithms and SP experiments and apply them to data for unmodified and modified RNAs to estimate new
parameters for modified nucleotides. As the link between SP data and folding thermodynamics is complex, and
furthermore, the ability to fit the Turner parameters from SP data has not been explored, we will assess the
feasibility, accuracy, performance, and computational efficiency of the developed methods. Validation
efforts will include comparing to experimentally derived values and evaluating predictions over held-out data.