PROJECT SUMMARY / ABSTRACT
The rapid advancement of artificial intelligence (AI) and machine learning (ML) has led to significant
breakthroughs in molecular structure modeling, particularly in accurately predicting protein structures.
However, the prediction of RNA tertiary structures faces challenges due to limited experimentally determined
RNA 3D structures and the absence of AI/ML-ready data for training advanced algorithms. The recent Critical
Assessment of Protein Structure Prediction (CASP15) competition revealed that traditional motif-based
approaches outperform deep-learning-driven methods in RNA 3D structure modeling. Nevertheless, traditional
methods struggle when applied to RNA molecules not well-represented in their template libraries. To overcome
this limitation, there is a need to advance ML-driven RNA structure prediction methods that can effectively
capture the relationship between nucleotides and structural motifs using extensive RNA sequence data. The
integration of RNA motif-based features with advanced AI/ML algorithms shows promise in enhancing RNA
structural analysis and prediction accuracy. To facilitate this advancement, this proposal aims to develop an
automated RNA motif structure parsing pipeline that generates motif-based feature datasets, supporting AI and
ML-driven RNA structural analysis. This dataset will facilitate the training of cutting-edge ML algorithms and
enable diverse RNA structure analysis applications.
Specific objectives are:
Aim 1: To develop an automated motif-based feature generation framework for improved RNA structure
prediction with machine learning.
Aim 2: To develop open-source computational workflows for RNA structure analysis using the AI/ML-ready
features.
Aim 3: To enhance the sequence-structure relationships in full-length RNA folding using RNA motif features
with open-source AI/ML algorithms.
The proposed AI/ML-ready features will facilitate various computational workflows for RNA structural analysis,
including RNA motif clustering, the identification of RNA motif 3D interactions, and cryo-EM modeling for 3D
structure prediction. Additionally, this proposal will provide preprocessed datasets and ML pipelines on
platforms to encourage community engagement and collaborative research efforts. These initiatives will
strengthen the research and education environment at Saint Louis University, promoting interdisciplinary
collaboration and preparing students from diverse backgrounds to tackle future challenges in intelligent RNA
structure analysis. This research aligns with the mission of the NIH NIGMS and the objectives of the AREA
grant program by addressing the need for improved RNA structure prediction methods through AI and ML-
driven approaches, thereby offering valuable resources for research and education in this field.