Project Summary:
Novel deep learning frameworks for predicting nucleosome-binding proteins
Nucleosome-binding proteins (NBPs) such as pioneer transcription factors play an important role in cell fate
changes during organogenesis, cell differentiation, and reprogramming. The ability of nucleosome binding is
encoded within their DNA-binding domains. However, experimental identification of the NBPs is expensive and
time-consuming with low sensitivity. The proposed research aims to develop novel deep learning frameworks to
predict the NBPs. In Aim 1, we will develop a novel two-stage transfer learning framework. In the first stage, the
knowledge of the protein language model ProtT5 will be extracted as feature embeddings to represent the amino
acid sequences of the proteins. The information from the 1-D sequences will be combined with that from their 3-
D structures, which will be processed by a state-of-the-art Graph Neural Network (GNN) for 3-D molecules. In
the second stage, the GNN model will be first initialized by annotated DNA-binding proteins (DBPs) and then
fine-tuned on a NBP dataset. In Aim 2, we will design and implement a novel transfer learning-continual learning
framework to predict the NBPs in a species-specific manner. The transfer learning module initializes a novel
Bayesian online inference model with the knowledge acquired from the DBP prediction model of Aim 1 through
a prior for generative probabilistic neural networks. The continual learning module based on probabilistic
inferences will be able to constantly update learned neural network parameters with the DBPs of a given species
without forgetting the previously acquired knowledge. Lastly, the trained DBP prediction model will be fine-tuned
to make predictions of species-specific NBPs. Overall, this proof-of-concept study aims to gain technical
knowledge on the proposed frameworks and evaluate their feasibility for predicting the nucleosome binders.
These frameworks, if successful, will have a potential to broadly advance biomedical research by being
generalized to other types of intermolecular interactions, e.g., protein-RNA interactions and protein-protein
interactions.