Structure-Function-Aware Large Protein Language Models for Enhanced Biomedical Applications - Large protein language models have shown their foundational role in biomedical research. However, two challenges are roadblocking their broad applications: (a) the absence of critical knowledge about protein structure and functions in the models, and (b) the lack of efficient approaches to adapt a trained protein language model. To address the two challenges, we propose to develop protein language models with knowledge of protein structures and functions, and adaptation methods that can provide accurate predictions for protein properties. The goal is to develop and validate the structure-function-aware large protein language models (SF-PLM) that could be adapted to operate challenging biomedical research tasks using few-shot learning. We hypothesize that (a) multi-view contrastive learning can fuse 2D/3D structural information into 1D representation, (b) well-developed reinforcement learning can align a large protein language model with the related function annotation, and (c) prompt tuning can realize a few-shot learning process to adapt the trained models to specific biomedical tasks. Inspired by the hypotheses, we develop three Specific Aims to help achieve the proposal's goal. Aim 1: Develop large protein language models aware of 2D and 3D structures using multi-view contrastive learning. We will develop the encoders for the protein 1D, 2D, and 3D structures; optimize the model training procedure and contrastive loss functions, and validate and select the developed models using structure-oriented downstream tasks. Aim 2: Develop a reinforcement learning- based method to align knowledge of protein functions with the structure-aware large protein language models. We will start by developing an initial policy model, further develop the reward model and proximal policy optimization to align the trained large protein language models and validate and select the aligned large protein language models. Aim 3: Develop prompt technologies and tools to adapt structure-function-aware large protein language models for downstream tasks. We will develop prompt tuning to adapt the trained protein language models for antimicrobial peptide design and predict the targets and phosphorylation strengths for polo-like kinase 1 (PLK1), an overexpressed kinase in cancer cells. We will build utilities to enable the community usage of the prompt tuning. The success of the proposed research will lead to (a) the development of novel large protein language models aware of structures and functions, (b) prompt-based efficient adaptation of trained large protein language models for downstream tasks, (c) several novel antimicrobial peptides, (d) a list of predicted substrates and their phosphorylation strengths of PLK1, and (f) a library of Python code that enables the development of the pre-trained protein language models and efficient prompt tuning. These outcomes will provide and validate fundamental deep learning tools for biomedical research. The outcome (c) and (d) will further enhance biomedical research in bacterial resistance and cancer treatment.