Machine learning for identifying antigen-antibody interactions from massive sequencing data - Project Summary We aim to leverage machine learning to predict antigen-antibody interactions from massive sequencing data, for elucidating the roles of humoral immunity in various biological contexts, for discovery of antibodies of therapeutic values, and for development of diagnostic tools for immune-related diseases. Existing experimental methods for profiling antibody-antigen interactions are costly, time-consuming, and low-to-mid throughput, necessitating the need for AI-driven predictions. However, existing bioinformatics tools mainly optimize antibodies given antigen targets, whereas de novo detection of antibody-antigen interactions requires a different approach. Luckily, recent scientific developments have provided opportunities to solve this problem of fast, cheap and accurate detection of antigen-antibody interactions: (1) High-throughput sequencing technologies, like Libra-Seq, Beam-B and TRAPnSeq, have provided abundant antigen-antibody pairing data for training deep learning models. (2) The emergence of protein structure prediction models like AlphaFold3, RoseTTAFold and ESMFold have provided enabling weapons. (3) The introduction of multiplexed scRNA-seq/scBCR-seq data has provided another layer of complementary evidence to enhance the prediction of antigen-antibody interactions. We build on these new technologies/data, and our prior achievements and unique expertise in the relevant fields. We plan to develop and validate deep learning models to accurately identify antibody-antigen interactions and the binding epitopes on antigens from massive sequencing data (Aim 1). A complementary approach to integrate the transcriptomics of B cells with the BCR (B cell receptor, antibody) sequences is also taken to distill the gene expression information for enhancing the accuracy of antibody-antigen predictions (Aim 2). We seek to identify PD-L1-based PD-1 agonistic antibodies for autoimmune diseases, using the tools that we previously developed and those that are to be developed in this grant (Aim 3). This case study of PD-L1 antibodies has its own significance and innovation, but will also help validate and tune the tools to be developed in Aim 1 & 2. The multidisciplinary team includes experts in bioinformatics, computer science, and immunology, each responsible for different aspects of the research project. In particular, our prior research demonstrates expertise in deep learning of protein structures, development of novel therapeutic antibodies, and methodological development for single cell sequencing data analysis, which are essential for the success of this proposal. The impact of the research includes providing powerful tools for understanding the roles of B cells and BCRs in immune-related diseases, reducing the time and cost of identifying therapeutic antibodies (e.g. PD-L1-based PD-1 agonistic antibodies), aiding in vaccine development, and the derivation of BCR-based diagnostic and prognostic biomarkers for various diseases involving humoral immunity. We will disseminate our tools through workshops and webportals. Overall, the project addresses major unsolved problems in the field of B cell informatics that recently become solvable due to the advance of science and our latest works in this field.