PROJECT SUMMARY
The adaptive immune system is responsible for the specific recognition and elimination of antigens originating
from infection and disease. It recognizes antigens via an immense array of antigen-binding antibodies (B-cell
receptors, BCRs) and T-cell receptors (TCRs), the immune repertoire. Because of the enormous breadth of
epitopes recognized by immune repertoires, immune repertoires are extremely diverse and dynamic. Advances
in immune receptor sequencing (Rep-seq), such as next generation sequencing, have driven the quantitative
and molecular-level profiling of immune repertoires, thereby revealing the high-dimensional complexity of the
immune receptor sequence landscape. However, the current analysis tools lack the ability to track and examine
the dynamic nature of the repertoire across serial time points or correlate with clinical outcomes. We propose to
use network analysis and formulate a novel ensemble feature selection approach, along with other
advanced machine learning techniques and statistical approaches (e.g., Bayesian nonparametric approach
and shrinkage estimation method), to interrogate and measure immune repertoire architecture longitudinally
and in a clinical context. Network analysis is a powerful approach that can help us identify TCRs sharing antigen
specificity and highly mutable BCR, which can help to develop or improve existing immunotherapeutics and
immunodiagnostics. To integrate gene expression data and scRep-seq data in single-cell setting, we propose to
apply the multitable mixed-membership approach to construct a network to increase the resolution of T and
B cell clusters. In addition, we assess the importance of shared clusters by introducing Bayes factor to
incorporate clonal generation probability and real data abundance. B and T cell responses develop in parallel
and influence one another, thus we will further study how BCR/TCR network properties interact, in addition to
assessing their individual response separately. We will implement the proposed methods on multiple studies to
better illustrate the diversity and richness of the data to demonstrate the flexibility and power of the proposed
tools. These studies are unique and generalizable, because they include three cancer types spanning from
immunogenic to non-immunogenic in both metastatic and localized settings with different
immunotherapeutic modalities. In addition, the proposed methods can be used to study immune response to
diseases besides cancer, including respiratory coronaviruses, such as SARS-CoV-2. Therefore, first, we will
investigate the landscape of bulk Rep-seq changes over serial timepoints for prostate cancer patients who
received Sipuleucel-T and COVID-19 patients. We will develop prognostic/prediction model based on network
properties with clinical outcome/characteristics for durvalumab-treated lung cancer patients to elucidate the
clinically prognostic features of the network as well classify SARS-CoV-2 infected patients from healthy donors.
Moreover, based on unique features of single-cell RNA sequencing, we will classify the immune cells and study
the T and B cell responses to immunotherapy (CD40 agonist antibody) for esophageal and gastroesophageal
junction cancer patients. Furthermore, we will develop bioinformatics software by incorporating the proposed
methods and techniques to tackle the complexity of the immunosequencing data in a translational fashion and
provide a comprehensive platform with user-friendly visualization tools.