PROJECT SUMMARY
The processing of antigens through proteolytic degradation and the recognition of epitopes is central to the
body’s ability to combat pathogens, like viruses, through discriminating self from non-self. As a result, there
has been substantial research effort aimed at determining the outcomes of these processes for novel
pathogens to enable epitope-driven vaccine design. There has also been great interest at the intersection of
immunology and personalized medicine in identifying subject (host) specific epitopes, as these have great
promise in the treatment of allergies and cancer where the distinction between self vs. non-self becomes
blurred. Computational methods have emerged as promising approaches for identifying (predicting) epitopes
that elicit a robust immune response given genetic information for an antigen. This is a very challenging task,
which is compounded further due to the existence of uncertainty caused by genetic variability between
pathogen strains, as well as, from individual to individual. Following this logic, it is also clear that using animal
models in evaluating the immune response elicited by epitopes can often have limited predictive value, since
sequence differences between a model species and humans can result in significantly different outcomes in
terms of the peptides formed during antigen processing and epitopes recognized by immune cell receptors.
Accordingly, there is an unmet need for computational tools that can predict the outcomes of antigen
processing and epitope recognition in a host-dependent fashion, where the models take as input both antigen
and host-specific genetic data. We propose the development of computational tools in three related areas to
meet these needs: i) Prediction of peptides formed through antigen processing; ii) Prediction of epitope
recognition by MHC molecules and T-cell receptors; and iii) Probabilistic analysis of epitopes most likely to
elicit an immune response. In the proposed work, molecular modeling and machine learning will be used to
develop accurate models of antigen processing and epitope binding to MHC molecules and T-cell receptors.
Molecular models will first allow us to identify key interactions between the antigen and immune system
proteins, which when coupled with statistical data can allow us to understand how mutations would affect those
interactions. The statistical analysis of the effects of mutations will be applied to large publicly available
datasets to sufficiently capture the effects of mutations on antigen processing and epitope recognition and will
ultimately be incorporated into machine learning models. The proposed probabilistic models will apply a
scenario-driven approach for capturing uncertainty in epitope generation and recognition. We will sample
potential antigen and human sequences based on known distributions of mutation prevalence to measure the
likelihood that an identified epitope will be generated and elicit a robust immune response. The proposed
computational tools, if successful, could have substantial impact on the areas of epitope-driven vaccine design,
including personalized cancer vaccines, and the identification of allergy related epitopes.