ABSTRACT
Recent years have witnessed a dramatic rise in interest towards cancer epitopes in general, and neoepitopes
that encompass mutations arising in a given tumor in particular. Current lines of research examine how the
epitope load in a given tumor relates to the success of checkpoint blockade treatments, and how to utilize
epitope-based vaccines and adoptive transfer of epitope-specific T cells for personalized therapies. For these
purposes, neoepitopes that are recurrently recognized in different individuals are of particular interest, which has
also re-ignited interest in epitopes identified in classic tumor-associated antigens. Along with the interest in
cancer epitopes, there is also interest in the TCRs and BCRs specifically recognizing them, as these have the
potential to be used in therapeutic approaches, and they can aid in basic studies to infer the specificity of T cells
or B cells characterized in single cell sequencing data. This resurgence of interest in epitopes has created a
need to catalog and make accessible to the scientific community all epitope data, also linked to the biological,
immunological, and clinical contexts. The ultimate goal is to come “full circle” and link epitope recognition and
immunological readouts to clinical outcomes and treatment strategies alike. In parallel, there is an urgent need
to develop resources for epitope prediction and analysis tools that provide access to predictive strategies and
provide objective evaluations of their performance in the relevant biological, immunological, and clinical contexts.
Recent years have also witnessed the publication of multiple original methodologies that reported sometimes
impressive gains in the predictions of cancer epitopes. However, several of these studies were difficult to
evaluate, because the methodologies and/or datasets were not fully available in a format that was readily
executable. As a result, their performance could not be properly benchmarked on independent datasets. This is
also because effective benchmarking on independent datasets requires the assembly of novel datasets of
sufficient size and diversity. To overcome all of these information technology challenges, we propose to design
and implement the Cancer Epitope Database and Analysis Resource (CEDAR), which will provide a freely
accessible, comprehensive collection of cancer epitope and receptor data curated from the literature, and provide
easily accessible epitope and TCR/BCR target prediction and analysis tools. As the cancer epitope data are
curated, they will be used as a transparent benchmark of how well prediction tools perform, and also to develop
new prediction tools for the analysis resource component of CEDAR. CEDAR will leverage our expertise from
developing the Immune Epitope Database and Analysis Resource (IEDB), which is fully operational and widely
used by researchers globally. CEDAR will directly complement other projects currently funded through the NIH
ITCR program that provide resources and tools related to cancer omics data. Finally, we will engage in outreach
activities to improve functions, user interfaces, and interoperability with other ITCR tools and promote the use of
CEDAR in cancer research.