ABSTRACT
This application validates a minimally invasive, multi-stage artificial intelligence (AI)-based cytologic, histologic and
epigenetic biomarker to identify oral premalignant lesions (OPL) with high risk of progression to oral cavity squamous cell
carcinoma (OSCC), using a massive existing data set and a prospective study in racially and socioeconomically diverse OPL
patients. OSCC patients suffer from a 5-year mortality rate of 40%, accounting for one death per hour. Up to 10% of the
U.S. population has oral lesions, of which a small proportion are high-risk OPLs that transform to OSCC. Major challenges
exist in monitoring and risk stratifying these OPLs. While grade is used to recommend treatment, its prognostic value is low.
There is currently no reliable clinical, histologic or molecular marker to determine individual risk in patients with the same
dysplasia grade. The quality of OPL grading in hematoxylin eosin (HE) stained slides is based on the availability of a
surgeon and pathologist, typically absent in resource-constrained locations. Currently, noninvasive sampling that can be
used in settings with restricted access to care, have not been validated to replace tissue diagnosis. Herein we design a staged
approach to diagnosing and monitoring OPLs, using cytology, histology, and epigenomics in a step-wise fashion in order to
minimize diagnostic invasiveness. Our approach will automate and improve prognostication of OPL risk by using deep
learning. Our central hypothesis is that histologic and molecular patterns within OPLs can be risk-stratified using deep
learning to individualize prognosis in patients with the same apparent OPL grade. We test our hypothesis through a series
of scientific aims, which taken together, create a paradigm shift in management of OPLs by establishing a layered strategy
that escalates the complexity of the diagnostic test (from brush swabs to surgical biopsy) with escalating cancer risk. Our
study proceeds with the following 3 aims. 1) Train deep learning based digital pathology models for oral premalignant
lesion progression risk prediction. We will use a longitudinal cohort with known cancer outcomes to train deep learning
models using cytology and histology, respectively, to predict risk of progression to OSCC. 2) Validate and merge cytology
and histology with epigenomic signatures to create the multi-stage, multi-modal PROSPECT score using multiple cohorts.
We will refine the digital cytology and histology biomarkers in a separate existing validation cohort with known cancer
outcomes. Next, we will use brush swabs to predict biologically relevant epigenomic alterations in the transition from OPL
to OSCC. We will create the PROSPECT score (Premalignant Oral Lesions Pathology and Epigenetic Risk Prediction Tool),
which is a risk score that combines the cytologic, histologic, and epigenetic scores sequentially with clinical information to
predict risk of cancer progression. 3) Test the PROSPECT score in a prospective, multi-institutional clinical study of OPL
patients enrolled in geographically and racially diverse populations. We will refine our PROSPECT score to perform
robustly in brush biopsies and tissues from a separate prospective cohort of OPL patients, which will be recruited during the
course of this study from four clinical sites, where the patient populations possess diversity in race, socioeconomics, and
health disparities indices. Testing of the PROSPECT score in this prospective cohort will set the stage for a large-scale
clinical study to use non-invasive brush swabs to monitor OPL with higher accuracy than current clinical standards.