Oral and oropharyngeal squamous cell carcinoma (OSCC) together rank as the sixth most common cancer
worldwide, accounting for 400,000 new cancer cases each year. Two-thirds of these cancers occur in low- and
middle-income countries (LMICs). While the 5-year survival rate in the U.S. is 62%, the survival rate is only 10-
40% and cure rate around 30% in the developing world. To meet the need for technologies that enable
comprehensive oral cancer screening and diagnosis in low resource settings (LRS). In the parent R01DE030682
project titled “Multimodal Intraoral Imaging System for Oral Cancer Detection and Diagnosis in Low Resource
Setting”, we have formed an interdisciplinary team with complementary expertise in optical imaging, oncology,
deep learning, technology translation, and commercialization to develop, validate, and clinically translate a
multimodal intraoral imaging system for oral cancer detection and diagnosis. We will achieve the project objective
through three Aims: (1) develop a portable, semi-flexible, and compact multimodal intraoral imaging system; (2)
evaluate the clinical feasibility of the prototyped intraoral imaging system and develop deep learning based image
processing algorithms for early detection, diagnosis, and mapping of oral dysplastic and malignant lesions; and
(3) validate the capability of the prototyped intraoral imaging system for diagnosing oral dysplasia and malignant
lesions.
In our UH3CA239682 project titled “Low-cost Mobile Oral Cancer Screening for Low Resource Setting”, we
have screened ~7,000 high-risk population for oral cancer and obtained at least two pairs of dual-modal images
(white light and autofluorescence) from each patient and obtained more than 28,000 de-identified images and
related information. It is the largest image dataset on oral cancers. With this Administrative Supplements, we will
make the image data AI/ML-ready by improving data compatibility with AI/ML tools, cleaning dataset, balancing
data, reducing uncertainty, improving the interoperability of the data with ontology, and improving trustworthiness
of AI/ML models using pixel-level annotation. We will also demonstrate the use of the transformed data in AI/ML
applications through (1) multi-class oral cancer classification using the transformed multi-modal data and (2)
interpretable and trustworthy AI model using image-level labels and pixel-level annotation.
The image data and machine learning models will be available through The University of Arizona Research
Data Repository (ReDATA). Completion of this project will accelerate development of AI/ML-based techniques
for early oral cancer detection in low-resource settings, reducing morbidity and mortality. It will make data FAIR
(Findable, Accessible, Interoperable, and Reusable) with high impact for open science, contributing to the NIH
vision of a modernized and integrated biomedical data ecosystem. The parent R01 project will directly benefit
from this dataset and the developed AI/ML algorithms as deep learning segmentation based on dual-modal
images will be used to locate the suspicious regions for optical coherence tomography (OCT) imaging.