PROJECT SUMMARY/ABSTRACT
Human papillomavirus positive (HPV+) oral cancer (OC), accounting for over 70% of oropharyngeal cancer
cases in North America and Europe, was found to be more aggressive with a higher tendency of metastasis
compared to HPV negative OC. It is believed that such aggressiveness is associated to the nature of its
oncogenic mechanisms triggered by HPV infection. HPV encodes two potent oncogenes E6 and E7 that
inactivate key tumor suppressors pRb and p53 and subsequently alter the expression spectrum of genes in oral
epithelial cells. To identify the molecular mechanisms of HPV oncogenesis, numerous studies have compared
the (epi)genomic profiles of HPV+ OC to normal oral epithelium, HPV negative OC, or other cancer types. These
studies have generated high-throughput sequencing datasets using different methods (transcriptomic, genomic
and epigenomic) and cellular conditions (normal, viral-infected and cancerous). However, these datasets were
not fully explored due to lack of comparable analysis platform to efficiently interrogate them, especially when
heterogeneity and batch effects are high across studies. We propose to leverage our data science experience
as well as close wet-lab collaborations to perform integrative analysis to identify HPV-specific biomarkers in
HPV+ OC. We propose to integrate (epi)genomic next generation sequencing datasets from 11 selected studies
(with addition if more availability in the future), involving 3 data types, 13 cell lines, 3 viral infection stages and 2
anatomically similar sites. At the core of the analysis is to remove potential batch effects and biases of the
integrated datasets and to set proper controls to nominate oncogenic biomarkers. To validate the findings, we
propose both dry and wet-lab experiments to evaluate candidate biomarkers. Insights from the proposed study
could advance our understanding of oral biology and potentially translate to novel therapeutics for HPV+ OC.