HistoTools: A suite of digital pathology tools for quality control, annotation and dataset identification - ABSTRACT: Roughly 40% of the US population will be diagnosed with some form of cancer in their lifetime. In
a majority of these cases, a definitive cancer diagnosis is only possible via histopathologic confirmation using a
tissue slide. Increasingly, these slides are being digitally scanned as high-resolution images for usage in both
clinical and research digital pathology (DP) workflows. Our group has been pioneering the use of deep learning
(DL), a form of machine learning, for segmentation, detection, and classification of various cancers using digital
pathology images. DL learns features and their associated weighting from large datasets to maximally
discriminate between user labeled data (e.g., cancer vs non-cancer, nuclei vs non-nuclei); a paradigm known
as “learn from data”. Unfortunately, this paradigm makes DL especially sensitive to low quality slides, noise
induced by small errors in the manual user labeling process, and general dataset heterogeneity. As many
groups do not intentionally account for these problems, they learn that successful employment of DL
technologies relies heavily on explicitly addressing challenges associated with (a) carefully curating high
quality slides without preparation or scanning artifacts, (b) obtaining a large precise collection of annotations
delineating objects of interest, and (c) selecting diverse datasets to ensure robust classifier performance when
clinically deploying the model. To address these challenges we propose HistoTools, a suite of three modules
or “Apps”: (1) HistoQC examines slides for artifacts and computes metrics associated with slide presentation
characteristics (e.g., stain intensity, compression levels) helping to quantify ranges of acceptable
characteristics for downstream algorithmic evaluation. (2) HistoAnno drastically improves the efficiency of
annotation efforts using a combined active learning and deep learning approach to ensure experts focus only
on regions which are important for classifier improvement. (3) HistoFinder aids in selecting suitable training
and test cohorts to guarantee that various tissue level characteristics are well balanced, leading to increased
reproducibility. Our team already has working prototypes of HistoQC (100% concordance with a pathologist,
evaluated on n>1200 slides) and HistoAnno (30% efficiency improvement during annotation tasks). In this U01,
we seek to further develop and evaluate HistoTools in the context of enhancing two companion diagnostic
(CDx) assays being developed in our group. First, we will use HistoTools to quality control and annotate nuclei,
tubules, and mitosis for improving our CDx classifier for predicting recurrence in breast cancers using a cohort
of n>900 patients from completed trial ECOG 2197. Secondly, HistoTools will be employed for quality control
and identification of tumor infiltrating lymphocytes and cancer nuclei towards improving our CDx classifier for
predicting response to immunotherapy in lung cancer using the n>700 patients from completed clinical trials
Checkmate 017 and 057. These tools will build on our existing open source tool repository to aid in real-time
feedback and dissemination throughout the ITCR and cancer research community.