M-ISIC: A Multimodal Open-Source International Skin Imaging Collaboration Informatics Platform for Automated Skin Cancer Detection - ABSTRACT
Skin cancer is the most common type of cancer in the United States. It is critical to detect it early as skin
cancers, especially melanoma, can be cured by surgery alone if detected early. As digital technology improves,
skin cancer detection, and especially automated skin cancer detection, is increasingly being performed over
images either in person or remotely via teledermatology. While artificial intelligence (AI) for skin cancer
detection exceeds human performance on static images, algorithm performance on representative, multimodal
data is still underdeveloped due to data collected piecemeal with different devices, without consistent image
acquisition standards or automated registration. A well-curated dataset of annotated skin images helps meet a
unique need beyond machine learning, as primary care clinicians also require expertly annotated images for
education and training. We will overcome the lack of imaging standards and disparate data sources
problematic in dermatology imaging by developing automated ingestion, organization, registration, and curation
pipeline to improve AI for skin cancer detection.
The International Skin Imaging Collaboration (ISIC) Archive includes over 2,500 citations, 156,000 images, 100
daily users, and 5 AI grand challenges with over 3,500 participants. The ISIC archive is built upon the open-
source, NCI- supported, open-source web-based data management platform, Girder. The Girder platform is
highly flexible, and has been extended to multiple applications (e.g., pathology, radiology).
The flexibility of the Girder platform will enable us to address four major barriers that prevent our ability
to efficiently ingest, host and serve large amounts of multidimensional data at the scale of non-medical image
repositories (e.g. ImageNet): (1) need for laborious expert data curation and quality assurance review for
protected health information, imaging artifacts, and incorrect labels (SA1.1); (2) limited metadata without
content-based features creating cumbersome image retrieval (SA1.2); (3) lack of multimodal viewing
capabilities (SA2); and (4) inadequate integration to existing AI and annotation software, preventing flexible,
hypothesis-driven experimentation (SA3).
The proposed informatics project aimed at data ingestion, multimodal visualization, and organization through
ML and computer vision-based automation build on the initial success of the International Skin Imaging
Collaboration (ISIC) Archive and the Girder platform upon which it is built. They will enable scaling of the
Archive to millions of images, enabling multimodal experimentation with registered reflectance confocal
microscopy images, and nimbly facilitate AI and translational experimentation for improved skin cancer
detection.