M-ISIC: A Multimodal Open-Source International Skin Imaging Collaboration Informatics Platform for Automated Skin Cancer Detection - ABSTRACT Skin cancer is the most common type of cancer in the United States. It is critical to detect it early as skin cancers, especially melanoma, can be cured by surgery alone if detected early. As digital technology improves, skin cancer detection, and especially automated skin cancer detection, is increasingly being performed over images either in person or remotely via teledermatology. While artificial intelligence (AI) for skin cancer detection exceeds human performance on static images, algorithm performance on representative, multimodal data is still underdeveloped due to data collected piecemeal with different devices, without consistent image acquisition standards or automated registration. A well-curated dataset of annotated skin images helps meet a unique need beyond machine learning, as primary care clinicians also require expertly annotated images for education and training. We will overcome the lack of imaging standards and disparate data sources problematic in dermatology imaging by developing automated ingestion, organization, registration, and curation pipeline to improve AI for skin cancer detection. The International Skin Imaging Collaboration (ISIC) Archive includes over 2,500 citations, 156,000 images, 100 daily users, and 5 AI grand challenges with over 3,500 participants. The ISIC archive is built upon the open- source, NCI- supported, open-source web-based data management platform, Girder. The Girder platform is highly flexible, and has been extended to multiple applications (e.g., pathology, radiology). The flexibility of the Girder platform will enable us to address four major barriers that prevent our ability to efficiently ingest, host and serve large amounts of multidimensional data at the scale of non-medical image repositories (e.g. ImageNet): (1) need for laborious expert data curation and quality assurance review for protected health information, imaging artifacts, and incorrect labels (SA1.1); (2) limited metadata without content-based features creating cumbersome image retrieval (SA1.2); (3) lack of multimodal viewing capabilities (SA2); and (4) inadequate integration to existing AI and annotation software, preventing flexible, hypothesis-driven experimentation (SA3). The proposed informatics project aimed at data ingestion, multimodal visualization, and organization through ML and computer vision-based automation build on the initial success of the International Skin Imaging Collaboration (ISIC) Archive and the Girder platform upon which it is built. They will enable scaling of the Archive to millions of images, enabling multimodal experimentation with registered reflectance confocal microscopy images, and nimbly facilitate AI and translational experimentation for improved skin cancer detection.