Albert S. Chiou, M.D., M.B.A., Jesutofunmi A. Omiye, M.D., M.S., Haiwen Gui, B.A., Susan M. Swetter, M.D., Justin M. Ko, M.D., M.B.A., Brian Gastman, M.D., Joshua Arbesman, M.D., Zhuo Ran Cai, M.D., Olivier Gevaert, Ph.D., Christoph Sad�e, M.Res., Veronica M. Rotemberg, M.D., Ph.D., Seung Seog Han, M.D., Ph.D., Philipp Tschandl, M.D., Ph.D., Meghan Dickman, M.D., Elizabeth Bailey, M.D., M.P.H., Gordon Bae, M.D., Philip Bailin, M.D., Jennifer Boldrick, M.D., Kiana Yekrang, B.S., Peter Caroline, B.S., Jackson Hanna, B.S., Nicholas R. Kurtansky, B.S., Jochen Weber, B.S., Niki A. See, B.S., Michelle Phung, M.S., Marianna Gallegos, B.S., Roxana Daneshjou, M.D., Ph.D., and Roberto A. Novoa, M.D.
Background:With an estimated 3 billion people lacking dermatologic care globally, artificial intelligence (AI) offers potential improvements in access. However, high-quality, diverse datasets are crucial for developing and testing these algorithms, including both unimodal and multimodal approaches. Most dermatology AI models are built on proprietary, siloed data, often from a single site with a single image type (i.e., clinical or dermoscopic). To address this, we introduce the Melanoma Research Alliance Multimodal Image Dataset for AI-based Skin Cancer (MIDAS), the largest publicly available, prospectively recruited dataset of biopsy-proven skin lesions with paired dermoscopic and clinical images.
Methods:We evaluated model performance on real-world MIDAS cases using four previously published state-of-the-art (SOTA) models and compared model and clinician diagnostic performance. We also assessed algorithm performance using clinical photography taken at different distances from the lesion to assess its influence across diagnostic categories.
Results:We prospectively enrolled 796 patients through an institutional review board�approved protocol with informed consent, representing 1290 unique lesions and 3830 total images (including dermoscopic and clinical images taken at 15-cm and 30-cm distances), to build MIDAS. The images represented a diverse range of lesions seen in general dermatology, including malignant, benign, and inflammatory types. Among these were melanocytic nevi (22.4%), invasive cutaneous melanomas (4.4%), and melanomas in situ (4.5%). We observed performance reduction across all the dermatology SOTA models compared with their previously published metrics. As a baseline, dermatologists achieved 79% accuracy in identifying malignant lesions, and dermoscopic images yielded higher sensitivity than clinical ones.
Conclusions:Improving our understanding of the strengths and weaknesses of AI diagnostic algorithms is critical as these tools advance toward widespread clinical deployment. While many models report high performance, caution is warranted due to a lack of model transportability across different patient populations. MIDAS�s robust, multimodal dataset allows researchers to evaluate models on real-world images, better assessing their generalizability and helping to bridge the gap between performance and clinical applicability. (Funded by the L�Or�al Dermatological Beauty Brands-Melanoma Research Alliance Team Science Award and others.)