Satvik Tripathi, Kyla Gabriel, Suhani Dheer, Aastha Parajuli, Alisha Isabelle Augustin, Ameena Elahi, Omar Awan, Farouk Dako
J Am Coll Radiol . 2023 Sep;20(9):836-841. doi: 10.1016/j.jacr.2023.06.015. Epub 2023 Jul 16.
Artificial intelligence (AI) continues to show great potential in disease detection and diagnosis on medical imaging with increasingly high accuracy. An important component of AI model creation is dataset development for training, validation, and testing. Diverse and high-quality datasets are critical to ensure robust and unbiased AI models that maintain validity, especially in traditionally underserved populations globally. Yet publicly available datasets demonstrate problems with quality and inclusivity. In this literature review, the authors evaluate publicly available medical imaging datasets for demographic, geographic, genetic, and disease representation or lack thereof and call for an increase emphasis on dataset development to maximize the impact of AI models.