Imaging Pearls ❯ Deep Learning ❯ Deep Learning and the FDA (regulations)
-- OR -- |
|
- AI in Clinical Trials
- Decision support in trial design
- Patient identification, recruitment and retention
- Outcome monitoring
- Side effect monitoring - “Contrary to popular media speculation, AI alone is not enough to overcome the problems that society seeks to resolve; rather, machine learning depends on subject-matter experts in order to find solutions. AI provides us with numerous opportunities for advancement in the field of radiology: improved diagnostic certainty, suspicious case identification for early review, better patient prognosis, and a quicker turnaround. Machine learning depends on radiologists and our expertise and the convergence of radiologists and AI will bring forth the best outcomes for patients.”
AI as a Public Service.
Lavista Ferres JM, Fishman EK, Rowe SP, Chu LC, Lugo-Fagundo E.
J Am Coll Radiol. 2023 Mar 30:S1546-1440(23) in press - “One of the first lessons we learned was that if you want to impress people, your solution can be complex; but if you want to have an impact on the world, your solutions need to be simple enough to be implemented. This leads us to Big Data, a term created by Dough Laney in 2001, which he defines using the 3 Vs: velocity, variety, and volume. Big Data is simply data, and the objective is to solve problems with it. Using data for problem solving isn’t a novel practice. For instance, John Snow was able to trace the source of an 1854 cholera outbreak in London to a water pump on Broad Street by simply using data, but the way we utilize it today is different. We generate copious data and change how the data is collected, as well as significantly reduce the cost of storing and processing that data. We need enough data to solve problems, but we also need to focus on the problems.”
AI as a Public Service.
Lavista Ferres JM, Fishman EK, Rowe SP, Chu LC, Lugo-Fagundo E.
J Am Coll Radiol. 2023 Mar 30:S1546-1440(23) in press
- “Artificial intelligence (AI) is becoming more widespread within radiology. Capabilities that AI algorithms currently provide include detection, segmentation, classification, and quantification of pathological findings. Artificial intelligence software have created challenges for the traditional United States Food and Drug Administration (FDA) approval process for medical devices given their abilities to evolve over time with incremental data input. Currently, there are 190 FDA-approved radiology AI-based software devices, 42 of which pertain specifically to thoracic radiology. The majority of these algorithms are approved for the detection and/or analysis of pulmonary nodules, for monitoring placement of endotracheal tubes and indwelling catheters, for detection of emergent findings, and for assessment of pulmonary parenchyma; however, as technology evolves, there are many other potential applications that can be explored. For example, evaluation of non-idiopathic pulmonary fibrosis interstitial lung diseases, synthesis of imaging, clinical and/or laboratory data to yield comprehensive diagnoses, and survival or prognosis prediction of certain pathologies. With increasing physician and developer engagement, transparency and frequent communication between developers and regulatory agencies, such as the FDA, AI medical devices will be able to provide a critical supplement to patient management and ultimately enhance physicians’ ability to improve patient care.”
The current status and future of FDA-approved artificial intelligence tools in chest radiology in the United States
M.E. Milam, C.W. Koo
Clinical Radiology 2022 (in press) - “Rib fractures are commonly seen in the setting of thoracic trauma with an estimated 350,000 cases in the United States yearly. Presently, the only FDA-approved AI SaMD for rib fracture detection on CT is the uAI EasyTriage- Rib by Shanghai United Imaging Intelligence Co., Ltd. The tool consists of automatic vertebra localisation, rib segmentation and labelling, and rib fracture detection. There are currently no FDA-approved AI SaMD algorithms to detect rib fractures on chest radiographs.”
The current status and future of FDA-approved artificial intelligence tools in chest radiology in the United States
M.E. Milam, C.W. Koo
Clinical Radiology 2022 (in press) - “Although medical AI holds great promise, several barriers must be overcome before widespread clinical implementation. The process through which a model renders a decision remains unclear to human end-users, preventing trust-building. Although explainability methods, such as saliency maps, feature visualisation, and Shapley plots exist, it is increasingly apparent that these methods may be insufficient and more intuitive methods will need to be developed. In order to obtain buy-in from physicians, a change in physician point of view is needed, from perceiving AI as a rival threatening job security to seeing AI as a beneficial assistant. Physician engagement through training sessions that promote user understanding of software capability and limitations and having physicians as drivers of product development instead of just passive users may be key in turning such challenge into an opportunity. It cannot be emphasised enough that physician acceptance is one of the most important determinants in the successful initial institutional rollout of a medical AI product.”
The current status and future of FDA-approved artificial intelligence tools in chest radiology in the United States
M.E. Milam, C.W. Koo
Clinical Radiology 2022 (in press) - “AI is becoming ever increasingly more integrated within radiology. It has required the FDA to adapt their regulatory and approval process. Currently, there are 42 commercially available FDA-approved AI SaMD that have applications within chest radiology and as technologies continue to progress, more devices with different and enhanced capabilities currently under development will become available. Although several challenges remain for widespread adoption of AI SaMD, with increasing physician buy in, developer engagement, transparency, and frequent communication between developers and regulatory agencies such as the FDA, it will not be long before AI becomes an integral part of patient management and ultimately enhances patient care.”
The current status and future of FDA-approved artificial intelligence tools in chest radiology in the United States
M.E. Milam, C.W. Koo
Clinical Radiology 2022 (in press)
- Purpose: To assess an FDA-approved and CE-certified deep learning (DL) software application compared to the performance of human radiologists in detecting intracranial hemorrhages (ICH).
Methods: Within a 20-week trial from January to May 2020, 2210 adult non-contrast head CT scans were performed in a single center and automatically analyzed by an artificial intelligence (AI) solution with workflow integration. After excluding 22 scans due to severe motion artifacts, images were retrospectively assessed for the presence of ICHs by a second-year resident and a certified radiologist under simulated time pressure. Disagreements were resolved by a subspecialized neuro- radiologist serving as the reference standard. We calculated interrater agreement and diagnostic performance parameters, including the Breslow–Day and Cochran–Mantel–Haenszel tests.
Results: An ICH was present in 214 out of 2188 scans. The interrater agreement between the resident and the certified radiologist was very high (κ = 0.89) and even higher (κ = 0.93) between the resident and the reference standard. The software has delivered 64 false-positive and 68 false-negative results giving an overall sensitivity, specificity, positive predictive value, negative predictive value, and accuracy of 68.2%, 96.8%, 69.5%, 96.6%, and 94.0%, respectively. Corresponding values for the resident were 94.9%, 99.2%, 93.1%, 99.4%, and 98.8%. The accuracy of the DL application was inferior (p < 0.001) to that of both the resident and the certified neuroradiologist.
Conclusion: A resident under time pressure outperformed an FDA-approved DL program in detecting ICH in CT scans. Our results underline the importance of thoughtful workflow integration and post-approval validation of AI applications in various clinical environments.
FDA‐approved deep learning software application versus radiologists with different levels of expertise: detection of intracranial hemorrhage in a retrospective single‐center study
Thomas Kau et al.
Neuroradiology (2022) 64:981–990 - Conclusion A resident under time pressure outperformed an FDA-approved DL program in detecting ICH in CT scans. Our results underline the importance of thoughtful workflow integration and post-approval validation of AI applications in various clinical environments.
FDA‐approved deep learning software application versus radiologists with different levels of expertise: detection of intracranial hemorrhage in a retrospective single‐center study
Thomas Kau et al.
Neuroradiology (2022) 64:981–990 - “Individual AI algorithms have been developed to sup- port clinicians in the identification and prioritization of cases suspected to be ICHs. So far, only a few ven- dors have received FDA approval for their solutions. The Aidoc software has been reported to perform with accuracy levels of up to 98%, notably with even higher specificity than sensitivity [20]. Nonetheless, the generalization of different datasets and clinical translation are known chal- lenges restraining convolutional neural networks (CNN).”
FDA‐approved deep learning software application versus radiologists with different levels of expertise: detection of intracranial hemorrhage in a retrospective single‐center study
Thomas Kau et al.
Neuroradiology (2022) 64:981–990 - “The objective of our study was to assess the Aidoc software in a diverse clinical setting compared to the performance of human radiologists under simulated time pressure. Specifically, we investigated whether the diagnostic accuracy of this workflow-integrated AI solution was equivalent to that of a second-year resident.”
FDA‐approved deep learning software application versus radiologists with different levels of expertise: detection of intracranial hemorrhage in a retrospective single‐center study
Thomas Kau et al.
Neuroradiology (2022) 64:981–990 - "In conclusion, the radiological assessment detected ICHs in a total of 214 examinations (9.8%). Measured against the reference standard, the initial automatic DL software analysis delivered 64 false-positive and 68 false-negative results. This results in overall sensitivity, specificity, positive predictive value, negative predictive value, and accuracy of 68.2%, 96.8%, 69.5%, 96.6%, and 94.0%, respectively, for the software-based detection of ICH. In contrast, the resi- dent achieved respective accuracy values of 94.9%, 99.2%, 93.1%, 99.4%, and 98.8%; the certified radiologist achieved values of 95.8%, 99.7%, 97.2%, 99.5%, and 99.3%, respec- tively (Table 1).”
FDA‐approved deep learning software application versus radiologists with different levels of expertise: detection of intracranial hemorrhage in a retrospective single‐center study
Thomas Kau et al.
Neuroradiology (2022) 64:981–990 - "The software application detected five ICHs not spotted by the resident. Discrepancies between the resident and the neuroradiologist were caused in nine cases by beam hardening artifacts; in four cases due to discordant assessment of subdural bleeding; in three cases related to subtle subarach- noid hemorrhage (SAH); in three cases due to dural thick- ening after osteoplastic craniotomy; and relating to the fol- lowing conditions in one case each: intracerebral hematoma, vascular malformation, brain tumor, dense venous sinus, and ischemic infarct with possible hemorrhagic transformation. Discrepancies between the certified radiologist and the neuroradiologist occurred in six cases of SDH; in three cases of subtle tumor hemorrhage or calcification, respectively; in two cases of dense sinus or tentorium; and in each one case related to beam hardening, tiny cortical bleeding, and calcifications of the falx or brain parenchyma.”
FDA‐approved deep learning software application versus radiologists with different levels of expertise: detection of intracranial hemorrhage in a retrospective single‐center study
Thomas Kau et al.
Neuroradiology (2022) 64:981–990 - "Ultimately, the combination of man and machine will most likely achieve the highest diagnostic accuracy. Our results support the expectation that well-integrated algo- rithms should be further improved to assist radiologists, especially in high-output situations and during on-call hours. O’Neill et al. recently reported that AI-assisted reprioritization of the reading worklist was beneficial in terms of turnaround time, especially for examinations ordered as routine. It remains to be seen which frequent tasks will be integrated into neuroradiological solutions over time, and to what extent rare differential diagnoses may move into the focus of development.”
FDA‐approved deep learning software application versus radiologists with different levels of expertise: detection of intracranial hemorrhage in a retrospective single‐center study
Thomas Kau et al.
Neuroradiology (2022) 64:981–990 - "Our study adds to the body of evidence required for the implementation of AI solutions in real-world scenarios. We assessed an FDA-approved DL software application for ICH detection in a routine setting compared to the perfor- mance of human radiologists under time-constrained study conditions. A second-year resident outperformed the AI tool in terms of both sensitivity and specificity. Since most erroneous alerts can be resolved by experienced radiologists, the software holds promise for prioritizing cranial CT exams. However, due to a notable rate of unflagged ICH scans, we doubt generalizability and recommend this AI solution be improved. Our results underline the need for external post-approval validation in various clinical environments. They warrant further research with a focus on the combination of human and artificial intelligence for an accurate and timely diagnosis of ICH.”
FDA‐approved deep learning software application versus radiologists with different levels of expertise: detection of intracranial hemorrhage in a retrospective single‐center study
Thomas Kau et al.
Neuroradiology (2022) 64:981–990
- Purpose: To develop DL algorithms for the automated classification of benign versus malignant ovarian tumors assessed with US and to compare algorithm performance to Ovarian-Adnexal Reporting and Data System (O-RADS) and subjective expert assessment for malignancy.
Results: A total of 422 women (mean age, 46.4 years 6 14.8 [SD]) with 304 benign and 118 malignant tumors were included; there were 337 women in the training and validation data set and 85 women in the test data set. DLfeature had an AUC of 0.93 (95% CI: 0.85, 0.97) for classifying malignant from benign ovarian tumors, comparable with O-RADS (AUC, 0.92; 95% CI: 0.85, 0.97; P .88) and expert assessment (AUC, 0.97; 95% CI: 0.91, 0.99; P .07), and similar to DLdecision (AUC, 0.90; 95% CI: 0.82, 0.96; P .29). DLdecision, DLfeature, O-RADS, and expert assessment achieved sensitivities of 92%, 92%, 92%, and 96%, respectively, and specificities of 80%, 85%, 89%, and 87%, respectively, for malignancy.
Conclusion: Deep learning algorithms developed by using multimodal US images may distinguish malignant from benign ovarian tumors with diagnostic performance comparable to expert subjective and Ovarian-Adnexal Reporting and Data System assessment.
Deep Learning Prediction of Ovarian Malignancy at US Compared with O-RADS and Expert Assessment
Hui Chen et al.
Radiology 2022; 000:1–8 • https://doi.org/10.1148/radiol.211367 - Key Results
• In this retrospective study of 422 women, US-based deep learning with feature fusion (DLfeature) was comparable to Ovarian-Adnexal Reporting and Data System (O-RADS) risk categorization (area under the receiver operating characteristic curve [AUC], 0.93 vs 0.92, respectively; P .88) and subjective expert assessment (AUC, 0.93 vs 0.97, respectively; P .07) in distinguishing malignant from benign ovarian tumors.
• DLfeature, O-RADS, and expert assessment achieved sensitivities of 92%, 92%, and 96%, respectively, and specificities of 85%, 89%, and 87%, respectively, for malignancy.
Deep Learning Prediction of Ovarian Malignancy at US Compared with O-RADS and Expert Assessment
Hui Chen et al.
Radiology 2022; 000:1–8 • https://doi.org/10.1148/radiol.211367 - “Ovarian cancer is the second most common cause of cancer-related death worldwide among women, with a 5-year survival rate less than 45%. Early detection and accurate characterization of ovarian tumors is important for optimal patient treatment (3,4). Benign tumors can be treated conservatively, avoiding unnecessary costs and overtreatment, and preserving fertility. However, malignant tumors require referral to gynecologic oncology, appropriate staging, and consideration for radical surgery. To provide individualized and effective treatment op- tions, it is critical to be able to distinguish benign and malignant ovarian tumors with high accuracy.”
Deep Learning Prediction of Ovarian Malignancy at US Compared with O-RADS and Expert Assessment
Hui Chen et al.
Radiology 2022; 000:1–8 • https://doi.org/10.1148/radiol.211367
Deep Learning Prediction of Ovarian Malignancy at US Compared with O-RADS and Expert Assessment
Hui Chen et al.
Radiology 2022; 000:1–8 • https://doi.org/10.1148/radiol.211367- "In conclusion, we demonstrated that deep learning algorithms based on multimodal US images may predict ovarian malignancy with high diagnostic performance comparable to that of expert subjective and Ovarian-Adnexal Reporting and Data System assessment.”
Deep Learning Prediction of Ovarian Malignancy at US Compared with O-RADS and Expert Assessment
Hui Chen et al.
Radiology 2022; 000:1–8 • https://doi.org/10.1148/radiol.211367
- “There is a lack of standardization in the FDA clearance summary document reporting as well. Notably, what manufacturers provided in their FDA application is much more extensive, but only an abridged summary is publicly available. From this study, it was found that only 11 of all 118 FDA cleared AI algorithms (from 2008 to April 2021) had >1000 patients for validating the AI and from the 11, only one (Visage Breast Density, Visage Imaging, Inc.) reported validation from two different clinical sites.”
What’s Needed to Bridge the Gap Between US FDA Clearance and Real-world Use of AI Algorithms
MingDe Lin
Acad Radiol 2022; 29:567–568 - "There is also a gap between how results are reported in manuscripts (typically as Dice similarity coefficient [DSC], Hausdorff distance, responder operating characteristic curve [ROC], area under the ROC curve [AUC], confusion matrix, sensitivity/specificity/F1, etc.) and what is considered when evaluating an AI product, especially for algorithms that solve a niche clinical problem and depend upon other software/hardware underpinnings, return on investment, information about the degree of (uni or bidirectional) integration and flexibility with PACS, interoperability, workflow, ease of use, end user feedback, amongst others.”
What’s Needed to Bridge the Gap Between US FDA Clearance and Real-world Use of AI Algorithms
MingDe Lin
Acad Radiol 2022; 29:567–568 - "Many of the considerations could be helpful in making the requirements for FDA clearance more robust and transparent. These include: What are the training/ validation/test definitions and size? Was an external dataset (ideally from another institution) used for final testing? Were images generated from multiple (modality) vendors used to train and evaluate the AI algorithm? Was the AI algorithm trained using widely accepted standards of reference in the field? How was the performance of the AI measured and was it in reference to radiology experts and/or pathology? How was the potential for dataset bias mitigated? Of note, some of the considerations in the guide are for scientific reproducibility and would be at odds with business/intellectual property (IP)/productization interests and efforts (i.e., making the AI algorithm code publicly available).”
What’s Needed to Bridge the Gap Between US FDA Clearance and Real-world Use of AI Algorithms
MingDe Lin
Acad Radiol 2022; 29:567–568 - Rationale and Objectives: To assess key trends, strengths, and gaps in validation studies of the Food and Drug Administration (FDA)- regulated imaging-based artificial intelligence/machine learning (AI/ML) algorithms.
Results: We noted an increasing number of FDA-regulated AI/ML from 2008 to 2021. Seventeen (17/118) regulated AI/ML algorithms posted no validation claims or data. Just 9/118 reviewed AI/ML algorithms had a validation dataset sizes of over 1000 patients. The most common type of AI/ML included image processing/quantification (IPQ; n = 59/118), and triage (CADt; n = 27/118). Brain, breast, and lungs dominated the targeted body regions of interest.
Conclusion: Insufficient public information on validation datasets in several FDA-regulated AI/ML algorithms makes it difficult to justify clinical applications since their generalizability and presence of bias cannot be inferred.
FDA-regulated AI Algorithms: Trends, Strengths, and Gaps of Validation Studies
Shadi Ebrahimian et al.
Acad Radiol 2022; 29:559–566 - “The FDA often classifies AI/ML algorithms based on one or more of these tasks into computer-aided triage (CADt), detection (CADe), diagnosis (CADx), detection/diagnosis (CADe/x), and acquisition/optimization (CADa/o). The FDA presented another category of software for image processing which are not disease-specific, and include software for quantification, image reconstruction, artifact reduction, segmentation, filters, and denoising (IPQ).”
FDA-regulated AI Algorithms: Trends, Strengths, and Gaps of Validation Studies
Shadi Ebrahimian et al.
Acad Radiol 2022; 29:559–566 - Based on their functions and whether the algorithm was related to AI/ML, we assigned one of the following AI/ML category to each algorithm:
AI-Pr (prioritization) - ability to prioritize interpretation of imaging exams for any number of findings
AI-Cx (characterization) - ability to characterize specific etiology such as benign or malignant, type of lesion histology (such as non-small cell lung cancer or small cell lung cancer) or provide differential for an opacity (such as differentiate pneumonia, atelectasis or nodule)
AI-Ct (categorization) - ability to categorize a finding into different categories - such as LungRADS, LiRAD, PiRADs, breast density, low grade vs high grade cancer, and TNM staging
AI-Im (improvement) - leading to improved acquisition with AI camera for centering, planning, contrast triggering OR improved image quality with reconstruction or image- based noise and/or artifact reduction
AI-De (detection) - ability to aid in detection of one or more abnormalities/findings
AI-Q (quantification) - ability to either measure size [right ventricle/left ventricle ratio, bladder size, low attenuation areas for chronic obstructive pulmonary diseases or bronchial wall thickening] or function [fractional flow reserve or FFR, perfusion] of an organ or system, or comparison of findings over serial exams
AI-S (segmentation) - ability to draw contour, highlight, place a bounding box annotate or mark-up different anatomic structures, landmarks, or findings
FDA-regulated AI Algorithms: Trends, Strengths, and Gaps of Validation Studies
Shadi Ebrahimian et al.
Acad Radiol 2022; 29:559–566
FDA-regulated AI Algorithms: Trends, Strengths, and Gaps of Validation Studies
Shadi Ebrahimian et al.
Acad Radiol 2022; 29:559–566
FDA-regulated AI Algorithms: Trends, Strengths, and Gaps of Validation Studies
Shadi Ebrahimian et al.
Acad Radiol 2022; 29:559–566- “In conclusion, most FDA-regulated AI/ML algorithms lack adequate evaluation data or its full description which can affect their robustness and generalizability in the country with tremendous disparities in patients and imaging equipment and acquisition settings. This study calls for greater transparency in the FDA summary documents or release of the full FDA applications and validation data. It suggests that companies with FDA-regulated AI algorithms should provide additional details on their validation data, perhaps including this on the ACR website.”
FDA-regulated AI Algorithms: Trends, Strengths, and Gaps of Validation Studies
Shadi Ebrahimian et al.
Acad Radiol 2022; 29:559–566
- AI and the FDA
- AI and the FDA
- AI and the FDA
- AI and the FDA
- AI and the FDA
- “On the other hand, machine learning (ML) algorithms—also referred to as a data-based approach—“learn” from numerous examples in a dataset without being explicitly programmed to reach a particular answer or conclusion. ML algorithms can learn to decipher patterns in patient data at scales larger than a human can analyze while also potentially uncovering previously unrecognized correlations. Algorithms may also work at a faster pace than a human.”
How FDA Regulates Artificial Intelligence in Medical Products
Pew Charitable Trusts July 2021 - "Most ML-driven applications use a supervised approach in which the data used to train and validate the algorithm is labeled in advance by humans; for example, a collection of chest X-rays taken of people who have lung cancer and those who do not, with the two groups identified for the AI software. The algorithm examines all examples within the training dataset to “learn” which features of a chest X-ray are most closely correlated with the diagnosis of lung cancer and uses that analysis to predict new cases. Developers then test the algorithm to see how generalizable it is; that is, how well it performs on a new dataset, in this case, a new set of chest X-rays. Further validation is required by the end user, such as the health care practice, to ensure that the algorithm is accurate in real-world settings.”
How FDA Regulates Artificial Intelligence in Medical Products
Pew Charitable Trusts July 2021 - "Locked algorithms can degrade as new treatments and clinical practices arise or as populations alter overtime. These inevitable changes may make the real-world data entered into the AI program vastly different from its training data, leading the software to yield less accurate results. An adaptive algorithm could present an advantage in such situations, because it may learn to calibrate its recommendations in response to new data, potentially becoming more accurate than a locked model. However, allowing an adaptive algorithm to learn and adapt on its own also presents risks, including that it may infer patterns from biased practices or underperform in small subgroups of patients.”
How FDA Regulates Artificial Intelligence in Medical Products
Pew Charitable Trusts July 2021 - "In addition, patients are often not aware when an AI program has influenced the course of their care; these tools could, for example, be part of the reason a patient does not receive a certain treatment or is recommended fora potentially unnecessary procedure. Although there are many aspects of health care that a patient may not fully understand, in a recent patient engagement meeting hosted by FDA, some committee members—including patient advocates—expressed a desire to be notified when an AI product is part of their care. This desire included knowing if the data the model was trained on was representative of their particular demographics, or if it had been modified in some way that changed its intended use.”
How FDA Regulates Artificial Intelligence in Medical Products
Pew Charitable Trusts July 2021
- “An innovative framework proposed by the FDA seeks to address these issues by looking to current good manufacturing practices (cGMP) and adopting a total product lifecycle (TPLC) approach. If brought into force, this may reduce the regulatory burden incumbent on developers, while holding them to rigorous quality standards, maximizing safety, and permitting the field to mature.”
How the FDA Regulates AI
Harvey BH, Gowda V
Acad Radiol 2020; 27:58–61 - "The integration of man and machine to drive outcomes is not a concept particularly new to radiology. However, AI poses unique regulatory issues which set it apart from other advances in imaging technology. Unlike the case for the majority of pharmaceutical products, devices, and foods, the FDA has indicated its preference to regulate AI software based on function, rather than technical components or indicated use. Consequently, medical products incorporating AI will likely straddle the boundaries delineated in decent guidance documents and find incorporation into both CDS and regulated devices necessitating conventional premarket review.”
How the FDA Regulates AI
Harvey BH, Gowda V
Acad Radiol 2020; 27:58–61 - “As the market grows, the FDA will likely promulgate new regulations with a greater degree of specificity than those currently in existence, particularly in the realms of data security and privacy. This is critical to cloud- based systems susceptible to cyberattack. Further research in these and other areas will dynamically inform policymaking as the field matures.”
How the FDA Regulates AI
Harvey BH, Gowda V
Acad Radiol 2020; 27:58–61
- OBJECTIVE. Although extensive attention has been focused on the enormous potential of artificial intelligence (AI) technology, a major question remains: how should this fundamentally new technology be regulated? The purpose of this article is to provide an overview of the pathways developed by the U.S. Food and Drug Administration to regulate the incorporation of AI in medical imaging.
CONCLUSION. AI is the new wave of innovation in health care. The technology holds promising applications to revolutionize all aspects of medicine.
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:886–888 - “Unlike the regulation of drugs and devices, the regulation of AI by the FDA poses unique challenges. In its Digital Health In- novation Action Plan, the FDA acknowledged that the traditional approach to evaluating hardware-based medical devices is not suited for the faster iterative design of software- based medical technologies. This is partly because of the inherent variability in the parameters of AI-based technologies, which depend on both the nature and the source of the data. For example, in a recent study on deep learning algorithms for the auto- mated detection of an anterior cruciate ligament tear on knee MRI, the algorithm had an AUC value of 0.824 for an external test dataset and an AUC value of 0.937 for an internal test dataset. The veracity of algorithms would have to be judged by two witnesses, so to speak.”
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:886–888 - “Traditional image processing techniques were rule based and predictable, and they relied on well-defined features such as the size, texture, and heterogeneity of a lesion. AI- based technologies often use deep learning in which large amounts of data are fed into a computer system and the computer develops rules to predict outcomes from the data. A technology that learns on its own has an explainability problem—that is, we do not know how it arrived at the rules it derived from the data. The explainability problem makes it difficult to benchmark AI.”
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:886–888 - ”Class I devices, which are classified as low risk, are typically exempt from PMA review; an example of a class I device is an algorithm that merely labels nodules in a chest CT, rather than stating which nodule is malignant, so the nodules are brought to the attention of radiologists. Most AI algorithms are categorized as class I devices or are excluded from being designated as a device as outlined by the recent 21st Century Cures Act updated draft guidelines.”
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:886–888 - “Technology that lacks a predicate device (i.e., a predecessor) is a revolutionary technology. Because the technology is brand new, no evidence has accrued that can form a framework for regulation. Although AI- based imaging algorithms are fairly new, they can be either evolutionary or revolutionary. Quantification of coronary calcium or detection of a lung nodule with the use of machine learning techniques would be con- sidered evolutionary because these tasks have already been performed by software using rule-based automatic and semiautomatic methods.”
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:886–888 - “Computer-aided detection—CAD systems flag abnormalities for review by radiologists but do not assist in diagnostic or clinical decision making. They focus on the detection of abnormalities rather than their characterization. Examples of CAD include identification of colonic polyps on CT colonography, filling defects on pulmonary embolism CT, or liver lesions on CT or MRI. Critically, CAD analysis does not include further analysis of these lesions; instead, it flags a finding for clinician review but does not directly make a diagnosis of colon cancer, pulmonary embolism, hepatic malignancy, or other abnormality.”
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:886–888 - “Computer-aided diagnosis—CADx systems take analytics to a higher level than CAD systems. The FDA characterizes CADx not only as identifying the disease but also as providing an assessment of the disease through either a specific diagnosis or differential diagnosis as well as determining the extent of disease, the prognosis, and the presence of other known conditions. Thus, CADx involves the role of CAD, al- though the opposite is not true. As an example, CADx technology might identify lung nodules on CT (CAD) and might also pro- vide a malignancy score for those lesions (CADx)”.
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:886–888 - “PMA is the most stringent of the approval pathways. PMA approval is based on a deter- mination by the FDA that there is sufficient valid scientific evidence to ensure that the device is safe and effective for its intended use. This generally requires rigorous nonclinical and clinical studies to be conducted that show evidence of safety and efficacy in a substantial population. This is generally the pathway for class III devices that are considered high risk for patients or those that are revolutionary.”
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:886–888 - “Although AI may potentially revolutionize health care, it is often considered only evolutionary from the FDA’s point of view, because often a predicate de- vice can be identified so that the demanding PMA process can be avoided and a 510(k) approach can be pursued. For instance, newer image postprocessing algorithms that use deep learning have used commercially available postprocessing software that does not use deep learning as a predicate, and these algorithms have gone through the 510(k) pathway for FDA approval.”
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:886–888 - “More recently, the FDA developed the Digital Health Software Precertification (Pre-Cert) Program. This program is based on the assumption that because medical soft- ware evolves so rapidly, every iteration of a particular technology cannot realistically be reviewed by the FDA. This approach specifically regulates software by primarily evaluating the developer of the product rather than the product itself, thus deviating from the traditional approval processes that directly evaluated a particular product.”
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:886–888 - “The Pre-Cert program mirrors the Transportation Security Administration (TSA) Pre-Check program, because prevented companies are given a higher level of trust after meeting certain rigorous certification criteria. Several participants, including ma- jor consumer electronic companies, have already been enrolled in an early pilot version of this program. The participants will pro- vide the FDA access to the measures they use to develop, test, and maintain software products, including ways that they collect post- market data. After attaining certification, they will then undergo periodic audits rather than constant stepwise reviews as their dynamic products change. This approach may be a key solution to the rapid nature of software development and the associated workload burdens affecting the approval system.”
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:886–888 - “Medical device companies generally take one of three paths to gain regulatory approval: they seek approval in the United States first, seek approval overseas first, or seek approval in the United States and overseas in tandem. To develop a viable business strategy, a medical device company must understand the strengths and weaknesses of the regulatory system, its target market, the amount of internal and external resources required, and the amount of reimbursement available. In general, release in the United States requires a higher capital investment but gives a company access to the widest market, better intellectual property protection, and less foreign competition.”
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:886–888 - “Although AI algorithms pose a unique challenge to medical regulation agencies, these challenges are being acknowledged and addressed by the FDA, which recognizes that the standards by which medical tech- nology is evaluated may not apply to AI. By creating novel regulatory pathways, the FDA is encouraging the adoption of AI in medicine. The exact regulatory pathway and burden will be determined by intent—that is, whether AI is used for detection or diagnosis and whether is it used as an adjunct or a replacement. Regulatory standards are likely to evolve as AI algorithms become more robust and widespread.”
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:886–888
- “Unlike the regulation of drugs and devices, the regulation of AI by the FDA poses unique challenges. In its Digital Health In- novation Action Plan, the FDA acknowledged that the traditional approach to evaluating hardware-based medical devices is not suited for the faster iterative design of software- based medical technologies. This is partly because of the inherent variability in the parameters of AI-based technologies, which depend on both the nature and the source of the data. For example, in a recent study on deep learning algorithms for the auto- mated detection of an anterior cruciate ligament tear on knee MRI, the algorithm had an AUC value of 0.824 for an external test dataset and an AUC value of 0.937 for an internal test dataset. The veracity of algorithms would have to be judged by two witnesses, so to speak.”
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:886–888 - “Traditional image processing techniques were rule based and predictable, and they relied on well-defined features such as the size, texture, and heterogeneity of a lesion. AI- based technologies often use deep learning in which large amounts of data are fed into a computer system and the computer develops rules to predict outcomes from the data. A technology that learns on its own has an explainability problem—that is, we do not know how it arrived at the rules it derived from the data. The explainability problem makes it difficult to benchmark AI.”
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:886–888 - ”Class I devices, which are classified as low risk, are typically exempt from PMA review; an example of a class I device is an algorithm that merely labels nodules in a chest CT, rather than stating which nodule is malignant, so the nodules are brought to the attention of radiologists. Most AI algorithms are categorized as class I devices or are excluded from being designated as a device as outlined by the recent 21st Century Cures Act updated draft guidelines.”
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:886–888 - “Technology that lacks a predicate device (i.e., a predecessor) is a revolutionary technology. Because the technology is brand new, no evidence has accrued that can form a framework for regulation. Although AI- based imaging algorithms are fairly new, they can be either evolutionary or revolutionary. Quantification of coronary calcium or detection of a lung nodule with the use of machine learning techniques would be con- sidered evolutionary because these tasks have already been performed by software using rule-based automatic and semiautomatic methods.”
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:886–888 - “Computer-aided detection—CAD systems flag abnormalities for review by radiologists but do not assist in diagnostic or clinical decision making. They focus on the detection of abnormalities rather than their characterization. Examples of CAD include identification of colonic polyps on CT colonography, filling defects on pulmonary embolism CT, or liver lesions on CT or MRI. Critically, CAD analysis does not include further analysis of these lesions; instead, it flags a finding for clinician review but does not directly make a diagnosis of colon cancer, pulmonary embolism, hepatic malignancy, or other abnormality.”
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:886–888 - “Computer-aided diagnosis—CADx systems take analytics to a higher level than CAD systems. The FDA characterizes CADx not only as identifying the disease but also as providing an assessment of the disease through either a specific diagnosis or differential diagnosis as well as determining the extent of disease, the prognosis, and the presence of other known conditions. Thus, CADx involves the role of CAD, al- though the opposite is not true. As an example, CADx technology might identify lung nodules on CT (CAD) and might also pro- vide a malignancy score for those lesions (CADx)”.
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:886–888 - “PMA is the most stringent of the approval pathways. PMA approval is based on a deter- mination by the FDA that there is sufficient valid scientific evidence to ensure that the device is safe and effective for its intended use. This generally requires rigorous nonclinical and clinical studies to be conducted that show evidence of safety and efficacy in a substantial population. This is generally the pathway for class III devices that are considered high risk for patients or those that are revolutionary.”
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:886–888 - “Although AI may potentially revolutionize health care, it is often considered only evolutionary from the FDA’s point of view, because often a predicate de- vice can be identified so that the demanding PMA process can be avoided and a 510(k) approach can be pursued. For instance, newer image postprocessing algorithms that use deep learning have used commercially available postprocessing software that does not use deep learning as a predicate, and these algorithms have gone through the 510(k) pathway for FDA approval.”
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:886–888 - “More recently, the FDA developed the Digital Health Software Precertification (Pre-Cert) Program. This program is based on the assumption that because medical soft- ware evolves so rapidly, every iteration of a particular technology cannot realistically be reviewed by the FDA. This approach specifically regulates software by primarily evaluating the developer of the product rather than the product itself, thus deviating from the traditional approval processes that directly evaluated a particular product.”
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:886–888 - “The Pre-Cert program mirrors the Transportation Security Administration (TSA) Pre-Check program, because prevented companies are given a higher level of trust after meeting certain rigorous certification criteria. Several participants, including ma- jor consumer electronic companies, have already been enrolled in an early pilot version of this program. The participants will pro- vide the FDA access to the measures they use to develop, test, and maintain software products, including ways that they collect post- market data. After attaining certification, they will then undergo periodic audits rather than constant stepwise reviews as their dynamic products change. This approach may be a key solution to the rapid nature of software development and the associated workload burdens affecting the approval system.”
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:886–888 - “Medical device companies generally take one of three paths to gain regulatory approval: they seek approval in the United States first, seek approval overseas first, or seek approval in the United States and overseas in tandem. To develop a viable business strategy, a medical device company must understand the strengths and weaknesses of the regulatory system, its target market, the amount of internal and external resources required, and the amount of reimbursement available. In general, release in the United States requires a higher capital investment but gives a company access to the widest market, better intellectual property protection, and less foreign competition.”
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:886–888 - “Although AI algorithms pose a unique challenge to medical regulation agencies, these challenges are being acknowledged and addressed by the FDA, which recognizes that the standards by which medical tech- nology is evaluated may not apply to AI. By creating novel regulatory pathways, the FDA is encouraging the adoption of AI in medicine. The exact regulatory pathway and burden will be determined by intent—that is, whether AI is used for detection or diagnosis and whether is it used as an adjunct or a replacement. Regulatory standards are likely to evolve as AI algorithms become more robust and widespread.”
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:886–888
- “More recently, the FDA developed the Digital Health Software Precertification (Pre-Cert) Program. This program is based on the assumption that because medical software evolves so rapidly, every iteration of a particular technology cannot realistically be reviewed by the FDA. This approach specifically regulates software by primarily evaluating the developer of the product rather than the product itself, thus deviating from the traditional approval processes that directly evaluated a particular product.”
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:1–3 - “The Pre-Cert program mirrors the Transportation Security Administration (TSA) Pre-Check program, because prevented companies are given a higher level of trust after meeting certain rigorous certification cri- teria. Several participants, including major consumer electronic companies, have already been enrolled in an early pilot version of this program. The participants will provide the FDA access to the measures they use to develop, test, and maintain software products, including ways that they collect post-market data.”
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:1–3 - “To understand the strategic options for a vendor in bringing an AI product into the market and complying with regulations, it is important to review the regulatory process outside the United States. The European equivalent of the FDA is the Conformité Européenne (CE). The FDA typically requires evidence of both the safety and efficacy of a device, whereas a European CE mark requires only proof of safety and proof that the device performs consistently with the intend-ed use expressed by the manufacturer. Understandably, it is easier to obtain a CE mark than FDA approval. It is for this rea- son that some companies launch their prod- uct outside of the United States.”
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:1–3 - “Although AI algorithms pose a unique challenge to medical regulation agencies, these challenges are being acknowledged and addressed by the FDA, which recognizes that the standards by which medical technology is evaluated may not apply to AI. By creating novel regulatory pathways, the FDA is encouraging the adoption of AI in medicine. The exact regulatory pathway and bur- den will be determined by intent—that is, whether AI is used for detection or diagnosis and whether is it used as an adjunct or a replacement. Regulatory standards are likely to evolve as AI algorithms become more robust and widespread.”
Concepts in U.S. Food and Drug Administration Regulation of Artificial Intelligence for Medical Imaging
Kohli A et al.
AJR 2019; 213:1–3
- “The FDA has not issued rules about test datasets, transparency, or verification proce- dures. It will probably evaluate models and associated test datasets on a case by case ba- sis. How this will evolve is unclear at present. In addition, regulation that created the FDA was enacted before the availability of ML, and existing laws regarding devices are difficult to apply to ML algorithms.”
Implementing Machine Learning in Radiology Practice and Research Kohli M et al. AJR 2017; 208:754–760