-
Evaluating Translational AI: A Two-Way Moving Target Problem
Richard K. Leuchter, M.D., William B. Turner, B.A., and David Ouyang, M.D.Abstract
Predictive artificial intelligence models are being deployed across health systems with dangerously inconsistent oversight, creating two critical gaps: a compliance gap, where clinical tools that likely qualify as software as a medical device are implemented without seeking U.S. Food and Drug Administration authorization; and a regulatory gap, where administrative and operational models are deployed without any external review despite their potential to influence care and widen disparities. Given that comprehensive U.S. Food and Drug Administration oversight of all such models is infeasible, the de facto onus of ensuring their safety and efficacy falls on the implementing institutions. However, this imperative for self-governance is undermined by a fundamental and previously unarticulated two-way moving target problem: (1) prior to implementation, concurrent-intervention confounding moves the target as practice and operational changes shift the outcome during the time it takes to develop models; and (2) after implementation, action-induced outcome bias moves the target again when prediction-triggered interventions alter or censor the outcome. Together, these pitfalls render traditional evaluation methods inadequate. The authors argue that health systems must adopt a new default standard for implementing any model that predicts patient outcomes or utilization: short-term randomized deployment with a control group. This approach provides a crucial counterfactual for rigorous, independent assessment of model performance and intervention effectiveness. It offers a practical path forward for institutions to ensure that their artificial intelligence tools are safe, effective, and equitable, thereby building a foundation of trust that is worthy of the patients they serve. (Funded by the National Institutes of Health National Heart, Lung, and Blood Institute.)