International Retrospective Observational Study of Continual Learning for AI on Endotracheal Tube Placement from Chest Radiographs
Emma Chen, M.S., Agustina Saenz, M.D., Oishi Banerjee, M.S., Henrik Marklund, M.S., Xiaoman Zhang, Ph.D., Shreya Johri, B.Tech., Hong-Yu Zhou, Ph.D., +51 , and Pranav Rajpurkar, Ph.D. Abstract
Background:Medical artificial intelligence (AI) models often underperform when deployed at new hospitals despite strong performance during development, creating a need for effective adaptation strategies that maintain institutional privacy. Continual learning (i.e., repeatedly retraining a model at each new hospital where it is deployed) can improve performance during and after deployment.
Methods:To determine whether continual learning could improve model performance across diverse clinical settings without requiring data exchange between hospitals or complex computational setups, we performed a retrospective study of a convolutional neural network model for endotracheal tube (ETT) assessment, comparing three deployment approaches (original model deployment, hospital-specific fine-tuning, and continual learning across hospitals). The study included 2313 intensive care unit chest radiographs across 23 hospitals spanning 12 countries and 5 continents, with 1 hospital held out for external validation. Participants were adults (≥18 years of age) with ETTs placed in 2021 (or earlier if a hospital found too few cases from 2021), identified through standardized key word searches in hospital systems and verified by radiology reports where available. Each hospital contributed approximately 100 anteroposterior radiographs, with 50 used for training, as well as associated deidentified reports and metadata. Each image was preprocessed by the contributing hospital to remove identifiable features and was independently reviewed and annotated by two radiologists.
Results:Continual learning models achieved a significantly lower average ETT�carina error of 10.58 mm (standard deviation [�SD], 12.98 mm), compared with 12.49 mm (SD�, 15.37 mm) for models fine-tuned only at their deployment hospital and 16.39 mm (�SD, 21.56 mm) for the original model. In addition, continual learning models outperformed the original model at all hospitals, and they outperformed models fine-tuned only at their deployment hospital at 21 of 22 hospitals.
Conclusions:Despite using only 50 training data points from each hospital, continual learning surpassed traditional fine-tuning as a method for improving medical AI model generalization.