Aaron Long, Christopher M Haggerty, Joshua Finer, Dustin Hartzel, Linyuan Jing, Azadeh Keivani, Christopher Kelsey, Daniel Rocha, Jeffrey Ruhl, David vanMaanen, Gil Metser, Eamon Duffy, Thomas Mawson, Mathew Maurer, Andrew J Einstein, Ashley Beecy, Deepa Kumaraiah, Shunichi Homma, Qi Liu, Vratika Agarwal, Mark Lebehn, Martin Leon, Rebecca Hahn, Pierre Elias, Timothy J Poterucha
Circulation . 2024 Jun 17. doi: 10.1161/CIRCULATIONAHA.124.068996. Online ahead of print.
Background: Artificial intelligence, particularly deep learning (DL), has immense potential to improve the interpretation of transthoracic echocardiography (TTE). Mitral regurgitation (MR) is the most common valvular heart disease and presents unique challenges for DL, including the integration of multiple video-level assessments into a final study-level classification.
Methods: A novel DL system was developed to intake complete TTEs, identify color MR Doppler videos, and determine MR severity on a 4-step ordinal scale (none/trace, mild, moderate, and severe) using the reading cardiologist as a reference standard. This DL system was tested in internal and external test sets with performance assessed by agreement with the reading cardiologist, weighted κ, and area under the receiver-operating characteristic curve for binary classification of both moderate or greater and severe MR. In addition to the primary 4-step model, a 6-step MR assessment model was studied with the addition of the intermediate MR classes of mild-moderate and moderate-severe with performance assessed by both exact agreement and ±1 step agreement with the clinical MR interpretation.
Results: A total of 61 689 TTEs were split into train (n=43 811), validation (n=8891), and internal test (n=8987) sets with an additional external test set of 8208 TTEs. The model had high performance in MR classification in internal (exact accuracy, 82%; κ=0.84; area under the receiver-operating characteristic curve, 0.98 for moderate/severe MR) and external test sets (exact accuracy, 79%; κ=0.80; area under the receiver-operating characteristic curve, 0.98 for moderate or greater MR). Most (63% internal and 66% external) misclassification disagreements were between none/trace and mild MR. MR classification accuracy was slightly higher using multiple TTE views (accuracy, 82%) than with only apical 4-chamber views (accuracy, 80%). In subset analyses, the model was accurate in the classification of both primary and secondary MR with slightly lower performance in cases of eccentric MR. In the analysis of the 6-step classification system, the exact accuracy was 80% and 76% with a ±1 step agreement of 99% and 98% in the internal and external test set, respectively.
Conclusions: This end-to-end DL system can intake entire echocardiogram studies to accurately classify MR severity and may be useful in helping clinicians refine MR assessments.