Automated Artificial Intelligence Model Trained on a Large Data Set Can Detect Pancreas Cancer on Diagnostic Computed Tomography Scans As Well As Visually Occult Preinvasive Cancer on Prediagnostic Computed Tomography Scans
Panagiotis Korfiatis, Garima Suman, Nandakumar G Patnam, Kamaxi H Trivedi, Aashna Karbhari, Sovanlal Mukherjee, Cole Cook, Jason R Klug, Anurima Patra, Hala Khasawneh, Naveen Rajamohan, Joel G Fletcher, Mark J Truty, Shounak Majumder, Candice W Bolan, Kumar Sandrasegaran, Suresh T Chari, Ajit H Goenka
Gastroenterology . 2023 Aug 30:S0016-5085(23)04958-2. doi: 10.1053/j.gastro.2023.08.034. Online ahead of print.
Background & aims: The aims of our case-control study were (1) to develop an automated 3-dimensional (3D) Convolutional Neural Network (CNN) for detection of pancreatic ductal adenocarcinoma (PDA) on diagnostic computed tomography scans (CTs), (2) evaluate its generalizability on multi-institutional public data sets, (3) its utility as a potential screening tool using a simulated cohort with high pretest probability, and (4) its ability to detect visually occult preinvasive cancer on prediagnostic CTs.
Methods: A 3D-CNN classification system was trained using algorithmically generated bounding boxes and pancreatic masks on a curated data set of 696 portal phase diagnostic CTs with PDA and 1080 control images with a nonneoplastic pancreas. The model was evaluated on (1) an intramural hold-out test subset (409 CTs with PDA, 829 controls); (2) a simulated cohort with a case-control distribution that matched the risk of PDA in glycemically defined new-onset diabetes, and Enriching New-Onset Diabetes for Pancreatic Cancer score ≥3; (3) multi-institutional public data sets (194 CTs with PDA, 80 controls), and (4) a cohort of 100 prediagnostic CTs (i.e., CTs incidentally acquired 3-36 months before clinical diagnosis of PDA) without a focal mass, and 134 controls.
Results: Of the CTs in the intramural test subset, 798 (64%) were from other hospitals. The model correctly classified 360 CTs (88%) with PDA and 783 control CTs (94%), with a mean accuracy 0.92 (95% CI, 0.91-0.94), area under the receiver operating characteristic (AUROC) curve of 0.97 (95% CI, 0.96-0.98), sensitivity of 0.88 (95% CI, 0.85-0.91), and specificity of 0.95 (95% CI, 0.93-0.96). Activation areas on heat maps overlapped with the tumor in 350 of 360 CTs (97%). Performance was high across tumor stages (sensitivity of 0.80, 0.87, 0.95, and 1.0 on T1 through T4 stages, respectively), comparable for hypodense vs isodense tumors (sensitivity: 0.90 vs 0.82), different age, sex, CT slice thicknesses, and vendors (all P > .05), and generalizable on both the simulated cohort (accuracy, 0.95 [95% 0.94-0.95]; AUROC curve, 0.97 [95% CI, 0.94-0.99]) and public data sets (accuracy, 0.86 [95% CI, 0.82-0.90]; AUROC curve, 0.90 [95% CI, 0.86-0.95]). Despite being exclusively trained on diagnostic CTs with larger tumors, the model could detect occult PDA on prediagnostic CTs (accuracy, 0.84 [95% CI, 0.79-0.88]; AUROC curve, 0.91 [95% CI, 0.86-0.94]; sensitivity, 0.75 [95% CI, 0.67-0.84]; and specificity, 0.90 [95% CI, 0.85-0.95]) at a median 475 days (range, 93-1082 days) before clinical diagnosis.
Conclusions: This automated artificial intelligence model trained on a large and diverse data set shows high accuracy and generalizable performance for detection of PDA on diagnostic CTs as well as for visually occult PDA on prediagnostic CTs. Prospective validation with blood-based biomarkers is warranted to assess the potential for early detection of sporadic PDA in high-risk individuals.