Methods: Retrospective cohort study using TriNetX electronic health records. Patients with hemoglobin A1c ≥6.5% were included. Sixty-five clinical variables were extracted at 90-day intervals. An XGBoost model was developed using patient-level 1:1 case-control sampling and compared with ENDPAC, Boursi, and Cheung models using AUROC, sensitivity, specificity, and lead time.
Results: Among 3,213,551 patients (mean age 56.7 years; 46.8% female), 2,655 (0.08%) developed pancreatic cancer. XGBoost achieved AUROC 0.78 (95% CI, 0.77-0.79). At 90% sensitivity, specificity was 50% with median 9-month lead time. The model scored 100% of patients versus <10% for existing models. On matched patient subsets with complete data, XGBoost significantly outperformed ENDPAC (AUROC 0.79 vs 0.63; P<0.001) and Cheung (AUROC 0.84 vs 0.75; P=0.045). Boursi could not be reliably evaluated due to insufficient scorable patients.
Conclusions: This XGBoost model predicts pancreatic cancer among patients with both new-onset and prevalent diabetes in the setting of limited clinical information, achieving 100% patient coverage vs < 10% for existing models. External validation is needed before clinical implementation.
Impact: Existing models focus exclusively on new-onset diabetes and require complete historical data, scoring fewer than 10% of patients. This model risk-stratifies both new-onset and prevalent diabetes patients with limited clinical information, achieving 100% patient coverage pending external validation.