Pancreatic Cancer Risk Stratification Across Diabetes Stages: Development and Internal Validation of a Machine Learning Model

Salman Khan
Cancer Epidemiol Biomarkers Prev. 2026 Jun 8. doi: 10.1158/1055-9965.EPI-25-1782. Online ahead of print.
Abstract
Background: Pancreatic cancer is diagnosed at advanced stages in diabetes patients. Existing prediction models require complete historical data and focus on new-onset diabetes, limiting applicability. We developed a machine learning model to handle missing data and perform across the diabetes spectrum.

Methods: Retrospective cohort study using TriNetX electronic health records. Patients with hemoglobin A1c ≥6.5% were included. Sixty-five clinical variables were extracted at 90-day intervals. An XGBoost model was developed using patient-level 1:1 case-control sampling and compared with ENDPAC, Boursi, and Cheung models using AUROC, sensitivity, specificity, and lead time.

Results: Among 3,213,551 patients (mean age 56.7 years; 46.8% female), 2,655 (0.08%) developed pancreatic cancer. XGBoost achieved AUROC 0.78 (95% CI, 0.77-0.79). At 90% sensitivity, specificity was 50% with median 9-month lead time. The model scored 100% of patients versus <10% for existing models. On matched patient subsets with complete data, XGBoost significantly outperformed ENDPAC (AUROC 0.79 vs 0.63; P<0.001) and Cheung (AUROC 0.84 vs 0.75; P=0.045). Boursi could not be reliably evaluated due to insufficient scorable patients.

Conclusions: This XGBoost model predicts pancreatic cancer among patients with both new-onset and prevalent diabetes in the setting of limited clinical information, achieving 100% patient coverage vs < 10% for existing models. External validation is needed before clinical implementation.

Impact: Existing models focus exclusively on new-onset diabetes and require complete historical data, scoring fewer than 10% of patients. This model risk-stratifies both new-onset and prevalent diabetes patients with limited clinical information, achieving 100% patient coverage pending external validation.

Pancreatic Cancer Risk Stratification Across Diabetes Stages: Development and Internal Validation of a Machine Learning Model

Abstract