Methods: We conducted a retrospective study of 101 Japanese radiology reports from pancreatic cancer staging computed tomography (CT) examinations. Using a zero-shot approach, Claude 3.7 Sonnet was prompted with definitions from the 8th edition of the Japanese Pancreatic Cancer Classification to generate TNM stage and resectability categories. The model�s outputs were compared with a reference standard established by a radiologist�s interpretation of the same reports. Performance metrics included categorical accuracy and Cohen�s kappa coefficients. Detailed error analysis was also performed to characterize common sources of misclassification.
Results: Claude 3.7 Sonnet achieved accuracies of 84.1% for the T category, 92.1% for the N category, 98.0% for the M category, and 87.1% for resectability. Cohen�s kappa values indicated substantial agreement for T (κ = 0.745) and almost perfect agreement for N (κ = 0.858), M (κ = 0.956), and resectability (κ = 0.812). The lower accuracy in T classification was mainly attributable to misinterpretation of nuanced vascular involvement. The model effectively detected missing information for TNM classification but showed limitations in identifying omissions relevant to resectability assessment.
Conclusion: Claude 3.7 Sonnet demonstrated high accuracy in extracting structured pancreatic cancer staging information from unstructured Japanese radiology reports without task-specific training. While challenges remain in interpreting nuanced descriptions of vascular invasion and resectability, the model reliably identified most staging elements and omissions. These findings highlight the potential of LLMs as tools for semi-automated generation of structured data from routine free-text reports, which could improve reporting consistency, workflow efficiency, and secondary data utilization in oncology care.