Artificial intelligence-supported screen reading versus standard double reading in the Mammography Screening with Artificial Intelligence trial (MASAI): a clinical safety analysis of a randomised, controlled, non-inferiority, single-blinded, screening acc
Kristina Lång, Viktoria Josefsson, Anna-Maria Larsson, Stefan Larsson, Charlotte Högberg, Hanna Sartor, Solveig Hofvind, Ingvar Andersson, Aldana Rosso
Lancet Oncol . 2023 Aug;24(8):936-944. doi: 10.1016/S1470-2045(23)00298-X.
Background: Retrospective studies have shown promising results using artificial intelligence (AI) to improve mammography screening accuracy and reduce screen-reading workload; however, to our knowledge, a randomised trial has not yet been conducted. We aimed to assess the clinical safety of an AI-supported screen-reading protocol compared with standard screen reading by radiologists following mammography.
Methods: In this randomised, controlled, population-based trial, women aged 40-80 years eligible for mammography screening (including general screening with 1·5-2-year intervals and annual screening for those with moderate hereditary risk of breast cancer or a history of breast cancer) at four screening sites in Sweden were informed about the study as part of the screening invitation. Those who did not opt out were randomly allocated (1:1) to AI-supported screening (intervention group) or standard double reading without AI (control group). Screening examinations were automatically randomised by the Picture Archive and Communications System with a pseudo-random number generator after image acquisition. The participants and the radiographers acquiring the screening examinations, but not the radiologists reading the screening examinations, were masked to study group allocation. The AI system (Transpara version 1.7.0) provided an examination-based malignancy risk score on a 10-level scale that was used to triage screening examinations to single reading (score 1-9) or double reading (score 10), with AI risk scores (for all examinations) and computer-aided detection marks (for examinations with risk score 8-10) available to the radiologists doing the screen reading. Here we report the prespecified clinical safety analysis, to be done after 80 000 women were enrolled, to assess the secondary outcome measures of early screening performance (cancer detection rate, recall rate, false positive rate, positive predictive value [PPV] of recall, and type of cancer detected [invasive or in situ]) and screen-reading workload. Analyses were done in the modified intention-to-treat population (ie, all women randomly assigned to a group with one complete screening examination, excluding women recalled due to enlarged lymph nodes diagnosed with lymphoma). The lowest acceptable limit for safety in the intervention group was a cancer detection rate of more than 3 per 1000 participants screened. The trial is registered with ClinicalTrials.gov, NCT04838756, and is closed to accrual; follow-up is ongoing to assess the primary endpoint of the trial, interval cancer rate.
Findings: Between April 12, 2021, and July 28, 2022, 80 033 women were randomly assigned to AI-supported screening (n=40 003) or double reading without AI (n=40 030). 13 women were excluded from the analysis. The median age was 54·0 years (IQR 46·7-63·9). Race and ethnicity data were not collected. AI-supported screening among 39 996 participants resulted in 244 screen-detected cancers, 861 recalls, and a total of 46 345 screen readings. Standard screening among 40 024 participants resulted in 203 screen-detected cancers, 817 recalls, and a total of 83 231 screen readings. Cancer detection rates were 6·1 (95% CI 5·4-6·9) per 1000 screened participants in the intervention group, above the lowest acceptable limit for safety, and 5·1 (4·4-5·8) per 1000 in the control group-a ratio of 1·2 (95% CI 1·0-1·5; p=0·052). Recall rates were 2·2% (95% CI 2·0-2·3) in the intervention group and 2·0% (1·9-2·2) in the control group. The false positive rate was 1·5% (95% CI 1·4-1·7) in both groups. The PPV of recall was 28·3% (95% CI 25·3-31·5) in the intervention group and 24·8% (21·9-28·0) in the control group. In the intervention group, 184 (75%) of 244 cancers detected were invasive and 60 (25%) were in situ; in the control group, 165 (81%) of 203 cancers were invasive and 38 (19%) were in situ. The screen-reading workload was reduced by 44·3% using AI.
Interpretation: AI-supported mammography screening resulted in a similar cancer detection rate compared with standard double reading, with a substantially lower screen-reading workload, indicating that the use of AI in mammography screening is safe. The trial was thus not halted and the primary endpoint of interval cancer rate will be assessed in 100 000 enrolled participants after 2-years of follow up.