1
|
Lång K, Josefsson V, Larsson AM, Larsson S, Högberg C, Sartor H, Hofvind S, Andersson I, Rosso A. Artificial intelligence-supported screen reading versus standard double reading in the Mammography Screening with Artificial Intelligence trial (MASAI): a clinical safety analysis of a randomised, controlled, non-inferiority, single-blinded, screening accuracy study. Lancet Oncol 2023; 24:936-944. [PMID: 37541274 DOI: 10.1016/s1470-2045(23)00298-x] [Citation(s) in RCA: 78] [Impact Index Per Article: 78.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 06/07/2023] [Accepted: 06/21/2023] [Indexed: 08/06/2023]
Abstract
BACKGROUND Retrospective studies have shown promising results using artificial intelligence (AI) to improve mammography screening accuracy and reduce screen-reading workload; however, to our knowledge, a randomised trial has not yet been conducted. We aimed to assess the clinical safety of an AI-supported screen-reading protocol compared with standard screen reading by radiologists following mammography. METHODS In this randomised, controlled, population-based trial, women aged 40-80 years eligible for mammography screening (including general screening with 1·5-2-year intervals and annual screening for those with moderate hereditary risk of breast cancer or a history of breast cancer) at four screening sites in Sweden were informed about the study as part of the screening invitation. Those who did not opt out were randomly allocated (1:1) to AI-supported screening (intervention group) or standard double reading without AI (control group). Screening examinations were automatically randomised by the Picture Archive and Communications System with a pseudo-random number generator after image acquisition. The participants and the radiographers acquiring the screening examinations, but not the radiologists reading the screening examinations, were masked to study group allocation. The AI system (Transpara version 1.7.0) provided an examination-based malignancy risk score on a 10-level scale that was used to triage screening examinations to single reading (score 1-9) or double reading (score 10), with AI risk scores (for all examinations) and computer-aided detection marks (for examinations with risk score 8-10) available to the radiologists doing the screen reading. Here we report the prespecified clinical safety analysis, to be done after 80 000 women were enrolled, to assess the secondary outcome measures of early screening performance (cancer detection rate, recall rate, false positive rate, positive predictive value [PPV] of recall, and type of cancer detected [invasive or in situ]) and screen-reading workload. Analyses were done in the modified intention-to-treat population (ie, all women randomly assigned to a group with one complete screening examination, excluding women recalled due to enlarged lymph nodes diagnosed with lymphoma). The lowest acceptable limit for safety in the intervention group was a cancer detection rate of more than 3 per 1000 participants screened. The trial is registered with ClinicalTrials.gov, NCT04838756, and is closed to accrual; follow-up is ongoing to assess the primary endpoint of the trial, interval cancer rate. FINDINGS Between April 12, 2021, and July 28, 2022, 80 033 women were randomly assigned to AI-supported screening (n=40 003) or double reading without AI (n=40 030). 13 women were excluded from the analysis. The median age was 54·0 years (IQR 46·7-63·9). Race and ethnicity data were not collected. AI-supported screening among 39 996 participants resulted in 244 screen-detected cancers, 861 recalls, and a total of 46 345 screen readings. Standard screening among 40 024 participants resulted in 203 screen-detected cancers, 817 recalls, and a total of 83 231 screen readings. Cancer detection rates were 6·1 (95% CI 5·4-6·9) per 1000 screened participants in the intervention group, above the lowest acceptable limit for safety, and 5·1 (4·4-5·8) per 1000 in the control group-a ratio of 1·2 (95% CI 1·0-1·5; p=0·052). Recall rates were 2·2% (95% CI 2·0-2·3) in the intervention group and 2·0% (1·9-2·2) in the control group. The false positive rate was 1·5% (95% CI 1·4-1·7) in both groups. The PPV of recall was 28·3% (95% CI 25·3-31·5) in the intervention group and 24·8% (21·9-28·0) in the control group. In the intervention group, 184 (75%) of 244 cancers detected were invasive and 60 (25%) were in situ; in the control group, 165 (81%) of 203 cancers were invasive and 38 (19%) were in situ. The screen-reading workload was reduced by 44·3% using AI. INTERPRETATION AI-supported mammography screening resulted in a similar cancer detection rate compared with standard double reading, with a substantially lower screen-reading workload, indicating that the use of AI in mammography screening is safe. The trial was thus not halted and the primary endpoint of interval cancer rate will be assessed in 100 000 enrolled participants after 2-years of follow up. FUNDING Swedish Cancer Society, Confederation of Regional Cancer Centres, and the Swedish governmental funding for clinical research (ALF).
Collapse
Affiliation(s)
- Kristina Lång
- Division of Diagnostic Radiology, Department of Translational Medicine, Lund University, Malmö, Sweden; Unilabs Mammography Unit, Skåne University Hospital, Malmö, Sweden.
| | - Viktoria Josefsson
- Division of Diagnostic Radiology, Department of Translational Medicine, Lund University, Malmö, Sweden; Unilabs Mammography Unit, Skåne University Hospital, Malmö, Sweden
| | - Anna-Maria Larsson
- Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden
| | - Stefan Larsson
- Department of Technology and Society, Lund University, Lund, Sweden
| | | | - Hanna Sartor
- Division of Diagnostic Radiology, Department of Translational Medicine, Lund University, Malmö, Sweden; Unilabs Mammography Unit, Skåne University Hospital, Malmö, Sweden
| | - Solveig Hofvind
- Section for Breast Cancer Screening, Cancer Registry of Norway, Oslo, Norway; Health and Care Sciences, Faculty of Health Sciences, The Arctic University of Norway, Tromsø, Norway
| | - Ingvar Andersson
- Division of Diagnostic Radiology, Department of Translational Medicine, Lund University, Malmö, Sweden; Unilabs Mammography Unit, Skåne University Hospital, Malmö, Sweden
| | - Aldana Rosso
- Division of Diagnostic Radiology, Department of Translational Medicine, Lund University, Malmö, Sweden
| |
Collapse
|
2
|
Dratsch T, Chen X, Rezazade Mehrizi M, Kloeckner R, Mähringer-Kunz A, Püsken M, Baeßler B, Sauer S, Maintz D, Pinto Dos Santos D. Automation Bias in Mammography: The Impact of Artificial Intelligence BI-RADS Suggestions on Reader Performance. Radiology 2023; 307:e222176. [PMID: 37129490 DOI: 10.1148/radiol.222176] [Citation(s) in RCA: 62] [Impact Index Per Article: 62.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Background Automation bias (the propensity for humans to favor suggestions from automated decision-making systems) is a known source of error in human-machine interactions, but its implications regarding artificial intelligence (AI)-aided mammography reading are unknown. Purpose To determine how automation bias can affect inexperienced, moderately experienced, and very experienced radiologists when reading mammograms with the aid of an artificial intelligence (AI) system. Materials and Methods In this prospective experiment, 27 radiologists read 50 mammograms and provided their Breast Imaging Reporting and Data System (BI-RADS) assessment assisted by a purported AI system. Mammograms were obtained between January 2017 and December 2019 and were presented in two randomized sets. The first was a training set of 10 mammograms, with the correct BI-RADS category suggested by the AI system. The second was a set of 40 mammograms in which an incorrect BI-RADS category was suggested for 12 mammograms. Reader performance, degree of bias in BI-RADS scoring, perceived accuracy of the AI system, and reader confidence in their own BI-RADS ratings were assessed using analysis of variance (ANOVA) and repeated-measures ANOVA followed by post hoc tests and Kruskal-Wallis tests followed by the Dunn post hoc test. Results The percentage of correctly rated mammograms by inexperienced (mean, 79.7% ± 11.7 [SD] vs 19.8% ± 14.0; P < .001; r = 0.93), moderately experienced (mean, 81.3% ± 10.1 vs 24.8% ± 11.6; P < .001; r = 0.96), and very experienced (mean, 82.3% ± 4.2 vs 45.5% ± 9.1; P = .003; r = 0.97) radiologists was significantly impacted by the correctness of the AI prediction of BI-RADS category. Inexperienced radiologists were significantly more likely to follow the suggestions of the purported AI when it incorrectly suggested a higher BI-RADS category than the actual ground truth compared with both moderately (mean degree of bias, 4.0 ± 1.8 vs 2.4 ± 1.5; P = .044; r = 0.46) and very (mean degree of bias, 4.0 ± 1.8 vs 1.2 ± 0.8; P = .009; r = 0.65) experienced readers. Conclusion The results show that inexperienced, moderately experienced, and very experienced radiologists reading mammograms are prone to automation bias when being supported by an AI-based system. This and other effects of human and machine interaction must be considered to ensure safe deployment and accurate diagnostic performance when combining human readers and AI. © RSNA, 2023 Supplemental material is available for this article. See also the editorial by Baltzer in this issue.
Collapse
Affiliation(s)
- Thomas Dratsch
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| | - Xue Chen
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| | - Mohammad Rezazade Mehrizi
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| | - Roman Kloeckner
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| | - Aline Mähringer-Kunz
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| | - Michael Püsken
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| | - Bettina Baeßler
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| | - Stephanie Sauer
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| | - David Maintz
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| | - Daniel Pinto Dos Santos
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| |
Collapse
|
3
|
Finck T, Moosbauer J, Probst M, Schlaeger S, Schuberth M, Schinz D, Yiğitsoy M, Byas S, Zimmer C, Pfister F, Wiestler B. Faster and Better: How Anomaly Detection Can Accelerate and Improve Reporting of Head Computed Tomography. Diagnostics (Basel) 2022; 12:diagnostics12020452. [PMID: 35204543 PMCID: PMC8871235 DOI: 10.3390/diagnostics12020452] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Revised: 02/06/2022] [Accepted: 02/07/2022] [Indexed: 02/06/2023] Open
Abstract
Background: Most artificial intelligence (AI) systems are restricted to solving a pre-defined task, thus limiting their generalizability to unselected datasets. Anomaly detection relieves this shortfall by flagging all pathologies as deviations from a learned norm. Here, we investigate whether diagnostic accuracy and reporting times can be improved by an anomaly detection tool for head computed tomography (CT), tailored to provide patient-level triage and voxel-based highlighting of pathologies. Methods: Four neuroradiologists with 1–10 years of experience each investigated a set of 80 routinely acquired head CTs containing 40 normal scans and 40 scans with common pathologies. In a random order, scans were investigated with and without AI-predictions. A 4-week wash-out period between runs was included to prevent a reminiscence effect. Performance metrics for identifying pathologies, reporting times, and subjectively assessed diagnostic confidence were determined for both runs. Results: AI-support significantly increased the share of correctly classified scans (normal/pathological) from 309/320 scans to 317/320 scans (p = 0.0045), with a corresponding sensitivity, specificity, negative- and positive- predictive value of 100%, 98.1%, 98.2% and 100%, respectively. Further, reporting was significantly accelerated with AI-support, as evidenced by the 15.7% reduction in reporting times (65.1 ± 8.9 s vs. 54.9 ± 7.1 s; p < 0.0001). Diagnostic confidence was similar in both runs. Conclusion: Our study shows that AI-based triage of CTs can improve the diagnostic accuracy and accelerate reporting for experienced and inexperienced radiologists alike. Through ad hoc identification of normal CTs, anomaly detection promises to guide clinicians towards scans requiring urgent attention.
Collapse
Affiliation(s)
- Tom Finck
- Department of Diagnostic and Interventional Neuroradiology, Klinikum Rechts der Isar, Technische Universität München, Ismaninger Straße 22, 81675 Munich, Germany; (M.P.); (S.S.); (M.S.); (D.S.); (C.Z.); (B.W.)
- Correspondence:
| | - Julia Moosbauer
- DeepC GmbH, Atelierstraße 29, 81671 Munich, Germany; (J.M.); (M.Y.); (S.B.); (F.P.)
| | - Monika Probst
- Department of Diagnostic and Interventional Neuroradiology, Klinikum Rechts der Isar, Technische Universität München, Ismaninger Straße 22, 81675 Munich, Germany; (M.P.); (S.S.); (M.S.); (D.S.); (C.Z.); (B.W.)
| | - Sarah Schlaeger
- Department of Diagnostic and Interventional Neuroradiology, Klinikum Rechts der Isar, Technische Universität München, Ismaninger Straße 22, 81675 Munich, Germany; (M.P.); (S.S.); (M.S.); (D.S.); (C.Z.); (B.W.)
| | - Madeleine Schuberth
- Department of Diagnostic and Interventional Neuroradiology, Klinikum Rechts der Isar, Technische Universität München, Ismaninger Straße 22, 81675 Munich, Germany; (M.P.); (S.S.); (M.S.); (D.S.); (C.Z.); (B.W.)
| | - David Schinz
- Department of Diagnostic and Interventional Neuroradiology, Klinikum Rechts der Isar, Technische Universität München, Ismaninger Straße 22, 81675 Munich, Germany; (M.P.); (S.S.); (M.S.); (D.S.); (C.Z.); (B.W.)
| | - Mehmet Yiğitsoy
- DeepC GmbH, Atelierstraße 29, 81671 Munich, Germany; (J.M.); (M.Y.); (S.B.); (F.P.)
| | - Sebastian Byas
- DeepC GmbH, Atelierstraße 29, 81671 Munich, Germany; (J.M.); (M.Y.); (S.B.); (F.P.)
| | - Claus Zimmer
- Department of Diagnostic and Interventional Neuroradiology, Klinikum Rechts der Isar, Technische Universität München, Ismaninger Straße 22, 81675 Munich, Germany; (M.P.); (S.S.); (M.S.); (D.S.); (C.Z.); (B.W.)
| | - Franz Pfister
- DeepC GmbH, Atelierstraße 29, 81671 Munich, Germany; (J.M.); (M.Y.); (S.B.); (F.P.)
| | - Benedikt Wiestler
- Department of Diagnostic and Interventional Neuroradiology, Klinikum Rechts der Isar, Technische Universität München, Ismaninger Straße 22, 81675 Munich, Germany; (M.P.); (S.S.); (M.S.); (D.S.); (C.Z.); (B.W.)
| |
Collapse
|
4
|
Factors Influencing the False Positive Rate in CT Lung Cancer Screening. Acad Radiol 2022; 29 Suppl 2:S18-S22. [PMID: 32893112 DOI: 10.1016/j.acra.2020.07.040] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 07/28/2020] [Accepted: 07/30/2020] [Indexed: 12/20/2022]
Abstract
PURPOSE To identify factors influencing the likelihood of a false positive lung cancer screening (LCS) computed tomography (CT), which may lead to increased costs and patient anxiety. MATERIALS AND METHODS In this retrospective study, we examined all LCS CTs performed across our healthcare network from 2014 to 2018, recording Lung-RADS category and diagnosis of lung cancer. A false positive was defined by Lung-RADS 3-4X and no diagnosis of lung cancer within 1 year. Patient demographics and smoking history, presence of emphysema, diagnosis of chronic obstructive pulmonary disease, radiologist years of experience and annual volume, income level by patient zip code, and screening institution were evaluated in a multivariate logistic regression model for false positive exams. RESULTS A total of 5835 LCS CTs were included from 3735 patients. Lung cancer was diagnosed in 142 cases (2%). Of the LCS CTs, 905 (16%) were positive by Lung-RADS, and 766 (13%) represented false positives. Logistic regression analysis showed that screening institution (odds ratios [OR] 0.91 - 2.43), baseline scan (OR 1.43), radiologist experience (OR 0.59), patient age (OR 2.08), diagnosis of chronic obstructive pulmonary disease (OR 1.34), presence of emphysema (OR 1.32), and income level (OR 0.43) were significant predictors of false positives. CONCLUSION A number of patient-specific and site/radiologist-specific factors influence the false positive rate in CT LCS. In particular, radiologists with less experience had a higher false positive rate. Screening programs may wish to develop quality assurance programs to compare the false positive rates of their radiologists to national benchmarks.
Collapse
|
5
|
Walker MJ, Hartman K, Majpruz V, Leung YW, Fienberg S, Rabeneck L, Chiarelli AM. The Impact of Radiologist Screening Mammogram Reading Volume on Performance in the Ontario Breast Screening Program. Can Assoc Radiol J 2021; 73:362-370. [PMID: 34423685 DOI: 10.1177/08465371211031186] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
PURPOSE Although some studies have shown increasing radiologists' mammography volumes improves performance, there is a lack of evidence specific to digital mammography and breast screening program performance targets. This study evaluates the relationship between digital screening volume and meeting performance targets. METHODS This retrospective cohort study included 493 radiologists in the Ontario Breast Screening Program who interpreted 1,762,173 screening mammograms in participants ages 50-90 between 2014 and 2016. Associations between annual screening volume and meeting performance targets for abnormal call rate, positive predictive value (PPV), invasive cancer detection rate (CDR), sensitivity, and specificity were modeled using mixed-effects multivariate logistic regression. RESULTS Most radiologists read 500-999 (36.7%) or 1,000-1,999 (31.0%) screens annually, and 18.5% read ≥2,000. Radiologists who read ≥2,000 annually were more likely to meet abnormal call rate (OR = 3.85; 95% CI: 1.17-12.61), PPV (OR = 5.36; 95% CI: 2.53-11.34), invasive CDR (OR = 4.14; 95% CI: 1.50-11.46), and specificity (OR = 4.07; 95% CI: 1.89-8.79) targets versus those who read 100-499 screens. Radiologists reading 1,000-1,999 screens annually were more likely to meet PPV (OR = 2.32; 95% CI: 1.22-4.40), invasive CDR (OR = 3.36; 95% CI: 1.49-7.59) and specificity (OR = 2.00; 95% CI: 1.04-3.84) targets versus those who read 100-499 screens. No significant differences were observed for sensitivity. CONCLUSIONS Annual reading volume requirements of 1,000 in Canada are supported as screening volume above 1,000 was strongly associated with achieving performance targets for nearly all measures. Increasing the minimum volume to 2,000 may further reduce the potential limitations of screening due to false positives, leading to improvements in overall breast screening program quality.
Collapse
Affiliation(s)
- Meghan J Walker
- Prevention and Cancer Control, 573450Ontario Health (Cancer Care Ontario), Toronto, Ontario, Canada.,Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Krystal Hartman
- Prevention and Cancer Control, 573450Ontario Health (Cancer Care Ontario), Toronto, Ontario, Canada
| | - Vicky Majpruz
- Prevention and Cancer Control, 573450Ontario Health (Cancer Care Ontario), Toronto, Ontario, Canada
| | - Yvonne W Leung
- Prevention and Cancer Control, 573450Ontario Health (Cancer Care Ontario), Toronto, Ontario, Canada
| | - Samantha Fienberg
- Prevention and Cancer Control, 573450Ontario Health (Cancer Care Ontario), Toronto, Ontario, Canada.,Radiology, McMaster University, Hamilton, Ontario, Canada.,Medical Imaging, Grand River Hospital, Kitchener, Ontario, Canada
| | - Linda Rabeneck
- Prevention and Cancer Control, 573450Ontario Health (Cancer Care Ontario), Toronto, Ontario, Canada.,Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada.,Department of Medicine, University of Toronto, Toronto, Ontario, Canada.,IC/ES, Toronto, Ontario, Canada
| | - Anna M Chiarelli
- Prevention and Cancer Control, 573450Ontario Health (Cancer Care Ontario), Toronto, Ontario, Canada.,Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
6
|
Hoff SR, Myklebust TÅ, Lee CI, Hofvind S. Influence of Mammography Volume on Radiologists’ Performance: Results from BreastScreen Norway. Radiology 2019; 292:289-296. [DOI: 10.1148/radiol.2019182684] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
7
|
Jackson RL, Double CR, Munro HJ, Lynch J, Tapia KA, Trieu PD, Alakhras M, Ganesan A, Do TD, Soh BP, Brennan PC, Puslednik P. Breast Cancer Diagnostic Efficacy in a Developing South-East Asian Country. Asian Pac J Cancer Prev 2019; 20:727-731. [PMID: 30909671 PMCID: PMC6825776 DOI: 10.31557/apjcp.2019.20.3.727] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Background: Breast cancer, is increasing in prevalence amongst South East (SE) Asian women, highlighting the
need for high quality, early diagnoses. This study investigated radiologists’ detection efficacy in a developing (DC)
and developed (DDC) SE Asian country, as compared to Australian radiologists. Methods: Using a test-set of 60
mammographic cases, 20 containing cancer, JAFROC figures of merit (FOM) and ROC area under the curves (AUC)
were calculated as well as location sensitivity, sensitivity and specificity. The test set was examined by 35, 15, and
53 radiologists from DC, a DDC and Australia, respectively. Results: DC radiologists, compared to both groups of
counterparts, demonstrated significantly lower JAFROC FOM, ROC AUC and specificity scores. DC radiologists had
a significantly lower location sensitivity than Australian radiologists. DC radiologists also demonstrated significantly
lower values for age, hours of reading per week, and years of mammography experience when compared with other
radiologists. Conclusion: Significant differences in breast cancer detection parameters can be attributed to the experience
of DC radiologists. The development of inexpensive, innovative, interactive training programs are discussed. This nonuniform
level of breast cancer detection between countries must be addressed to achieve the World Health Organisation
goal of health equity.
Collapse
Affiliation(s)
| | - Callan R Double
- St Matthews Catholic School, Mudgee, New South Wales, Australia.
| | - Hayden J Munro
- St Matthews Catholic School, Mudgee, New South Wales, Australia.
| | - Jessica Lynch
- St Matthews Catholic School, Mudgee, New South Wales, Australia.
| | - Kriscia A Tapia
- Faculty of Health Sciences, The University of Sydney, Australia
| | - Phuong Dung Trieu
- Faculty of Health Sciences, The University of Sydney, Australia.,Department of Medical Imaging, Ho Chi Minh City University of Medicine and Pharmacy, Vietnam
| | - Maram Alakhras
- Faculty of Health Sciences, The University of Sydney, Australia
| | - Aarthi Ganesan
- Faculty of Health Sciences, The University of Sydney, Australia
| | - Thuan Doan Do
- Department of Diagnostic Imaging, Vietnam National Cancer Hospital, Vietnam
| | | | | | - Puslednik Puslednik
- St Matthews Catholic School, Mudgee, New South Wales, Australia. ,Faculty of Health Sciences, The University of Sydney, Australia
| |
Collapse
|
8
|
Geel KV, Kok EM, Aldekhayel AD, Robben SGF, van Merriënboer JJG. Chest X-ray evaluation training: impact of normal and abnormal image ratio and instructional sequence. MEDICAL EDUCATION 2019; 53:153-164. [PMID: 30474292 PMCID: PMC6587445 DOI: 10.1111/medu.13756] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Revised: 09/07/2018] [Accepted: 09/13/2018] [Indexed: 06/09/2023]
Abstract
CONTEXT Medical image perception training generally focuses on abnormalities, whereas normal images are more prevalent in medical practice. Furthermore, instructional sequences that let students practice prior to expert instruction (inductive) may lead to improved performance compared with methods that give students expert instruction before practice (deductive). This study investigates the effects of the proportion of normal images and practice-instruction order on learning to interpret medical images. It is hypothesised that manipulation of the proportion of normal images will lead to a sensitivity-specificity trade-off and that students in practice-first (inductive) conditons need more time per practice case but will correctly identify more test cases. METHODS Third-year medical students (n = 103) learned radiograph interpretation by practising cases with, respectively, 30% or 70% normal radiographs prior to expert instruction (practice-first order) or after expert instruction (instruction-first order). After training, students performed a test (60% normal) and sensitivity (% of correctly identified abnormal radiographs), specificity (% of correctly identified normal radiographs), diagnostic performance (% of correct diagnoses) and case duration were measured. RESULTS The conditions with 30% of normal images scored higher on sensitivity but the conditions with 70% of normal images scored higher on specificity, indicating a sensitivity and specificity trade-off. Those who participated in inductive conditions took less time per practice case but more per test case. They had similar test sensitivity, but scored lower on test specificity. CONCLUSIONS The proportion of normal images impacted the sensitivity-specificity trade-off. This trade-off should be an important consideration for the alignment of training with future practice. Furthermore, the deductive conditions unexpectedly scored higher on specificity when participants took less time per case. An inductive approach did not lead to higher diagnostic performance, possibly because participants might already have relevant prior knowledge. Deductive approaches are therefore advised for the training of advanced learners.
Collapse
Affiliation(s)
- Koos van Geel
- Department of Radiology, Maastricht University Medical Center, Maastricht, the Netherlands
| | - Ellen M Kok
- Department of Education, Utrecht University, Utrecht, the Netherlands
| | - Abdullah D Aldekhayel
- Department of Radiology, Maastricht University Medical Center, Maastricht, the Netherlands
| | - Simon G F Robben
- Department of Radiology, Maastricht University Medical Center, Maastricht, the Netherlands
| | - Jeroen J G van Merriënboer
- School of Health Professions Education, Department of Educational Research and Development, Maastricht University, Maastricht, the Netherlands
| |
Collapse
|
9
|
Ang ZZ, Rawashdeh MA, Heard R, Brennan PC, Lee W, Lewis SJ. Classification of normal screening mammograms is strongly influenced by perceived mammographic breast density. J Med Imaging Radiat Oncol 2017; 61:461-469. [PMID: 28052571 DOI: 10.1111/1754-9485.12576] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2016] [Accepted: 11/20/2016] [Indexed: 12/01/2022]
Abstract
INTRODUCTION To investigate how breast screen readers classify normal screening cases using descriptors of normal mammographic features and to assess test cases for suitability for a single reading strategy. METHODS Fifteen breast screen readers interpreted a test set of 29 normal screening cases and classified them by firstly rating their perceived difficulty to reach a 'normal' decision, secondly identifying the cases' salient normal mammographic features and thirdly assessing the cases' suitability for a single reading strategy. RESULTS The relationship between the perceived difficulty in making 'normal' decisions and the normal mammographic features was investigated. Regular ductal pattern (Tb = -0.439, P = 0.001), uniform density (Tb = -0.527, P < 0.001), non-dense breasts (Tb = -0.736, P < 0.001), symmetrical mammographic features (Tb = -0.474, P = 0.001) and overlapped density (Tb = 0.630, P < 0.001) had a moderate to strong correlation with the difficulty to make 'normal' decisions. Cases with regular ductal pattern (Tb = 0.447, P = 0.002), uniform density (Tb = 0.550, P < 0.001), non-dense breasts (Tb = 0.748, P < 0.001) and symmetrical mammographic features (Tb = 0.460, P = 0.001) were considered to be more suitable for single reading, whereas cases with overlapped density were not (Tb = -0.679, P < 0.001). CONCLUSION The findings suggest that perceived mammographic breast density has a major influence on the difficulty for readers to classify cases as normal and hence their suitability for single reading.
Collapse
Affiliation(s)
- Zoey Zy Ang
- Medical Imaging Optimisation and Perception Group (MIOPeG), Faculty of Health Sciences, Discipline of Medical Radiation Sciences, The University of Sydney, Lidcombe, New South Wales, Australia.,National Healthcare Group Diagnostics (NHGD), Singapore City, Singapore
| | - Mohammad A Rawashdeh
- Medical Imaging Optimisation and Perception Group (MIOPeG), Faculty of Health Sciences, Discipline of Medical Radiation Sciences, The University of Sydney, Lidcombe, New South Wales, Australia.,Faculty of Applied Medical Sciences, Jordan University of Science and Technology, Irbid, Jordan
| | - Rob Heard
- Health Systems and Global Populations Research Group, Faculty of Health Sciences, Discipline of Behavioural and Social Sciences in Health, The University of Sydney, Lidcombe, New South Wales, Australia
| | - Patrick C Brennan
- Medical Imaging Optimisation and Perception Group (MIOPeG), Faculty of Health Sciences, Discipline of Medical Radiation Sciences, The University of Sydney, Lidcombe, New South Wales, Australia
| | - Warwick Lee
- Medical Imaging Optimisation and Perception Group (MIOPeG), Faculty of Health Sciences, Discipline of Medical Radiation Sciences, The University of Sydney, Lidcombe, New South Wales, Australia
| | - Sarah J Lewis
- Medical Imaging Optimisation and Perception Group (MIOPeG), Faculty of Health Sciences, Discipline of Medical Radiation Sciences, The University of Sydney, Lidcombe, New South Wales, Australia
| |
Collapse
|
10
|
Mohd Norsuddin N, Reed W, Mello-Thoms C, Lewis S. Understanding recall rates in screening mammography: A conceptual framework review of the literature. Radiography (Lond) 2015. [DOI: 10.1016/j.radi.2015.06.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
11
|
Performance indicators evaluation of the population-based breast cancer screening programme in Northern Portugal using the European Guidelines. Cancer Epidemiol 2015; 39:783-9. [PMID: 26315486 DOI: 10.1016/j.canep.2015.08.004] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2015] [Revised: 08/08/2015] [Accepted: 08/12/2015] [Indexed: 11/23/2022]
Abstract
OBJECTIVE To evaluate the first 10 years of operation of the population-based breast cancer screening programme implemented in the Northern Region of Portugal, using selected recommended standard performance indicators. METHODS Data from women aged 50-69 screened with two-view mammography, biennially, in the period 2000-2009, were included. Main performance indicators were compared with the recommended levels of the European Guidelines. RESULTS A total of 202,039 screening examinations were performed, 71,731 (35.5%) in the initial screening and 130,308 (64.5%) in the subsequent screening. Coverage rate by examination reached 74.3% of the target population, in the last period evaluated. Recall rates were 8.1% and 2.4% and cancer detection rates were 4.4/1000 and 2.9/1000 respectively, for initial and subsequent screenings. The breast cancer detection rate, expressed as a multiple of the background expected incidence was 3.1 in initial screen and 2.2 in subsequent screen. The incidence of invasive interval cancers met the desirable recommended levels both the first and second years since last screening examination, in the initial and subsequent screenings. Invasive tumours <15mm were 50.4% and 53.8% of the invasive cancers detected in initial and subsequent screenings. Less favourable size, grading and biomarkers expression were found in interval cancers compared to screen-detected cancers. CONCLUSIONS Breast cancer screening programme in the Northern Region of Portugal was well accepted by the population. Most of the performance indicators were consistent with the desirable levels of the European Guidelines, which indicate an effective screening programme. Future research should verify the consistency of some of these results by using updated information from a larger population.
Collapse
|
12
|
Roman M, Skaane P, Hofvind S. The cumulative risk of false-positive screening results across screening centres in the Norwegian Breast Cancer Screening Program. Eur J Radiol 2014; 83:1639-44. [PMID: 24972452 DOI: 10.1016/j.ejrad.2014.05.038] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2014] [Revised: 05/20/2014] [Accepted: 05/26/2014] [Indexed: 11/29/2022]
Abstract
BACKGROUND Recall for assessment in mammographic screening entails an inevitable number of false-positive screening results. This study aimed to investigate the variation in the cumulative risk of a false positive screening result and the positive predictive value across the screening centres in the Norwegian Breast Cancer Screening Program. METHODS We studied 618,636 women aged 50-69 years who underwent 2,090,575 screening exams (1996-2010. Recall rate, positive predictive value, rate of screen-detected cancer, and the cumulative risk of a false positive screening result, without and with invasive procedures across the screening centres were calculated. Generalized linear models were used to estimate the probability of a false positive screening result and to compute the cumulative false-positive risk for up to ten biennial screening examinations. RESULTS The cumulative risk of a false-positive screening exam varied from 10.7% (95% CI: 9.4-12.0%) to 41.5% (95% CI: 34.1-48.9%) across screening centres, with a highest to lowest ratio of 3.9 (95% CI: 3.7-4.0). The highest to lowest ratio for the cumulative risk of undergoing an invasive procedure with a benign outcome was 4.3 (95% CI: 4.0-4.6). The positive predictive value of recall varied between 12.0% (95% CI: 11.0-12.9%) and 19.9% (95% CI: 18.3-21.5%), with a highest to lowest ratio of 1.7 (95% CI: 1.5-1.9). CONCLUSIONS A substantial variation in the performance measures across the screening centres in the Norwegian Breast Cancer Screening Program was identified, despite of similar administration, procedures, and quality assurance requirements. Differences in the readers' performance is probably of influence for the variability. This results underscore the importance of continuous surveillance of the screening centres and the radiologists in order to sustain and improve the performance and effectiveness of screening programs.
Collapse
Affiliation(s)
- M Roman
- Cancer Registry of Norway, Oslo, Norway; Department of Women and Children's Health, Oslo University Hospital, Oslo, Norway.
| | - P Skaane
- Department of Radiology, Oslo University Hospital Ullevaal, University of Oslo, Oslo, Norway.
| | - S Hofvind
- Cancer Registry of Norway, Oslo, Norway; Oslo and Akershus University College of Applied Sciences, Faculty of Health Science, Oslo, Norway.
| |
Collapse
|
13
|
Rawashdeh MA, Lee WB, Bourne RM, Ryan EA, Pietrzyk MW, Reed WM, Heard RC, Black DA, Brennan PC. Markers of Good Performance in Mammography Depend on Number of Annual Readings. Radiology 2013; 269:61-7. [DOI: 10.1148/radiol.13122581] [Citation(s) in RCA: 70] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
14
|
Ascunce N, Delfrade J, Salas D, Zubizarreta R, Ederra M. Programas de detección precoz de cáncer de mama en España: características y principales resultados. Med Clin (Barc) 2013; 141:13-23. [DOI: 10.1016/j.medcli.2012.03.030] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2011] [Revised: 03/14/2012] [Accepted: 03/15/2012] [Indexed: 10/28/2022]
|
15
|
Situación de la investigación en el cribado de cáncer de mama en España: implicaciones para la prevención. GACETA SANITARIA 2012; 26:574-81. [DOI: 10.1016/j.gaceta.2011.11.013] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2011] [Revised: 11/15/2011] [Accepted: 11/16/2011] [Indexed: 11/19/2022]
|
16
|
Hofvind S, Ponti A, Patnick J, Ascunce N, Njor S, Broeders M, Giordano L, Frigerio A, Törnberg S. False-Positive Results in Mammographic Screening for Breast Cancer in Europe: A Literature Review and Survey of Service Screening Programmes. J Med Screen 2012; 19 Suppl 1:57-66. [PMID: 22972811 DOI: 10.1258/jms.2012.012083] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Affiliation(s)
- Solveig Hofvind
- Researcher, Department of Research, Cancer Registry of Norway, Oslo, Norway
| | - Antonio Ponti
- Epidemiologist, Epidemiology Unit, CPO Piemonte, AOU S. Giovanni Battista, Turin, Italy
| | | | - Nieves Ascunce
- Public Health Doctor, Navarra Breast Cancer Screening Programme. Spanish Cancer Screening Network, Public Health Institute, Pamplona, Spain
| | - Sisse Njor
- Post Doc, Centre for Epidemiology and Screening, University of Copenhagen, Copenhagen, Denmark
| | - Mireille Broeders
- Senior Epidemiologist, Department of Epidemiology, Biostatistics and HTA, Radboud University Nijmegen Medical Centre, and National Expert and Training Centre for Breast Cancer Screening, Nijmegen, The Netherlands
| | - Livia Giordano
- MD MPH, Epidemiologist, Epidemiology Unit, CPO Piemonte, AOU S. Giovanni Battista, Turin, Italy
| | - Alfonso Frigerio
- Radiologist, Regional Reference Centre for Breast Cancer Screening, AOU S. Giovanni Battista, Turin, Italy
| | - Sven Törnberg
- Oncologist and Director, Cancer Screening Unit, Oncologic Centre S3:00, Karolinska University Hospital, Stockholm, Sweden
| |
Collapse
|
17
|
Point: Generalism vs Subspecialization—The ACR Should Encourage Radiologists to Structure Their Practices Around a Model of Subspecialization. J Am Coll Radiol 2012; 9:535-6. [DOI: 10.1016/j.jacr.2012.04.004] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2012] [Accepted: 04/04/2012] [Indexed: 11/23/2022]
|
18
|
Ascunce N, Ederra M, Delfrade J, Baroja A, Erdozain N, Zubizarreta R, Salas D, Castells X. Impact of intermediate mammography assessment on the likelihood of false-positive results in breast cancer screening programmes. Eur Radiol 2011; 22:331-40. [DOI: 10.1007/s00330-011-2263-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2011] [Revised: 08/25/2011] [Accepted: 08/29/2011] [Indexed: 12/01/2022]
|