1
|
Dratsch T, Chen X, Rezazade Mehrizi M, Kloeckner R, Mähringer-Kunz A, Püsken M, Baeßler B, Sauer S, Maintz D, Pinto Dos Santos D. Automation Bias in Mammography: The Impact of Artificial Intelligence BI-RADS Suggestions on Reader Performance. Radiology 2023; 307:e222176. [PMID: 37129490 DOI: 10.1148/radiol.222176] [Citation(s) in RCA: 62] [Impact Index Per Article: 62.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Background Automation bias (the propensity for humans to favor suggestions from automated decision-making systems) is a known source of error in human-machine interactions, but its implications regarding artificial intelligence (AI)-aided mammography reading are unknown. Purpose To determine how automation bias can affect inexperienced, moderately experienced, and very experienced radiologists when reading mammograms with the aid of an artificial intelligence (AI) system. Materials and Methods In this prospective experiment, 27 radiologists read 50 mammograms and provided their Breast Imaging Reporting and Data System (BI-RADS) assessment assisted by a purported AI system. Mammograms were obtained between January 2017 and December 2019 and were presented in two randomized sets. The first was a training set of 10 mammograms, with the correct BI-RADS category suggested by the AI system. The second was a set of 40 mammograms in which an incorrect BI-RADS category was suggested for 12 mammograms. Reader performance, degree of bias in BI-RADS scoring, perceived accuracy of the AI system, and reader confidence in their own BI-RADS ratings were assessed using analysis of variance (ANOVA) and repeated-measures ANOVA followed by post hoc tests and Kruskal-Wallis tests followed by the Dunn post hoc test. Results The percentage of correctly rated mammograms by inexperienced (mean, 79.7% ± 11.7 [SD] vs 19.8% ± 14.0; P < .001; r = 0.93), moderately experienced (mean, 81.3% ± 10.1 vs 24.8% ± 11.6; P < .001; r = 0.96), and very experienced (mean, 82.3% ± 4.2 vs 45.5% ± 9.1; P = .003; r = 0.97) radiologists was significantly impacted by the correctness of the AI prediction of BI-RADS category. Inexperienced radiologists were significantly more likely to follow the suggestions of the purported AI when it incorrectly suggested a higher BI-RADS category than the actual ground truth compared with both moderately (mean degree of bias, 4.0 ± 1.8 vs 2.4 ± 1.5; P = .044; r = 0.46) and very (mean degree of bias, 4.0 ± 1.8 vs 1.2 ± 0.8; P = .009; r = 0.65) experienced readers. Conclusion The results show that inexperienced, moderately experienced, and very experienced radiologists reading mammograms are prone to automation bias when being supported by an AI-based system. This and other effects of human and machine interaction must be considered to ensure safe deployment and accurate diagnostic performance when combining human readers and AI. © RSNA, 2023 Supplemental material is available for this article. See also the editorial by Baltzer in this issue.
Collapse
Affiliation(s)
- Thomas Dratsch
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| | - Xue Chen
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| | - Mohammad Rezazade Mehrizi
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| | - Roman Kloeckner
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| | - Aline Mähringer-Kunz
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| | - Michael Püsken
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| | - Bettina Baeßler
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| | - Stephanie Sauer
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| | - David Maintz
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| | - Daniel Pinto Dos Santos
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| |
Collapse
|
2
|
Larsen M, Aglen CF, Hoff SR, Lund-Hanssen H, Hofvind S. Possible strategies for use of artificial intelligence in screen-reading of mammograms, based on retrospective data from 122,969 screening examinations. Eur Radiol 2022; 32:8238-8246. [PMID: 35704111 DOI: 10.1007/s00330-022-08909-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 05/23/2022] [Accepted: 05/25/2022] [Indexed: 12/09/2022]
Abstract
OBJECTIVES Artificial intelligence (AI) has shown promising results when used on retrospective data from mammographic screening. However, few studies have explored the possible consequences of different strategies for combining AI and radiologists in screen-reading. METHODS A total of 122,969 digital screening examinations performed between 2009 and 2018 in BreastScreen Norway were retrospectively processed by an AI system, which scored the examinations from 1 to 10; 1 indicated low suspicion of malignancy and 10 high suspicion. Results were merged with information about screening outcome and used to explore consensus, recall, and cancer detection for 11 different scenarios of combining AI and radiologists. RESULTS Recall was 3.2%, screen-detected cancer 0.61% and interval cancer 0.17% after independent double reading and served as reference values. In a scenario where examinations with AI scores 1-5 were considered negative and 6-10 resulted in standard independent double reading, the estimated recall was 2.6% and screen-detected cancer 0.60%. When scores 1-9 were considered negative and score 10 double read, recall was 1.2% and screen-detected cancer 0.53%. In these two scenarios, potential rates of screen-detected cancer could be up to 0.63% and 0.56%, if the interval cancers selected for consensus were detected at screening. In the former scenario, screen-reading volume would be reduced by 50%, while the latter would reduce the volume by 90%. CONCLUSION Several theoretical scenarios with AI and radiologists have the potential to reduce the volume in screen-reading without affecting cancer detection substantially. Possible influence on recall and interval cancers must be evaluated in prospective studies. KEY POINTS • Different scenarios using artificial intelligence in combination with radiologists could reduce the screen-reading volume by 50% and result in a rate of screen-detected cancer ranging from 0.59% to 0.60%, compared to 0.61% after standard independent double reading • The use of artificial intelligence in combination with radiologists has the potential to identify negative screening examinations with high precision in mammographic screening and to reduce the rate of interval cancer.
Collapse
Affiliation(s)
- Marthe Larsen
- Section for Breast Cancer Screening, Cancer Registry of Norway, Oslo, Norway
| | - Camilla F Aglen
- Section for Breast Cancer Screening, Cancer Registry of Norway, Oslo, Norway
| | - Solveig R Hoff
- Department of Radiology, Ålesund Hospital, Møre og Romsdal Hospital Trust, Ålesund, Norway.,Department of Circulation and Medical Imaging, Norwegian University of Science and Technology, Trondheim, Norway
| | - Håkon Lund-Hanssen
- Department of Radiology and Nuclear Medicine, St. Olavs Hospital, Trondheim University Hospital, Trondheim, Norway
| | - Solveig Hofvind
- Section for Breast Cancer Screening, Cancer Registry of Norway, Oslo, Norway. .,Department of Health and Care Sciences, Faculty of Health Sciences, The Arctic University of Norway, Tromsø, Norway.
| |
Collapse
|