1
|
Siepmann R, Huppertz M, Rastkhiz A, Reen M, Corban E, Schmidt C, Wilke S, Schad P, Yüksel C, Kuhl C, Truhn D, Nebelung S. The virtual reference radiologist: comprehensive AI assistance for clinical image reading and interpretation. Eur Radiol 2024; 34:6652-6666. [PMID: 38627289 PMCID: PMC11399201 DOI: 10.1007/s00330-024-10727-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 02/27/2024] [Accepted: 03/08/2024] [Indexed: 04/20/2024]
Abstract
OBJECTIVES Large language models (LLMs) have shown potential in radiology, but their ability to aid radiologists in interpreting imaging studies remains unexplored. We investigated the effects of a state-of-the-art LLM (GPT-4) on the radiologists' diagnostic workflow. MATERIALS AND METHODS In this retrospective study, six radiologists of different experience levels read 40 selected radiographic [n = 10], CT [n = 10], MRI [n = 10], and angiographic [n = 10] studies unassisted (session one) and assisted by GPT-4 (session two). Each imaging study was presented with demographic data, the chief complaint, and associated symptoms, and diagnoses were registered using an online survey tool. The impact of Artificial Intelligence (AI) on diagnostic accuracy, confidence, user experience, input prompts, and generated responses was assessed. False information was registered. Linear mixed-effect models were used to quantify the factors (fixed: experience, modality, AI assistance; random: radiologist) influencing diagnostic accuracy and confidence. RESULTS When assessing if the correct diagnosis was among the top-3 differential diagnoses, diagnostic accuracy improved slightly from 181/240 (75.4%, unassisted) to 188/240 (78.3%, AI-assisted). Similar improvements were found when only the top differential diagnosis was considered. AI assistance was used in 77.5% of the readings. Three hundred nine prompts were generated, primarily involving differential diagnoses (59.1%) and imaging features of specific conditions (27.5%). Diagnostic confidence was significantly higher when readings were AI-assisted (p > 0.001). Twenty-three responses (7.4%) were classified as hallucinations, while two (0.6%) were misinterpretations. CONCLUSION Integrating GPT-4 in the diagnostic process improved diagnostic accuracy slightly and diagnostic confidence significantly. Potentially harmful hallucinations and misinterpretations call for caution and highlight the need for further safeguarding measures. CLINICAL RELEVANCE STATEMENT Using GPT-4 as a virtual assistant when reading images made six radiologists of different experience levels feel more confident and provide more accurate diagnoses; yet, GPT-4 gave factually incorrect and potentially harmful information in 7.4% of its responses.
Collapse
Affiliation(s)
- Robert Siepmann
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Marc Huppertz
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Annika Rastkhiz
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Matthias Reen
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Eric Corban
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Christian Schmidt
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Stephan Wilke
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Philipp Schad
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Can Yüksel
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Christiane Kuhl
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Daniel Truhn
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Sven Nebelung
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany.
| |
Collapse
|
2
|
Fu H, Novak A, Robert D, Kumar S, Tanamala S, Oke J, Bhatia K, Shah R, Romsauerova A, Das T, Espinosa A, Grzeda MT, Narbone M, Dharmadhikari R, Harrison M, Vimalesvaran K, Gooch J, Woznitza N, Salik N, Campbell A, Khan F, Lowe DJ, Shuaib H, Ather S. AI assisted reader evaluation in acute CT head interpretation (AI-REACT): protocol for a multireader multicase study. BMJ Open 2024; 14:e079824. [PMID: 38346874 PMCID: PMC10862304 DOI: 10.1136/bmjopen-2023-079824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 01/28/2024] [Indexed: 02/15/2024] Open
Abstract
INTRODUCTION A non-contrast CT head scan (NCCTH) is the most common cross-sectional imaging investigation requested in the emergency department. Advances in computer vision have led to development of several artificial intelligence (AI) tools to detect abnormalities on NCCTH. These tools are intended to provide clinical decision support for clinicians, rather than stand-alone diagnostic devices. However, validation studies mostly compare AI performance against radiologists, and there is relative paucity of evidence on the impact of AI assistance on other healthcare staff who review NCCTH in their daily clinical practice. METHODS AND ANALYSIS A retrospective data set of 150 NCCTH will be compiled, to include 60 control cases and 90 cases with intracranial haemorrhage, hypodensities suggestive of infarct, midline shift, mass effect or skull fracture. The intracranial haemorrhage cases will be subclassified into extradural, subdural, subarachnoid, intraparenchymal and intraventricular. 30 readers will be recruited across four National Health Service (NHS) trusts including 10 general radiologists, 15 emergency medicine clinicians and 5 CT radiographers of varying experience. Readers will interpret each scan first without, then with, the assistance of the qER EU 2.0 AI tool, with an intervening 2-week washout period. Using a panel of neuroradiologists as ground truth, the stand-alone performance of qER will be assessed, and its impact on the readers' performance will be analysed as change in accuracy (area under the curve), median review time per scan and self-reported diagnostic confidence. Subgroup analyses will be performed by reader professional group, reader seniority, pathological finding, and neuroradiologist-rated difficulty. ETHICS AND DISSEMINATION The study has been approved by the UK Healthcare Research Authority (IRAS 310995, approved 13 December 2022). The use of anonymised retrospective NCCTH has been authorised by Oxford University Hospitals. The results will be presented at relevant conferences and published in a peer-reviewed journal. TRIAL REGISTRATION NUMBER NCT06018545.
Collapse
Affiliation(s)
- Howell Fu
- Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Alex Novak
- Emergency Medicine Research Oxford, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | | | | | | | - Jason Oke
- Department of Primary Care Health Sciences, University of Oxford, Oxford, UK
| | - Kanika Bhatia
- Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Ruchir Shah
- Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | | | - Tilak Das
- Department of Clinical Radiology, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Abdalá Espinosa
- Emergency Medicine Research Oxford, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | | | | | | | - Mark Harrison
- Emergency Department, Northumbria Specialist Emergency Care Hospital, Cramlington, UK
| | - Kavitha Vimalesvaran
- Clinical Scientific Computing, Guy's and St Thomas' NHS Foundation Trust, London, UK
| | - Jane Gooch
- College of Health, Psychology & Social Care, University of Derby, Derby, UK
| | - Nicholas Woznitza
- Radiology Department, University College London Hospitals NHS Foundation Trust, London, UK
- School of Allied and Public Health Professions, Canterbury Christ Church University, Canterbury, UK
| | | | - Alan Campbell
- Radiology Department, University College London Hospitals NHS Foundation Trust, London, UK
| | - Farhaan Khan
- Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | | | - Haris Shuaib
- Clinical Scientific Computing, Guy's and St Thomas' NHS Foundation Trust, London, UK
| | - Sarim Ather
- Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| |
Collapse
|
3
|
Schlaeger S, Shit S, Eichinger P, Hamann M, Opfer R, Krüger J, Dieckmeyer M, Schön S, Mühlau M, Zimmer C, Kirschke JS, Wiestler B, Hedderich DM. AI-based detection of contrast-enhancing MRI lesions in patients with multiple sclerosis. Insights Imaging 2023; 14:123. [PMID: 37454342 DOI: 10.1186/s13244-023-01460-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 06/03/2023] [Indexed: 07/18/2023] Open
Abstract
BACKGROUND Contrast-enhancing (CE) lesions are an important finding on brain magnetic resonance imaging (MRI) in patients with multiple sclerosis (MS) but can be missed easily. Automated solutions for reliable CE lesion detection are emerging; however, independent validation of artificial intelligence (AI) tools in the clinical routine is still rare. METHODS A three-dimensional convolutional neural network for CE lesion segmentation was trained externally on 1488 datasets of 934 MS patients from 81 scanners using concatenated information from FLAIR and T1-weighted post-contrast imaging. This externally trained model was tested on an independent dataset comprising 504 T1-weighted post-contrast and FLAIR image datasets of MS patients from clinical routine. Two neuroradiologists (R1, R2) labeled CE lesions for gold standard definition in the clinical test dataset. The algorithmic output was evaluated on both patient- and lesion-level. RESULTS On a patient-level, recall, specificity, precision, and accuracy of the AI tool to predict patients with CE lesions were 0.75, 0.99, 0.91, and 0.96. The agreement between the AI tool and both readers was within the range of inter-rater agreement (Cohen's kappa; AI vs. R1: 0.69; AI vs. R2: 0.76; R1 vs. R2: 0.76). On a lesion-level, false negative lesions were predominately found in infratentorial location, significantly smaller, and at lower contrast than true positive lesions (p < 0.05). CONCLUSIONS AI-based identification of CE lesions on brain MRI is feasible, approaching human reader performance in independent clinical data and might be of help as a second reader in the neuroradiological assessment of active inflammation in MS patients. CRITICAL RELEVANCE STATEMENT Al-based detection of contrast-enhancing multiple sclerosis lesions approaches human reader performance, but careful visual inspection is still needed, especially for infratentorial, small and low-contrast lesions.
Collapse
Affiliation(s)
- Sarah Schlaeger
- Department of Diagnostic and Interventional Neuroradiology, School of Medicine, Klinikum rechts der Isar, Technical University of Munich, Ismaninger Str. 22, 81675, Munich, Germany.
| | - Suprosanna Shit
- Department of Diagnostic and Interventional Neuroradiology, School of Medicine, Klinikum rechts der Isar, Technical University of Munich, Ismaninger Str. 22, 81675, Munich, Germany
| | - Paul Eichinger
- Department of Diagnostic and Interventional Neuroradiology, School of Medicine, Klinikum rechts der Isar, Technical University of Munich, Ismaninger Str. 22, 81675, Munich, Germany
| | | | | | | | - Michael Dieckmeyer
- Department of Diagnostic and Interventional Neuroradiology, School of Medicine, Klinikum rechts der Isar, Technical University of Munich, Ismaninger Str. 22, 81675, Munich, Germany
- Department of Diagnostic, Interventional and Pediatric Radiology, Inselspital, University Hospital, University of Bern, Bern, Switzerland
| | - Simon Schön
- Department of Diagnostic and Interventional Neuroradiology, School of Medicine, Klinikum rechts der Isar, Technical University of Munich, Ismaninger Str. 22, 81675, Munich, Germany
- DIE RADIOLOGIE, Munich, Germany
| | - Mark Mühlau
- Department of Neurology, School of Medicine, Klinikum rechts der Isar, Technical University of Munich, Munich, Germany
| | - Claus Zimmer
- Department of Diagnostic and Interventional Neuroradiology, School of Medicine, Klinikum rechts der Isar, Technical University of Munich, Ismaninger Str. 22, 81675, Munich, Germany
| | - Jan S Kirschke
- Department of Diagnostic and Interventional Neuroradiology, School of Medicine, Klinikum rechts der Isar, Technical University of Munich, Ismaninger Str. 22, 81675, Munich, Germany
| | - Benedikt Wiestler
- Department of Diagnostic and Interventional Neuroradiology, School of Medicine, Klinikum rechts der Isar, Technical University of Munich, Ismaninger Str. 22, 81675, Munich, Germany
| | - Dennis M Hedderich
- Department of Diagnostic and Interventional Neuroradiology, School of Medicine, Klinikum rechts der Isar, Technical University of Munich, Ismaninger Str. 22, 81675, Munich, Germany
| |
Collapse
|
4
|
Dratsch T, Chen X, Rezazade Mehrizi M, Kloeckner R, Mähringer-Kunz A, Püsken M, Baeßler B, Sauer S, Maintz D, Pinto Dos Santos D. Automation Bias in Mammography: The Impact of Artificial Intelligence BI-RADS Suggestions on Reader Performance. Radiology 2023; 307:e222176. [PMID: 37129490 DOI: 10.1148/radiol.222176] [Citation(s) in RCA: 63] [Impact Index Per Article: 63.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Background Automation bias (the propensity for humans to favor suggestions from automated decision-making systems) is a known source of error in human-machine interactions, but its implications regarding artificial intelligence (AI)-aided mammography reading are unknown. Purpose To determine how automation bias can affect inexperienced, moderately experienced, and very experienced radiologists when reading mammograms with the aid of an artificial intelligence (AI) system. Materials and Methods In this prospective experiment, 27 radiologists read 50 mammograms and provided their Breast Imaging Reporting and Data System (BI-RADS) assessment assisted by a purported AI system. Mammograms were obtained between January 2017 and December 2019 and were presented in two randomized sets. The first was a training set of 10 mammograms, with the correct BI-RADS category suggested by the AI system. The second was a set of 40 mammograms in which an incorrect BI-RADS category was suggested for 12 mammograms. Reader performance, degree of bias in BI-RADS scoring, perceived accuracy of the AI system, and reader confidence in their own BI-RADS ratings were assessed using analysis of variance (ANOVA) and repeated-measures ANOVA followed by post hoc tests and Kruskal-Wallis tests followed by the Dunn post hoc test. Results The percentage of correctly rated mammograms by inexperienced (mean, 79.7% ± 11.7 [SD] vs 19.8% ± 14.0; P < .001; r = 0.93), moderately experienced (mean, 81.3% ± 10.1 vs 24.8% ± 11.6; P < .001; r = 0.96), and very experienced (mean, 82.3% ± 4.2 vs 45.5% ± 9.1; P = .003; r = 0.97) radiologists was significantly impacted by the correctness of the AI prediction of BI-RADS category. Inexperienced radiologists were significantly more likely to follow the suggestions of the purported AI when it incorrectly suggested a higher BI-RADS category than the actual ground truth compared with both moderately (mean degree of bias, 4.0 ± 1.8 vs 2.4 ± 1.5; P = .044; r = 0.46) and very (mean degree of bias, 4.0 ± 1.8 vs 1.2 ± 0.8; P = .009; r = 0.65) experienced readers. Conclusion The results show that inexperienced, moderately experienced, and very experienced radiologists reading mammograms are prone to automation bias when being supported by an AI-based system. This and other effects of human and machine interaction must be considered to ensure safe deployment and accurate diagnostic performance when combining human readers and AI. © RSNA, 2023 Supplemental material is available for this article. See also the editorial by Baltzer in this issue.
Collapse
Affiliation(s)
- Thomas Dratsch
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| | - Xue Chen
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| | - Mohammad Rezazade Mehrizi
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| | - Roman Kloeckner
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| | - Aline Mähringer-Kunz
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| | - Michael Püsken
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| | - Bettina Baeßler
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| | - Stephanie Sauer
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| | - David Maintz
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| | - Daniel Pinto Dos Santos
- From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.)
| |
Collapse
|