1
|
Bojsen JA, Elhakim MT, Graumann O, Gaist D, Nielsen M, Harbo FSG, Krag CH, Sagar MV, Kruuse C, Boesen MP, Rasmussen BSB. Artificial intelligence for MRI stroke detection: a systematic review and meta-analysis. Insights Imaging 2024; 15:160. [PMID: 38913106 PMCID: PMC11196541 DOI: 10.1186/s13244-024-01723-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Accepted: 05/23/2024] [Indexed: 06/25/2024] Open
Abstract
OBJECTIVES This systematic review and meta-analysis aimed to assess the stroke detection performance of artificial intelligence (AI) in magnetic resonance imaging (MRI), and additionally to identify reporting insufficiencies. METHODS PRISMA guidelines were followed. MEDLINE, Embase, Cochrane Central, and IEEE Xplore were searched for studies utilising MRI and AI for stroke detection. The protocol was prospectively registered with PROSPERO (CRD42021289748). Sensitivity, specificity, accuracy, and area under the receiver operating characteristic (ROC) curve were the primary outcomes. Only studies using MRI in adults were included. The intervention was AI for stroke detection with ischaemic and haemorrhagic stroke in separate categories. Any manual labelling was used as a comparator. A modified QUADAS-2 tool was used for bias assessment. The minimum information about clinical artificial intelligence modelling (MI-CLAIM) checklist was used to assess reporting insufficiencies. Meta-analyses were performed for sensitivity, specificity, and hierarchical summary ROC (HSROC) on low risk of bias studies. RESULTS Thirty-three studies were eligible for inclusion. Fifteen studies had a low risk of bias. Low-risk studies were better for reporting MI-CLAIM items. Only one study examined a CE-approved AI algorithm. Forest plots revealed detection sensitivity and specificity of 93% and 93% with identical performance in the HSROC analysis and positive and negative likelihood ratios of 12.6 and 0.079. CONCLUSION Current AI technology can detect ischaemic stroke in MRI. There is a need for further validation of haemorrhagic detection. The clinical usability of AI stroke detection in MRI is yet to be investigated. CRITICAL RELEVANCE STATEMENT This first meta-analysis concludes that AI, utilising diffusion-weighted MRI sequences, can accurately aid the detection of ischaemic brain lesions and its clinical utility is ready to be uncovered in clinical trials. KEY POINTS There is a growing interest in AI solutions for detection aid. The performance is unknown for MRI stroke assessment. AI detection sensitivity and specificity were 93% and 93% for ischaemic lesions. There is limited evidence for the detection of patients with haemorrhagic lesions. AI can accurately detect patients with ischaemic stroke in MRI.
Collapse
Affiliation(s)
- Jonas Asgaard Bojsen
- Research and Innovation Unit of Radiology, Odense University Hospital, University of Southern Denmark, Odense, Denmark.
| | - Mohammad Talal Elhakim
- Research and Innovation Unit of Radiology, Odense University Hospital, University of Southern Denmark, Odense, Denmark
| | - Ole Graumann
- Research Unit of Radiology, Aarhus University Hospital, Aarhus University, Aarhus, Denmark
| | - David Gaist
- Research Unit for Neurology, Odense University Hospital, University of Southern Denmark, Odense, Denmark
| | - Mads Nielsen
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | - Frederik Severin Gråe Harbo
- Research and Innovation Unit of Radiology, Odense University Hospital, University of Southern Denmark, Odense, Denmark
| | - Christian Hedeager Krag
- Radiological AI Test Center, Copenhagen University Hospital-Bispebjerg, Frederiksberg, Herlev and Gentofte Hospital, Copenhagen, Denmark
- Department of Radiology, Copenhagen University Hospital-Herlev and Gentofte, Copenhagen, Denmark
- Institute of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Malini Vendela Sagar
- Institute of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
- Department of Neurology, Copenhagen University Hospital-Herlev and Gentofte, Copenhagen, Denmark
| | - Christina Kruuse
- Institute of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
- Department of Neurology, Copenhagen University Hospital-Rigshospitalet, Copenhagen, Denmark
| | - Mikael Ploug Boesen
- Radiological AI Test Center, Copenhagen University Hospital-Bispebjerg, Frederiksberg, Herlev and Gentofte Hospital, Copenhagen, Denmark
- Institute of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
- Department of Radiology, Copenhagen University Hospital-Bispebjerg and Frederiksberg, Copenhagen, Denmark
| | - Benjamin Schnack Brandt Rasmussen
- Research and Innovation Unit of Radiology, Odense University Hospital, University of Southern Denmark, Odense, Denmark
- Centre for Clinical Artificial Intelligence, Odense University Hospital, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
2
|
Ewals LJS, Heesterbeek LJJ, Yu B, van der Wulp K, Mavroeidis D, Funk M, Snijders CCP, Jacobs I, Nederend J, Pluyter JR. The Impact of Expectation Management and Model Transparency on Radiologists' Trust and Utilization of AI Recommendations for Lung Nodule Assessment on Computed Tomography: Simulated Use Study. JMIR AI 2024; 3:e52211. [PMID: 38875574 PMCID: PMC11041414 DOI: 10.2196/52211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/27/2023] [Revised: 11/14/2023] [Accepted: 02/03/2024] [Indexed: 06/16/2024]
Abstract
BACKGROUND Many promising artificial intelligence (AI) and computer-aided detection and diagnosis systems have been developed, but few have been successfully integrated into clinical practice. This is partially owing to a lack of user-centered design of AI-based computer-aided detection or diagnosis (AI-CAD) systems. OBJECTIVE We aimed to assess the impact of different onboarding tutorials and levels of AI model explainability on radiologists' trust in AI and the use of AI recommendations in lung nodule assessment on computed tomography (CT) scans. METHODS In total, 20 radiologists from 7 Dutch medical centers performed lung nodule assessment on CT scans under different conditions in a simulated use study as part of a 2×2 repeated-measures quasi-experimental design. Two types of AI onboarding tutorials (reflective vs informative) and 2 levels of AI output (black box vs explainable) were designed. The radiologists first received an onboarding tutorial that was either informative or reflective. Subsequently, each radiologist assessed 7 CT scans, first without AI recommendations. AI recommendations were shown to the radiologist, and they could adjust their initial assessment. Half of the participants received the recommendations via black box AI output and half received explainable AI output. Mental model and psychological trust were measured before onboarding, after onboarding, and after assessing the 7 CT scans. We recorded whether radiologists changed their assessment on found nodules, malignancy prediction, and follow-up advice for each CT assessment. In addition, we analyzed whether radiologists' trust in their assessments had changed based on the AI recommendations. RESULTS Both variations of onboarding tutorials resulted in a significantly improved mental model of the AI-CAD system (informative P=.01 and reflective P=.01). After using AI-CAD, psychological trust significantly decreased for the group with explainable AI output (P=.02). On the basis of the AI recommendations, radiologists changed the number of reported nodules in 27 of 140 assessments, malignancy prediction in 32 of 140 assessments, and follow-up advice in 12 of 140 assessments. The changes were mostly an increased number of reported nodules, a higher estimated probability of malignancy, and earlier follow-up. The radiologists' confidence in their found nodules changed in 82 of 140 assessments, in their estimated probability of malignancy in 50 of 140 assessments, and in their follow-up advice in 28 of 140 assessments. These changes were predominantly increases in confidence. The number of changed assessments and radiologists' confidence did not significantly differ between the groups that received different onboarding tutorials and AI outputs. CONCLUSIONS Onboarding tutorials help radiologists gain a better understanding of AI-CAD and facilitate the formation of a correct mental model. If AI explanations do not consistently substantiate the probability of malignancy across patient cases, radiologists' trust in the AI-CAD system can be impaired. Radiologists' confidence in their assessments was improved by using the AI recommendations.
Collapse
Affiliation(s)
- Lotte J S Ewals
- Catharina Cancer Institute, Catharina Hospital Eindhoven, Eindhoven, Netherlands
| | | | - Bin Yu
- Research Center for Marketing and Supply Chain Management, Nyenrode Business University, Breukelen, Netherlands
| | - Kasper van der Wulp
- Catharina Cancer Institute, Catharina Hospital Eindhoven, Eindhoven, Netherlands
| | | | - Mathias Funk
- Department of Industrial Design, Eindhoven University of Technology, Eindhoven, Netherlands
| | - Chris C P Snijders
- Department of Human Technology Interaction, Eindhoven University of Technology, Eindhoven, Netherlands
| | - Igor Jacobs
- Department of Hospital Services and Informatics, Philips Research, Eindhoven, Netherlands
| | - Joost Nederend
- Catharina Cancer Institute, Catharina Hospital Eindhoven, Eindhoven, Netherlands
| | - Jon R Pluyter
- Department of Experience Design, Royal Philips, Eindhoven, Netherlands
| |
Collapse
|
3
|
Alike Y, Li C, Hou J, Zhou C, Long Y, Zhang Z, Zeng W, Zhang Y, Wang DM, Ye M, Yang R. A two-step neural network-based guiding system for obtaining reliable radiographs for critical shoulder angle measurement. Quant Imaging Med Surg 2024; 14:1406-1416. [PMID: 38415118 PMCID: PMC10895144 DOI: 10.21037/qims-23-610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 11/21/2023] [Indexed: 02/29/2024]
Abstract
Background The critical shoulder angle (CSA) has been reported to be highly associated with rotator cuff tears (RCTs) and an increased risk of RCT re-tears. However, the measurement of the CSA is greatly affected by the malpositioning of the shoulder. To address this issue, a two-step neural network-based guiding system was developed to obtain reliable CSA radiographs, and its feasibility and accuracy was evaluated. Methods A total of 1,754 shoulder anteroposterior (AP) radiographs were retrospectively acquired to train and validate a two-step neural network-based guiding system to obtain reliable CSA radiographs. The study included patients aged 18 years or older who underwent X-rays and/or computed tomography (CT) scans of the shoulder. Patients who had undergone shoulder surgery, had a confirmed fracture, or were diagnosed with a musculoskeletal tumor or glenoid defect were excluded from the study. The system consisted of a two-step neural network that in the first step, localized the region of interest of the shoulder, and in the second step, classified the radiography according to type [i.e., 'forward' when the non-overlapping coracoid process is above the glenoid rim, 'backward' when the non-overlapping coracoid process is below or aligned with the glenoid rim, a ratio of the transverse to longitudinal diameter of the glenoid projection (RTL) ≤0.25, or a RTL >0.25]. The performance of the model was assessed in an offline, prospective manner, focusing on the sensitivity and specificity for the forward, backward, RTL ≤0.25, or RTL >0.25 types (denoted as SensF, B, -, + and SpecF, B, -, +, respectively), and Cohen's kappa was also reported. Results Of 273 cases in the offline prospective test, the SensF, SensB, Sens-, and Sens+ were 88.88% [95% confidence interval (CI): 50.67-99.41%], 94.11% (95% CI: 82.77-98.47%), 96.96% (95% CI: 91.94-99.02%), and 95.06% (95% CI: 87.15-98.40%), respectively. The SpecF, SpecB, Spec-, and Spec+ were 98.48% (95% CI: 95.90-99.51%), 99.55% (95% CI: 97.12-99.97%), 95.04% (95% CI: 89.65-97.81%), and 97.39% (93.69-99.03%), respectively. A high classification rate (93.41%; 95% CI: 89.14-96.24%) and almost perfect agreement (Cohen's kappa: 0.903, 95% CI: 0.86-0.95) were achieved. Conclusions The guiding system can rapidly and accurately classify the types of AP shoulder radiography, thereby guiding the adjustment of patient positioning. This will facilitate the rapid obtainment of reliable CSA radiography to measure the CSA on proper AP radiographs.
Collapse
Affiliation(s)
- Yamuhanmode Alike
- Department of Orthopedics, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
| | - Cheng Li
- Department of Orthopedics, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
| | - Jingyi Hou
- Department of Orthopedics, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
| | - Chuanhai Zhou
- Department of Orthopedics, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
| | - Yi Long
- Department of Orthopedics, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
| | - Zongda Zhang
- Department of Orthopedics, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
| | - Weike Zeng
- Department of Radiology, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
| | - Yuanhao Zhang
- School of Biomedical Sciences, Institute for Tissue Engineering and Regenerative Medicine, The Chinese University of Hong Kong, Hong Kong, China
| | - Dan Michelle Wang
- School of Biomedical Sciences, Institute for Tissue Engineering and Regenerative Medicine, The Chinese University of Hong Kong, Hong Kong, China
| | - Mengjie Ye
- Intelligent Engineering and Education Application Research Center, Zhuhai Campus of Beijing Normal University, Zhuhai, China
| | - Rui Yang
- Department of Orthopedics, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
4
|
Al-Bazzaz H, Janicijevic M, Strand F. Reader bias in breast cancer screening related to cancer prevalence and artificial intelligence decision support-a reader study. Eur Radiol 2024:10.1007/s00330-023-10514-5. [PMID: 38165430 DOI: 10.1007/s00330-023-10514-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Revised: 09/23/2023] [Accepted: 11/01/2023] [Indexed: 01/03/2024]
Abstract
OBJECTIVES The aim of our study was to examine how breast radiologists would be affected by high cancer prevalence and the use of artificial intelligence (AI) for decision support. MATERIALS AND METHOD This reader study was based on selection of screening mammograms, including the original radiologist assessment, acquired in 2010 to 2013 at the Karolinska University Hospital, with a ratio of 1:1 cancer versus healthy based on a 2-year follow-up. A commercial AI system generated an exam-level positive or negative read, and image markers. Double-reading and consensus discussions were first performed without AI and later with AI, with a 6-week wash-out period in between. The chi-squared test was used to test for differences in contingency tables. RESULTS Mammograms of 758 women were included, half with cancer and half healthy. 52% were 40-55 years; 48% were 56-75 years. In the original non-enriched screening setting, the sensitivity was 61% (232/379) at specificity 98% (323/379). In the reader study, the sensitivity without and with AI was 81% (307/379) and 75% (284/379) respectively (p < 0.001). The specificity without and with AI was 67% (255/379) and 86% (326/379) respectively (p < 0.001). The tendency to change assessment from positive to negative based on erroneous AI information differed between readers and was affected by type and number of image signs of malignancy. CONCLUSION Breast radiologists reading a list with high cancer prevalence performed at considerably higher sensitivity and lower specificity than the original screen-readers. Adding AI information, calibrated to a screening setting, decreased sensitivity and increased specificity. CLINICAL RELEVANCE STATEMENT Radiologist screening mammography assessments will be biased towards higher sensitivity and lower specificity by high-risk triaging and nudged towards the sensitivity and specificity setting of AI reads. After AI implementation in clinical practice, there is reason to carefully follow screening metrics to ensure the impact is desired. KEY POINTS • Breast radiologists' sensitivity and specificity will be affected by changes brought by artificial intelligence. • Reading in a high cancer prevalence setting markedly increased sensitivity and decreased specificity. • Reviewing the binary reads by AI, negative or positive, biased screening radiologists towards the sensitivity and specificity of the AI system.
Collapse
Affiliation(s)
- Hanen Al-Bazzaz
- Mälarsjukhuset Eskilstuna, Kungsvägen 42, 633 49, Eskilstuna, Sweden
| | | | - Fredrik Strand
- Department of Oncology-Pathology, Karolinska Institutet, L2:03, Karolinska Vägen 8, 171 64, Solna, Sweden.
- Breast Radiology, Medical Diagnostics Karolinska, Karolinska University Hospital, NB1:03, Gävlegatan 55, 171 76, Stockholm, Sweden.
| |
Collapse
|