1
|
Singer C, Saban M, Luxenburg O, Yellin LB, Hierath M, Sosna J, Karoussou-Schreiner A, Brkljačić B. Computed tomography referral guidelines adherence in Europe: insights from a seven-country audit. Eur Radiol 2024:10.1007/s00330-024-11083-x. [PMID: 39384590 DOI: 10.1007/s00330-024-11083-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Revised: 07/21/2024] [Accepted: 09/09/2024] [Indexed: 10/11/2024]
Abstract
BACKGROUND Ensuring appropriate computed tomography (CT) utilization optimizes patient care while minimizing radiation exposure. Decision support tools show promise for standardizing appropriateness. OBJECTIVES In the current study, we aimed to assess CT appropriateness rates using the European Society of Radiology (ESR) iGuide criteria across seven European countries. Additional objectives were to identify factors associated with appropriateness variability and examine recommended alternative exams. METHODS As part of the European Commission-funded EU-JUST-CT project, 6734 anonymized CT referrals were audited across 125 centers in Belgium, Denmark, Estonia, Finland, Greece, Hungary, and Slovenia. In each country, two blinded radiologists independently scored each exam's appropriateness using the ESR iGuide and noted any recommended alternatives based on presented indications. Arbitration was used in case auditors disagreed. Associations between appropriateness rate and institution type, patient's age and sex, inpatient/outpatient patient status, anatomical area, and referring physician's specialty were statistically examined within each country. RESULTS The average appropriateness rate was 75%, ranging from 58% in Greece to 86% in Denmark. Higher rates were associated with public hospitals, inpatient settings, and referrals from specialists. Variability in appropriateness existed by country and anatomical area, patient age, and gender. Common alternative exam recommendations included magnetic resonance imaging, X-ray, and ultrasound. CONCLUSION This multi-country evaluation found that even when using a standardized imaging guideline, significant variations in CT appropriateness persist, ranging from 58% to 86% across the participating countries. The study provided valuable insights into real-world utilization patterns and identified opportunities to optimize practices and reduce clinical and demographic disparities in CT use. KEY POINTS Question Largest multinational study (7 EU countries, 6734 CT referrals) assessed real-world CT appropriateness using ESR iGuide, enabling cross-system comparisons. Findings Significant variability in appropriateness rates across institution type, patient status, age, gender, exam area, and physician specialty, highlighted the opportunities to optimize practices based on local factors. Clinical relevance International collaboration on imaging guidelines and decision support can maximize CT benefits while optimizing radiation exposure; ongoing research is crucial for refining evidence-based guidelines globally.
Collapse
Affiliation(s)
- Clara Singer
- The Gertner Institute for Epidemiology and Health Policy Research, Chaim Sheba Medical Center, Tel Hashomer, Ramat-Gan, Israel
| | - Mor Saban
- The Gertner Institute for Epidemiology and Health Policy Research, Chaim Sheba Medical Center, Tel Hashomer, Ramat-Gan, Israel
- Nursing Department, School of Health Sciences, Faculty of Medical and Health Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Osnat Luxenburg
- Medical Technology, Health Information and Research Directorate, Ministry of Health, Jerusalem, Israel
| | - Lucia Bergovoy Yellin
- The Gertner Institute for Epidemiology and Health Policy Research, Chaim Sheba Medical Center, Tel Hashomer, Ramat-Gan, Israel
| | | | - Jacob Sosna
- Department of Radiology, Hadassah Medical Center, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel.
| | | | - Boris Brkljačić
- Department of Radiology, University Hospital Dubrava, School of Medicine, University of Zagreb, Zagreb, Croatia
| |
Collapse
|
2
|
Sarangi PK, Datta S, Swarup MS, Panda S, Nayak DSK, Malik A, Datta A, Mondal H. Radiologic Decision-Making for Imaging in Pulmonary Embolism: Accuracy and Reliability of Large Language Models-Bing, Claude, ChatGPT, and Perplexity. Indian J Radiol Imaging 2024; 34:653-660. [PMID: 39318561 PMCID: PMC11419749 DOI: 10.1055/s-0044-1787974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/26/2024] Open
Abstract
Background Artificial intelligence chatbots have demonstrated potential to enhance clinical decision-making and streamline health care workflows, potentially alleviating administrative burdens. However, the contribution of AI chatbots to radiologic decision-making for clinical scenarios remains insufficiently explored. This study evaluates the accuracy and reliability of four prominent Large Language Models (LLMs)-Microsoft Bing, Claude, ChatGPT 3.5, and Perplexity-in offering clinical decision support for initial imaging for suspected pulmonary embolism (PE). Methods Open-ended (OE) and select-all-that-apply (SATA) questions were crafted, covering four variants of case scenarios of PE in-line with the American College of Radiology Appropriateness Criteria. These questions were presented to the LLMs by three radiologists from diverse geographical regions and setups. The responses were evaluated based on established scoring criteria, with a maximum achievable score of 2 points for OE responses and 1 point for each correct answer in SATA questions. To enable comparative analysis, scores were normalized (score divided by the maximum achievable score). Result In OE questions, Perplexity achieved the highest accuracy (0.83), while Claude had the lowest (0.58), with Bing and ChatGPT each scoring 0.75. For SATA questions, Bing led with an accuracy of 0.96, Perplexity was the lowest at 0.56, and both Claude and ChatGPT scored 0.6. Overall, OE questions saw higher scores (0.73) compared to SATA (0.68). There is poor agreement among radiologists' scores for OE (Intraclass Correlation Coefficient [ICC] = -0.067, p = 0.54), while there is strong agreement for SATA (ICC = 0.875, p < 0.001). Conclusion The study revealed variations in accuracy across LLMs for both OE and SATA questions. Perplexity showed superior performance in OE questions, while Bing excelled in SATA questions. OE queries yielded better overall results. The current inconsistencies in LLM accuracy highlight the importance of further refinement before these tools can be reliably integrated into clinical practice, with a need for additional LLM fine-tuning and judicious selection by radiologists to achieve consistent and reliable support for decision-making.
Collapse
Affiliation(s)
- Pradosh Kumar Sarangi
- Department of Radiodiagnosis, All India Institute of Medical Sciences Deoghar, Deoghar, Jharkhand, India
| | - Suvrankar Datta
- Department of Radiodiagnosis, All India Institute of Medical Sciences New Delhi, New Delhi, India
| | - M. Sarthak Swarup
- Department of Radiodiagnosis, Vardhman Mahavir Medical College and Safdarjung Hospital New Delhi, New Delhi, India
| | - Swaha Panda
- Department of Otorhinolaryngology and Head and Neck Surgery, All India Institute of Medical Sciences Deoghar, Deoghar, Jharkhand, India
| | - Debasish Swapnesh Kumar Nayak
- Department of Computer Science and Engineering, SOET, Centurion University of Technology and Management, Bhubaneswar, Odisha, India
| | - Archana Malik
- Department of Pulmonary Medicine, All India Institute of Medical Sciences Deoghar, Deoghar, Jharkhand, India
| | - Ananda Datta
- Department of Pulmonary Medicine, All India Institute of Medical Sciences Deoghar, Deoghar, Jharkhand, India
| | - Himel Mondal
- Department of Physiology, All India Institute of Medical Sciences Deoghar, Deoghar, Jharkhand, India
| |
Collapse
|
3
|
Sosna J. Insights from a web-based questionnaire: examining diagnostic procedures prior to magnetic resonance imaging. Isr J Health Policy Res 2024; 13:49. [PMID: 39294783 PMCID: PMC11409486 DOI: 10.1186/s13584-024-00636-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Accepted: 09/06/2024] [Indexed: 09/21/2024] Open
Abstract
The appropriate use of diagnostic imaging, particularly MRI, is a critical concern in modern healthcare. This paper examines the current state of MRI utilization in Israel, drawing on a recent study by Kaim et al. that surveyed 557 Israeli adults who underwent MRI in the public health system. The study revealed that 60% of participants had undergone other imaging tests before their MRI, with 23% having more than one prior examination. While these findings highlight potential inefficiencies in the diagnostic pathway, they also underscore the complexity of medical decision-making in imaging.The paper discusses various factors influencing MRI utilization, including regulatory pressures, healthcare system structure, and the need for evidence-based guidelines. It explores potential strategies for optimizing MRI justification and scheduling, such as implementing clinical decision support systems, enhancing interdisciplinary communication, and leveraging artificial intelligence (AI) for predictive analytics and resource optimization.The need for comprehensive research into MRI justification and scheduling optimization is presented. Key areas for investigation include the effectiveness of decision support tools, patient outcomes, economic analyses, and the application of quality improvement methodologies.
Collapse
Affiliation(s)
- Jacob Sosna
- Department of Radiology, Faculty of Medicine, Hadassah Medical Center, Hebrew University of Jerusalem, Jerusalem, 91120, Israel.
| |
Collapse
|
4
|
Luxenburg O, Vaknin S, Wilf-Miron R, Saban M. Evaluating the Accuracy and Impact of the ESR-iGuide Decision Support Tool in Optimizing CT Imaging Referral Appropriateness. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024:10.1007/s10278-024-01197-5. [PMID: 39028357 DOI: 10.1007/s10278-024-01197-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2024] [Revised: 07/01/2024] [Accepted: 07/02/2024] [Indexed: 07/20/2024]
Abstract
Radiology referral quality impacts patient care, yet factors influencing quality are poorly understood. This study assessed the quality of computed tomography (CT) referrals, identified associated characteristics, and evaluated the ESR-iGuide clinical decision support tool's ability to optimize referrals. A retrospective review analyzed 300 consecutive CT referrals from an acute care hospital. Referral quality was evaluated on a 5-point scale by three expert reviewers (inter-rater reliability κ = 0.763-0.97). The ESR-iGuide tool provided appropriateness scores and estimated radiation exposure levels for the actual referred exams and recommended exams. Scores were compared between actual and recommended exams. Associations between ESR-iGuide scores and referral characteristics, including the specialty of the ordering physician (surgical vs. non-surgical), were explored. Of the referrals, 67.1% were rated as appropriate. The most common exams were head and abdomen/pelvis CTs. The ESR-iGuide deemed 70% of the actual referrals "usually appropriate" and found that the recommended exams had lower estimated radiation exposure compared to the actual exams. Logistic regression analysis showed that non-surgical physicians were more likely to order inappropriate exams compared to surgical physicians. Over one-third of the referrals showed suboptimal quality in the unstructured system. The ESR-iGuide clinical decision support tool identified opportunities to optimize appropriateness and reduce radiation exposure. Implementation of such a tool warrants consideration to improve communication and maximize patient care quality.
Collapse
Affiliation(s)
- Osnat Luxenburg
- Medical Technology, Health Information and Research Directorate, Ministry of Health, Jerusalem, Israel
| | - Sharona Vaknin
- The Gertner Institute for Health Policy and Epidemiology, Ramat-Gan, Israel
| | - Rachel Wilf-Miron
- Department of Health Promotion, School of Public Health, Faculty of Medical & Health Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Mor Saban
- School of Health Professions, Faculty of Medical & Health Sciences, Tel-Aviv University, Tel-Aviv-Yafo, Israel.
| |
Collapse
|
5
|
Russe MF, Rau A, Ermer MA, Rothweiler R, Wenger S, Klöble K, Schulze RKW, Bamberg F, Schmelzeisen R, Reisert M, Semper-Hogg W. A content-aware chatbot based on GPT 4 provides trustworthy recommendations for Cone-Beam CT guidelines in dental imaging. Dentomaxillofac Radiol 2024; 53:109-114. [PMID: 38180877 PMCID: PMC11003655 DOI: 10.1093/dmfr/twad015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 12/01/2023] [Accepted: 12/18/2023] [Indexed: 01/07/2024] Open
Abstract
OBJECTIVES To develop a content-aware chatbot based on GPT-3.5-Turbo and GPT-4 with specialized knowledge on the German S2 Cone-Beam CT (CBCT) dental imaging guideline and to compare the performance against humans. METHODS The LlamaIndex software library was used to integrate the guideline context into the chatbots. Based on the CBCT S2 guideline, 40 questions were posed to content-aware chatbots and early career and senior practitioners with different levels of experience served as reference. The chatbots' performance was compared in terms of recommendation accuracy and explanation quality. Chi-square test and one-tailed Wilcoxon signed rank test evaluated accuracy and explanation quality, respectively. RESULTS The GPT-4 based chatbot provided 100% correct recommendations and superior explanation quality compared to the one based on GPT3.5-Turbo (87.5% vs. 57.5% for GPT-3.5-Turbo; P = .003). Moreover, it outperformed early career practitioners in correct answers (P = .002 and P = .032) and earned higher trust than the chatbot using GPT-3.5-Turbo (P = 0.006). CONCLUSIONS A content-aware chatbot using GPT-4 reliably provided recommendations according to current consensus guidelines. The responses were deemed trustworthy and transparent, and therefore facilitate the integration of artificial intelligence into clinical decision-making.
Collapse
Affiliation(s)
- Maximilian Frederik Russe
- Department of Diagnostic and Interventional Radiology, Medical Center – University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg 79106, Germany
| | - Alexander Rau
- Department of Diagnostic and Interventional Radiology, Medical Center – University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg 79106, Germany
- Department of Neuroradiology, Medical Center – University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg 79106, Germany
| | - Michael Andreas Ermer
- Department of Oral and Maxillofacial Surgery, Medical Center – University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg 79106, Germany
| | - René Rothweiler
- Department of Oral and Maxillofacial Surgery, Medical Center – University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg 79106, Germany
| | - Sina Wenger
- Department of Oral and Maxillofacial Surgery, Medical Center – University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg 79106, Germany
| | - Klara Klöble
- Department of Oral and Maxillofacial Surgery, Medical Center – University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg 79106, Germany
| | - Ralf K W Schulze
- Division of Oral Diagnostic Sciences, Department of Oral Surgery and Stomatology and Oral Diagnostics, School of Dental Medicine, University of Bern, Bern 3010, Switzerland
| | - Fabian Bamberg
- Department of Diagnostic and Interventional Radiology, Medical Center – University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg 79106, Germany
| | - Rainer Schmelzeisen
- Department of Oral and Maxillofacial Surgery, Medical Center – University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg 79106, Germany
| | - Marco Reisert
- Division of Medical Physics, Department of Diagnostic and Interventional Radiology, Medical Center – University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg 79106, Germany
- Department of Stereotactic and Functional Neurosurgery, Medical Center – University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg 79106, Germany
| | - Wiebke Semper-Hogg
- Department of Oral and Maxillofacial Surgery, Medical Center – University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg 79106, Germany
| |
Collapse
|
6
|
Rosen S, Singer C, Vaknin S, Kaim A, Luxenburg O, Makori A, Goldberg N, Rad M, Gitman S, Saban M. Inappropriate CT examinations: how much, who and where? Insights from a clinical decision support system (CDSS) analysis. Eur Radiol 2023; 33:7796-7804. [PMID: 37646812 DOI: 10.1007/s00330-023-10136-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 06/29/2023] [Accepted: 07/04/2023] [Indexed: 09/01/2023]
Abstract
OBJECTIVE To assess the appropriateness of Computed Tomography (CT) examinations, using the ESR-iGuide. MATERIAL AND METHODS A retrospective study was conducted in 2022 in a medium-sized acute care teaching hospital. A total of 278 consecutive cases of CT referral were included. For each imaging referral, the ESR-iGuide provided an appropriateness score using a scale of 1-9 and the Relative Radiation Level using a scale of 0-5. These were then compared with the appropriateness score and the radiation level of the recommended ESR-iGuide exam. DATA ANALYSIS Pearson's chi-square test or Fisher exact test was used to explore the correlation between ESR-iGuide appropriateness level and physician, patients, and shift characteristics. A stepwise logistic regression model was used to capture the contribution of each of these factors. RESULTS Most of exams performed were CT head (63.67%) or CT abdominal pelvis (23.74%). Seventy percent of the actual imaging referrals resulted in an ESR-iGuide score corresponding to "usually appropriate." The mean radiation level for actual exam was 3.2 ± 0.45 compared with 2.16 ± 1.56 for the recommended exam. When using a stepwise logistic regression for modeling the probability of non-appropriate score, both physician specialty and status were significant (p = 0.0011, p = 0.0192 respectively). Non-surgical and specialist physicians were more likely to order inappropriate exams than surgical physicians. CONCLUSIONS ESR-iGuide software indicates a substantial rate of inappropriate exams of CT head and CT abdominal-pelvis and unnecessary radiation exposure mainly in the ED department. Inappropriate exams were found to be related to physicians' specialty and seniority. CLINICAL RELEVANCE STATEMENT These findings underscore the urgent need for improved imaging referral practices to ensure appropriate healthcare delivery and effective resource management. Additionally, they highlight the potential benefits and necessity of integrating CDSS as a standard medical practice. By implementing CDSS, healthcare providers can make more informed decisions, leading to enhanced patient care, optimized resource allocation, and improved overall healthcare outcomes. KEY POINTS • The overall mean of appropriateness for the actual exam according to the ESR-iGuide was 6.62 ± 2.69 on a scale of 0-9. • Seventy percent of the actual imaging referrals resulted in an ESR-iGuide score corresponding to "usually appropriate." • Inappropriate examination is related to both the specialty of the physician who requested the exam and the seniority status of the physician.
Collapse
Affiliation(s)
- Shani Rosen
- Department of Health Technology and Policy Evaluation, Gertner Institute for Epidemiology and Health Policy Research, Sheba Medical Center, Tel HaShomer, Israel
| | - Clara Singer
- Department of Health Technology and Policy Evaluation, Gertner Institute for Epidemiology and Health Policy Research, Sheba Medical Center, Tel HaShomer, Israel
| | - Sharona Vaknin
- Department of Health Technology and Policy Evaluation, Gertner Institute for Epidemiology and Health Policy Research, Sheba Medical Center, Tel HaShomer, Israel
| | - Arielle Kaim
- Department of Emergency and Disaster Management, School of Public Health, Faculty of Medicine, Tel-Aviv University, Tel-Aviv-Yafo, Israel
- National Center for Trauma and Emergency Medicine Research, Gertner Institute for Epidemiology and Health Policy Research, Sheba Medical Center, Tel-HaShomer, Israel
| | - Osnat Luxenburg
- Medical Technology, Health Information and Research Directorate, Ministry of Health, Jerusalem, Israel
| | - Arnon Makori
- Community Medical Services Division, Clalit Health Services, Tel Aviv, Israel
| | | | - Moran Rad
- Research Division, Carmel Medical Center, Haifa, Israel
| | - Shani Gitman
- Research Division, Carmel Medical Center, Haifa, Israel
| | - Mor Saban
- Nursing Department, School of Health Sciences, Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
| |
Collapse
|