51
|
Bortoli M, Fiore M, Tedeschi S, Oliveira V, Sousa R, Bruschi A, Campanacci DA, Viale P, De Paolis M, Sambri A. GPT-based chatbot tools are still unreliable in the management of prosthetic joint infections. Musculoskelet Surg 2024:10.1007/s12306-024-00846-w. [PMID: 38954323 DOI: 10.1007/s12306-024-00846-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 06/21/2024] [Indexed: 07/04/2024]
Abstract
BACKGROUND Artificial intelligence chatbot tools responses might discern patterns and correlations that may elude human observation, leading to more accurate and timely interventions. However, their reliability to answer healthcare-related questions is still debated. This study aimed to assess the performance of the three versions of GPT-based chatbots about prosthetic joint infections (PJI). METHODS Thirty questions concerning the diagnosis and treatment of hip and knee PJIs, stratified by a priori established difficulty, were generated by a team of experts, and administered to ChatGPT 3.5, BingChat, and ChatGPT 4.0. Responses were rated by three orthopedic surgeons and two infectious diseases physicians using a five-point Likert-like scale with numerical values to quantify the quality of responses. Inter-rater reliability was assessed by interclass correlation statistics. RESULTS Responses averaged "good-to-very good" for all chatbots examined, both in diagnosis and treatment, with no significant differences according to the difficulty of the questions. However, BingChat ratings were significantly lower in the treatment setting (p = 0.025), particularly in terms of accuracy (p = 0.02) and completeness (p = 0.004). Agreement in ratings among examiners appeared to be very poor. CONCLUSIONS On average, the quality of responses is rated positively by experts, but with ratings that frequently may vary widely. This currently suggests that AI chatbot tools are still unreliable in the management of PJI.
Collapse
Affiliation(s)
- M Bortoli
- Orthopedic and Traumatology Unit, IRCCS Azienda Ospedaliero-Universitaria Di Bologna, 40138, Bologna, Italy
| | - M Fiore
- Orthopedic and Traumatology Unit, IRCCS Azienda Ospedaliero-Universitaria Di Bologna, 40138, Bologna, Italy.
- Department of Medical and Surgical Sciences, Alma Mater Studiorum University of Bologna, 40138, Bologna, Italy.
| | - S Tedeschi
- Department of Medical and Surgical Sciences, Alma Mater Studiorum University of Bologna, 40138, Bologna, Italy
- Infectious Disease Unit, Department for Integrated Infectious Risk Management, IRCCS Azienda Ospedaliero-Universitaria Di Bologna, 40138, Bologna, Italy
| | - V Oliveira
- Department of Orthopedics, Centro Hospitalar Universitário de Santo António, 4099-001, Porto, Portugal
| | - R Sousa
- Department of Orthopedics, Centro Hospitalar Universitário de Santo António, 4099-001, Porto, Portugal
| | - A Bruschi
- Orthopedic and Traumatology Unit, IRCCS Azienda Ospedaliero-Universitaria Di Bologna, 40138, Bologna, Italy
| | - D A Campanacci
- Orthopedic Oncology Unit, Azienda Ospedaliera Universitaria Careggi, 50134, Florence, Italy
| | - P Viale
- Department of Medical and Surgical Sciences, Alma Mater Studiorum University of Bologna, 40138, Bologna, Italy
- Infectious Disease Unit, Department for Integrated Infectious Risk Management, IRCCS Azienda Ospedaliero-Universitaria Di Bologna, 40138, Bologna, Italy
| | - M De Paolis
- Orthopedic and Traumatology Unit, IRCCS Azienda Ospedaliero-Universitaria Di Bologna, 40138, Bologna, Italy
| | - A Sambri
- Orthopedic and Traumatology Unit, IRCCS Azienda Ospedaliero-Universitaria Di Bologna, 40138, Bologna, Italy
| |
Collapse
|
52
|
Keshavarz P, Bagherieh S, Nabipoorashrafi SA, Chalian H, Rahsepar AA, Kim GHJ, Hassani C, Raman SS, Bedayat A. ChatGPT in radiology: A systematic review of performance, pitfalls, and future perspectives. Diagn Interv Imaging 2024; 105:251-265. [PMID: 38679540 DOI: 10.1016/j.diii.2024.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/11/2024] [Accepted: 04/16/2024] [Indexed: 05/01/2024]
Abstract
PURPOSE The purpose of this study was to systematically review the reported performances of ChatGPT, identify potential limitations, and explore future directions for its integration, optimization, and ethical considerations in radiology applications. MATERIALS AND METHODS After a comprehensive review of PubMed, Web of Science, Embase, and Google Scholar databases, a cohort of published studies was identified up to January 1, 2024, utilizing ChatGPT for clinical radiology applications. RESULTS Out of 861 studies derived, 44 studies evaluated the performance of ChatGPT; among these, 37 (37/44; 84.1%) demonstrated high performance, and seven (7/44; 15.9%) indicated it had a lower performance in providing information on diagnosis and clinical decision support (6/44; 13.6%) and patient communication and educational content (1/44; 2.3%). Twenty-four (24/44; 54.5%) studies reported the proportion of ChatGPT's performance. Among these, 19 (19/24; 79.2%) studies recorded a median accuracy of 70.5%, and in five (5/24; 20.8%) studies, there was a median agreement of 83.6% between ChatGPT outcomes and reference standards [radiologists' decision or guidelines], generally confirming ChatGPT's high accuracy in these studies. Eleven studies compared two recent ChatGPT versions, and in ten (10/11; 90.9%), ChatGPTv4 outperformed v3.5, showing notable enhancements in addressing higher-order thinking questions, better comprehension of radiology terms, and improved accuracy in describing images. Risks and concerns about using ChatGPT included biased responses, limited originality, and the potential for inaccurate information leading to misinformation, hallucinations, improper citations and fake references, cybersecurity vulnerabilities, and patient privacy risks. CONCLUSION Although ChatGPT's effectiveness has been shown in 84.1% of radiology studies, there are still multiple pitfalls and limitations to address. It is too soon to confirm its complete proficiency and accuracy, and more extensive multicenter studies utilizing diverse datasets and pre-training techniques are required to verify ChatGPT's role in radiology.
Collapse
Affiliation(s)
- Pedram Keshavarz
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA; School of Science and Technology, The University of Georgia, Tbilisi 0171, Georgia
| | - Sara Bagherieh
- Independent Clinical Radiology Researcher, Los Angeles, CA 90024, USA
| | | | - Hamid Chalian
- Department of Radiology, Cardiothoracic Imaging, University of Washington, Seattle, WA 98195, USA
| | - Amir Ali Rahsepar
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Grace Hyun J Kim
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA; Department of Radiological Sciences, Center for Computer Vision and Imaging Biomarkers, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Cameron Hassani
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Steven S Raman
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Arash Bedayat
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA.
| |
Collapse
|
53
|
Al‐Anezi FM. Exploring the use of ChatGPT as a virtual health coach for chronic disease management. Learn Health Syst 2024; 8:e10406. [PMID: 39036525 PMCID: PMC11257053 DOI: 10.1002/lrh2.10406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 12/09/2023] [Accepted: 12/19/2023] [Indexed: 07/23/2024] Open
Abstract
Introduction ChatGPT has been widely researched for its potential in gealthcare applications. However, its efficcy as a virtual health coach is one of the important areas, which can significantly contribute to the sustainablility in healthcare operations, especially in managing critical illnesses. Therefore, this study aims to analyze the use of ChatGPT as a virtual health coach for chronic disease managemet. Methods This study used a quasi-experimental design because ChatGPT is a relatively new technology and few people have experience with it. Patients who were receiving care outside of the hospital were included. Semi-structured interviews were conducted after a 2-week period in which participants used ChatGPT to search for health information about chronic disease management. Thirty-nine outpatients were interviewed and thematic analysis was used to analyze the interview data. Results The findings suggested both opportunities and challenges of using ChatGPT as a virtual health coach for chronic disease management. The major opportunities identified included life-long learning, improved health literacy, cost-effectiveness, behavioral change support, scalability, and accessibility. The major challenges identified included limited physical examination, lack of human connection, legal and ethical complications, and lack of accuracy and reliability. Conclusion ChatGPT-based technologies may serve as a supplementary or intermediate support system. However, such applications for managing chronic diseases must protect privacy and promote both short- and long-term positive outcomes.
Collapse
Affiliation(s)
- Fahad M. Al‐Anezi
- Department of Management Information SystemsImam Abdulrahman Bin Faisal UniversityDammamSaudi Arabia
| |
Collapse
|
54
|
Aburumman R, Al Annan K, Mrad R, Brunaldi VO, Gala K, Abu Dayyeh BK. Assessing ChatGPT vs. Standard Medical Resources for Endoscopic Sleeve Gastroplasty Education: A Medical Professional Evaluation Study. Obes Surg 2024; 34:2718-2724. [PMID: 38758515 DOI: 10.1007/s11695-024-07283-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Revised: 05/07/2024] [Accepted: 05/09/2024] [Indexed: 05/18/2024]
Abstract
BACKGROUND AND AIMS The Chat Generative Pre-Trained Transformer (ChatGPT) represents a significant advancement in artificial intelligence (AI) chatbot technology. While ChatGPT offers promising capabilities, concerns remain about its reliability and accuracy. This study aims to evaluate ChatGPT's responses to patients' frequently asked questions about Endoscopic Sleeve Gastroplasty (ESG). METHODS Expert Gastroenterologists and Bariatric Surgeons, with experience in ESG, were invited to evaluate ChatGPT-generated answers to eight ESG-related questions, and answers sourced from hospital websites. The evaluation criteria included ease of understanding, scientific accuracy, and overall answer satisfaction. They were also tasked with discerning whether each response was AI generated or not. RESULTS Twelve medical professionals with expertise in ESG participated, 83.3% of whom had experience performing the procedure independently. The entire cohort possessed substantial knowledge about ESG. ChatGPT's utility among participants, rated on a scale of one to five, averaged 2.75. The raters demonstrated a 54% accuracy rate in distinguishing AI-generated responses, with a sensitivity of 39% and specificity of 60%, resulting in an average of 17.6 correct identifications out of a possible 31. Overall, there were no significant differences between AI-generated and non-AI responses in terms of scientific accuracy, understandability, and satisfaction, with one notable exception. For the question defining ESG, the AI-generated definition scored higher in scientific accuracy (4.33 vs. 3.61, p = 0.007) and satisfaction (4.33 vs. 3.58, p = 0.009) compared to the non-AI versions. CONCLUSIONS This study underscores ChatGPT's efficacy in providing medical information on ESG, demonstrating its comparability to traditional sources in scientific accuracy.
Collapse
Affiliation(s)
- Razan Aburumman
- Division of Gastroenterology and Hepatology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
| | - Karim Al Annan
- Division of Gastroenterology and Hepatology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
| | - Rudy Mrad
- Division of Gastroenterology and Hepatology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
| | - Vitor O Brunaldi
- Division of Gastroenterology and Hepatology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
| | - Khushboo Gala
- Division of Gastroenterology and Hepatology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
| | - Barham K Abu Dayyeh
- Division of Gastroenterology and Hepatology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA.
| |
Collapse
|
55
|
Lecler A, Soyer P, Gong B. The potential and pitfalls of ChatGPT in radiology. Diagn Interv Imaging 2024; 105:249-250. [PMID: 38811261 DOI: 10.1016/j.diii.2024.05.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Accepted: 05/15/2024] [Indexed: 05/31/2024]
Affiliation(s)
- Augustin Lecler
- Department of Neuroradiology, Foundation Adolphe de Rothschild Hospital, 75019, Paris, France; Université Paris Cité, Faculté de Médecine, 75006, Paris, France.
| | - Philippe Soyer
- Université Paris Cité, Faculté de Médecine, 75006, Paris, France; Department of Radiology, Hôpital Cochin, APH-HP, 75014, Paris, France
| | - Bo Gong
- Department of Radiology, University of British Columbia, Vancouver, BC, V6T 1M9, Canada
| |
Collapse
|
56
|
Shiraishi M, Tomioka Y, Miyakuni A, Ishii S, Hori A, Park H, Ohba J, Okazaki M. Performance of ChatGPT in Answering Clinical Questions on the Practical Guideline of Blepharoptosis. Aesthetic Plast Surg 2024; 48:2389-2398. [PMID: 38684536 DOI: 10.1007/s00266-024-04005-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 03/11/2024] [Indexed: 05/02/2024]
Abstract
BACKGROUND ChatGPT is a free artificial intelligence (AI) language model developed and released by OpenAI in late 2022. This study aimed to evaluate the performance of ChatGPT to accurately answer clinical questions (CQs) on the Guideline for the Management of Blepharoptosis published by the American Society of Plastic Surgeons (ASPS) in 2022. METHODS CQs in the guideline were used as question sources in both English and Japanese. For each question, ChatGPT provided answers for CQs, evidence quality, recommendation strength, reference match, and answered word counts. We compared the performance of ChatGPT in each component between English and Japanese queries. RESULTS A total of 11 questions were included in the final analysis, and ChatGPT answered 61.3% of these correctly. ChatGPT demonstrated a higher accuracy rate in English answers for CQs compared to Japanese answers for CQs (76.4% versus 46.4%; p = 0.004) and word counts (123 words versus 35.9 words; p = 0.004). No statistical differences were noted for evidence quality, recommendation strength, and reference match. A total of 697 references were proposed, but only 216 of them (31.0%) existed. CONCLUSIONS ChatGPT demonstrates potential as an adjunctive tool in the management of blepharoptosis. However, it is crucial to recognize that the existing AI model has distinct limitations, and its primary role should be to complement the expertise of medical professionals. LEVEL OF EVIDENCE V Observational study under respected authorities. This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
Collapse
Affiliation(s)
- Makoto Shiraishi
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan.
| | - Yoko Tomioka
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Ami Miyakuni
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Saaya Ishii
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Asei Hori
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Hwayoung Park
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Jun Ohba
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Mutsumi Okazaki
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| |
Collapse
|
57
|
Alkhamees A. Evaluation of Artificial Intelligence as a Search Tool for Patients: Can ChatGPT-4 Provide Accurate Evidence-Based Orthodontic-Related Information? Cureus 2024; 16:e65820. [PMID: 39219978 PMCID: PMC11363007 DOI: 10.7759/cureus.65820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/30/2024] [Indexed: 09/04/2024] Open
Abstract
INTRODUCTION Artificial intelligence (AI) is already a part of our reality. Many people started using ChatGPT in their daily life, replacing existing web browsers. The confidence people put in the ability of ChatGPT to provide accurate medical information is increasing. With that, the need for proper assessment tools for the safety and reliability of ChatGPT is also crucial. OBJECTIVE This study aimed to assess the accuracy, reliability, and quality of information provided by ChatGPT-4 on three specific orthodontic topics, namely, impacted canines, interceptive orthodontic treatment, and orthognathic surgery, as evaluated by five experienced orthodontists using a Likert scale ranking method. MATERIALS AND METHODS Using ChatGPT version 4, 20 most commonly asked questions were generated and answered on the following topics: impacted canines, interceptive treatment, and orthognathic surgery. The evaluation of the quality of the answers provided was done by five experienced orthodontists. Quality assessment was done using the Likert scale ranking method. RESULTS The quality answers generated by a conversational AI system (ChatGPT4) were evaluated by five experienced orthodontists for three topics: impacted canines, interceptive orthodontics, and orthognathic surgery. The evaluators rated each question-answer pair on a five-point scale from "very poor" to "very good." The results showed that the AI system produced generally good quality information for all topics, with no significant difference between them. The inter-rater agreement among the experts was low, indicating some variability in their judgments. CONCLUSION This study demonstrates that ChatGPT4 can provide generally good information on impacted canines, interceptive treatment, and orthognathic surgery. However, answers provided should be handled with caution due to variability and lack of reliability and should not be considered a substitute for professional opinion.
Collapse
Affiliation(s)
- Amani Alkhamees
- Department of Orthodontics and Pediatric Dentistry, College of Dentistry, Qassim University, Buraydah, SAU
| |
Collapse
|
58
|
Kumar RP, Sivan V, Bachir H, Sarwar SA, Ruzicka F, O'Malley GR, Lobo P, Morales IC, Cassimatis ND, Hundal JS, Patel NV. Can Artificial Intelligence Mitigate Missed Diagnoses by Generating Differential Diagnoses for Neurosurgeons? World Neurosurg 2024; 187:e1083-e1088. [PMID: 38759788 DOI: 10.1016/j.wneu.2024.05.052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 05/08/2024] [Accepted: 05/09/2024] [Indexed: 05/19/2024]
Abstract
BACKGROUND/OBJECTIVE Neurosurgery emphasizes the criticality of accurate differential diagnoses, with diagnostic delays posing significant health and economic challenges. As large language models (LLMs) emerge as transformative tools in healthcare, this study seeks to elucidate their role in assisting neurosurgeons with the differential diagnosis process, especially during preliminary consultations. METHODS This study employed 3 chat-based LLMs, ChatGPT (versions 3.5 and 4.0), Perplexity AI, and Bard AI, to evaluate their diagnostic accuracy. Each LLM was prompted using clinical vignettes, and their responses were recorded to generate differential diagnoses for 20 common and uncommon neurosurgical disorders. Disease-specific prompts were crafted using Dynamed, a clinical reference tool. The accuracy of the LLMs was determined based on their ability to identify the target disease within their top differential diagnoses correctly. RESULTS For the initial differential, ChatGPT 3.5 achieved an accuracy of 52.63%, while ChatGPT 4.0 performed slightly better at 53.68%. Perplexity AI and Bard AI demonstrated 40.00% and 29.47% accuracy, respectively. As the number of considered differentials increased from 2 to 5, ChatGPT 3.5 reached its peak accuracy of 77.89% for the top 5 differentials. Bard AI and Perplexity AI had varied performances, with Bard AI improving in the top 5 differentials at 62.11%. On a disease-specific note, the LLMs excelled in diagnosing conditions like epilepsy and cervical spine stenosis but faced challenges with more complex diseases such as Moyamoya disease and amyotrophic lateral sclerosis. CONCLUSIONS LLMs showcase the potential to enhance diagnostic accuracy and decrease the incidence of missed diagnoses in neurosurgery.
Collapse
Affiliation(s)
- Rohit Prem Kumar
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA.
| | - Vijay Sivan
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Hanin Bachir
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Syed A Sarwar
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Francis Ruzicka
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Geoffrey R O'Malley
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Paulo Lobo
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Ilona Cazorla Morales
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Nicholas D Cassimatis
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Jasdeep S Hundal
- Department of Neurology, HMH-Jersey Shore University Medical Center, Neptune, New Jersey, USA
| | - Nitesh V Patel
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA; Department of Neurosurgery, HMH-Jersey Shore University Medical Center, Neptune, New Jersey, USA
| |
Collapse
|
59
|
Park J, Oh K, Han K, Lee YH. Patient-centered radiology reports with generative artificial intelligence: adding value to radiology reporting. Sci Rep 2024; 14:13218. [PMID: 38851825 PMCID: PMC11162416 DOI: 10.1038/s41598-024-63824-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Accepted: 06/03/2024] [Indexed: 06/10/2024] Open
Abstract
The purposes were to assess the efficacy of AI-generated radiology reports in terms of report summary, patient-friendliness, and recommendations and to evaluate the consistent performance of report quality and accuracy, contributing to the advancement of radiology workflow. Total 685 spine MRI reports were retrieved from our hospital database. AI-generated radiology reports were generated in three formats: (1) summary reports, (2) patient-friendly reports, and (3) recommendations. The occurrence of artificial hallucinations was evaluated in the AI-generated reports. Two radiologists conducted qualitative and quantitative assessments considering the original report as a standard reference. Two non-physician raters assessed their understanding of the content of original and patient-friendly reports using a 5-point Likert scale. The scoring of the AI-generated radiology reports were overall high average scores across all three formats. The average comprehension score for the original report was 2.71 ± 0.73, while the score for the patient-friendly reports significantly increased to 4.69 ± 0.48 (p < 0.001). There were 1.12% artificial hallucinations and 7.40% potentially harmful translations. In conclusion, the potential benefits of using generative AI assistants to generate these reports include improved report quality, greater efficiency in radiology workflow for producing summaries, patient-centered reports, and recommendations, and a move toward patient-centered radiology.
Collapse
Affiliation(s)
- Jiwoo Park
- Department of Radiology, Research Institute of Radiological Science, and Center for Clinical Imaging Data Science (CCIDS), Yonsei University College of Medicine, 50-1 Yonsei-Ro, Seodaemun-Gu, Seoul, 03722, South Korea
| | - Kangrok Oh
- Department of Radiology, Research Institute of Radiological Science, and Center for Clinical Imaging Data Science (CCIDS), Yonsei University College of Medicine, 50-1 Yonsei-Ro, Seodaemun-Gu, Seoul, 03722, South Korea
| | - Kyunghwa Han
- Department of Radiology, Research Institute of Radiological Science, and Center for Clinical Imaging Data Science (CCIDS), Yonsei University College of Medicine, 50-1 Yonsei-Ro, Seodaemun-Gu, Seoul, 03722, South Korea.
| | - Young Han Lee
- Department of Radiology, Research Institute of Radiological Science, and Center for Clinical Imaging Data Science (CCIDS), Yonsei University College of Medicine, 50-1 Yonsei-Ro, Seodaemun-Gu, Seoul, 03722, South Korea.
- Institute for Innovation in Digital Healthcare, Yonsei University, Seoul, South Korea.
| |
Collapse
|
60
|
Rao SJ, Isath A, Krishnan P, Tangsrivimol JA, Virk HUH, Wang Z, Glicksberg BS, Krittanawong C. ChatGPT: A Conceptual Review of Applications and Utility in the Field of Medicine. J Med Syst 2024; 48:59. [PMID: 38836893 DOI: 10.1007/s10916-024-02075-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 05/07/2024] [Indexed: 06/06/2024]
Abstract
Artificial Intelligence, specifically advanced language models such as ChatGPT, have the potential to revolutionize various aspects of healthcare, medical education, and research. In this narrative review, we evaluate the myriad applications of ChatGPT in diverse healthcare domains. We discuss its potential role in clinical decision-making, exploring how it can assist physicians by providing rapid, data-driven insights for diagnosis and treatment. We review the benefits of ChatGPT in personalized patient care, particularly in geriatric care, medication management, weight loss and nutrition, and physical activity guidance. We further delve into its potential to enhance medical research, through the analysis of large datasets, and the development of novel methodologies. In the realm of medical education, we investigate the utility of ChatGPT as an information retrieval tool and personalized learning resource for medical students and professionals. There are numerous promising applications of ChatGPT that will likely induce paradigm shifts in healthcare practice, education, and research. The use of ChatGPT may come with several benefits in areas such as clinical decision making, geriatric care, medication management, weight loss and nutrition, physical fitness, scientific research, and medical education. Nevertheless, it is important to note that issues surrounding ethics, data privacy, transparency, inaccuracy, and inadequacy persist. Prior to widespread use in medicine, it is imperative to objectively evaluate the impact of ChatGPT in a real-world setting using a risk-based approach.
Collapse
Affiliation(s)
- Shiavax J Rao
- Department of Medicine, MedStar Union Memorial Hospital, Baltimore, MD, USA
| | - Ameesh Isath
- Department of Cardiology, Westchester Medical Center and New York Medical College, Valhalla, NY, USA
| | - Parvathy Krishnan
- Department of Pediatrics, Westchester Medical Center and New York Medical College, Valhalla, NY, USA
| | - Jonathan A Tangsrivimol
- Division of Neurosurgery, Department of Surgery, Chulabhorn Hospital, Chulabhorn Royal Academy, Bangkok, 10210, Thailand
- Department of Neurological Surgery, Weill Cornell Medicine Brain and Spine Center, New York, NY, 10022, USA
| | - Hafeez Ul Hassan Virk
- Harrington Heart & Vascular Institute, Case Western Reserve University, University Hospitals Cleveland Medical Center, Cleveland, OH, USA
| | - Zhen Wang
- Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, USA
- Division of Health Care Policy and Research, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Benjamin S Glicksberg
- Hasso Plattner Institute for Digital Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Chayakrit Krittanawong
- Cardiology Division, NYU Langone Health and NYU School of Medicine, 550 First Avenue, New York, NY, 10016, USA.
| |
Collapse
|
61
|
Mickley JP, Kaji ES, Khosravi B, Mulford KL, Taunton MJ, Wyles CC. Overview of Artificial Intelligence Research Within Hip and Knee Arthroplasty. Arthroplast Today 2024; 27:101396. [PMID: 39071822 PMCID: PMC11282426 DOI: 10.1016/j.artd.2024.101396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 03/14/2024] [Accepted: 04/02/2024] [Indexed: 07/30/2024] Open
Abstract
Hip and knee arthroplasty are high-volume procedures undergoing rapid growth. The large volume of procedures generates a vast amount of data available for next-generation analytics. Techniques in the field of artificial intelligence (AI) can assist in large-scale pattern recognition and lead to clinical insights. AI methodologies have become more prevalent in orthopaedic research. This review will first describe an overview of AI in the medical field, followed by a description of the 3 arthroplasty research areas in which AI is commonly used (risk modeling, automated radiographic measurements, arthroplasty registry construction). Finally, we will discuss the next frontier of AI research focusing on model deployment and uncertainty quantification.
Collapse
Affiliation(s)
- John P. Mickley
- Orthopedic Surgery Artificial Intelligence Laboratory (OSAIL), Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN, USA
| | - Elizabeth S. Kaji
- Orthopedic Surgery Artificial Intelligence Laboratory (OSAIL), Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN, USA
| | - Bardia Khosravi
- Orthopedic Surgery Artificial Intelligence Laboratory (OSAIL), Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN, USA
- Radiology Informatics Lab (RIL), Department of Radiology, Mayo Clinic, Rochester, MN, USA
| | - Kellen L. Mulford
- Orthopedic Surgery Artificial Intelligence Laboratory (OSAIL), Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN, USA
| | - Michael J. Taunton
- Orthopedic Surgery Artificial Intelligence Laboratory (OSAIL), Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN, USA
- Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN, USA
| | - Cody C. Wyles
- Orthopedic Surgery Artificial Intelligence Laboratory (OSAIL), Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN, USA
- Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN, USA
- Department of Clinical Anatomy, Mayo Clinic, Rochester, MN, USA
| |
Collapse
|
62
|
Kuai H, Chen J, Tao X, Cai L, Imamura K, Matsumoto H, Liang P, Zhong N. Never-Ending Learning for Explainable Brain Computing. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2307647. [PMID: 38602432 PMCID: PMC11200082 DOI: 10.1002/advs.202307647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 03/24/2024] [Indexed: 04/12/2024]
Abstract
Exploring the nature of human intelligence and behavior is a longstanding pursuit in cognitive neuroscience, driven by the accumulation of knowledge, information, and data across various studies. However, achieving a unified and transparent interpretation of findings presents formidable challenges. In response, an explainable brain computing framework is proposed that employs the never-ending learning paradigm, integrating evidence combination and fusion computing within a Knowledge-Information-Data (KID) architecture. The framework supports continuous brain cognition investigation, utilizing joint knowledge-driven forward inference and data-driven reverse inference, bolstered by the pre-trained language modeling techniques and the human-in-the-loop mechanisms. In particular, it incorporates internal evidence learning through multi-task functional neuroimaging analyses and external evidence learning via topic modeling of published neuroimaging studies, all of which involve human interactions at different stages. Based on two case studies, the intricate uncertainty surrounding brain localization in human reasoning is revealed. The present study also highlights the potential of systematization to advance explainable brain computing, offering a finer-grained understanding of brain activity patterns related to human intelligence.
Collapse
Affiliation(s)
- Hongzhi Kuai
- Faculty of EngineeringMaebashi Institute of TechnologyGunma371–0816Japan
- School of Psychology and Beijing Key Laboratory of Learning and CognitionCapital Normal UniversityBeijing100048China
| | - Jianhui Chen
- Faculty of Information TechnologyBeijing University of TechnologyBeijing100124China
- Beijing International Collaboration Base on Brain Informatics and Wisdom ServicesBeijing100124China
| | - Xiaohui Tao
- School of Mathematics, Physics and ComputingUniversity of Southern QueenslandToowoomba4350Australia
| | - Lingyun Cai
- School of Psychology and Beijing Key Laboratory of Learning and CognitionCapital Normal UniversityBeijing100048China
| | - Kazuyuki Imamura
- Faculty of EngineeringMaebashi Institute of TechnologyGunma371–0816Japan
| | - Hiroki Matsumoto
- Faculty of EngineeringMaebashi Institute of TechnologyGunma371–0816Japan
| | - Peipeng Liang
- School of Psychology and Beijing Key Laboratory of Learning and CognitionCapital Normal UniversityBeijing100048China
| | - Ning Zhong
- Faculty of EngineeringMaebashi Institute of TechnologyGunma371–0816Japan
- School of Psychology and Beijing Key Laboratory of Learning and CognitionCapital Normal UniversityBeijing100048China
- Beijing International Collaboration Base on Brain Informatics and Wisdom ServicesBeijing100124China
| |
Collapse
|
63
|
Farquhar S, Kossen J, Kuhn L, Gal Y. Detecting hallucinations in large language models using semantic entropy. Nature 2024; 630:625-630. [PMID: 38898292 PMCID: PMC11186750 DOI: 10.1038/s41586-024-07421-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 04/12/2024] [Indexed: 06/21/2024]
Abstract
Large language model (LLM) systems, such as ChatGPT1 or Gemini2, can show impressive reasoning and question-answering capabilities but often 'hallucinate' false outputs and unsubstantiated answers3,4. Answering unreliably or without the necessary information prevents adoption in diverse fields, with problems including fabrication of legal precedents5 or untrue facts in news articles6 and even posing a risk to human life in medical domains such as radiology7. Encouraging truthfulness through supervision or reinforcement has been only partially successful8. Researchers need a general method for detecting hallucinations in LLMs that works even with new and unseen questions to which humans might not know the answer. Here we develop new methods grounded in statistics, proposing entropy-based uncertainty estimators for LLMs to detect a subset of hallucinations-confabulations-which are arbitrary and incorrect generations. Our method addresses the fact that one idea can be expressed in many ways by computing uncertainty at the level of meaning rather than specific sequences of words. Our method works across datasets and tasks without a priori knowledge of the task, requires no task-specific data and robustly generalizes to new tasks not seen before. By detecting when a prompt is likely to produce a confabulation, our method helps users understand when they must take extra care with LLMs and opens up new possibilities for using LLMs that are otherwise prevented by their unreliability.
Collapse
Affiliation(s)
- Sebastian Farquhar
- OATML, Department of Computer Science, University of Oxford, Oxford, UK.
| | - Jannik Kossen
- OATML, Department of Computer Science, University of Oxford, Oxford, UK
| | - Lorenz Kuhn
- OATML, Department of Computer Science, University of Oxford, Oxford, UK
| | - Yarin Gal
- OATML, Department of Computer Science, University of Oxford, Oxford, UK
| |
Collapse
|
64
|
Saba L, Fu CL, Khouri J, Faiman B, Anwer F, Chaulagain CP. Evaluating ChatGPT as an educational resource for patients with multiple myeloma: A preliminary investigation. Am J Hematol 2024; 99:1205-1207. [PMID: 38602288 DOI: 10.1002/ajh.27318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 03/07/2024] [Accepted: 03/11/2024] [Indexed: 04/12/2024]
Abstract
The findings of this study highlight a 95% accuracy rate in ChatGPT responses, as assessed by five myeloma specialists, underscoring its potential as a reliable educational tool.
Collapse
Affiliation(s)
- Ludovic Saba
- Department of Hematology and Medical Oncology, Cleveland Clinic Florida, Weston, Florida, USA
| | - Chieh-Lin Fu
- Department of Hematology and Medical Oncology, Cleveland Clinic Florida, Weston, Florida, USA
| | - Jack Khouri
- Department of Hematology and Medical Oncology, Cleveland Clinic Main Campus, Cleveland, Ohio, USA
| | - Beth Faiman
- Department of Hematologic Oncology and Blood Disorders, Cleveland Clinic Main Campus, Cleveland, Ohio, USA
| | - Faiz Anwer
- Department of Hematology and Medical Oncology, Cleveland Clinic Main Campus, Cleveland, Ohio, USA
| | - Chakra P Chaulagain
- Department of Hematology and Medical Oncology, Cleveland Clinic Florida, Weston, Florida, USA
| |
Collapse
|
65
|
Brunner J, Rinne S. Large Language Models as a Tool for Health Services Researchers: An Exploration of High-Value Applications. Ann Am Thorac Soc 2024; 21:845-848. [PMID: 38445982 DOI: 10.1513/annalsats.202311-980ps] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 03/05/2024] [Indexed: 03/07/2024] Open
Affiliation(s)
- Julian Brunner
- Center for the Study of Healthcare Innovation, Implementation, and Policy, VA Greater Los Angeles Health Care, Los Angeles, California
| | - Seppo Rinne
- Center for Healthcare Organization and Implementation Research, Bedford VA Medical Center, Bedford, Massachusetts; and
- Department of Medicine, Dartmouth Geisel School of Medicine, Hanover, New Hampshire
| |
Collapse
|
66
|
Burnette H, Pabani A, von Itzstein MS, Switzer B, Fan R, Ye F, Puzanov I, Naidoo J, Ascierto PA, Gerber DE, Ernstoff MS, Johnson DB. Use of artificial intelligence chatbots in clinical management of immune-related adverse events. J Immunother Cancer 2024; 12:e008599. [PMID: 38816231 PMCID: PMC11141185 DOI: 10.1136/jitc-2023-008599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/14/2024] [Indexed: 06/01/2024] Open
Abstract
BACKGROUND Artificial intelligence (AI) chatbots have become a major source of general and medical information, though their accuracy and completeness are still being assessed. Their utility to answer questions surrounding immune-related adverse events (irAEs), common and potentially dangerous toxicities from cancer immunotherapy, are not well defined. METHODS We developed 50 distinct questions with answers in available guidelines surrounding 10 irAE categories and queried two AI chatbots (ChatGPT and Bard), along with an additional 20 patient-specific scenarios. Experts in irAE management scored answers for accuracy and completion using a Likert scale ranging from 1 (least accurate/complete) to 4 (most accurate/complete). Answers across categories and across engines were compared. RESULTS Overall, both engines scored highly for accuracy (mean scores for ChatGPT and Bard were 3.87 vs 3.5, p<0.01) and completeness (3.83 vs 3.46, p<0.01). Scores of 1-2 (completely or mostly inaccurate or incomplete) were particularly rare for ChatGPT (6/800 answer-ratings, 0.75%). Of the 50 questions, all eight physician raters gave ChatGPT a rating of 4 (fully accurate or complete) for 22 questions (for accuracy) and 16 questions (for completeness). In the 20 patient scenarios, the average accuracy score was 3.725 (median 4) and the average completeness was 3.61 (median 4). CONCLUSIONS AI chatbots provided largely accurate and complete information regarding irAEs, and wildly inaccurate information ("hallucinations") was uncommon. However, until accuracy and completeness increases further, appropriate guidelines remain the gold standard to follow.
Collapse
Affiliation(s)
- Hannah Burnette
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Aliyah Pabani
- Department of Oncology, Johns Hopkins University, Baltimore, Maryland, USA
| | - Mitchell S von Itzstein
- Harold C Simmons Comprehensive Cancer Center, The University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - Benjamin Switzer
- Department of Medicine, Roswell Park Comprehensive Cancer Center, Buffalo, New York, USA
| | - Run Fan
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Fei Ye
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Igor Puzanov
- Department of Medicine, Roswell Park Comprehensive Cancer Center, Buffalo, New York, USA
| | | | - Paolo A Ascierto
- Department of Melanoma, Cancer Immunotherapy and Development Therapeutics, Istituto Nazionale Tumori IRCCS Fondazione Pascale, Napoli, Campania, Italy
| | - David E Gerber
- Harold C Simmons Comprehensive Cancer Center, The University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - Marc S Ernstoff
- ImmunoOncology Branch (IOB), Developmental Therapeutics Program, Cancer Therapy and Diagnosis Division, National Cancer Institute (NCI), National Institutes of Health, Bethesda, Maryland, USA
| | - Douglas B Johnson
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
67
|
Choudhury A, Shamszare H. The Impact of Performance Expectancy, Workload, Risk, and Satisfaction on Trust in ChatGPT: Cross-Sectional Survey Analysis. JMIR Hum Factors 2024; 11:e55399. [PMID: 38801658 PMCID: PMC11165287 DOI: 10.2196/55399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 03/25/2024] [Accepted: 04/07/2024] [Indexed: 05/29/2024] Open
Abstract
BACKGROUND ChatGPT (OpenAI) is a powerful tool for a wide range of tasks, from entertainment and creativity to health care queries. There are potential risks and benefits associated with this technology. In the discourse concerning the deployment of ChatGPT and similar large language models, it is sensible to recommend their use primarily for tasks a human user can execute accurately. As we transition into the subsequent phase of ChatGPT deployment, establishing realistic performance expectations and understanding users' perceptions of risk associated with its use are crucial in determining the successful integration of this artificial intelligence (AI) technology. OBJECTIVE The aim of the study is to explore how perceived workload, satisfaction, performance expectancy, and risk-benefit perception influence users' trust in ChatGPT. METHODS A semistructured, web-based survey was conducted with 607 adults in the United States who actively use ChatGPT. The survey questions were adapted from constructs used in various models and theories such as the technology acceptance model, the theory of planned behavior, the unified theory of acceptance and use of technology, and research on trust and security in digital environments. To test our hypotheses and structural model, we used the partial least squares structural equation modeling method, a widely used approach for multivariate analysis. RESULTS A total of 607 people responded to our survey. A significant portion of the participants held at least a high school diploma (n=204, 33.6%), and the majority had a bachelor's degree (n=262, 43.1%). The primary motivations for participants to use ChatGPT were for acquiring information (n=219, 36.1%), amusement (n=203, 33.4%), and addressing problems (n=135, 22.2%). Some participants used it for health-related inquiries (n=44, 7.2%), while a few others (n=6, 1%) used it for miscellaneous activities such as brainstorming, grammar verification, and blog content creation. Our model explained 64.6% of the variance in trust. Our analysis indicated a significant relationship between (1) workload and satisfaction, (2) trust and satisfaction, (3) performance expectations and trust, and (4) risk-benefit perception and trust. CONCLUSIONS The findings underscore the importance of ensuring user-friendly design and functionality in AI-based applications to reduce workload and enhance user satisfaction, thereby increasing user trust. Future research should further explore the relationship between risk-benefit perception and trust in the context of AI chatbots.
Collapse
Affiliation(s)
- Avishek Choudhury
- Industrial and Management Systems Engineering, Benjamin M. Statler College of Engineering and Mineral Resources, West Virginia University, Morgantown, WV, United States
| | - Hamid Shamszare
- Industrial and Management Systems Engineering, Benjamin M. Statler College of Engineering and Mineral Resources, West Virginia University, Morgantown, WV, United States
| |
Collapse
|
68
|
Buldur M, Sezer B. Evaluating the accuracy of Chat Generative Pre-trained Transformer version 4 (ChatGPT-4) responses to United States Food and Drug Administration (FDA) frequently asked questions about dental amalgam. BMC Oral Health 2024; 24:605. [PMID: 38789962 PMCID: PMC11127407 DOI: 10.1186/s12903-024-04358-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 05/09/2024] [Indexed: 05/26/2024] Open
Abstract
BACKGROUND The use of artificial intelligence in the field of health sciences is becoming widespread. It is known that patients benefit from artificial intelligence applications on various health issues, especially after the pandemic period. One of the most important issues in this regard is the accuracy of the information provided by artificial intelligence applications. OBJECTIVE The purpose of this study was to the frequently asked questions about dental amalgam, as determined by the United States Food and Drug Administration (FDA), which is one of these information resources, to Chat Generative Pre-trained Transformer version 4 (ChatGPT-4) and to compare the content of the answers given by the application with the answers of the FDA. METHODS The questions were directed to ChatGPT-4 on May 8th and May 16th, 2023, and the responses were recorded and compared at the word and meaning levels using ChatGPT. The answers from the FDA webpage were also recorded. The responses were compared for content similarity in "Main Idea", "Quality Analysis", "Common Ideas", and "Inconsistent Ideas" between ChatGPT-4's responses and FDA's responses. RESULTS ChatGPT-4 provided similar responses at one-week intervals. In comparison with FDA guidance, it provided answers with similar information content to frequently asked questions. However, although there were some similarities in the general aspects of the recommendation regarding amalgam removal in the question, the two texts are not the same, and they offered different perspectives on the replacement of fillings. CONCLUSIONS The findings of this study indicate that ChatGPT-4, an artificial intelligence based application, encompasses current and accurate information regarding dental amalgam and its removal, providing it to individuals seeking access to such information. Nevertheless, we believe that numerous studies are required to assess the validity and reliability of ChatGPT-4 across diverse subjects.
Collapse
Affiliation(s)
- Mehmet Buldur
- Department of Restorative Dentistry, School of Dentistry, Çanakkale Onsekiz Mart University, Çanakkale, Türkiye
| | - Berkant Sezer
- Department of Pediatric Dentistry, School of Dentistry, Çanakkale Onsekiz Mart University, Çanakkale, Türkiye.
| |
Collapse
|
69
|
Sireci F, Lorusso F, Immordino A, Centineo M, Gerardi I, Patti G, Rusignuolo S, Manzella R, Gallina S, Dispenza F. ChatGPT as a New Tool to Select a Biological for Chronic Rhino Sinusitis with Polyps, "Caution Advised" or "Distant Reality"? J Pers Med 2024; 14:563. [PMID: 38929784 PMCID: PMC11204527 DOI: 10.3390/jpm14060563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Revised: 05/07/2024] [Accepted: 05/23/2024] [Indexed: 06/28/2024] Open
Abstract
ChatGPT is an advanced language model developed by OpenAI, designed for natural language understanding and generation. It employs deep learning technology to comprehend and generate human-like text, making it versatile for various applications. The aim of this study is to assess the alignment between the Rhinology Board's indications and ChatGPT's recommendations for treating patients with chronic rhinosinusitis with nasal polyps (CRSwNP) using biologic therapy. An observational cohort study involving 72 patients was conducted to evaluate various parameters of type 2 inflammation and assess the concordance in therapy choices between ChatGPT and the Rhinology Board. The observed results highlight the potential of Chat-GPT in guiding optimal biological therapy selection, with a concordance percentage = 68% and a Kappa coefficient = 0.69 (CI95% [0.50; 0.75]). In particular, the concordance was, respectively, 79.6% for dupilumab, 20% for mepolizumab, and 0% for omalizumab. This research represents a significant advancement in managing CRSwNP, addressing a condition lacking robust biomarkers. It provides valuable insights into the potential of AI, specifically ChatGPT, to assist otolaryngologists in determining the optimal biological therapy for personalized patient care. Our results demonstrate the need to implement the use of this tool to effectively aid clinicians.
Collapse
Affiliation(s)
- Federico Sireci
- Otorhinolaryngology Section, Department of Precision Medicine in Medical, Surgical and Critical Care (Me.Pre.C.C), University of Palermo, Via del Vespro 129, 133, 90127 Palermo, Italy;
| | - Francesco Lorusso
- Otorhinolaryngology Section, Biomedicine, Neuroscience and Advanced Diagnosics Department (BiND), University of Palermo, Via del Vespro 129, 133, 90127 Palermo, Italy; (F.L.); (I.G.); (G.P.); (S.R.); (R.M.); (S.G.); (F.D.)
| | - Angelo Immordino
- Otorhinolaryngology Section, Biomedicine, Neuroscience and Advanced Diagnosics Department (BiND), University of Palermo, Via del Vespro 129, 133, 90127 Palermo, Italy; (F.L.); (I.G.); (G.P.); (S.R.); (R.M.); (S.G.); (F.D.)
| | | | - Ignazio Gerardi
- Otorhinolaryngology Section, Biomedicine, Neuroscience and Advanced Diagnosics Department (BiND), University of Palermo, Via del Vespro 129, 133, 90127 Palermo, Italy; (F.L.); (I.G.); (G.P.); (S.R.); (R.M.); (S.G.); (F.D.)
| | - Gaetano Patti
- Otorhinolaryngology Section, Biomedicine, Neuroscience and Advanced Diagnosics Department (BiND), University of Palermo, Via del Vespro 129, 133, 90127 Palermo, Italy; (F.L.); (I.G.); (G.P.); (S.R.); (R.M.); (S.G.); (F.D.)
| | - Simona Rusignuolo
- Otorhinolaryngology Section, Biomedicine, Neuroscience and Advanced Diagnosics Department (BiND), University of Palermo, Via del Vespro 129, 133, 90127 Palermo, Italy; (F.L.); (I.G.); (G.P.); (S.R.); (R.M.); (S.G.); (F.D.)
| | - Riccardo Manzella
- Otorhinolaryngology Section, Biomedicine, Neuroscience and Advanced Diagnosics Department (BiND), University of Palermo, Via del Vespro 129, 133, 90127 Palermo, Italy; (F.L.); (I.G.); (G.P.); (S.R.); (R.M.); (S.G.); (F.D.)
| | - Salvatore Gallina
- Otorhinolaryngology Section, Biomedicine, Neuroscience and Advanced Diagnosics Department (BiND), University of Palermo, Via del Vespro 129, 133, 90127 Palermo, Italy; (F.L.); (I.G.); (G.P.); (S.R.); (R.M.); (S.G.); (F.D.)
| | - Francesco Dispenza
- Otorhinolaryngology Section, Biomedicine, Neuroscience and Advanced Diagnosics Department (BiND), University of Palermo, Via del Vespro 129, 133, 90127 Palermo, Italy; (F.L.); (I.G.); (G.P.); (S.R.); (R.M.); (S.G.); (F.D.)
| |
Collapse
|
70
|
SAYGIN M, BEKMEZCİ M, DİNÇER E. Artificial Intelligence Model Chatgpt-4: Entrepreneur Candidate and Entrepreneurship Example. F1000Res 2024; 13:308. [PMID: 38845823 PMCID: PMC11153998 DOI: 10.12688/f1000research.144671.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/20/2024] [Indexed: 06/09/2024] Open
Abstract
Background Although artificial intelligence technologies are still in their infancy, it is seen that they can bring together both hope and anxiety for the future. In the research, it is focused on examining the ChatGPT-4 version, which is one of the most well-known artificial intelligence applications and claimed to have self-learning feature, within the scope of business establishment processes. Methods In this direction, the assessment questions in the Entrepreneurship Handbook, published as open access by the Small and Medium Enterprises Development Organization of Turkey, which focuses on guiding the entrepreneurial processes in Turkey and creating the perception of entrepreneurship, were combined with the artificial intelligence model ChatGPT-4 and analysed within three stages. The way of solving the questions of artificial intelligence modelling and the answers it provides have the opportunity to be compared with the entrepreneurship literature. Results It has been seen that the artificial intelligence modelling ChatGPT-4, being an outstanding entrepreneurship example itself, has succeeded in answering the questions posed in the context of 16 modules in the entrepreneurship handbook in an original way by analysing deeply. Conclusion It has also been concluded that it is quite creative in developing new alternatives to the correct answers specified in the entrepreneurship handbook. The original aspect of the research is that it is one of the pioneers of the study on artificial intelligence and entrepreneurship in literature.
Collapse
|
71
|
Tripathi S, Sukumaran R, Cook TS. Efficient healthcare with large language models: optimizing clinical workflow and enhancing patient care. J Am Med Inform Assoc 2024; 31:1436-1440. [PMID: 38273739 PMCID: PMC11105142 DOI: 10.1093/jamia/ocad258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 12/01/2023] [Accepted: 12/29/2023] [Indexed: 01/27/2024] Open
Abstract
PURPOSE This article explores the potential of large language models (LLMs) to automate administrative tasks in healthcare, alleviating the burden on clinicians caused by electronic medical records. POTENTIAL LLMs offer opportunities in clinical documentation, prior authorization, patient education, and access to care. They can personalize patient scheduling, improve documentation accuracy, streamline insurance prior authorization, increase patient engagement, and address barriers to healthcare access. CAUTION However, integrating LLMs requires careful attention to security and privacy concerns, protecting patient data, and complying with regulations like the Health Insurance Portability and Accountability Act (HIPAA). It is crucial to acknowledge that LLMs should supplement, not replace, the human connection and care provided by healthcare professionals. CONCLUSION By prudently utilizing LLMs alongside human expertise, healthcare organizations can improve patient care and outcomes. Implementation should be approached with caution and consideration to ensure the safe and effective use of LLMs in the clinical setting.
Collapse
Affiliation(s)
- Satvik Tripathi
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Rithvik Sukumaran
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Tessa S Cook
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
72
|
Dimitriadis F, Tsigkriki L, Charisopoulou D, Tsaousidis A, Siarkos M, Koulaouzidis G. Letter Re: Response to Luan et al. Angiology 2024:33197241256685. [PMID: 38769649 DOI: 10.1177/00033197241256685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Affiliation(s)
- Fotis Dimitriadis
- Cardiology Department, General Hospital G. Papanikolaou, Thessaloniki, Greece
| | - Lamprini Tsigkriki
- Cardiology Department, General Hospital G. Papanikolaou, Thessaloniki, Greece
| | | | - Adam Tsaousidis
- Cardiology Department, General Hospital G. Papanikolaou, Thessaloniki, Greece
| | - Michail Siarkos
- Cardiology Department, General Hospital G. Papanikolaou, Thessaloniki, Greece
| | - George Koulaouzidis
- Department of Biochemical Sciences, Pomeranian Medical University, Szczecin, Poland
| |
Collapse
|
73
|
Simsek O, Manteghinejad A, Vossough A. A Comparative Review of Imaging Journal Policies for Use of AI in Manuscript Generation. Acad Radiol 2024:S1076-6332(24)00290-3. [PMID: 38772797 DOI: 10.1016/j.acra.2024.05.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 05/06/2024] [Accepted: 05/06/2024] [Indexed: 05/23/2024]
Abstract
RATIONALE AND OBJECTIVES Artificial intelligence (AI) technologies are rapidly evolving and offering new advances almost on a day-by-day basis, including various tools for manuscript generation and modification. On the other hand, these potentially time- and effort-saving solutions come with potential bias, factual error, and plagiarism risks. Some journals have started to update their author guidelines in reference to AI-generated or AI-assisted manuscripts. The purpose of this paper is to evaluate author guidelines for including AI use policies in radiology journals and compare scientometric data between journals with and without explicit AI use policies. MATERIALS AND METHODS This cross-sectional study included 112 MEDLINE-indexed imaging journals and evaluated their author guidelines between 13 October 2023 and 16 October 2023. Journals were identified based on subject matter and association with a radiological society. The authors' guidelines and editorial policies were evaluated for the use of AI in manuscript preparation and specific AI-generated image policies. We assessed the existence of an AI usage policy among subspecialty imaging journals. The scientometric scores of journals with and without AI use policies were compared using the Wilcoxon signed-rank test. RESULTS Among 112 MEDLINE-indexed radiology journals, 80 journals were affiliated with an imaging society, and 32 were not. 69 (61.6%) of 112 imaging journals had an AI usage policy, and 40 (57.9%) of 69 mentioned a specific policy about AI-generated figures. CiteScore (4.9 vs 4, p = 0.023), Source Normalized Impact per Paper (1.12 vs 0.83, p = 0.06), Scientific Journal Ranking (0.75 vs 0.54, p = 0.010) and Journal Citation Indicator (0.77 vs 0.62, p = 0.038) were significantly higher in journals with an AI policy. CONCLUSION The majority of imaging journals provide guidelines for AI-generated content, but still, a substantial number of journals do not have AI usage policies or do not require disclosure for non-human-created manuscripts. Journals with an established AI policy had higher citation and impact scores.
Collapse
Affiliation(s)
- Onur Simsek
- Division of Neuroradiology, Department of Radiology, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA (O.S., A.M., A.V.).
| | - Amirreza Manteghinejad
- Division of Neuroradiology, Department of Radiology, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA (O.S., A.M., A.V.)
| | - Arastoo Vossough
- Division of Neuroradiology, Department of Radiology, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA (O.S., A.M., A.V.); Department of Radiology, Children's Hospital of Philadelphia, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA (A.V.)
| |
Collapse
|
74
|
Rau S, Rau A, Nattenmüller J, Fink A, Bamberg F, Reisert M, Russe MF. A retrieval-augmented chatbot based on GPT-4 provides appropriate differential diagnosis in gastrointestinal radiology: a proof of concept study. Eur Radiol Exp 2024; 8:60. [PMID: 38755410 PMCID: PMC11098977 DOI: 10.1186/s41747-024-00457-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 03/12/2024] [Indexed: 05/18/2024] Open
Abstract
BACKGROUND We investigated the potential of an imaging-aware GPT-4-based chatbot in providing diagnoses based on imaging descriptions of abdominal pathologies. METHODS Utilizing zero-shot learning via the LlamaIndex framework, GPT-4 was enhanced using the 96 documents from the Radiographics Top 10 Reading List on gastrointestinal imaging, creating a gastrointestinal imaging-aware chatbot (GIA-CB). To assess its diagnostic capability, 50 cases on a variety of abdominal pathologies were created, comprising radiological findings in fluoroscopy, MRI, and CT. We compared the GIA-CB to the generic GPT-4 chatbot (g-CB) in providing the primary and 2 additional differential diagnoses, using interpretations from senior-level radiologists as ground truth. The trustworthiness of the GIA-CB was evaluated by investigating the source documents as provided by the knowledge-retrieval mechanism. Mann-Whitney U test was employed. RESULTS The GIA-CB demonstrated a high capability to identify the most appropriate differential diagnosis in 39/50 cases (78%), significantly surpassing the g-CB in 27/50 cases (54%) (p = 0.006). Notably, the GIA-CB offered the primary differential in the top 3 differential diagnoses in 45/50 cases (90%) versus g-CB with 37/50 cases (74%) (p = 0.022) and always with appropriate explanations. The median response time was 29.8 s for GIA-CB and 15.7 s for g-CB, and the mean cost per case was $0.15 and $0.02, respectively. CONCLUSIONS The GIA-CB not only provided an accurate diagnosis for gastrointestinal pathologies, but also direct access to source documents, providing insight into the decision-making process, a step towards trustworthy and explainable AI. Integrating context-specific data into AI models can support evidence-based clinical decision-making. RELEVANCE STATEMENT A context-aware GPT-4 chatbot demonstrates high accuracy in providing differential diagnoses based on imaging descriptions, surpassing the generic GPT-4. It provided formulated rationale and source excerpts supporting the diagnoses, thus enhancing trustworthy decision-support. KEY POINTS • Knowledge retrieval enhances differential diagnoses in a gastrointestinal imaging-aware chatbot (GIA-CB). • GIA-CB outperformed the generic counterpart, providing formulated rationale and source excerpts. • GIA-CB has the potential to pave the way for AI-assisted decision support systems.
Collapse
Affiliation(s)
- Stephan Rau
- Department of Diagnostic and Interventional Radiology, Faculty of Medicine, Medical Center - University of Freiburg, University of Freiburg, 79106, Freiburg Im Breisgau, Germany.
| | - Alexander Rau
- Department of Diagnostic and Interventional Radiology, Faculty of Medicine, Medical Center - University of Freiburg, University of Freiburg, 79106, Freiburg Im Breisgau, Germany
- Department of Neuroradiology, Faculty of Medicine, Medical Center - University of Freiburg, University of Freiburg, Hugstetter Str. 55, 79106, Freiburg Im Breisgau, Germany
| | - Johanna Nattenmüller
- Department of Diagnostic and Interventional Radiology, Faculty of Medicine, Medical Center - University of Freiburg, University of Freiburg, 79106, Freiburg Im Breisgau, Germany
| | - Anna Fink
- Department of Diagnostic and Interventional Radiology, Faculty of Medicine, Medical Center - University of Freiburg, University of Freiburg, 79106, Freiburg Im Breisgau, Germany
| | - Fabian Bamberg
- Department of Diagnostic and Interventional Radiology, Faculty of Medicine, Medical Center - University of Freiburg, University of Freiburg, 79106, Freiburg Im Breisgau, Germany
| | - Marco Reisert
- Department of Diagnostic and Interventional Radiology, Faculty of Medicine, Medical Center - University of Freiburg, University of Freiburg, 79106, Freiburg Im Breisgau, Germany
| | - Maximilian F Russe
- Department of Diagnostic and Interventional Radiology, Faculty of Medicine, Medical Center - University of Freiburg, University of Freiburg, 79106, Freiburg Im Breisgau, Germany
| |
Collapse
|
75
|
Bhatia A, Khalvati F, Ertl-Wagner BB. Artificial Intelligence in the Future Landscape of Pediatric Neuroradiology: Opportunities and Challenges. AJNR Am J Neuroradiol 2024; 45:549-553. [PMID: 38176730 PMCID: PMC11288527 DOI: 10.3174/ajnr.a8086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 10/17/2023] [Indexed: 01/06/2024]
Abstract
This paper will review how artificial intelligence (AI) will play an increasingly important role in pediatric neuroradiology in the future. A safe, transparent, and human-centric AI is needed to tackle the quadruple aim of improved health outcomes, enhanced patient and family experience, reduced costs, and improved well-being of the healthcare team in pediatric neuroradiology. Equity, diversity and inclusion, data safety, and access to care will need to always be considered. In the next decade, AI algorithms are expected to play an increasingly important role in access to care, workflow management, abnormality detection, classification, response prediction, prognostication, report generation, as well as in the patient and family experience in pediatric neuroradiology. Also, AI algorithms will likely play a role in recognizing and flagging rare diseases and in pattern recognition to identify previously unknown disorders. While AI algorithms will play an important role, humans will not only need to be in the loop, but in the center of pediatric neuroimaging. AI development and deployment will need to be closely watched and monitored by experts in the field. Patient and data safety need to be at the forefront, and the risks of a dependency on technology will need to be contained. The applications and implications of AI in pediatric neuroradiology will differ from adult neuroradiology.
Collapse
Affiliation(s)
- Aashim Bhatia
- From the Children's Hospital of Philadelphia (A.B.), Philadelphia, Pennsylvania
| | - Farzad Khalvati
- Hospital for Sick Children (F.K., B.B.E.-W.), Toronto, Ontario, Canada
| | | |
Collapse
|
76
|
Kedia N, Sanjeev S, Ong J, Chhablani J. ChatGPT and Beyond: An overview of the growing field of large language models and their use in ophthalmology. Eye (Lond) 2024; 38:1252-1261. [PMID: 38172581 PMCID: PMC11076576 DOI: 10.1038/s41433-023-02915-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 11/23/2023] [Accepted: 12/20/2023] [Indexed: 01/05/2024] Open
Abstract
ChatGPT, an artificial intelligence (AI) chatbot built on large language models (LLMs), has rapidly gained popularity. The benefits and limitations of this transformative technology have been discussed across various fields, including medicine. The widespread availability of ChatGPT has enabled clinicians to study how these tools could be used for a variety of tasks such as generating differential diagnosis lists, organizing patient notes, and synthesizing literature for scientific research. LLMs have shown promising capabilities in ophthalmology by performing well on the Ophthalmic Knowledge Assessment Program, providing fairly accurate responses to questions about retinal diseases, and in generating differential diagnoses list. There are current limitations to this technology, including the propensity of LLMs to "hallucinate", or confidently generate false information; their potential role in perpetuating biases in medicine; and the challenges in incorporating LLMs into research without allowing "AI-plagiarism" or publication of false information. In this paper, we provide a balanced overview of what LLMs are and introduce some of the LLMs that have been generated in the past few years. We discuss recent literature evaluating the role of these language models in medicine with a focus on ChatGPT. The field of AI is fast-paced, and new applications based on LLMs are being generated rapidly; therefore, it is important for ophthalmologists to be aware of how this technology works and how it may impact patient care. Here, we discuss the benefits, limitations, and future advancements of LLMs in patient care and research.
Collapse
Affiliation(s)
- Nikita Kedia
- Department of Ophthalmology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | | | - Joshua Ong
- Department of Ophthalmology and Visual Sciences, University of Michigan Kellogg Eye Center, Ann Arbor, MI, USA
| | - Jay Chhablani
- Department of Ophthalmology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
| |
Collapse
|
77
|
Tepe M, Emekli E. Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy. Cureus 2024; 16:e59960. [PMID: 38726360 PMCID: PMC11080394 DOI: 10.7759/cureus.59960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/09/2024] [Indexed: 05/12/2024] Open
Abstract
Background Large language models (LLMs), such as ChatGPT-4, Gemini, and Microsoft Copilot, have been instrumental in various domains, including healthcare, where they enhance health literacy and aid in patient decision-making. Given the complexities involved in breast imaging procedures, accurate and comprehensible information is vital for patient engagement and compliance. This study aims to evaluate the readability and accuracy of the information provided by three prominent LLMs, ChatGPT-4, Gemini, and Microsoft Copilot, in response to frequently asked questions in breast imaging, assessing their potential to improve patient understanding and facilitate healthcare communication. Methodology We collected the most common questions on breast imaging from clinical practice and posed them to LLMs. We then evaluated the responses in terms of readability and accuracy. Responses from LLMs were analyzed for readability using the Flesch Reading Ease and Flesch-Kincaid Grade Level tests and for accuracy through a radiologist-developed Likert-type scale. Results The study found significant variations among LLMs. Gemini and Microsoft Copilot scored higher on readability scales (p < 0.001), indicating their responses were easier to understand. In contrast, ChatGPT-4 demonstrated greater accuracy in its responses (p < 0.001). Conclusions While LLMs such as ChatGPT-4 show promise in providing accurate responses, readability issues may limit their utility in patient education. Conversely, Gemini and Microsoft Copilot, despite being less accurate, are more accessible to a broader patient audience. Ongoing adjustments and evaluations of these models are essential to ensure they meet the diverse needs of patients, emphasizing the need for continuous improvement and oversight in the deployment of artificial intelligence technologies in healthcare.
Collapse
Affiliation(s)
- Murat Tepe
- Radiology, Mediclinic City Hospital, Dubai, ARE
| | - Emre Emekli
- Radiology, Eskişehir Osmangazi University Health Practice and Research Hospital, Eskişehir, TUR
| |
Collapse
|
78
|
Wessel D, Pogrebnyakov N. Using Social Media as a Source of Real-World Data for Pharmaceutical Drug Development and Regulatory Decision Making. Drug Saf 2024; 47:495-511. [PMID: 38446405 PMCID: PMC11018692 DOI: 10.1007/s40264-024-01409-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/07/2024] [Indexed: 03/07/2024]
Abstract
INTRODUCTION While pharmaceutical companies aim to leverage real-world data (RWD) to bridge the gap between clinical drug development and real-world patient outcomes, extant research has mainly focused on the use of social media in a post-approval safety-surveillance setting. Recent regulatory and technological developments indicate that social media may serve as a rich source to expand the evidence base to pre-approval and drug development activities. However, use cases related to drug development have been largely omitted, thereby missing some of the benefits of RWD. In addition, an applied end-to-end understanding of RWD rooted in both industry and regulations is lacking. OBJECTIVE We aimed to investigate how social media can be used as a source of RWD to support regulatory decision making and drug development in the pharmaceutical industry. We aimed to specifically explore the data pipeline and examine how social-media derived RWD can align with regulatory guidance from the US Food and Drug Administration and industry needs. METHODS A machine learning pipeline was developed to extract patient insights related to anticoagulants from X (Twitter) data. These findings were then analysed from an industry perspective, and complemented by interviews with professionals from a pharmaceutical company. RESULTS The analysis reveals several use cases where RWD derived from social media can be beneficial, particularly in generating hypotheses around patient and therapeutic area needs. We also note certain limitations of social media data, particularly around inferring causality. CONCLUSIONS Social media display considerable potential as a source of RWD for guiding efforts in pharmaceutical drug development and pre-approval settings. Although further regulatory guidance on the use of social media for RWD is needed to encourage its use, regulatory and technological developments are suggested to warrant at least exploratory uses for drug development.
Collapse
Affiliation(s)
- Didrik Wessel
- Copenhagen Business School, Frederiksberg, Denmark.
- , Nørrebrogade 18A 3TH, 2200, Copenhagen N, Denmark.
| | | |
Collapse
|
79
|
Dağci M, Çam F, Dost A. Reliability and Quality of the Nursing Care Planning Texts Generated by ChatGPT. Nurse Educ 2024; 49:E109-E114. [PMID: 37994523 DOI: 10.1097/nne.0000000000001566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2023]
Abstract
BACKGROUND The research on ChatGPT-generated nursing care planning texts is critical for enhancing nursing education through innovative and accessible learning methods, improving reliability and quality. PURPOSE The aim of the study was to examine the quality, authenticity, and reliability of the nursing care planning texts produced using ChatGPT. METHODS The study sample comprised 40 texts generated by ChatGPT selected nursing diagnoses that were included in NANDA 2021-2023. The texts were evaluated by using descriptive criteria form and DISCERN tool to evaluate health information. RESULTS DISCERN total average score of the texts was 45.93 ± 4.72. All texts had a moderate level of reliability and 97.5% of them provided moderate quality subscale score of information. A statistically significant relationship was found among the number of accessible references, reliability ( r = 0.408) and quality subscale score ( r = 0.379) of the texts ( P < .05). CONCLUSION ChatGPT-generated texts exhibited moderate reliability, quality of nursing care information, and overall quality despite low similarity rates.
Collapse
Affiliation(s)
- Mahmut Dağci
- Author Affiliation: Department of Nursing, Bezmialem Vakif University, Faculty of Health Sciences, Istanbul, Turkey
| | | | | |
Collapse
|
80
|
Yilmaz Muluk S. Enhancing Musculoskeletal Injection Safety: Evaluating Checklists Generated by Artificial Intelligence and Revising the Preformed Checklist. Cureus 2024; 16:e59708. [PMID: 38841023 PMCID: PMC11150897 DOI: 10.7759/cureus.59708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/02/2024] [Indexed: 06/07/2024] Open
Abstract
Background Musculoskeletal disorders are a significant global health issue, necessitating advanced management strategies such as intra-articular and extra-articular injections to alleviate pain, inflammation, and mobility challenges. As the adoption of these interventions by physicians grows, the importance of robust safety protocols becomes paramount. This study evaluates the effectiveness of conversational artificial intelligence (AI), particularly versions 3.5 and 4 of Chat Generative Pre-trained Transformer (ChatGPT), in creating patient safety checklists for managing musculoskeletal injections to enhance the preparation of safety documentation. Methodology A quantitative analysis was conducted to evaluate AI-generated safety checklists against a preformed checklist adapted from reputable medical sources. Adherence of the generated checklists to the preformed checklist was calculated and classified. The Wilcoxon signed-rank test was used to assess the performance differences between ChatGPT versions 3.5 and 4. Results ChatGPT-4 showed superior adherence to the preformed checklist compared to ChatGPT-3.5, with both versions classified as very good in safety protocol creation. Although no significant differences were present in the sign-in and sign-out parts of the checklists of both versions, ChatGPT-4 had significantly higher scores in the procedure planning part (p = 0.007), and its overall performance was also higher (p < 0.001). Subsequently, the preformed checklist was revised to incorporate new contributions from ChatGPT. Conclusions ChatGPT, especially version 4, proved effective in generating patient safety checklists for musculoskeletal injections, highlighting the potential of AI to streamline clinical practices. Further enhancements are necessary to fully meet the medical standards.
Collapse
|
81
|
Tippareddy C, Faraji N, Awan OA. The Application of ChatGPT to Enhance Medical Education. Acad Radiol 2024; 31:2185-2187. [PMID: 38724132 DOI: 10.1016/j.acra.2023.04.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 04/14/2023] [Indexed: 06/15/2024]
Affiliation(s)
- Charit Tippareddy
- University Hospitals Cleveland Medical Center, Cleveland, Ohio (C.T., N.F.)
| | - Navid Faraji
- University Hospitals Cleveland Medical Center, Cleveland, Ohio (C.T., N.F.)
| | - Omer A Awan
- University of Maryland School of Medicine, 655 W Baltimore St, Baltimore, MD 21201 (O.A.A.).
| |
Collapse
|
82
|
Pinto DS, Noronha SM, Saigal G, Quencer RM. Comparison of an AI-Generated Case Report With a Human-Written Case Report: Practical Considerations for AI-Assisted Medical Writing. Cureus 2024; 16:e60461. [PMID: 38883028 PMCID: PMC11179998 DOI: 10.7759/cureus.60461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/15/2024] [Indexed: 06/18/2024] Open
Abstract
INTRODUCTION The utility of ChatGPT has recently caused consternation in the medical world. While it has been utilized to write manuscripts, only a few studies have evaluated the quality of manuscripts generated by AI (artificial intelligence). OBJECTIVE We evaluate the ability of ChatGPT to write a case report when provided with a framework. We also provide practical considerations for manuscript writing using AI. METHODS We compared a manuscript written by a blinded human author (10 years of medical experience) with a manuscript written by ChatGPT on a rare presentation of a common disease. We used multiple iterations of the manuscript generation request to derive the best ChatGPT output. Participants, outcomes, and measures: 22 human reviewers compared the manuscripts using parameters that characterize human writing and relevant standard manuscript assessment criteria, viz., scholarly impact quotient (SIQ). We also compared the manuscripts using the "average perplexity score" (APS), "burstiness score" (BS), and "highest perplexity of a sentence" (GPTZero parameters to detect AI-generated content). RESULTS The human manuscript had a significantly higher quality of presentation and nuanced writing (p<0.05). Both manuscripts had a logical flow. 12/22 reviewers were able to identify the AI-generated manuscript (p<0.05), but 4/22 reviewers wrongly identified the human-written manuscript as AI-generated. GPTZero software erroneously identified four sentences of the human-written manuscript to be AI-generated. CONCLUSION Though AI showed an ability to highlight the novelty of the case report and project a logical flow comparable to the human manuscript, it could not outperform the human writer on all parameters. The human manuscript showed a better quality of presentation and more nuanced writing. The practical considerations we provide for AI-assisted medical writing will help to better utilize AI in manuscript writing.
Collapse
Affiliation(s)
| | | | - Gaurav Saigal
- Radiology, University of Miami Miller School of Medicine, Miami, USA
| | - Robert M Quencer
- Radiology, University of Miami Miller School of Medicine, Miami, USA
| |
Collapse
|
83
|
Brady AP, Allen B, Chong J, Kotter E, Kottler N, Mongan J, Oakden-Rayner L, Dos Santos DP, Tang A, Wald C, Slavotinek J. Developing, Purchasing, Implementing and Monitoring AI Tools in Radiology: Practical Considerations. A Multi-Society Statement From the ACR, CAR, ESR, RANZCR & RSNA. Can Assoc Radiol J 2024; 75:226-244. [PMID: 38251882 DOI: 10.1177/08465371231222229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2024] Open
Abstract
Artificial Intelligence (AI) carries the potential for unprecedented disruption in radiology, with possible positive and negative consequences. The integration of AI in radiology holds the potential to revolutionize healthcare practices by advancing diagnosis, quantification, and management of multiple medical conditions. Nevertheless, the ever‑growing availability of AI tools in radiology highlights an increasing need to critically evaluate claims for its utility and to differentiate safe product offerings from potentially harmful, or fundamentally unhelpful ones. This multi‑society paper, presenting the views of Radiology Societies in the USA, Canada, Europe, Australia, and New Zealand, defines the potential practical problems and ethical issues surrounding the incorporation of AI into radiological practice. In addition to delineating the main points of concern that developers, regulators, and purchasers of AI tools should consider prior to their introduction into clinical practice, this statement also suggests methods to monitor their stability and safety in clinical use, and their suitability for possible autonomous function. This statement is intended to serve as a useful summary of the practical issues which should be considered by all parties involved in the development of radiology AI resources, and their implementation as clinical tools.
Collapse
Affiliation(s)
| | - Bibb Allen
- Department of Radiology, Grandview Medical Center, Birmingham, AL, USA
- Data Science Institute, American College of Radiology, Reston, VA, USA
| | - Jaron Chong
- Department of Medical Imaging, Schulich School of Medicine and Dentistry, Western University, London, ON, Canada
| | - Elmar Kotter
- Department of Diagnostic and Interventional Radiology, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Nina Kottler
- Radiology Partners, El Segundo, CA, USA
- Stanford Center for Artificial Intelligence in Medicine & Imaging, Palo Alto, CA, USA
| | - John Mongan
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, CA, USA
| | - Lauren Oakden-Rayner
- Australian Institute for Machine Learning, University of Adelaide, Adelaide, SA, Australia
| | - Daniel Pinto Dos Santos
- Department of Radiology, University Hospital of Cologne, Cologne, Germany
- Department of Radiology, University Hospital of Frankfurt, Frankfurt, Germany
| | - An Tang
- Department of Radiology, Radiation Oncology, and Nuclear Medicine, Université de Montréal, Montréal, QC, Canada
| | - Christoph Wald
- Department of Radiology, Lahey Hospital & Medical Center, Burlington, MA, USA
- Tufts University Medical School, Boston, MA, USA
- American College of Radiology, Reston, VA, USA
| | - John Slavotinek
- South Australia Medical Imaging, Flinders Medical Centre Adelaide, SA, Australia
- College of Medicine and Public Health, Flinders University, Adelaide, SA, Australia
| |
Collapse
|
84
|
Simms RC. Work With ChatGPT, Not Against: 3 Teaching Strategies That Harness the Power of Artificial Intelligence. Nurse Educ 2024; 49:158-161. [PMID: 38502607 DOI: 10.1097/nne.0000000000001634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/21/2024]
Abstract
BACKGROUND Technological advances have expanded nursing education to include generative artificial intelligence (AI) tools such as ChatGPT. PROBLEM Generative AI tools challenge academic integrity, pose a challenge to validating information accuracy, and require strategies to ensure the credibility of AI-generated information. APPROACH This article presents a dual-purpose approach integrating AI tools into prelicensure nursing education to enhance learning while promoting critical evaluation skills. Constructivist theories and Vygotsky's Zone of Proximal Development framework support this integration, with AI as a scaffold for developing critical thinking. OUTCOMES The approach involves practical activities for students to engage with AI-generated content critically, thereby reinforcing clinical judgment and preparing them for AI-prevalent health care environments. CONCLUSIONS Incorporating AI tools such as ChatGPT into nursing curricula represents a strategic educational advancement, equipping students with essential skills to navigate modern health care.
Collapse
Affiliation(s)
- Rachel Cox Simms
- Author Affiliation: Assistant Professor, School of Nursing, MGH Institute of Health Professions, Boston, Massachusetts
| |
Collapse
|
85
|
Schlussel L, Samaan JS, Chan Y, Chang B, Yeo YH, Ng WH, Rezaie A. Evaluating the accuracy and reproducibility of ChatGPT-4 in answering patient questions related to small intestinal bacterial overgrowth. Artif Intell Gastroenterol 2024; 5:90503. [DOI: 10.35712/aig.v5.i1.90503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 03/27/2024] [Accepted: 04/16/2024] [Indexed: 04/29/2024] Open
Abstract
BACKGROUND Small intestinal bacterial overgrowth (SIBO) poses diagnostic and treatment challenges due to its complex management and evolving guidelines. Patients often seek online information related to their health, prompting interest in large language models, like GPT-4, as potential sources of patient education.
AIM To investigate ChatGPT-4's accuracy and reproducibility in responding to patient questions related to SIBO.
METHODS A total of 27 patient questions related to SIBO were curated from professional societies, Facebook groups, and Reddit threads. Each question was entered into GPT-4 twice on separate days to examine reproducibility of accuracy on separate occasions. GPT-4 generated responses were independently evaluated for accuracy and reproducibility by two motility fellowship-trained gastroenterologists. A third senior fellowship-trained gastroenterologist resolved disagreements. Accuracy of responses were graded using the scale: (1) Comprehensive; (2) Correct but inadequate; (3) Some correct and some incorrect; or (4) Completely incorrect. Two responses were generated for every question to evaluate reproducibility in accuracy.
RESULTS In evaluating GPT-4's effectiveness at answering SIBO-related questions, it provided responses with correct information to 18/27 (66.7%) of questions, with 16/27 (59.3%) of responses graded as comprehensive and 2/27 (7.4%) responses graded as correct but inadequate. The model provided responses with incorrect information to 9/27 (33.3%) of questions, with 4/27 (14.8%) of responses graded as completely incorrect and 5/27 (18.5%) of responses graded as mixed correct and incorrect data. Accuracy varied by question category, with questions related to “basic knowledge” achieving the highest proportion of comprehensive responses (90%) and no incorrect responses. On the other hand, the “treatment” related questions yielded the lowest proportion of comprehensive responses (33.3%) and highest percent of completely incorrect responses (33.3%). A total of 77.8% of questions yielded reproducible responses.
CONCLUSION Though GPT-4 shows promise as a supplementary tool for SIBO-related patient education, the model requires further refinement and validation in subsequent iterations prior to its integration into patient care.
Collapse
Affiliation(s)
- Lauren Schlussel
- Division of Gastroenterology and Hepatology, Cedars-Sinai Medical Center, Los Angeles, CA 90048, United States
| | - Jamil S Samaan
- Division of Gastroenterology and Hepatology, Cedars-Sinai Medical Center, Los Angeles, CA 90048, United States
| | - Yin Chan
- Division of Gastroenterology and Hepatology, Cedars-Sinai Medical Center, Los Angeles, CA 90048, United States
| | - Bianca Chang
- Division of Gastroenterology and Hepatology, Cedars-Sinai Medical Center, Los Angeles, CA 90048, United States
| | - Yee Hui Yeo
- Division of Gastroenterology and Hepatology, Cedars-Sinai Medical Center, Los Angeles, CA 90048, United States
| | - Wee Han Ng
- Bristol Medical School, University of Bristol, BS8 1TH, Bristol, United Kingdom
| | - Ali Rezaie
- Division of Gastroenterology and Hepatology, Cedars-Sinai Medical Center, Los Angeles, CA 90048, United States
- Medically Associated Science and Technology Program, Cedars-Sinai Medical Center, Los Angeles, CA 90048, United States
| |
Collapse
|
86
|
Raman R, Lathabai HH, Mandal S, Das P, Kaur T, Nedungadi P. ChatGPT: Literate or intelligent about UN sustainable development goals? PLoS One 2024; 19:e0297521. [PMID: 38656952 PMCID: PMC11042716 DOI: 10.1371/journal.pone.0297521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 01/05/2024] [Indexed: 04/26/2024] Open
Abstract
Generative AI tools, such as ChatGPT, are progressively transforming numerous sectors, demonstrating a capacity to impact human life dramatically. This research seeks to evaluate the UN Sustainable Development Goals (SDGs) literacy of ChatGPT, which is crucial for diverse stakeholders involved in SDG-related policies. Experimental outcomes from two widely used Sustainability Assessment tests-the UN SDG Fitness Test and Sustainability Literacy Test (SULITEST) - suggest that ChatGPT exhibits high SDG literacy, yet its comprehensive SDG intelligence needs further exploration. The Fitness Test gauges eight vital competencies across introductory, intermediate, and advanced levels. Accurate mapping of these to the test questions is essential for partial evaluation of SDG intelligence. To assess SDG intelligence, the questions from both tests were mapped to 17 SDGs and eight cross-cutting SDG core competencies, but both test questionnaires were found to be insufficient. SULITEST could satisfactorily map only 5 out of 8 competencies, whereas the Fitness Test managed to map 6 out of 8. Regarding the coverage of the Fitness Test and SULITEST, their mapping to the 17 SDGs, both tests fell short. Most SDGs were underrepresented in both instruments, with certain SDGs not represented at all. Consequently, both tools proved ineffective in assessing SDG intelligence through SDG coverage. The study recommends future versions of ChatGPT to enhance competencies such as collaboration, critical thinking, systems thinking, and others to achieve the SDGs. It concludes that while AI models like ChatGPT hold considerable potential in sustainable development, their usage must be approached carefully, considering current limitations and ethical implications.
Collapse
Affiliation(s)
- Raghu Raman
- Amrita School of Business, Amrita Vishwa Vidyapeetham, Amritapuri, Kerala, India
| | | | - Santanu Mandal
- Amrita School of Business, Amaravati, Andhra Pradesh, India
| | - Payel Das
- Amrita School of Business, Amaravati, Andhra Pradesh, India
| | - Tavleen Kaur
- Fortune Institute of International Business, New Delhi, India
| | | |
Collapse
|
87
|
Wu C, Chen L, Han M, Li Z, Yang N, Yu C. Application of ChatGPT-based blended medical teaching in clinical education of hepatobiliary surgery. MEDICAL TEACHER 2024:1-5. [PMID: 38614458 DOI: 10.1080/0142159x.2024.2339412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 04/02/2024] [Indexed: 04/15/2024]
Abstract
OBJECTIVE This study evaluates the effectiveness of incorporating the Chat Generative Pre-trained Transformer (ChatGPT) into the clinical teaching of hepatobiliary surgery for undergraduate medical students. MATERIALS AND METHODS A group of 61 medical undergraduates from the Affiliated Hospital of Guizhou Medical University, undergoing hepatobiliary surgery training, were randomly assigned to either an experimental group (31 students) using ChatGPT-based blended teaching or a control group (30 students) with traditional teaching methods. The evaluation metrics included final exam scores, teaching satisfaction, and teaching effectiveness ratings, analyzed using SPSS 26.0 (SPSS Inc., Chicago, IL) with t-tests and χ2 tests. RESULTS The experimental group significantly outperformed the control group in final exam theoretical scores (86.44 ± 5.59 vs. 77.86 ± 4.16, p < .001) and clinical skills scores (83.84 ± 6.13 vs. 79.12 ± 4.27, p = .001). Additionally, the experimental group reported higher teaching satisfaction (17.23 ± 1.33) and self-evaluation of teaching effectiveness (9.14 ± 0.54) compared to the control group (15.38 ± 1.5 and 8.46 ± 0.70, respectively, p < .001). CONCLUSIONS The integration of ChatGPT into hepatobiliary surgery education significantly enhances theoretical knowledge, clinical skills, and overall satisfaction among medical undergraduates, suggesting a beneficial impact on their educational development.
Collapse
Affiliation(s)
- Changhao Wu
- Department of Hepatobiliary Surgery, The Affiliated Hospital of Guizhou Medical University, Guizhou Medical University, Guiyang, China
- Department of Surgery, Guizhou Medical University, Guiyang, China
- College of Clinical Medicine, Guizhou Medical University, Guiyang, China
- Guizhou Provincial Institute of Hepatobiliary, Pancreatic and Splenic Diseases, Guiyang, China
| | - Liwen Chen
- Department of Hepatobiliary Surgery, The Affiliated Hospital of Guizhou Medical University, Guizhou Medical University, Guiyang, China
- Department of Surgery, Guizhou Medical University, Guiyang, China
- College of Clinical Medicine, Guizhou Medical University, Guiyang, China
- Guizhou Provincial Institute of Hepatobiliary, Pancreatic and Splenic Diseases, Guiyang, China
| | - Min Han
- Department of Hepatobiliary Surgery, The Affiliated Hospital of Guizhou Medical University, Guizhou Medical University, Guiyang, China
- Department of Surgery, Guizhou Medical University, Guiyang, China
- College of Clinical Medicine, Guizhou Medical University, Guiyang, China
- Guizhou Provincial Institute of Hepatobiliary, Pancreatic and Splenic Diseases, Guiyang, China
| | - Zhu Li
- Department of Hepatobiliary Surgery, The Affiliated Hospital of Guizhou Medical University, Guizhou Medical University, Guiyang, China
- Department of Surgery, Guizhou Medical University, Guiyang, China
- College of Clinical Medicine, Guizhou Medical University, Guiyang, China
- Guizhou Provincial Institute of Hepatobiliary, Pancreatic and Splenic Diseases, Guiyang, China
| | - Nenghong Yang
- Department of Hepatobiliary Surgery, The Affiliated Hospital of Guizhou Medical University, Guizhou Medical University, Guiyang, China
- Department of Surgery, Guizhou Medical University, Guiyang, China
- College of Clinical Medicine, Guizhou Medical University, Guiyang, China
- Guizhou Provincial Institute of Hepatobiliary, Pancreatic and Splenic Diseases, Guiyang, China
| | - Chao Yu
- Department of Hepatobiliary Surgery, The Affiliated Hospital of Guizhou Medical University, Guizhou Medical University, Guiyang, China
- Department of Surgery, Guizhou Medical University, Guiyang, China
- College of Clinical Medicine, Guizhou Medical University, Guiyang, China
- Guizhou Provincial Institute of Hepatobiliary, Pancreatic and Splenic Diseases, Guiyang, China
| |
Collapse
|
88
|
Wu Y, Zheng Y, Feng B, Yang Y, Kang K, Zhao A. Embracing ChatGPT for Medical Education: Exploring Its Impact on Doctors and Medical Students. JMIR MEDICAL EDUCATION 2024; 10:e52483. [PMID: 38598263 PMCID: PMC11043925 DOI: 10.2196/52483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 11/03/2023] [Accepted: 01/17/2024] [Indexed: 04/11/2024]
Abstract
ChatGPT (OpenAI), a cutting-edge natural language processing model, holds immense promise for revolutionizing medical education. With its remarkable performance in language-related tasks, ChatGPT offers personalized and efficient learning experiences for medical students and doctors. Through training, it enhances clinical reasoning and decision-making skills, leading to improved case analysis and diagnosis. The model facilitates simulated dialogues, intelligent tutoring, and automated question-answering, enabling the practical application of medical knowledge. However, integrating ChatGPT into medical education raises ethical and legal concerns. Safeguarding patient data and adhering to data protection regulations are critical. Transparent communication with students, physicians, and patients is essential to ensure their understanding of the technology's purpose and implications, as well as the potential risks and benefits. Maintaining a balance between personalized learning and face-to-face interactions is crucial to avoid hindering critical thinking and communication skills. Despite challenges, ChatGPT offers transformative opportunities. Integrating it with problem-based learning, team-based learning, and case-based learning methodologies can further enhance medical education. With proper regulation and supervision, ChatGPT can contribute to a well-rounded learning environment, nurturing skilled and knowledgeable medical professionals ready to tackle health care challenges. By emphasizing ethical considerations and human-centric approaches, ChatGPT's potential can be fully harnessed in medical education, benefiting both students and patients alike.
Collapse
Affiliation(s)
- Yijun Wu
- Cancer Center, West China Hospital, Sichuan University, Chengdu, China
- Laboratory of Clinical Cell Therapy, West China Hospital, Sichuan University, Chengdu, China
| | - Yue Zheng
- Cancer Center, West China Hospital, Sichuan University, Chengdu, China
- Laboratory of Clinical Cell Therapy, West China Hospital, Sichuan University, Chengdu, China
| | - Baijie Feng
- West China School of Medicine, Sichuan University, Chengdu, China
| | - Yuqi Yang
- West China School of Medicine, Sichuan University, Chengdu, China
| | - Kai Kang
- Cancer Center, West China Hospital, Sichuan University, Chengdu, China
- Laboratory of Clinical Cell Therapy, West China Hospital, Sichuan University, Chengdu, China
| | - Ailin Zhao
- Department of Hematology, West China Hospital, Sichuan University, Chengdu, China
| |
Collapse
|
89
|
Gande S, Gould M, Ganti L. Bibliometric analysis of ChatGPT in medicine. Int J Emerg Med 2024; 17:50. [PMID: 38575866 PMCID: PMC10993428 DOI: 10.1186/s12245-024-00624-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 03/19/2024] [Indexed: 04/06/2024] Open
Abstract
INTRODUCTION The emergence of artificial intelligence (AI) chat programs has opened two distinct paths, one enhancing interaction and another potentially replacing personal understanding. Ethical and legal concerns arise due to the rapid development of these programs. This paper investigates academic discussions on AI in medicine, analyzing the context, frequency, and reasons behind these conversations. METHODS The study collected data from the Web of Science database on articles containing the keyword "ChatGPT" published from January to September 2023, resulting in 786 medically related journal articles. The inclusion criteria were peer-reviewed articles in English related to medicine. RESULTS The United States led in publications (38.1%), followed by India (15.5%) and China (7.0%). Keywords such as "patient" (16.7%), "research" (12%), and "performance" (10.6%) were prevalent. The Cureus Journal of Medical Science (11.8%) had the most publications, followed by the Annals of Biomedical Engineering (8.3%). August 2023 had the highest number of publications (29.3%), with significant growth between February to March and April to May. Medical General Internal (21.0%) was the most common category, followed by Surgery (15.4%) and Radiology (7.9%). DISCUSSION The prominence of India in ChatGPT research, despite lower research funding, indicates the platform's popularity and highlights the importance of monitoring its use for potential medical misinformation. China's interest in ChatGPT research suggests a focus on Natural Language Processing (NLP) AI applications, despite public bans on the platform. Cureus' success in publishing ChatGPT articles can be attributed to its open-access, rapid publication model. The study identifies research trends in plastic surgery, radiology, and obstetric gynecology, emphasizing the need for ethical considerations and reliability assessments in the application of ChatGPT in medical practice. CONCLUSION ChatGPT's presence in medical literature is growing rapidly across various specialties, but concerns related to safety, privacy, and accuracy persist. More research is needed to assess its suitability for patient care and implications for non-medical use. Skepticism and thorough review of research are essential, as current studies may face retraction as more information emerges.
Collapse
Affiliation(s)
| | | | - Latha Ganti
- University of Central Florida, Orlando, FL, USA.
- Warren Alpert Medical School of Brown University, RI Providence, USA.
| |
Collapse
|
90
|
Alanezi F. Examining the role of ChatGPT in promoting health behaviors and lifestyle changes among cancer patients. Nutr Health 2024:2601060241244563. [PMID: 38567408 DOI: 10.1177/02601060241244563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Purpose: This study aims to investigate the role of ChatGPT in promoting health behavioral changes among cancer patients. Methods: A quasi-experiment design with qualitative approach was adopted in this study, as the ChatGPT technology is novel, and many people are unaware of it. The participants included outpatients at a public hospital. An experiment was carried out, where the participants used ChatGPT for seeking cancer related information for two weeks, which is then followed by focus group (FG) discussions. A total of 72 outpatients participated in ten focus groups. Results: Three main themes with 14 sub-themes were identified reflecting the role of ChatGPT in promoting health behavior changes. Its prominent role was observed in developing health literacy, promoting self-management of conditions through emotional, informational, motivational support. Three challenges including privacy, lack of personalization, and reliability issues were identified. Conclusion: Although ChatGPT has a huge potential in promoting health behavior changes among cancer patients, its ability is minimized by several factors such as regulatory, reliability, and privacy issues. There is a need for further evidence to generalize the results across the regions.
Collapse
Affiliation(s)
- Fahad Alanezi
- College of Business Administration, Department Management Information Systems, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| |
Collapse
|
91
|
van Diessen E, van Amerongen RA, Zijlmans M, Otte WM. Potential merits and flaws of large language models in epilepsy care: A critical review. Epilepsia 2024; 65:873-886. [PMID: 38305763 DOI: 10.1111/epi.17907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 12/30/2023] [Accepted: 01/19/2024] [Indexed: 02/03/2024]
Abstract
The current pace of development and applications of large language models (LLMs) is unprecedented and will impact future medical care significantly. In this critical review, we provide the background to better understand these novel artificial intelligence (AI) models and how LLMs can be of future use in the daily care of people with epilepsy. Considering the importance of clinical history taking in diagnosing and monitoring epilepsy-combined with the established use of electronic health records-a great potential exists to integrate LLMs in epilepsy care. We present the current available LLM studies in epilepsy. Furthermore, we highlight and compare the most commonly used LLMs and elaborate on how these models can be applied in epilepsy. We further discuss important drawbacks and risks of LLMs, and we provide recommendations for overcoming these limitations.
Collapse
Affiliation(s)
- Eric van Diessen
- Department of Child Neurology, UMC Utrecht Brain Center, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands
- Department of Pediatrics, Franciscus Gasthuis & Vlietland, Rotterdam, The Netherlands
| | - Ramon A van Amerongen
- Faculty of Science, Bioinformatics and Biocomplexity, Utrecht University, Utrecht, The Netherlands
| | - Maeike Zijlmans
- Department of Neurology and Neurosurgery, UMC Utrecht Brain Center, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands
- Stichting Epilepsie Instellingen Nederland, Heemstede, The Netherlands
| | - Willem M Otte
- Department of Child Neurology, UMC Utrecht Brain Center, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
92
|
Gertz RJ, Dratsch T, Bunck AC, Lennartz S, Iuga AI, Hellmich MG, Persigehl T, Pennig L, Gietzen CH, Fervers P, Maintz D, Hahnfeldt R, Kottlors J. Potential of GPT-4 for Detecting Errors in Radiology Reports: Implications for Reporting Accuracy. Radiology 2024; 311:e232714. [PMID: 38625012 DOI: 10.1148/radiol.232714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2024]
Abstract
Background Errors in radiology reports may occur because of resident-to-attending discrepancies, speech recognition inaccuracies, and large workload. Large language models, such as GPT-4 (ChatGPT; OpenAI), may assist in generating reports. Purpose To assess effectiveness of GPT-4 in identifying common errors in radiology reports, focusing on performance, time, and cost-efficiency. Materials and Methods In this retrospective study, 200 radiology reports (radiography and cross-sectional imaging [CT and MRI]) were compiled between June 2023 and December 2023 at one institution. There were 150 errors from five common error categories (omission, insertion, spelling, side confusion, and other) intentionally inserted into 100 of the reports and used as the reference standard. Six radiologists (two senior radiologists, two attending physicians, and two residents) and GPT-4 were tasked with detecting these errors. Overall error detection performance, error detection in the five error categories, and reading time were assessed using Wald χ2 tests and paired-sample t tests. Results GPT-4 (detection rate, 82.7%;124 of 150; 95% CI: 75.8, 87.9) matched the average detection performance of radiologists independent of their experience (senior radiologists, 89.3% [134 of 150; 95% CI: 83.4, 93.3]; attending physicians, 80.0% [120 of 150; 95% CI: 72.9, 85.6]; residents, 80.0% [120 of 150; 95% CI: 72.9, 85.6]; P value range, .522-.99). One senior radiologist outperformed GPT-4 (detection rate, 94.7%; 142 of 150; 95% CI: 89.8, 97.3; P = .006). GPT-4 required less processing time per radiology report than the fastest human reader in the study (mean reading time, 3.5 seconds ± 0.5 [SD] vs 25.1 seconds ± 20.1, respectively; P < .001; Cohen d = -1.08). The use of GPT-4 resulted in lower mean correction cost per report than the most cost-efficient radiologist ($0.03 ± 0.01 vs $0.42 ± 0.41; P < .001; Cohen d = -1.12). Conclusion The radiology report error detection rate of GPT-4 was comparable with that of radiologists, potentially reducing work hours and cost. © RSNA, 2024 See also the editorial by Forman in this issue.
Collapse
Affiliation(s)
- Roman Johannes Gertz
- From the Institute of Diagnostic and Interventional Radiology (R.J.G., T.D., A.C.B., S.L., A.I.I., T.P., L.P., C.H.G., P.F., D.M., R.H., J.K.) and Institute of Medical Statistics and Bioinformatics (M.G.H.), Faculty of Medicine, University Hospital Cologne, University of Cologne, Kerpener Strasse 62, 50937 Cologne, Germany
| | - Thomas Dratsch
- From the Institute of Diagnostic and Interventional Radiology (R.J.G., T.D., A.C.B., S.L., A.I.I., T.P., L.P., C.H.G., P.F., D.M., R.H., J.K.) and Institute of Medical Statistics and Bioinformatics (M.G.H.), Faculty of Medicine, University Hospital Cologne, University of Cologne, Kerpener Strasse 62, 50937 Cologne, Germany
| | - Alexander Christian Bunck
- From the Institute of Diagnostic and Interventional Radiology (R.J.G., T.D., A.C.B., S.L., A.I.I., T.P., L.P., C.H.G., P.F., D.M., R.H., J.K.) and Institute of Medical Statistics and Bioinformatics (M.G.H.), Faculty of Medicine, University Hospital Cologne, University of Cologne, Kerpener Strasse 62, 50937 Cologne, Germany
| | - Simon Lennartz
- From the Institute of Diagnostic and Interventional Radiology (R.J.G., T.D., A.C.B., S.L., A.I.I., T.P., L.P., C.H.G., P.F., D.M., R.H., J.K.) and Institute of Medical Statistics and Bioinformatics (M.G.H.), Faculty of Medicine, University Hospital Cologne, University of Cologne, Kerpener Strasse 62, 50937 Cologne, Germany
| | - Andra-Iza Iuga
- From the Institute of Diagnostic and Interventional Radiology (R.J.G., T.D., A.C.B., S.L., A.I.I., T.P., L.P., C.H.G., P.F., D.M., R.H., J.K.) and Institute of Medical Statistics and Bioinformatics (M.G.H.), Faculty of Medicine, University Hospital Cologne, University of Cologne, Kerpener Strasse 62, 50937 Cologne, Germany
| | - Martin Gunnar Hellmich
- From the Institute of Diagnostic and Interventional Radiology (R.J.G., T.D., A.C.B., S.L., A.I.I., T.P., L.P., C.H.G., P.F., D.M., R.H., J.K.) and Institute of Medical Statistics and Bioinformatics (M.G.H.), Faculty of Medicine, University Hospital Cologne, University of Cologne, Kerpener Strasse 62, 50937 Cologne, Germany
| | - Thorsten Persigehl
- From the Institute of Diagnostic and Interventional Radiology (R.J.G., T.D., A.C.B., S.L., A.I.I., T.P., L.P., C.H.G., P.F., D.M., R.H., J.K.) and Institute of Medical Statistics and Bioinformatics (M.G.H.), Faculty of Medicine, University Hospital Cologne, University of Cologne, Kerpener Strasse 62, 50937 Cologne, Germany
| | - Lenhard Pennig
- From the Institute of Diagnostic and Interventional Radiology (R.J.G., T.D., A.C.B., S.L., A.I.I., T.P., L.P., C.H.G., P.F., D.M., R.H., J.K.) and Institute of Medical Statistics and Bioinformatics (M.G.H.), Faculty of Medicine, University Hospital Cologne, University of Cologne, Kerpener Strasse 62, 50937 Cologne, Germany
| | - Carsten Herbert Gietzen
- From the Institute of Diagnostic and Interventional Radiology (R.J.G., T.D., A.C.B., S.L., A.I.I., T.P., L.P., C.H.G., P.F., D.M., R.H., J.K.) and Institute of Medical Statistics and Bioinformatics (M.G.H.), Faculty of Medicine, University Hospital Cologne, University of Cologne, Kerpener Strasse 62, 50937 Cologne, Germany
| | - Philipp Fervers
- From the Institute of Diagnostic and Interventional Radiology (R.J.G., T.D., A.C.B., S.L., A.I.I., T.P., L.P., C.H.G., P.F., D.M., R.H., J.K.) and Institute of Medical Statistics and Bioinformatics (M.G.H.), Faculty of Medicine, University Hospital Cologne, University of Cologne, Kerpener Strasse 62, 50937 Cologne, Germany
| | - David Maintz
- From the Institute of Diagnostic and Interventional Radiology (R.J.G., T.D., A.C.B., S.L., A.I.I., T.P., L.P., C.H.G., P.F., D.M., R.H., J.K.) and Institute of Medical Statistics and Bioinformatics (M.G.H.), Faculty of Medicine, University Hospital Cologne, University of Cologne, Kerpener Strasse 62, 50937 Cologne, Germany
| | - Robert Hahnfeldt
- From the Institute of Diagnostic and Interventional Radiology (R.J.G., T.D., A.C.B., S.L., A.I.I., T.P., L.P., C.H.G., P.F., D.M., R.H., J.K.) and Institute of Medical Statistics and Bioinformatics (M.G.H.), Faculty of Medicine, University Hospital Cologne, University of Cologne, Kerpener Strasse 62, 50937 Cologne, Germany
| | - Jonathan Kottlors
- From the Institute of Diagnostic and Interventional Radiology (R.J.G., T.D., A.C.B., S.L., A.I.I., T.P., L.P., C.H.G., P.F., D.M., R.H., J.K.) and Institute of Medical Statistics and Bioinformatics (M.G.H.), Faculty of Medicine, University Hospital Cologne, University of Cologne, Kerpener Strasse 62, 50937 Cologne, Germany
| |
Collapse
|
93
|
Mira FA, Favier V, Dos Santos Sobreira Nunes H, de Castro JV, Carsuzaa F, Meccariello G, Vicini C, De Vito A, Lechien JR, Chiesa-Estomba C, Maniaci A, Iannella G, Rojas EP, Cornejo JB, Cammaroto G. Chat GPT for the management of obstructive sleep apnea: do we have a polar star? Eur Arch Otorhinolaryngol 2024; 281:2087-2093. [PMID: 37980605 DOI: 10.1007/s00405-023-08270-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 09/29/2023] [Indexed: 11/21/2023]
Abstract
PURPOSE This study explores the potential of the Chat-Generative Pre-Trained Transformer (Chat-GPT), a Large Language Model (LLM), in assisting healthcare professionals in the diagnosis of obstructive sleep apnea (OSA). It aims to assess the agreement between Chat-GPT's responses and those of expert otolaryngologists, shedding light on the role of AI-generated content in medical decision-making. METHODS A prospective, cross-sectional study was conducted, involving 350 otolaryngologists from 25 countries who responded to a specialized OSA survey. Chat-GPT was tasked with providing answers to the same survey questions. Responses were assessed by both super-experts and statistically analyzed for agreement. RESULTS The study revealed that Chat-GPT and expert responses shared a common answer in over 75% of cases for individual questions. However, the overall consensus was achieved in only four questions. Super-expert assessments showed a moderate agreement level, with Chat-GPT scoring slightly lower than experts. Statistically, Chat-GPT's responses differed significantly from experts' opinions (p = 0.0009). Sub-analysis revealed areas of improvement for Chat-GPT, particularly in questions where super-experts rated its responses lower than expert consensus. CONCLUSIONS Chat-GPT demonstrates potential as a valuable resource for OSA diagnosis, especially where access to specialists is limited. The study emphasizes the importance of AI-human collaboration, with Chat-GPT serving as a complementary tool rather than a replacement for medical professionals. This research contributes to the discourse in otolaryngology and encourages further exploration of AI-driven healthcare applications. While Chat-GPT exhibits a commendable level of consensus with expert responses, ongoing refinements in AI-based healthcare tools hold significant promise for the future of medicine, addressing the underdiagnosis and undertreatment of OSA and improving patient outcomes.
Collapse
Affiliation(s)
- Felipe Ahumada Mira
- ENT Department, Hospital of Linares, Linares, Chile
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France
| | - Valentin Favier
- ENT Department, University Hospital of Montpellier, Montpellier, France
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France
| | - Heloisa Dos Santos Sobreira Nunes
- ENT and Sleep Medicine Department, Nucleus of Otolaryngology, Head and Neck Surgery and Sleep Medicine of São Paulo, São Paulo, Brazil
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France
| | - Joana Vaz de Castro
- ENT Department, Armed Forces Hospital, Lisbon, Portugal
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France
| | - Florent Carsuzaa
- ENT Department, University Hospital of Poitiers, Poitiers, France
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France
| | - Giuseppe Meccariello
- Head and Neck Department, ENT & Oral Surgery Unity, G.B. Morgagni, L. Pierantoni Hospital, Via Forlanini, 47121, Forlì, Italy
| | - Claudio Vicini
- Head and Neck Department, ENT & Oral Surgery Unity, G.B. Morgagni, L. Pierantoni Hospital, Via Forlanini, 47121, Forlì, Italy
| | - Andrea De Vito
- Head and Neck Department, ENT & Oral Surgery Unity, G.B. Morgagni, L. Pierantoni Hospital, Via Forlanini, 47121, Forlì, Italy
| | - Jerome R Lechien
- Division of Laryngology and Broncho-Esophagology, Department of Otolaryngology and Head and Neck Surgery, EpiCURA Hospital, UMONS Research Institute for Health Sciences and Technology, University of Mons, Mons, Belgium
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France
| | - Carlos Chiesa-Estomba
- Department of Otorhinolaryngology, Biodonostia Research Institute, Donostia University Hospital, Osakidetza, 20014, San Sebastian, Spain
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France
| | - Antonino Maniaci
- Department of Medical and Surgical Sciences and Advanced Technologies "GF Ingrassia", ENT Section, University of Catania, Piazza Università 2, 95100, Catania, Italy
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France
| | - Giannicola Iannella
- Department of 'Organi di Senso', University "Sapienza", Viale Dell'Università 33, 00185, Rome, Italy
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France
| | | | | | - Giovanni Cammaroto
- Head and Neck Department, ENT & Oral Surgery Unity, G.B. Morgagni, L. Pierantoni Hospital, Via Forlanini, 47121, Forlì, Italy.
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France.
| |
Collapse
|
94
|
Ni Z, Peng R, Zheng X, Xie P. Embracing the future: Integrating ChatGPT into China's nursing education system. Int J Nurs Sci 2024; 11:295-299. [PMID: 38707690 PMCID: PMC11064564 DOI: 10.1016/j.ijnss.2024.03.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 02/13/2024] [Accepted: 03/06/2024] [Indexed: 05/07/2024] Open
Abstract
This article delves into the role of ChatGPT within the rapidly evolving field of artificial intelligence, especially highlighting its significant potential in nursing education. Initially, the paper presents the notable advancements ChatGPT has achieved in facilitating interactive learning and providing real-time feedback, along with the academic community's growing interest in this technology. Subsequently, summarizing the research outcomes of ChatGPT's applications in nursing education, including various clinical disciplines and scenarios, showcases the enormous potential for multidisciplinary education and addressing clinical issues. Comparing the performance of several Large Language Models (LLMs) on China's National Nursing Licensure Examination, we observed that ChatGPT demonstrated a higher accuracy rate than its counterparts, providing a solid theoretical foundation for its application in Chinese nursing education and clinical settings. Educational institutions should establish a targeted and effective regulatory framework to leverage ChatGPT in localized nursing education while assuming corresponding responsibilities. Through standardized training for users and adjustments to existing educational assessment methods aimed at preventing potential misuse and abuse, the full potential of ChatGPT as an innovative auxiliary tool in China's nursing education system can be realized, aligning with the developmental needs of modern teaching methodologies.
Collapse
Affiliation(s)
- Zhengxin Ni
- School of Nursing, Yangzhou University, Yangzhou, China
| | - Rui Peng
- Department of Bone and Joint Surgery and Sports Medicine Center, The First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Xiaofei Zheng
- Department of Bone and Joint Surgery and Sports Medicine Center, The First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Ping Xie
- Department of External Cooperation, Northern Jiangsu People’s Hospital, Nanjing, China
| |
Collapse
|
95
|
Bajaj S, Gandhi D, Nayar D. Potential Applications and Impact of ChatGPT in Radiology. Acad Radiol 2024; 31:1256-1261. [PMID: 37802673 DOI: 10.1016/j.acra.2023.08.039] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 08/15/2023] [Accepted: 08/28/2023] [Indexed: 10/08/2023]
Abstract
Radiology has always gone hand-in-hand with technology and artificial intelligence (AI) is not new to the field. While various AI devices and algorithms have already been integrated in the daily clinical practice of radiology, with applications ranging from scheduling patient appointments to detecting and diagnosing certain clinical conditions on imaging, the use of natural language processing and large language model based software have been in discussion for a long time. Algorithms like ChatGPT can help in improving patient outcomes, increasing the efficiency of radiology interpretation, and aiding in the overall workflow of radiologists and here we discuss some of its potential applications.
Collapse
Affiliation(s)
- Suryansh Bajaj
- Department of Radiology, University of Arkansas for Medical Sciences, Little Rock, Arkansas 72205 (S.B.)
| | - Darshan Gandhi
- Department of Diagnostic Radiology, University of Tennessee Health Science Center, Memphis, Tennessee 38103 (D.G.).
| | - Divya Nayar
- Department of Neurology, University of Arkansas for Medical Sciences, Little Rock, Arkansas 72205 (D.N.)
| |
Collapse
|
96
|
Cozzi A, Pinker K, Hidber A, Zhang T, Bonomo L, Lo Gullo R, Christianson B, Curti M, Rizzo S, Del Grande F, Mann RM, Schiaffino S, Panzer A. BI-RADS Category Assignments by GPT-3.5, GPT-4, and Google Bard: A Multilanguage Study. Radiology 2024; 311:e232133. [PMID: 38687216 PMCID: PMC11070611 DOI: 10.1148/radiol.232133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 03/08/2024] [Accepted: 03/12/2024] [Indexed: 05/02/2024]
Abstract
Background The performance of publicly available large language models (LLMs) remains unclear for complex clinical tasks. Purpose To evaluate the agreement between human readers and LLMs for Breast Imaging Reporting and Data System (BI-RADS) categories assigned based on breast imaging reports written in three languages and to assess the impact of discordant category assignments on clinical management. Materials and Methods This retrospective study included reports for women who underwent MRI, mammography, and/or US for breast cancer screening or diagnostic purposes at three referral centers. Reports with findings categorized as BI-RADS 1-5 and written in Italian, English, or Dutch were collected between January 2000 and October 2023. Board-certified breast radiologists and the LLMs GPT-3.5 and GPT-4 (OpenAI) and Bard, now called Gemini (Google), assigned BI-RADS categories using only the findings described by the original radiologists. Agreement between human readers and LLMs for BI-RADS categories was assessed using the Gwet agreement coefficient (AC1 value). Frequencies were calculated for changes in BI-RADS category assignments that would affect clinical management (ie, BI-RADS 0 vs BI-RADS 1 or 2 vs BI-RADS 3 vs BI-RADS 4 or 5) and compared using the McNemar test. Results Across 2400 reports, agreement between the original and reviewing radiologists was almost perfect (AC1 = 0.91), while agreement between the original radiologists and GPT-4, GPT-3.5, and Bard was moderate (AC1 = 0.52, 0.48, and 0.42, respectively). Across human readers and LLMs, differences were observed in the frequency of BI-RADS category upgrades or downgrades that would result in changed clinical management (118 of 2400 [4.9%] for human readers, 611 of 2400 [25.5%] for Bard, 573 of 2400 [23.9%] for GPT-3.5, and 435 of 2400 [18.1%] for GPT-4; P < .001) and that would negatively impact clinical management (37 of 2400 [1.5%] for human readers, 435 of 2400 [18.1%] for Bard, 344 of 2400 [14.3%] for GPT-3.5, and 255 of 2400 [10.6%] for GPT-4; P < .001). Conclusion LLMs achieved moderate agreement with human reader-assigned BI-RADS categories across reports written in three languages but also yielded a high percentage of discordant BI-RADS categories that would negatively impact clinical management. © RSNA, 2024 Supplemental material is available for this article.
Collapse
Affiliation(s)
| | | | - Andri Hidber
- From the Imaging Institute of Southern Switzerland (IIMSI), Ente
Ospedaliero Cantonale, Via Tesserete 46, 6900 Lugano, Switzerland (A.C., L.B.,
M.C., S.R., F.D.G., S.S.); Breast Imaging Service, Department of Radiology,
Memorial Sloan Kettering Cancer Center, New York, NY (K.P., R.L.G., B.C.);
Faculty of Biomedical Sciences, Università della Svizzera Italiana,
Lugano, Switzerland (A.H., S.R., F.D.G., S.S.); Department of Radiology,
Netherlands Cancer Institute, Amsterdam, the Netherlands (T.Z., R.M.M.);
Department of Diagnostic Imaging, Radboud University Medical Center, Nijmegen,
the Netherlands (T.Z., R.M.M.); and GROW Research Institute for Oncology and
Reproduction, Maastricht University, Maastricht, the Netherlands (T.Z.)
| | - Tianyu Zhang
- From the Imaging Institute of Southern Switzerland (IIMSI), Ente
Ospedaliero Cantonale, Via Tesserete 46, 6900 Lugano, Switzerland (A.C., L.B.,
M.C., S.R., F.D.G., S.S.); Breast Imaging Service, Department of Radiology,
Memorial Sloan Kettering Cancer Center, New York, NY (K.P., R.L.G., B.C.);
Faculty of Biomedical Sciences, Università della Svizzera Italiana,
Lugano, Switzerland (A.H., S.R., F.D.G., S.S.); Department of Radiology,
Netherlands Cancer Institute, Amsterdam, the Netherlands (T.Z., R.M.M.);
Department of Diagnostic Imaging, Radboud University Medical Center, Nijmegen,
the Netherlands (T.Z., R.M.M.); and GROW Research Institute for Oncology and
Reproduction, Maastricht University, Maastricht, the Netherlands (T.Z.)
| | - Luca Bonomo
- From the Imaging Institute of Southern Switzerland (IIMSI), Ente
Ospedaliero Cantonale, Via Tesserete 46, 6900 Lugano, Switzerland (A.C., L.B.,
M.C., S.R., F.D.G., S.S.); Breast Imaging Service, Department of Radiology,
Memorial Sloan Kettering Cancer Center, New York, NY (K.P., R.L.G., B.C.);
Faculty of Biomedical Sciences, Università della Svizzera Italiana,
Lugano, Switzerland (A.H., S.R., F.D.G., S.S.); Department of Radiology,
Netherlands Cancer Institute, Amsterdam, the Netherlands (T.Z., R.M.M.);
Department of Diagnostic Imaging, Radboud University Medical Center, Nijmegen,
the Netherlands (T.Z., R.M.M.); and GROW Research Institute for Oncology and
Reproduction, Maastricht University, Maastricht, the Netherlands (T.Z.)
| | - Roberto Lo Gullo
- From the Imaging Institute of Southern Switzerland (IIMSI), Ente
Ospedaliero Cantonale, Via Tesserete 46, 6900 Lugano, Switzerland (A.C., L.B.,
M.C., S.R., F.D.G., S.S.); Breast Imaging Service, Department of Radiology,
Memorial Sloan Kettering Cancer Center, New York, NY (K.P., R.L.G., B.C.);
Faculty of Biomedical Sciences, Università della Svizzera Italiana,
Lugano, Switzerland (A.H., S.R., F.D.G., S.S.); Department of Radiology,
Netherlands Cancer Institute, Amsterdam, the Netherlands (T.Z., R.M.M.);
Department of Diagnostic Imaging, Radboud University Medical Center, Nijmegen,
the Netherlands (T.Z., R.M.M.); and GROW Research Institute for Oncology and
Reproduction, Maastricht University, Maastricht, the Netherlands (T.Z.)
| | - Blake Christianson
- From the Imaging Institute of Southern Switzerland (IIMSI), Ente
Ospedaliero Cantonale, Via Tesserete 46, 6900 Lugano, Switzerland (A.C., L.B.,
M.C., S.R., F.D.G., S.S.); Breast Imaging Service, Department of Radiology,
Memorial Sloan Kettering Cancer Center, New York, NY (K.P., R.L.G., B.C.);
Faculty of Biomedical Sciences, Università della Svizzera Italiana,
Lugano, Switzerland (A.H., S.R., F.D.G., S.S.); Department of Radiology,
Netherlands Cancer Institute, Amsterdam, the Netherlands (T.Z., R.M.M.);
Department of Diagnostic Imaging, Radboud University Medical Center, Nijmegen,
the Netherlands (T.Z., R.M.M.); and GROW Research Institute for Oncology and
Reproduction, Maastricht University, Maastricht, the Netherlands (T.Z.)
| | - Marco Curti
- From the Imaging Institute of Southern Switzerland (IIMSI), Ente
Ospedaliero Cantonale, Via Tesserete 46, 6900 Lugano, Switzerland (A.C., L.B.,
M.C., S.R., F.D.G., S.S.); Breast Imaging Service, Department of Radiology,
Memorial Sloan Kettering Cancer Center, New York, NY (K.P., R.L.G., B.C.);
Faculty of Biomedical Sciences, Università della Svizzera Italiana,
Lugano, Switzerland (A.H., S.R., F.D.G., S.S.); Department of Radiology,
Netherlands Cancer Institute, Amsterdam, the Netherlands (T.Z., R.M.M.);
Department of Diagnostic Imaging, Radboud University Medical Center, Nijmegen,
the Netherlands (T.Z., R.M.M.); and GROW Research Institute for Oncology and
Reproduction, Maastricht University, Maastricht, the Netherlands (T.Z.)
| | - Stefania Rizzo
- From the Imaging Institute of Southern Switzerland (IIMSI), Ente
Ospedaliero Cantonale, Via Tesserete 46, 6900 Lugano, Switzerland (A.C., L.B.,
M.C., S.R., F.D.G., S.S.); Breast Imaging Service, Department of Radiology,
Memorial Sloan Kettering Cancer Center, New York, NY (K.P., R.L.G., B.C.);
Faculty of Biomedical Sciences, Università della Svizzera Italiana,
Lugano, Switzerland (A.H., S.R., F.D.G., S.S.); Department of Radiology,
Netherlands Cancer Institute, Amsterdam, the Netherlands (T.Z., R.M.M.);
Department of Diagnostic Imaging, Radboud University Medical Center, Nijmegen,
the Netherlands (T.Z., R.M.M.); and GROW Research Institute for Oncology and
Reproduction, Maastricht University, Maastricht, the Netherlands (T.Z.)
| | - Filippo Del Grande
- From the Imaging Institute of Southern Switzerland (IIMSI), Ente
Ospedaliero Cantonale, Via Tesserete 46, 6900 Lugano, Switzerland (A.C., L.B.,
M.C., S.R., F.D.G., S.S.); Breast Imaging Service, Department of Radiology,
Memorial Sloan Kettering Cancer Center, New York, NY (K.P., R.L.G., B.C.);
Faculty of Biomedical Sciences, Università della Svizzera Italiana,
Lugano, Switzerland (A.H., S.R., F.D.G., S.S.); Department of Radiology,
Netherlands Cancer Institute, Amsterdam, the Netherlands (T.Z., R.M.M.);
Department of Diagnostic Imaging, Radboud University Medical Center, Nijmegen,
the Netherlands (T.Z., R.M.M.); and GROW Research Institute for Oncology and
Reproduction, Maastricht University, Maastricht, the Netherlands (T.Z.)
| | | | | | - Ariane Panzer
- From the Imaging Institute of Southern Switzerland (IIMSI), Ente
Ospedaliero Cantonale, Via Tesserete 46, 6900 Lugano, Switzerland (A.C., L.B.,
M.C., S.R., F.D.G., S.S.); Breast Imaging Service, Department of Radiology,
Memorial Sloan Kettering Cancer Center, New York, NY (K.P., R.L.G., B.C.);
Faculty of Biomedical Sciences, Università della Svizzera Italiana,
Lugano, Switzerland (A.H., S.R., F.D.G., S.S.); Department of Radiology,
Netherlands Cancer Institute, Amsterdam, the Netherlands (T.Z., R.M.M.);
Department of Diagnostic Imaging, Radboud University Medical Center, Nijmegen,
the Netherlands (T.Z., R.M.M.); and GROW Research Institute for Oncology and
Reproduction, Maastricht University, Maastricht, the Netherlands (T.Z.)
| |
Collapse
|
97
|
Teixeira-Marques F, Medeiros N, Nazaré F, Alves S, Lima N, Ribeiro L, Gama R, Oliveira P. Exploring the role of ChatGPT in clinical decision-making in otorhinolaryngology: a ChatGPT designed study. Eur Arch Otorhinolaryngol 2024; 281:2023-2030. [PMID: 38345613 DOI: 10.1007/s00405-024-08498-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Accepted: 01/23/2024] [Indexed: 03/16/2024]
Abstract
PURPOSE Since the beginning of 2023, ChatGPT emerged as a hot topic in healthcare research. The potential to be a valuable tool in clinical practice is compelling, particularly in improving clinical decision support by helping physicians to make clinical decisions based on the best medical knowledge available. We aim to investigate ChatGPT's ability to identify, diagnose and manage patients with otorhinolaryngology-related symptoms. METHODS A prospective, cross-sectional study was designed based on an idea suggested by ChatGPT to assess the level of agreement between ChatGPT and five otorhinolaryngologists (ENTs) in 20 reality-inspired clinical cases. The clinical cases were presented to the chatbot on two different occasions (ChatGPT-1 and ChatGPT-2) to assess its temporal stability. RESULTS The mean score of ChatGPT-1 was 4.4 (SD 1.2; min 1, max 5) and of ChatGPT-2 was 4.15 (SD 1.3; min 1, max 5), while the ENTs mean score was 4.91 (SD 0.3; min 3, max 5). The Mann-Whitney U test revealed a statistically significant difference (p < 0.001) between both ChatGPT's and the ENTs's score. ChatGPT-1 and ChatGPT-2 gave different answers in five occasions. CONCLUSIONS Artificial intelligence will be an important instrument in clinical decision-making in the near future and ChatGPT is the most promising chatbot so far. Despite needing further development to be used with safety, there is room for improvement and potential to aid otorhinolaryngology residents and specialists in making the most correct decision for the patient.
Collapse
Affiliation(s)
- Francisco Teixeira-Marques
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal.
| | - Nuno Medeiros
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Francisco Nazaré
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Sandra Alves
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Nuno Lima
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Leandro Ribeiro
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Rita Gama
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Pedro Oliveira
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| |
Collapse
|
98
|
Arango SD, Flynn JC, Zeitlin J, Lorenzana DJ, Miller AJ, Wilson MS, Strohl AB, Weiss LE, Weir TB. The Performance of ChatGPT on the American Society for Surgery of the Hand Self-Assessment Examination. Cureus 2024; 16:e58950. [PMID: 38800302 PMCID: PMC11126365 DOI: 10.7759/cureus.58950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/24/2024] [Indexed: 05/29/2024] Open
Abstract
BACKGROUND This study aims to compare the performance of ChatGPT-3.5 (GPT-3.5) and ChatGPT-4 (GPT-4) on the American Society for Surgery of the Hand (ASSH) Self-Assessment Examination (SAE) to determine their potential as educational tools. METHODS This study assessed the proportion of correct answers to text-based questions on the 2021 and 2022 ASSH SAE between untrained ChatGPT versions. Secondary analyses assessed the performance of ChatGPT based on question difficulty and question category. The outcomes of ChatGPT were compared with the performance of actual examinees on the ASSH SAE. RESULTS A total of 238 questions were included in the analysis. Compared with GPT-3.5, GPT-4 provided significantly more correct answers overall (58.0% versus 68.9%, respectively; P = 0.013), on the 2022 SAE (55.9% versus 72.9%; P = 0.007), and more difficult questions (48.8% versus 63.6%; P = 0.02). In a multivariable logistic regression analysis, correct answers were predicted by GPT-4 (odds ratio [OR], 1.66; P = 0.011), increased question difficulty (OR, 0.59; P = 0.009), Bone and Joint questions (OR, 0.18; P < 0.001), and Soft Tissue questions (OR, 0.30; P = 0.013). Actual examinees scored a mean of 21.6% above GPT-3.5 and 10.7% above GPT-4. The mean percentage of correct answers by actual examinees was significantly higher for correct (versus incorrect) ChatGPT answers. CONCLUSIONS GPT-4 demonstrated improved performance over GPT-3.5 on the ASSH SAE, especially on more difficult questions. Actual examinees scored higher than both versions of ChatGPT, but the margin was cut in half by GPT-4.
Collapse
Affiliation(s)
- Sebastian D Arango
- Department of Orthopaedic Surgery, Philadelphia Hand to Shoulder Center, Philadelphia, USA
| | - Jason C Flynn
- Department of Orthopaedic Surgery, Sidney Kimmel Medical College, Philadelphia, USA
| | - Jacob Zeitlin
- Department of Orthopaedic Surgery, Philadelphia Hand to Shoulder Center, Philadelphia, USA
| | - Daniel J Lorenzana
- Department of Orthopaedic Surgery, Philadelphia Hand to Shoulder Center, Philadelphia, USA
| | - Andrew J Miller
- Department of Orthopaedic Surgery, Philadelphia Hand to Shoulder Center, Philadelphia, USA
| | - Matthew S Wilson
- Department of Orthopaedic Surgery, Philadelphia Hand to Shoulder Center, Philadelphia, USA
| | - Adam B Strohl
- Department of Orthopaedic Surgery, Philadelphia Hand to Shoulder Center, Philadelphia, USA
| | - Lawrence E Weiss
- Division of Orthopaedic Hand Surgery, OAA Orthopaedic Specialists, Allentown, USA
| | - Tristan B Weir
- Department of Orthopaedic Surgery, Philadelphia Hand to Shoulder Center, Philadelphia, USA
| |
Collapse
|
99
|
Zampatti S, Peconi C, Megalizzi D, Calvino G, Trastulli G, Cascella R, Strafella C, Caltagirone C, Giardina E. Innovations in Medicine: Exploring ChatGPT's Impact on Rare Disorder Management. Genes (Basel) 2024; 15:421. [PMID: 38674356 PMCID: PMC11050022 DOI: 10.3390/genes15040421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 03/25/2024] [Accepted: 03/26/2024] [Indexed: 04/28/2024] Open
Abstract
Artificial intelligence (AI) is rapidly transforming the field of medicine, announcing a new era of innovation and efficiency. Among AI programs designed for general use, ChatGPT holds a prominent position, using an innovative language model developed by OpenAI. Thanks to the use of deep learning techniques, ChatGPT stands out as an exceptionally viable tool, renowned for generating human-like responses to queries. Various medical specialties, including rheumatology, oncology, psychiatry, internal medicine, and ophthalmology, have been explored for ChatGPT integration, with pilot studies and trials revealing each field's potential benefits and challenges. However, the field of genetics and genetic counseling, as well as that of rare disorders, represents an area suitable for exploration, with its complex datasets and the need for personalized patient care. In this review, we synthesize the wide range of potential applications for ChatGPT in the medical field, highlighting its benefits and limitations. We pay special attention to rare and genetic disorders, aiming to shed light on the future roles of AI-driven chatbots in healthcare. Our goal is to pave the way for a healthcare system that is more knowledgeable, efficient, and centered around patient needs.
Collapse
Affiliation(s)
- Stefania Zampatti
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
| | - Cristina Peconi
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
| | - Domenica Megalizzi
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of Science, Roma Tre University, 00146 Rome, Italy
| | - Giulia Calvino
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of Science, Roma Tre University, 00146 Rome, Italy
| | - Giulia Trastulli
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of System Medicine, Tor Vergata University, 00133 Rome, Italy
| | - Raffaella Cascella
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of Chemical-Toxicological and Pharmacological Evaluation of Drugs, Catholic University Our Lady of Good Counsel, 1000 Tirana, Albania
| | - Claudia Strafella
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
| | - Carlo Caltagirone
- Department of Clinical and Behavioral Neurology, IRCCS Fondazione Santa Lucia, 00179 Rome, Italy;
| | - Emiliano Giardina
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of Biomedicine and Prevention, Tor Vergata University, 00133 Rome, Italy
| |
Collapse
|
100
|
Popkov AA, Barrett TS. AI vs academia: Experimental study on AI text detectors' accuracy in behavioral health academic writing. Account Res 2024:1-17. [PMID: 38516933 DOI: 10.1080/08989621.2024.2331757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 03/13/2024] [Indexed: 03/23/2024]
Abstract
Artificial Intelligence (AI) language models continue to expand in both access and capability. As these models have evolved, the number of academic journals in medicine and healthcare which have explored policies regarding AI-generated text has increased. The implementation of such policies requires accurate AI detection tools. Inaccurate detectors risk unnecessary penalties for human authors and/or may compromise the effective enforcement of guidelines against AI-generated content. Yet, the accuracy of AI text detection tools in identifying human-written versus AI-generated content has been found to vary across published studies. This experimental study used a sample of behavioral health publications and found problematic false positive and false negative rates from both free and paid AI detection tools. The study assessed 100 research articles from 2016-2018 in behavioral health and psychiatry journals and 200 texts produced by AI chatbots (100 by "ChatGPT" and 100 by "Claude"). The free AI detector showed a median of 27.2% for the proportion of academic text identified as AI-generated, while commercial software Originality.AI demonstrated better performance but still had limitations, especially in detecting texts generated by Claude. These error rates raise doubts about relying on AI detectors to enforce strict policies around AI text generation in behavioral health publications.
Collapse
Affiliation(s)
- Andrey A Popkov
- Highmark Health, Pittsburgh, PA, USA
- Contigo Health, LLC, a subsidiary of Premier, Inc, Charlotte, NC, USA
| | | |
Collapse
|