1
|
Park SH, Pinto-Powell R, Thesen T, Lindqwister A, Levy J, Chacko R, Gonzalez D, Bridges C, Schwendt A, Byrum T, Fong J, Shasavari S, Hassanpour S. Preparing healthcare leaders of the digital age with an integrative artificial intelligence curriculum: a pilot study. MEDICAL EDUCATION ONLINE 2024; 29:2315684. [PMID: 38351737 PMCID: PMC10868429 DOI: 10.1080/10872981.2024.2315684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 02/02/2024] [Indexed: 02/16/2024]
Abstract
Artificial intelligence (AI) is rapidly being introduced into the clinical workflow of many specialties. Despite the need to train physicians who understand the utility and implications of AI and mitigate a growing skills gap, no established consensus exists on how to best introduce AI concepts to medical students during preclinical training. This study examined the effectiveness of a pilot Digital Health Scholars (DHS) non-credit enrichment elective that paralleled the Dartmouth Geisel School of Medicine's first-year preclinical curriculum with a focus on introducing AI algorithms and their applications in the concurrently occurring systems-blocks. From September 2022 to March 2023, ten self-selected first-year students enrolled in the elective curriculum run in parallel with four existing curricular blocks (Immunology, Hematology, Cardiology, and Pulmonology). Each DHS block consisted of a journal club, a live-coding demonstration, and an integration session led by a researcher in that field. Students' confidence in explaining the content objectives (high-level knowledge, implications, and limitations of AI) was measured before and after each block and compared using Mann-Whitney U tests. Students reported significant increases in confidence in describing the content objectives after all four blocks (Immunology: U = 4.5, p = 0.030; Hematology: U = 1.0, p = 0.009; Cardiology: U = 4.0, p = 0.019; Pulmonology: U = 4.0, p = 0.030) as well as an average overall satisfaction level of 4.29/5 in rating the curriculum content. Our study demonstrates that a digital health enrichment elective that runs in parallel to an institution's preclinical curriculum and embeds AI concepts into relevant clinical topics can enhance students' confidence in describing the content objectives that pertain to high-level algorithmic understanding, implications, and limitations of the studied models. Building on this elective curricular design, further studies with a larger enrollment can help determine the most effective approach in preparing future physicians for the AI-enhanced clinical workflow.
Collapse
Affiliation(s)
- Soo Hwan Park
- Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | | | - Thomas Thesen
- Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | | | - Joshua Levy
- Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Rachael Chacko
- Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | | | - Connor Bridges
- Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Adam Schwendt
- Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Travis Byrum
- Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Justin Fong
- Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | | | | |
Collapse
|
2
|
Zhui L, Fenghe L, Xuehu W, Qining F, Wei R. Ethical Considerations and Fundamental Principles of Large Language Models in Medical Education: A Viewpoint. J Med Internet Res 2024. [PMID: 38971715 DOI: 10.2196/60083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/08/2024] Open
Abstract
UNSTRUCTURED This viewpoint article first explores the ethical challenges associated with the future application of large language models (LLMs) in the context of medical education. These challenges include not only ethical concerns related to the development of LLMs, such as artificial intelligence (AI) hallucinations, information bias, privacy and data risks, and deficiencies in terms of transparency and interpretability but also issues concerning the application of LLMs, including deficiencies in emotional intelligence, educational inequities, problems with academic integrity, and questions of responsibility and copyright ownership. This paper then analyzes existing AI-related legal and ethical frameworks and highlights their limitations with regard to the application of LLMs in the context of medical education. To ensure that LLMs are integrated in a responsible and safe manner, the authors recommend the development of a unified ethical framework that is specifically tailored for LLMs in this field. This framework should be based on eight fundamental principles: quality control and supervision mechanisms; privacy and data protection; transparency and interpretability; fairness and equal treatment; academic integrity and moral norms; accountability and traceability; protection and respect for intellectual property; and the promotion of educational research and innovation. The authors further discuss specific measures that can be taken to implement these principles, thereby laying a solid foundation for the development of a comprehensive and actionable ethical framework. Such a unified ethical framework based on these eight fundamental principles can provide clear guidance and support for the application of LLMs in the context of medical education. This approach can help establish a balance between technological advancement and ethical safeguards, thereby ensuring that medical education can progress without compromising the principles of fairness, justice, or patient safety and establishing a more equitable, safer, and more efficient environment for medical education.
Collapse
Affiliation(s)
- Li Zhui
- Department of Vascular Surgery, The First Affiliated Hospital of Chongqing Medical University, No. 1 of Youyi Road, Yuzhong District, Chongqing, CN
| | - Li Fenghe
- Department of Vascular Surgery, The First Affiliated Hospital of Chongqing Medical University, No. 1 of Youyi Road, Yuzhong District, Chongqing, CN
| | - Wang Xuehu
- Department of Vascular Surgery, The First Affiliated Hospital of Chongqing Medical University, No. 1 of Youyi Road, Yuzhong District, Chongqing, CN
| | - Fu Qining
- Department of Vascular Surgery, The First Affiliated Hospital of Chongqing Medical University, No. 1 of Youyi Road, Yuzhong District, Chongqing, CN
| | - Ren Wei
- Department of Vascular Surgery, The First Affiliated Hospital of Chongqing Medical University, No. 1 of Youyi Road, Yuzhong District, Chongqing, CN
| |
Collapse
|
3
|
Yasaka K, Kanzawa J, Kanemaru N, Koshino S, Abe O. Fine-Tuned Large Language Model for Extracting Patients on Pretreatment for Lung Cancer from a Picture Archiving and Communication System Based on Radiological Reports. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024:10.1007/s10278-024-01186-8. [PMID: 38955964 DOI: 10.1007/s10278-024-01186-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 06/17/2024] [Accepted: 06/19/2024] [Indexed: 07/04/2024]
Abstract
This study aimed to investigate the performance of a fine-tuned large language model (LLM) in extracting patients on pretreatment for lung cancer from picture archiving and communication systems (PACS) and comparing it with that of radiologists. Patients whose radiological reports contained the term lung cancer (3111 for training, 124 for validation, and 288 for test) were included in this retrospective study. Based on clinical indication and diagnosis sections of the radiological report (used as input data), they were classified into four groups (used as reference data): group 0 (no lung cancer), group 1 (pretreatment lung cancer present), group 2 (after treatment for lung cancer), and group 3 (planning radiation therapy). Using the training and validation datasets, fine-tuning of the pretrained LLM was conducted ten times. Due to group imbalance, group 2 data were undersampled in the training. The performance of the best-performing model in the validation dataset was assessed in the independent test dataset. For testing purposes, two other radiologists (readers 1 and 2) were also involved in classifying radiological reports. The overall accuracy of the fine-tuned LLM, reader 1, and reader 2 was 0.983, 0.969, and 0.969, respectively. The sensitivity for differentiating group 0/1/2/3 by LLM, reader 1, and reader 2 was 1.000/0.948/0.991/1.000, 0.750/0.879/0.996/1.000, and 1.000/0.931/0.978/1.000, respectively. The time required for classification by LLM, reader 1, and reader 2 was 46s/2539s/1538s, respectively. Fine-tuned LLM effectively extracted patients on pretreatment for lung cancer from PACS with comparable performance to radiologists in a shorter time.
Collapse
Affiliation(s)
- Koichiro Yasaka
- Department of Radiology, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan.
| | - Jun Kanzawa
- Department of Radiology, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Noriko Kanemaru
- Department of Radiology, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Saori Koshino
- Department of Radiology, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Osamu Abe
- Department of Radiology, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| |
Collapse
|
4
|
Bortoli M, Fiore M, Tedeschi S, Oliveira V, Sousa R, Bruschi A, Campanacci DA, Viale P, De Paolis M, Sambri A. GPT-based chatbot tools are still unreliable in the management of prosthetic joint infections. Musculoskelet Surg 2024:10.1007/s12306-024-00846-w. [PMID: 38954323 DOI: 10.1007/s12306-024-00846-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 06/21/2024] [Indexed: 07/04/2024]
Abstract
BACKGROUND Artificial intelligence chatbot tools responses might discern patterns and correlations that may elude human observation, leading to more accurate and timely interventions. However, their reliability to answer healthcare-related questions is still debated. This study aimed to assess the performance of the three versions of GPT-based chatbots about prosthetic joint infections (PJI). METHODS Thirty questions concerning the diagnosis and treatment of hip and knee PJIs, stratified by a priori established difficulty, were generated by a team of experts, and administered to ChatGPT 3.5, BingChat, and ChatGPT 4.0. Responses were rated by three orthopedic surgeons and two infectious diseases physicians using a five-point Likert-like scale with numerical values to quantify the quality of responses. Inter-rater reliability was assessed by interclass correlation statistics. RESULTS Responses averaged "good-to-very good" for all chatbots examined, both in diagnosis and treatment, with no significant differences according to the difficulty of the questions. However, BingChat ratings were significantly lower in the treatment setting (p = 0.025), particularly in terms of accuracy (p = 0.02) and completeness (p = 0.004). Agreement in ratings among examiners appeared to be very poor. CONCLUSIONS On average, the quality of responses is rated positively by experts, but with ratings that frequently may vary widely. This currently suggests that AI chatbot tools are still unreliable in the management of PJI.
Collapse
Affiliation(s)
- M Bortoli
- Orthopedic and Traumatology Unit, IRCCS Azienda Ospedaliero-Universitaria Di Bologna, 40138, Bologna, Italy
| | - M Fiore
- Orthopedic and Traumatology Unit, IRCCS Azienda Ospedaliero-Universitaria Di Bologna, 40138, Bologna, Italy.
- Department of Medical and Surgical Sciences, Alma Mater Studiorum University of Bologna, 40138, Bologna, Italy.
| | - S Tedeschi
- Department of Medical and Surgical Sciences, Alma Mater Studiorum University of Bologna, 40138, Bologna, Italy
- Infectious Disease Unit, Department for Integrated Infectious Risk Management, IRCCS Azienda Ospedaliero-Universitaria Di Bologna, 40138, Bologna, Italy
| | - V Oliveira
- Department of Orthopedics, Centro Hospitalar Universitário de Santo António, 4099-001, Porto, Portugal
| | - R Sousa
- Department of Orthopedics, Centro Hospitalar Universitário de Santo António, 4099-001, Porto, Portugal
| | - A Bruschi
- Orthopedic and Traumatology Unit, IRCCS Azienda Ospedaliero-Universitaria Di Bologna, 40138, Bologna, Italy
| | - D A Campanacci
- Orthopedic Oncology Unit, Azienda Ospedaliera Universitaria Careggi, 50134, Florence, Italy
| | - P Viale
- Department of Medical and Surgical Sciences, Alma Mater Studiorum University of Bologna, 40138, Bologna, Italy
- Infectious Disease Unit, Department for Integrated Infectious Risk Management, IRCCS Azienda Ospedaliero-Universitaria Di Bologna, 40138, Bologna, Italy
| | - M De Paolis
- Orthopedic and Traumatology Unit, IRCCS Azienda Ospedaliero-Universitaria Di Bologna, 40138, Bologna, Italy
| | - A Sambri
- Orthopedic and Traumatology Unit, IRCCS Azienda Ospedaliero-Universitaria Di Bologna, 40138, Bologna, Italy
| |
Collapse
|
5
|
Keshavarz P, Bagherieh S, Nabipoorashrafi SA, Chalian H, Rahsepar AA, Kim GHJ, Hassani C, Raman SS, Bedayat A. ChatGPT in radiology: A systematic review of performance, pitfalls, and future perspectives. Diagn Interv Imaging 2024; 105:251-265. [PMID: 38679540 DOI: 10.1016/j.diii.2024.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/11/2024] [Accepted: 04/16/2024] [Indexed: 05/01/2024]
Abstract
PURPOSE The purpose of this study was to systematically review the reported performances of ChatGPT, identify potential limitations, and explore future directions for its integration, optimization, and ethical considerations in radiology applications. MATERIALS AND METHODS After a comprehensive review of PubMed, Web of Science, Embase, and Google Scholar databases, a cohort of published studies was identified up to January 1, 2024, utilizing ChatGPT for clinical radiology applications. RESULTS Out of 861 studies derived, 44 studies evaluated the performance of ChatGPT; among these, 37 (37/44; 84.1%) demonstrated high performance, and seven (7/44; 15.9%) indicated it had a lower performance in providing information on diagnosis and clinical decision support (6/44; 13.6%) and patient communication and educational content (1/44; 2.3%). Twenty-four (24/44; 54.5%) studies reported the proportion of ChatGPT's performance. Among these, 19 (19/24; 79.2%) studies recorded a median accuracy of 70.5%, and in five (5/24; 20.8%) studies, there was a median agreement of 83.6% between ChatGPT outcomes and reference standards [radiologists' decision or guidelines], generally confirming ChatGPT's high accuracy in these studies. Eleven studies compared two recent ChatGPT versions, and in ten (10/11; 90.9%), ChatGPTv4 outperformed v3.5, showing notable enhancements in addressing higher-order thinking questions, better comprehension of radiology terms, and improved accuracy in describing images. Risks and concerns about using ChatGPT included biased responses, limited originality, and the potential for inaccurate information leading to misinformation, hallucinations, improper citations and fake references, cybersecurity vulnerabilities, and patient privacy risks. CONCLUSION Although ChatGPT's effectiveness has been shown in 84.1% of radiology studies, there are still multiple pitfalls and limitations to address. It is too soon to confirm its complete proficiency and accuracy, and more extensive multicenter studies utilizing diverse datasets and pre-training techniques are required to verify ChatGPT's role in radiology.
Collapse
Affiliation(s)
- Pedram Keshavarz
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA; School of Science and Technology, The University of Georgia, Tbilisi 0171, Georgia
| | - Sara Bagherieh
- Independent Clinical Radiology Researcher, Los Angeles, CA 90024, USA
| | | | - Hamid Chalian
- Department of Radiology, Cardiothoracic Imaging, University of Washington, Seattle, WA 98195, USA
| | - Amir Ali Rahsepar
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Grace Hyun J Kim
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA; Department of Radiological Sciences, Center for Computer Vision and Imaging Biomarkers, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Cameron Hassani
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Steven S Raman
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Arash Bedayat
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA.
| |
Collapse
|
6
|
Aburumman R, Al Annan K, Mrad R, Brunaldi VO, Gala K, Abu Dayyeh BK. Assessing ChatGPT vs. Standard Medical Resources for Endoscopic Sleeve Gastroplasty Education: A Medical Professional Evaluation Study. Obes Surg 2024; 34:2718-2724. [PMID: 38758515 DOI: 10.1007/s11695-024-07283-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Revised: 05/07/2024] [Accepted: 05/09/2024] [Indexed: 05/18/2024]
Abstract
BACKGROUND AND AIMS The Chat Generative Pre-Trained Transformer (ChatGPT) represents a significant advancement in artificial intelligence (AI) chatbot technology. While ChatGPT offers promising capabilities, concerns remain about its reliability and accuracy. This study aims to evaluate ChatGPT's responses to patients' frequently asked questions about Endoscopic Sleeve Gastroplasty (ESG). METHODS Expert Gastroenterologists and Bariatric Surgeons, with experience in ESG, were invited to evaluate ChatGPT-generated answers to eight ESG-related questions, and answers sourced from hospital websites. The evaluation criteria included ease of understanding, scientific accuracy, and overall answer satisfaction. They were also tasked with discerning whether each response was AI generated or not. RESULTS Twelve medical professionals with expertise in ESG participated, 83.3% of whom had experience performing the procedure independently. The entire cohort possessed substantial knowledge about ESG. ChatGPT's utility among participants, rated on a scale of one to five, averaged 2.75. The raters demonstrated a 54% accuracy rate in distinguishing AI-generated responses, with a sensitivity of 39% and specificity of 60%, resulting in an average of 17.6 correct identifications out of a possible 31. Overall, there were no significant differences between AI-generated and non-AI responses in terms of scientific accuracy, understandability, and satisfaction, with one notable exception. For the question defining ESG, the AI-generated definition scored higher in scientific accuracy (4.33 vs. 3.61, p = 0.007) and satisfaction (4.33 vs. 3.58, p = 0.009) compared to the non-AI versions. CONCLUSIONS This study underscores ChatGPT's efficacy in providing medical information on ESG, demonstrating its comparability to traditional sources in scientific accuracy.
Collapse
Affiliation(s)
- Razan Aburumman
- Division of Gastroenterology and Hepatology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
| | - Karim Al Annan
- Division of Gastroenterology and Hepatology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
| | - Rudy Mrad
- Division of Gastroenterology and Hepatology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
| | - Vitor O Brunaldi
- Division of Gastroenterology and Hepatology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
| | - Khushboo Gala
- Division of Gastroenterology and Hepatology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
| | - Barham K Abu Dayyeh
- Division of Gastroenterology and Hepatology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA.
| |
Collapse
|
7
|
Lecler A, Soyer P, Gong B. The potential and pitfalls of ChatGPT in radiology. Diagn Interv Imaging 2024; 105:249-250. [PMID: 38811261 DOI: 10.1016/j.diii.2024.05.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Accepted: 05/15/2024] [Indexed: 05/31/2024]
Affiliation(s)
- Augustin Lecler
- Department of Neuroradiology, Foundation Adolphe de Rothschild Hospital, 75019, Paris, France; Université Paris Cité, Faculté de Médecine, 75006, Paris, France.
| | - Philippe Soyer
- Université Paris Cité, Faculté de Médecine, 75006, Paris, France; Department of Radiology, Hôpital Cochin, APH-HP, 75014, Paris, France
| | - Bo Gong
- Department of Radiology, University of British Columbia, Vancouver, BC, V6T 1M9, Canada
| |
Collapse
|
8
|
Kumar RP, Sivan V, Bachir H, Sarwar SA, Ruzicka F, O'Malley GR, Lobo P, Morales IC, Cassimatis ND, Hundal JS, Patel NV. Can Artificial Intelligence Mitigate Missed Diagnoses by Generating Differential Diagnoses for Neurosurgeons? World Neurosurg 2024; 187:e1083-e1088. [PMID: 38759788 DOI: 10.1016/j.wneu.2024.05.052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 05/08/2024] [Accepted: 05/09/2024] [Indexed: 05/19/2024]
Abstract
BACKGROUND/OBJECTIVE Neurosurgery emphasizes the criticality of accurate differential diagnoses, with diagnostic delays posing significant health and economic challenges. As large language models (LLMs) emerge as transformative tools in healthcare, this study seeks to elucidate their role in assisting neurosurgeons with the differential diagnosis process, especially during preliminary consultations. METHODS This study employed 3 chat-based LLMs, ChatGPT (versions 3.5 and 4.0), Perplexity AI, and Bard AI, to evaluate their diagnostic accuracy. Each LLM was prompted using clinical vignettes, and their responses were recorded to generate differential diagnoses for 20 common and uncommon neurosurgical disorders. Disease-specific prompts were crafted using Dynamed, a clinical reference tool. The accuracy of the LLMs was determined based on their ability to identify the target disease within their top differential diagnoses correctly. RESULTS For the initial differential, ChatGPT 3.5 achieved an accuracy of 52.63%, while ChatGPT 4.0 performed slightly better at 53.68%. Perplexity AI and Bard AI demonstrated 40.00% and 29.47% accuracy, respectively. As the number of considered differentials increased from 2 to 5, ChatGPT 3.5 reached its peak accuracy of 77.89% for the top 5 differentials. Bard AI and Perplexity AI had varied performances, with Bard AI improving in the top 5 differentials at 62.11%. On a disease-specific note, the LLMs excelled in diagnosing conditions like epilepsy and cervical spine stenosis but faced challenges with more complex diseases such as Moyamoya disease and amyotrophic lateral sclerosis. CONCLUSIONS LLMs showcase the potential to enhance diagnostic accuracy and decrease the incidence of missed diagnoses in neurosurgery.
Collapse
Affiliation(s)
- Rohit Prem Kumar
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA.
| | - Vijay Sivan
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Hanin Bachir
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Syed A Sarwar
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Francis Ruzicka
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Geoffrey R O'Malley
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Paulo Lobo
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Ilona Cazorla Morales
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Nicholas D Cassimatis
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Jasdeep S Hundal
- Department of Neurology, HMH-Jersey Shore University Medical Center, Neptune, New Jersey, USA
| | - Nitesh V Patel
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA; Department of Neurosurgery, HMH-Jersey Shore University Medical Center, Neptune, New Jersey, USA
| |
Collapse
|
9
|
Suleiman A, von Wedel D, Munoz-Acuna R, Redaelli S, Santarisi A, Seibold EL, Ratajczak N, Kato S, Said N, Sundar E, Goodspeed V, Schaefer MS. Assessing ChatGPT's ability to emulate human reviewers in scientific research: A descriptive and qualitative approach. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 254:108313. [PMID: 38954915 DOI: 10.1016/j.cmpb.2024.108313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 06/20/2024] [Accepted: 06/27/2024] [Indexed: 07/04/2024]
Abstract
BACKGROUND ChatGPT is an AI platform whose relevance in the peer review of scientific articles is steadily growing. Nonetheless, it has sparked debates over its potential biases and inaccuracies. This study aims to assess ChatGPT's ability to qualitatively emulate human reviewers in scientific research. METHODS We included the first submitted version of the latest twenty original research articles published by the 3rd of July 2023, in a high-profile medical journal. Each article underwent evaluation by a minimum of three human reviewers during the initial review stage. Subsequently, three researchers with medical backgrounds and expertise in manuscript revision, independently and qualitatively assessed the agreement between the peer reviews generated by ChatGPT version GPT-4 and the comments provided by human reviewers for these articles. The level of agreement was categorized into complete, partial, none, or contradictory. RESULTS 720 human reviewers' comments were assessed. There was a good agreement between the three assessors (Overall kappa >0.6). ChatGPT's comments demonstrated complete agreement in terms of quality and substance with 48 (6.7 %) human reviewers' comments, partially agreed with 92 (12.8 %), identifying issues necessitating further elaboration or recommending supplementary steps to address concerns, had no agreement with a significant 565 (78.5 %), and contradicted 15 (2.1 %). ChatGPT comments on methods had the lowest proportion of complete agreement (13 comments, 3.6 %), while general comments on the manuscript displayed the highest proportion of complete agreement (17 comments, 22.1 %). CONCLUSION ChatGPT version GPT-4 has a limited ability to emulate human reviewers within the peer review process of scientific research.
Collapse
Affiliation(s)
- Aiman Suleiman
- Department of Anesthesia, Critical Care and Pain Medicine, Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA; Center for Anesthesia Research Excellence (CARE), Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA; Department of Anesthesia, Critical Care and Pain Medicine, Albert Einstein College of Medicine, Montefiore Medical Center, Bronx, NY, USA.
| | - Dario von Wedel
- Department of Anesthesia, Critical Care and Pain Medicine, Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA; Center for Anesthesia Research Excellence (CARE), Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Ricardo Munoz-Acuna
- Department of Anesthesia, Critical Care and Pain Medicine, Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA; Center for Anesthesia Research Excellence (CARE), Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Simone Redaelli
- Department of Anesthesia, Critical Care and Pain Medicine, Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA; Center for Anesthesia Research Excellence (CARE), Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Abeer Santarisi
- Center for Anesthesia Research Excellence (CARE), Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA; Department of Emergency Medicine, Disaster Medicine Fellowship, Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Eva-Lotte Seibold
- Department of Anesthesia, Critical Care and Pain Medicine, Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA; Center for Anesthesia Research Excellence (CARE), Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Nikolai Ratajczak
- Department of Anesthesia, Critical Care and Pain Medicine, Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA; Center for Anesthesia Research Excellence (CARE), Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Shinichiro Kato
- Department of Anesthesia, Critical Care and Pain Medicine, Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA; Center for Anesthesia Research Excellence (CARE), Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Nader Said
- Department of Industrial Engineering, Faculty of Engineering Technologies and Sciences, Higher Colleges of Technology, DWC, Dubai, United Arab Emirates
| | - Eswar Sundar
- Department of Anesthesia, Critical Care and Pain Medicine, Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Valerie Goodspeed
- Department of Anesthesia, Critical Care and Pain Medicine, Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA; Center for Anesthesia Research Excellence (CARE), Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Maximilian S Schaefer
- Department of Anesthesia, Critical Care and Pain Medicine, Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA; Center for Anesthesia Research Excellence (CARE), Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA; Klinik für Anästhesiologie, Universitätsklinikum Düsseldorf, Düsseldorf, Germany
| |
Collapse
|
10
|
Dehdab R, Brendlin A, Werner S, Almansour H, Gassenmaier S, Brendel JM, Nikolaou K, Afat S. Evaluating ChatGPT-4V in chest CT diagnostics: a critical image interpretation assessment. Jpn J Radiol 2024:10.1007/s11604-024-01606-3. [PMID: 38867035 DOI: 10.1007/s11604-024-01606-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Accepted: 05/28/2024] [Indexed: 06/14/2024]
Abstract
PURPOSE To assess the diagnostic accuracy of ChatGPT-4V in interpreting a set of four chest CT slices for each case of COVID-19, non-small cell lung cancer (NSCLC), and control cases, thereby evaluating its potential as an AI tool in radiological diagnostics. MATERIALS AND METHODS In this retrospective study, 60 CT scans from The Cancer Imaging Archive, covering COVID-19, NSCLC, and control cases were analyzed using ChatGPT-4V. A radiologist selected four CT slices from each scan for evaluation. ChatGPT-4V's interpretations were compared against the gold standard diagnoses and assessed by two radiologists. Statistical analyses focused on accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), along with an examination of the impact of pathology location and lobe involvement. RESULTS ChatGPT-4V showed an overall diagnostic accuracy of 56.76%. For NSCLC, sensitivity was 27.27% and specificity was 60.47%. In COVID-19 detection, sensitivity was 13.64% and specificity of 64.29%. For control cases, the sensitivity was 31.82%, with a specificity of 95.24%. The highest sensitivity (83.33%) was observed in cases involving all lung lobes. The chi-squared statistical analysis indicated significant differences in Sensitivity across categories and in relation to the location and lobar involvement of pathologies. CONCLUSION ChatGPT-4V demonstrated variable diagnostic performance in chest CT interpretation, with notable proficiency in specific scenarios. This underscores the challenges of cross-modal AI models like ChatGPT-4V in radiology, pointing toward significant areas for improvement to ensure dependability. The study emphasizes the importance of enhancing these models for broader, more reliable medical use.
Collapse
Affiliation(s)
- Reza Dehdab
- Department of Diagnostic and Interventional Radiology, Tuebingen University Hospital, Hoppe-Seyler-Straße 3, 72076, Tuebingen, Germany.
| | - Andreas Brendlin
- Department of Diagnostic and Interventional Radiology, Tuebingen University Hospital, Hoppe-Seyler-Straße 3, 72076, Tuebingen, Germany
| | - Sebastian Werner
- Department of Diagnostic and Interventional Radiology, Tuebingen University Hospital, Hoppe-Seyler-Straße 3, 72076, Tuebingen, Germany
| | - Haidara Almansour
- Department of Diagnostic and Interventional Radiology, Tuebingen University Hospital, Hoppe-Seyler-Straße 3, 72076, Tuebingen, Germany
| | - Sebastian Gassenmaier
- Department of Diagnostic and Interventional Radiology, Tuebingen University Hospital, Hoppe-Seyler-Straße 3, 72076, Tuebingen, Germany
| | - Jan Michael Brendel
- Department of Diagnostic and Interventional Radiology, Tuebingen University Hospital, Hoppe-Seyler-Straße 3, 72076, Tuebingen, Germany
| | - Konstantin Nikolaou
- Department of Diagnostic and Interventional Radiology, Tuebingen University Hospital, Hoppe-Seyler-Straße 3, 72076, Tuebingen, Germany
| | - Saif Afat
- Department of Diagnostic and Interventional Radiology, Tuebingen University Hospital, Hoppe-Seyler-Straße 3, 72076, Tuebingen, Germany
| |
Collapse
|
11
|
Park J, Oh K, Han K, Lee YH. Patient-centered radiology reports with generative artificial intelligence: adding value to radiology reporting. Sci Rep 2024; 14:13218. [PMID: 38851825 PMCID: PMC11162416 DOI: 10.1038/s41598-024-63824-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Accepted: 06/03/2024] [Indexed: 06/10/2024] Open
Abstract
The purposes were to assess the efficacy of AI-generated radiology reports in terms of report summary, patient-friendliness, and recommendations and to evaluate the consistent performance of report quality and accuracy, contributing to the advancement of radiology workflow. Total 685 spine MRI reports were retrieved from our hospital database. AI-generated radiology reports were generated in three formats: (1) summary reports, (2) patient-friendly reports, and (3) recommendations. The occurrence of artificial hallucinations was evaluated in the AI-generated reports. Two radiologists conducted qualitative and quantitative assessments considering the original report as a standard reference. Two non-physician raters assessed their understanding of the content of original and patient-friendly reports using a 5-point Likert scale. The scoring of the AI-generated radiology reports were overall high average scores across all three formats. The average comprehension score for the original report was 2.71 ± 0.73, while the score for the patient-friendly reports significantly increased to 4.69 ± 0.48 (p < 0.001). There were 1.12% artificial hallucinations and 7.40% potentially harmful translations. In conclusion, the potential benefits of using generative AI assistants to generate these reports include improved report quality, greater efficiency in radiology workflow for producing summaries, patient-centered reports, and recommendations, and a move toward patient-centered radiology.
Collapse
Affiliation(s)
- Jiwoo Park
- Department of Radiology, Research Institute of Radiological Science, and Center for Clinical Imaging Data Science (CCIDS), Yonsei University College of Medicine, 50-1 Yonsei-Ro, Seodaemun-Gu, Seoul, 03722, South Korea
| | - Kangrok Oh
- Department of Radiology, Research Institute of Radiological Science, and Center for Clinical Imaging Data Science (CCIDS), Yonsei University College of Medicine, 50-1 Yonsei-Ro, Seodaemun-Gu, Seoul, 03722, South Korea
| | - Kyunghwa Han
- Department of Radiology, Research Institute of Radiological Science, and Center for Clinical Imaging Data Science (CCIDS), Yonsei University College of Medicine, 50-1 Yonsei-Ro, Seodaemun-Gu, Seoul, 03722, South Korea.
| | - Young Han Lee
- Department of Radiology, Research Institute of Radiological Science, and Center for Clinical Imaging Data Science (CCIDS), Yonsei University College of Medicine, 50-1 Yonsei-Ro, Seodaemun-Gu, Seoul, 03722, South Korea.
- Institute for Innovation in Digital Healthcare, Yonsei University, Seoul, South Korea.
| |
Collapse
|
12
|
Rao SJ, Isath A, Krishnan P, Tangsrivimol JA, Virk HUH, Wang Z, Glicksberg BS, Krittanawong C. ChatGPT: A Conceptual Review of Applications and Utility in the Field of Medicine. J Med Syst 2024; 48:59. [PMID: 38836893 DOI: 10.1007/s10916-024-02075-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 05/07/2024] [Indexed: 06/06/2024]
Abstract
Artificial Intelligence, specifically advanced language models such as ChatGPT, have the potential to revolutionize various aspects of healthcare, medical education, and research. In this narrative review, we evaluate the myriad applications of ChatGPT in diverse healthcare domains. We discuss its potential role in clinical decision-making, exploring how it can assist physicians by providing rapid, data-driven insights for diagnosis and treatment. We review the benefits of ChatGPT in personalized patient care, particularly in geriatric care, medication management, weight loss and nutrition, and physical activity guidance. We further delve into its potential to enhance medical research, through the analysis of large datasets, and the development of novel methodologies. In the realm of medical education, we investigate the utility of ChatGPT as an information retrieval tool and personalized learning resource for medical students and professionals. There are numerous promising applications of ChatGPT that will likely induce paradigm shifts in healthcare practice, education, and research. The use of ChatGPT may come with several benefits in areas such as clinical decision making, geriatric care, medication management, weight loss and nutrition, physical fitness, scientific research, and medical education. Nevertheless, it is important to note that issues surrounding ethics, data privacy, transparency, inaccuracy, and inadequacy persist. Prior to widespread use in medicine, it is imperative to objectively evaluate the impact of ChatGPT in a real-world setting using a risk-based approach.
Collapse
Affiliation(s)
- Shiavax J Rao
- Department of Medicine, MedStar Union Memorial Hospital, Baltimore, MD, USA
| | - Ameesh Isath
- Department of Cardiology, Westchester Medical Center and New York Medical College, Valhalla, NY, USA
| | - Parvathy Krishnan
- Department of Pediatrics, Westchester Medical Center and New York Medical College, Valhalla, NY, USA
| | - Jonathan A Tangsrivimol
- Division of Neurosurgery, Department of Surgery, Chulabhorn Hospital, Chulabhorn Royal Academy, Bangkok, 10210, Thailand
- Department of Neurological Surgery, Weill Cornell Medicine Brain and Spine Center, New York, NY, 10022, USA
| | - Hafeez Ul Hassan Virk
- Harrington Heart & Vascular Institute, Case Western Reserve University, University Hospitals Cleveland Medical Center, Cleveland, OH, USA
| | - Zhen Wang
- Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, USA
- Division of Health Care Policy and Research, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Benjamin S Glicksberg
- Hasso Plattner Institute for Digital Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Chayakrit Krittanawong
- Cardiology Division, NYU Langone Health and NYU School of Medicine, 550 First Avenue, New York, NY, 10016, USA.
| |
Collapse
|
13
|
Farquhar S, Kossen J, Kuhn L, Gal Y. Detecting hallucinations in large language models using semantic entropy. Nature 2024; 630:625-630. [PMID: 38898292 PMCID: PMC11186750 DOI: 10.1038/s41586-024-07421-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 04/12/2024] [Indexed: 06/21/2024]
Abstract
Large language model (LLM) systems, such as ChatGPT1 or Gemini2, can show impressive reasoning and question-answering capabilities but often 'hallucinate' false outputs and unsubstantiated answers3,4. Answering unreliably or without the necessary information prevents adoption in diverse fields, with problems including fabrication of legal precedents5 or untrue facts in news articles6 and even posing a risk to human life in medical domains such as radiology7. Encouraging truthfulness through supervision or reinforcement has been only partially successful8. Researchers need a general method for detecting hallucinations in LLMs that works even with new and unseen questions to which humans might not know the answer. Here we develop new methods grounded in statistics, proposing entropy-based uncertainty estimators for LLMs to detect a subset of hallucinations-confabulations-which are arbitrary and incorrect generations. Our method addresses the fact that one idea can be expressed in many ways by computing uncertainty at the level of meaning rather than specific sequences of words. Our method works across datasets and tasks without a priori knowledge of the task, requires no task-specific data and robustly generalizes to new tasks not seen before. By detecting when a prompt is likely to produce a confabulation, our method helps users understand when they must take extra care with LLMs and opens up new possibilities for using LLMs that are otherwise prevented by their unreliability.
Collapse
Affiliation(s)
- Sebastian Farquhar
- OATML, Department of Computer Science, University of Oxford, Oxford, UK.
| | - Jannik Kossen
- OATML, Department of Computer Science, University of Oxford, Oxford, UK
| | - Lorenz Kuhn
- OATML, Department of Computer Science, University of Oxford, Oxford, UK
| | - Yarin Gal
- OATML, Department of Computer Science, University of Oxford, Oxford, UK
| |
Collapse
|
14
|
Kuai H, Chen J, Tao X, Cai L, Imamura K, Matsumoto H, Liang P, Zhong N. Never-Ending Learning for Explainable Brain Computing. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2307647. [PMID: 38602432 PMCID: PMC11200082 DOI: 10.1002/advs.202307647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 03/24/2024] [Indexed: 04/12/2024]
Abstract
Exploring the nature of human intelligence and behavior is a longstanding pursuit in cognitive neuroscience, driven by the accumulation of knowledge, information, and data across various studies. However, achieving a unified and transparent interpretation of findings presents formidable challenges. In response, an explainable brain computing framework is proposed that employs the never-ending learning paradigm, integrating evidence combination and fusion computing within a Knowledge-Information-Data (KID) architecture. The framework supports continuous brain cognition investigation, utilizing joint knowledge-driven forward inference and data-driven reverse inference, bolstered by the pre-trained language modeling techniques and the human-in-the-loop mechanisms. In particular, it incorporates internal evidence learning through multi-task functional neuroimaging analyses and external evidence learning via topic modeling of published neuroimaging studies, all of which involve human interactions at different stages. Based on two case studies, the intricate uncertainty surrounding brain localization in human reasoning is revealed. The present study also highlights the potential of systematization to advance explainable brain computing, offering a finer-grained understanding of brain activity patterns related to human intelligence.
Collapse
Affiliation(s)
- Hongzhi Kuai
- Faculty of EngineeringMaebashi Institute of TechnologyGunma371–0816Japan
- School of Psychology and Beijing Key Laboratory of Learning and CognitionCapital Normal UniversityBeijing100048China
| | - Jianhui Chen
- Faculty of Information TechnologyBeijing University of TechnologyBeijing100124China
- Beijing International Collaboration Base on Brain Informatics and Wisdom ServicesBeijing100124China
| | - Xiaohui Tao
- School of Mathematics, Physics and ComputingUniversity of Southern QueenslandToowoomba4350Australia
| | - Lingyun Cai
- School of Psychology and Beijing Key Laboratory of Learning and CognitionCapital Normal UniversityBeijing100048China
| | - Kazuyuki Imamura
- Faculty of EngineeringMaebashi Institute of TechnologyGunma371–0816Japan
| | - Hiroki Matsumoto
- Faculty of EngineeringMaebashi Institute of TechnologyGunma371–0816Japan
| | - Peipeng Liang
- School of Psychology and Beijing Key Laboratory of Learning and CognitionCapital Normal UniversityBeijing100048China
| | - Ning Zhong
- Faculty of EngineeringMaebashi Institute of TechnologyGunma371–0816Japan
- School of Psychology and Beijing Key Laboratory of Learning and CognitionCapital Normal UniversityBeijing100048China
- Beijing International Collaboration Base on Brain Informatics and Wisdom ServicesBeijing100124China
| |
Collapse
|
15
|
Saba L, Fu CL, Khouri J, Faiman B, Anwer F, Chaulagain CP. Evaluating ChatGPT as an educational resource for patients with multiple myeloma: A preliminary investigation. Am J Hematol 2024; 99:1205-1207. [PMID: 38602288 DOI: 10.1002/ajh.27318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 03/07/2024] [Accepted: 03/11/2024] [Indexed: 04/12/2024]
Abstract
The findings of this study highlight a 95% accuracy rate in ChatGPT responses, as assessed by five myeloma specialists, underscoring its potential as a reliable educational tool.
Collapse
Affiliation(s)
- Ludovic Saba
- Department of Hematology and Medical Oncology, Cleveland Clinic Florida, Weston, Florida, USA
| | - Chieh-Lin Fu
- Department of Hematology and Medical Oncology, Cleveland Clinic Florida, Weston, Florida, USA
| | - Jack Khouri
- Department of Hematology and Medical Oncology, Cleveland Clinic Main Campus, Cleveland, Ohio, USA
| | - Beth Faiman
- Department of Hematologic Oncology and Blood Disorders, Cleveland Clinic Main Campus, Cleveland, Ohio, USA
| | - Faiz Anwer
- Department of Hematology and Medical Oncology, Cleveland Clinic Main Campus, Cleveland, Ohio, USA
| | - Chakra P Chaulagain
- Department of Hematology and Medical Oncology, Cleveland Clinic Florida, Weston, Florida, USA
| |
Collapse
|
16
|
Brunner J, Rinne S. Large Language Models as a Tool for Health Services Researchers: An Exploration of High-Value Applications. Ann Am Thorac Soc 2024; 21:845-848. [PMID: 38445982 DOI: 10.1513/annalsats.202311-980ps] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 03/05/2024] [Indexed: 03/07/2024] Open
Affiliation(s)
- Julian Brunner
- Center for the Study of Healthcare Innovation, Implementation, and Policy, VA Greater Los Angeles Health Care, Los Angeles, California
| | - Seppo Rinne
- Center for Healthcare Organization and Implementation Research, Bedford VA Medical Center, Bedford, Massachusetts; and
- Department of Medicine, Dartmouth Geisel School of Medicine, Hanover, New Hampshire
| |
Collapse
|
17
|
Burnette H, Pabani A, von Itzstein MS, Switzer B, Fan R, Ye F, Puzanov I, Naidoo J, Ascierto PA, Gerber DE, Ernstoff MS, Johnson DB. Use of artificial intelligence chatbots in clinical management of immune-related adverse events. J Immunother Cancer 2024; 12:e008599. [PMID: 38816231 PMCID: PMC11141185 DOI: 10.1136/jitc-2023-008599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/14/2024] [Indexed: 06/01/2024] Open
Abstract
BACKGROUND Artificial intelligence (AI) chatbots have become a major source of general and medical information, though their accuracy and completeness are still being assessed. Their utility to answer questions surrounding immune-related adverse events (irAEs), common and potentially dangerous toxicities from cancer immunotherapy, are not well defined. METHODS We developed 50 distinct questions with answers in available guidelines surrounding 10 irAE categories and queried two AI chatbots (ChatGPT and Bard), along with an additional 20 patient-specific scenarios. Experts in irAE management scored answers for accuracy and completion using a Likert scale ranging from 1 (least accurate/complete) to 4 (most accurate/complete). Answers across categories and across engines were compared. RESULTS Overall, both engines scored highly for accuracy (mean scores for ChatGPT and Bard were 3.87 vs 3.5, p<0.01) and completeness (3.83 vs 3.46, p<0.01). Scores of 1-2 (completely or mostly inaccurate or incomplete) were particularly rare for ChatGPT (6/800 answer-ratings, 0.75%). Of the 50 questions, all eight physician raters gave ChatGPT a rating of 4 (fully accurate or complete) for 22 questions (for accuracy) and 16 questions (for completeness). In the 20 patient scenarios, the average accuracy score was 3.725 (median 4) and the average completeness was 3.61 (median 4). CONCLUSIONS AI chatbots provided largely accurate and complete information regarding irAEs, and wildly inaccurate information ("hallucinations") was uncommon. However, until accuracy and completeness increases further, appropriate guidelines remain the gold standard to follow.
Collapse
Affiliation(s)
- Hannah Burnette
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Aliyah Pabani
- Department of Oncology, Johns Hopkins University, Baltimore, Maryland, USA
| | - Mitchell S von Itzstein
- Harold C Simmons Comprehensive Cancer Center, The University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - Benjamin Switzer
- Department of Medicine, Roswell Park Comprehensive Cancer Center, Buffalo, New York, USA
| | - Run Fan
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Fei Ye
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Igor Puzanov
- Department of Medicine, Roswell Park Comprehensive Cancer Center, Buffalo, New York, USA
| | | | - Paolo A Ascierto
- Department of Melanoma, Cancer Immunotherapy and Development Therapeutics, Istituto Nazionale Tumori IRCCS Fondazione Pascale, Napoli, Campania, Italy
| | - David E Gerber
- Harold C Simmons Comprehensive Cancer Center, The University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - Marc S Ernstoff
- ImmunoOncology Branch (IOB), Developmental Therapeutics Program, Cancer Therapy and Diagnosis Division, National Cancer Institute (NCI), National Institutes of Health, Bethesda, Maryland, USA
| | - Douglas B Johnson
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
18
|
Choudhury A, Shamszare H. The Impact of Performance Expectancy, Workload, Risk, and Satisfaction on Trust in ChatGPT: Cross-Sectional Survey Analysis. JMIR Hum Factors 2024; 11:e55399. [PMID: 38801658 PMCID: PMC11165287 DOI: 10.2196/55399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 03/25/2024] [Accepted: 04/07/2024] [Indexed: 05/29/2024] Open
Abstract
BACKGROUND ChatGPT (OpenAI) is a powerful tool for a wide range of tasks, from entertainment and creativity to health care queries. There are potential risks and benefits associated with this technology. In the discourse concerning the deployment of ChatGPT and similar large language models, it is sensible to recommend their use primarily for tasks a human user can execute accurately. As we transition into the subsequent phase of ChatGPT deployment, establishing realistic performance expectations and understanding users' perceptions of risk associated with its use are crucial in determining the successful integration of this artificial intelligence (AI) technology. OBJECTIVE The aim of the study is to explore how perceived workload, satisfaction, performance expectancy, and risk-benefit perception influence users' trust in ChatGPT. METHODS A semistructured, web-based survey was conducted with 607 adults in the United States who actively use ChatGPT. The survey questions were adapted from constructs used in various models and theories such as the technology acceptance model, the theory of planned behavior, the unified theory of acceptance and use of technology, and research on trust and security in digital environments. To test our hypotheses and structural model, we used the partial least squares structural equation modeling method, a widely used approach for multivariate analysis. RESULTS A total of 607 people responded to our survey. A significant portion of the participants held at least a high school diploma (n=204, 33.6%), and the majority had a bachelor's degree (n=262, 43.1%). The primary motivations for participants to use ChatGPT were for acquiring information (n=219, 36.1%), amusement (n=203, 33.4%), and addressing problems (n=135, 22.2%). Some participants used it for health-related inquiries (n=44, 7.2%), while a few others (n=6, 1%) used it for miscellaneous activities such as brainstorming, grammar verification, and blog content creation. Our model explained 64.6% of the variance in trust. Our analysis indicated a significant relationship between (1) workload and satisfaction, (2) trust and satisfaction, (3) performance expectations and trust, and (4) risk-benefit perception and trust. CONCLUSIONS The findings underscore the importance of ensuring user-friendly design and functionality in AI-based applications to reduce workload and enhance user satisfaction, thereby increasing user trust. Future research should further explore the relationship between risk-benefit perception and trust in the context of AI chatbots.
Collapse
Affiliation(s)
- Avishek Choudhury
- Industrial and Management Systems Engineering, Benjamin M. Statler College of Engineering and Mineral Resources, West Virginia University, Morgantown, WV, United States
| | - Hamid Shamszare
- Industrial and Management Systems Engineering, Benjamin M. Statler College of Engineering and Mineral Resources, West Virginia University, Morgantown, WV, United States
| |
Collapse
|
19
|
Ye F, Zhang H, Luo X, Wu T, Yang Q, Shi Z. Evaluating ChatGPT's Performance in Answering Questions About Allergic Rhinitis and Chronic Rhinosinusitis. Otolaryngol Head Neck Surg 2024. [PMID: 38796735 DOI: 10.1002/ohn.832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 04/30/2024] [Accepted: 05/04/2024] [Indexed: 05/28/2024]
Abstract
OBJECTIVE This study aims to evaluate the accuracy of ChatGPT in answering allergic rhinitis (AR) and chronic rhinosinusitis (CRS) related questions. STUDY DESIGN This is a cross-sectional study. SETTING Each question was inputted as a separate, independent prompt. METHODS Responses to AR (n = 189) and CRS (n = 242) related questions, generated by GPT-3.5 and GPT-4, were independently graded for accuracy by 2 senior rhinology professors, with disagreements adjudicated by a third reviewer. RESULTS Overall, ChatGPT demonstrated a satisfactory performance, accurately answering over 80% of questions across all categories. Specifically, GPT-4.0's accuracy in responding to AR-related questions significantly exceeded that of GPT-3.5, but distinction not evident in CRS-related questions. Patient-originated questions had a significantly higher accuracy compared to doctor-originated questions when utilizing GPT-4.0 to respond to AR-related questions. This discrepancy was not observed with GPT-3.5 or in the context of CRS-related questions. Across different types of content, ChatGPT excelled in covering basic knowledge, prevention, and emotion for AR and CRS. However, it experienced challenges when addressing questions about recent advancements, a trend consistent across both GPT-3.5 and GPT-4.0 iterations. Importantly, the accuracy of responses remained unaffected when questions were posed in Chinese. CONCLUSION Our findings suggest ChatGPT's capability to convey accurate information for AR and CRS patients, and offer insights into its performance across various domains, guiding its utilization and improvement.
Collapse
Affiliation(s)
- Fan Ye
- Department of Otolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Department of Allergy, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - He Zhang
- Department of Otolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Department of Allergy, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - Xin Luo
- Department of Otolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Department of Allergy, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - Tong Wu
- Department of Otolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Department of Allergy, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - Qintai Yang
- Department of Otolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Department of Allergy, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Naso-Orbital-Maxilla and Skull Base Center, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Key Laboratory of Airway Inflammatory Disease Research and Innovative Technology Translation, Guangzhou, China
| | - Zhaohui Shi
- Department of Otolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Department of Allergy, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Naso-Orbital-Maxilla and Skull Base Center, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Key Laboratory of Airway Inflammatory Disease Research and Innovative Technology Translation, Guangzhou, China
| |
Collapse
|
20
|
Buldur M, Sezer B. Evaluating the accuracy of Chat Generative Pre-trained Transformer version 4 (ChatGPT-4) responses to United States Food and Drug Administration (FDA) frequently asked questions about dental amalgam. BMC Oral Health 2024; 24:605. [PMID: 38789962 PMCID: PMC11127407 DOI: 10.1186/s12903-024-04358-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 05/09/2024] [Indexed: 05/26/2024] Open
Abstract
BACKGROUND The use of artificial intelligence in the field of health sciences is becoming widespread. It is known that patients benefit from artificial intelligence applications on various health issues, especially after the pandemic period. One of the most important issues in this regard is the accuracy of the information provided by artificial intelligence applications. OBJECTIVE The purpose of this study was to the frequently asked questions about dental amalgam, as determined by the United States Food and Drug Administration (FDA), which is one of these information resources, to Chat Generative Pre-trained Transformer version 4 (ChatGPT-4) and to compare the content of the answers given by the application with the answers of the FDA. METHODS The questions were directed to ChatGPT-4 on May 8th and May 16th, 2023, and the responses were recorded and compared at the word and meaning levels using ChatGPT. The answers from the FDA webpage were also recorded. The responses were compared for content similarity in "Main Idea", "Quality Analysis", "Common Ideas", and "Inconsistent Ideas" between ChatGPT-4's responses and FDA's responses. RESULTS ChatGPT-4 provided similar responses at one-week intervals. In comparison with FDA guidance, it provided answers with similar information content to frequently asked questions. However, although there were some similarities in the general aspects of the recommendation regarding amalgam removal in the question, the two texts are not the same, and they offered different perspectives on the replacement of fillings. CONCLUSIONS The findings of this study indicate that ChatGPT-4, an artificial intelligence based application, encompasses current and accurate information regarding dental amalgam and its removal, providing it to individuals seeking access to such information. Nevertheless, we believe that numerous studies are required to assess the validity and reliability of ChatGPT-4 across diverse subjects.
Collapse
Affiliation(s)
- Mehmet Buldur
- Department of Restorative Dentistry, School of Dentistry, Çanakkale Onsekiz Mart University, Çanakkale, Türkiye
| | - Berkant Sezer
- Department of Pediatric Dentistry, School of Dentistry, Çanakkale Onsekiz Mart University, Çanakkale, Türkiye.
| |
Collapse
|
21
|
Sireci F, Lorusso F, Immordino A, Centineo M, Gerardi I, Patti G, Rusignuolo S, Manzella R, Gallina S, Dispenza F. ChatGPT as a New Tool to Select a Biological for Chronic Rhino Sinusitis with Polyps, "Caution Advised" or "Distant Reality"? J Pers Med 2024; 14:563. [PMID: 38929784 PMCID: PMC11204527 DOI: 10.3390/jpm14060563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Revised: 05/07/2024] [Accepted: 05/23/2024] [Indexed: 06/28/2024] Open
Abstract
ChatGPT is an advanced language model developed by OpenAI, designed for natural language understanding and generation. It employs deep learning technology to comprehend and generate human-like text, making it versatile for various applications. The aim of this study is to assess the alignment between the Rhinology Board's indications and ChatGPT's recommendations for treating patients with chronic rhinosinusitis with nasal polyps (CRSwNP) using biologic therapy. An observational cohort study involving 72 patients was conducted to evaluate various parameters of type 2 inflammation and assess the concordance in therapy choices between ChatGPT and the Rhinology Board. The observed results highlight the potential of Chat-GPT in guiding optimal biological therapy selection, with a concordance percentage = 68% and a Kappa coefficient = 0.69 (CI95% [0.50; 0.75]). In particular, the concordance was, respectively, 79.6% for dupilumab, 20% for mepolizumab, and 0% for omalizumab. This research represents a significant advancement in managing CRSwNP, addressing a condition lacking robust biomarkers. It provides valuable insights into the potential of AI, specifically ChatGPT, to assist otolaryngologists in determining the optimal biological therapy for personalized patient care. Our results demonstrate the need to implement the use of this tool to effectively aid clinicians.
Collapse
Affiliation(s)
- Federico Sireci
- Otorhinolaryngology Section, Department of Precision Medicine in Medical, Surgical and Critical Care (Me.Pre.C.C), University of Palermo, Via del Vespro 129, 133, 90127 Palermo, Italy;
| | - Francesco Lorusso
- Otorhinolaryngology Section, Biomedicine, Neuroscience and Advanced Diagnosics Department (BiND), University of Palermo, Via del Vespro 129, 133, 90127 Palermo, Italy; (F.L.); (I.G.); (G.P.); (S.R.); (R.M.); (S.G.); (F.D.)
| | - Angelo Immordino
- Otorhinolaryngology Section, Biomedicine, Neuroscience and Advanced Diagnosics Department (BiND), University of Palermo, Via del Vespro 129, 133, 90127 Palermo, Italy; (F.L.); (I.G.); (G.P.); (S.R.); (R.M.); (S.G.); (F.D.)
| | | | - Ignazio Gerardi
- Otorhinolaryngology Section, Biomedicine, Neuroscience and Advanced Diagnosics Department (BiND), University of Palermo, Via del Vespro 129, 133, 90127 Palermo, Italy; (F.L.); (I.G.); (G.P.); (S.R.); (R.M.); (S.G.); (F.D.)
| | - Gaetano Patti
- Otorhinolaryngology Section, Biomedicine, Neuroscience and Advanced Diagnosics Department (BiND), University of Palermo, Via del Vespro 129, 133, 90127 Palermo, Italy; (F.L.); (I.G.); (G.P.); (S.R.); (R.M.); (S.G.); (F.D.)
| | - Simona Rusignuolo
- Otorhinolaryngology Section, Biomedicine, Neuroscience and Advanced Diagnosics Department (BiND), University of Palermo, Via del Vespro 129, 133, 90127 Palermo, Italy; (F.L.); (I.G.); (G.P.); (S.R.); (R.M.); (S.G.); (F.D.)
| | - Riccardo Manzella
- Otorhinolaryngology Section, Biomedicine, Neuroscience and Advanced Diagnosics Department (BiND), University of Palermo, Via del Vespro 129, 133, 90127 Palermo, Italy; (F.L.); (I.G.); (G.P.); (S.R.); (R.M.); (S.G.); (F.D.)
| | - Salvatore Gallina
- Otorhinolaryngology Section, Biomedicine, Neuroscience and Advanced Diagnosics Department (BiND), University of Palermo, Via del Vespro 129, 133, 90127 Palermo, Italy; (F.L.); (I.G.); (G.P.); (S.R.); (R.M.); (S.G.); (F.D.)
| | - Francesco Dispenza
- Otorhinolaryngology Section, Biomedicine, Neuroscience and Advanced Diagnosics Department (BiND), University of Palermo, Via del Vespro 129, 133, 90127 Palermo, Italy; (F.L.); (I.G.); (G.P.); (S.R.); (R.M.); (S.G.); (F.D.)
| |
Collapse
|
22
|
SAYGIN M, BEKMEZCİ M, DİNÇER E. Artificial Intelligence Model Chatgpt-4: Entrepreneur Candidate and Entrepreneurship Example. F1000Res 2024; 13:308. [PMID: 38845823 PMCID: PMC11153998 DOI: 10.12688/f1000research.144671.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/20/2024] [Indexed: 06/09/2024] Open
Abstract
Background Although artificial intelligence technologies are still in their infancy, it is seen that they can bring together both hope and anxiety for the future. In the research, it is focused on examining the ChatGPT-4 version, which is one of the most well-known artificial intelligence applications and claimed to have self-learning feature, within the scope of business establishment processes. Methods In this direction, the assessment questions in the Entrepreneurship Handbook, published as open access by the Small and Medium Enterprises Development Organization of Turkey, which focuses on guiding the entrepreneurial processes in Turkey and creating the perception of entrepreneurship, were combined with the artificial intelligence model ChatGPT-4 and analysed within three stages. The way of solving the questions of artificial intelligence modelling and the answers it provides have the opportunity to be compared with the entrepreneurship literature. Results It has been seen that the artificial intelligence modelling ChatGPT-4, being an outstanding entrepreneurship example itself, has succeeded in answering the questions posed in the context of 16 modules in the entrepreneurship handbook in an original way by analysing deeply. Conclusion It has also been concluded that it is quite creative in developing new alternatives to the correct answers specified in the entrepreneurship handbook. The original aspect of the research is that it is one of the pioneers of the study on artificial intelligence and entrepreneurship in literature.
Collapse
|
23
|
Tripathi S, Sukumaran R, Cook TS. Efficient healthcare with large language models: optimizing clinical workflow and enhancing patient care. J Am Med Inform Assoc 2024; 31:1436-1440. [PMID: 38273739 PMCID: PMC11105142 DOI: 10.1093/jamia/ocad258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 12/01/2023] [Accepted: 12/29/2023] [Indexed: 01/27/2024] Open
Abstract
PURPOSE This article explores the potential of large language models (LLMs) to automate administrative tasks in healthcare, alleviating the burden on clinicians caused by electronic medical records. POTENTIAL LLMs offer opportunities in clinical documentation, prior authorization, patient education, and access to care. They can personalize patient scheduling, improve documentation accuracy, streamline insurance prior authorization, increase patient engagement, and address barriers to healthcare access. CAUTION However, integrating LLMs requires careful attention to security and privacy concerns, protecting patient data, and complying with regulations like the Health Insurance Portability and Accountability Act (HIPAA). It is crucial to acknowledge that LLMs should supplement, not replace, the human connection and care provided by healthcare professionals. CONCLUSION By prudently utilizing LLMs alongside human expertise, healthcare organizations can improve patient care and outcomes. Implementation should be approached with caution and consideration to ensure the safe and effective use of LLMs in the clinical setting.
Collapse
Affiliation(s)
- Satvik Tripathi
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Rithvik Sukumaran
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Tessa S Cook
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
24
|
Dimitriadis F, Tsigkriki L, Charisopoulou D, Tsaousidis A, Siarkos M, Koulaouzidis G. Letter Re: Response to Luan et al. Angiology 2024:33197241256685. [PMID: 38769649 DOI: 10.1177/00033197241256685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Affiliation(s)
- Fotis Dimitriadis
- Cardiology Department, General Hospital G. Papanikolaou, Thessaloniki, Greece
| | - Lamprini Tsigkriki
- Cardiology Department, General Hospital G. Papanikolaou, Thessaloniki, Greece
| | | | - Adam Tsaousidis
- Cardiology Department, General Hospital G. Papanikolaou, Thessaloniki, Greece
| | - Michail Siarkos
- Cardiology Department, General Hospital G. Papanikolaou, Thessaloniki, Greece
| | - George Koulaouzidis
- Department of Biochemical Sciences, Pomeranian Medical University, Szczecin, Poland
| |
Collapse
|
25
|
Simsek O, Manteghinejad A, Vossough A. A Comparative Review of Imaging Journal Policies for Use of AI in Manuscript Generation. Acad Radiol 2024:S1076-6332(24)00290-3. [PMID: 38772797 DOI: 10.1016/j.acra.2024.05.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 05/06/2024] [Accepted: 05/06/2024] [Indexed: 05/23/2024]
Abstract
RATIONALE AND OBJECTIVES Artificial intelligence (AI) technologies are rapidly evolving and offering new advances almost on a day-by-day basis, including various tools for manuscript generation and modification. On the other hand, these potentially time- and effort-saving solutions come with potential bias, factual error, and plagiarism risks. Some journals have started to update their author guidelines in reference to AI-generated or AI-assisted manuscripts. The purpose of this paper is to evaluate author guidelines for including AI use policies in radiology journals and compare scientometric data between journals with and without explicit AI use policies. MATERIALS AND METHODS This cross-sectional study included 112 MEDLINE-indexed imaging journals and evaluated their author guidelines between 13 October 2023 and 16 October 2023. Journals were identified based on subject matter and association with a radiological society. The authors' guidelines and editorial policies were evaluated for the use of AI in manuscript preparation and specific AI-generated image policies. We assessed the existence of an AI usage policy among subspecialty imaging journals. The scientometric scores of journals with and without AI use policies were compared using the Wilcoxon signed-rank test. RESULTS Among 112 MEDLINE-indexed radiology journals, 80 journals were affiliated with an imaging society, and 32 were not. 69 (61.6%) of 112 imaging journals had an AI usage policy, and 40 (57.9%) of 69 mentioned a specific policy about AI-generated figures. CiteScore (4.9 vs 4, p = 0.023), Source Normalized Impact per Paper (1.12 vs 0.83, p = 0.06), Scientific Journal Ranking (0.75 vs 0.54, p = 0.010) and Journal Citation Indicator (0.77 vs 0.62, p = 0.038) were significantly higher in journals with an AI policy. CONCLUSION The majority of imaging journals provide guidelines for AI-generated content, but still, a substantial number of journals do not have AI usage policies or do not require disclosure for non-human-created manuscripts. Journals with an established AI policy had higher citation and impact scores.
Collapse
Affiliation(s)
- Onur Simsek
- Division of Neuroradiology, Department of Radiology, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA (O.S., A.M., A.V.).
| | - Amirreza Manteghinejad
- Division of Neuroradiology, Department of Radiology, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA (O.S., A.M., A.V.)
| | - Arastoo Vossough
- Division of Neuroradiology, Department of Radiology, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA (O.S., A.M., A.V.); Department of Radiology, Children's Hospital of Philadelphia, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA (A.V.)
| |
Collapse
|
26
|
Rau S, Rau A, Nattenmüller J, Fink A, Bamberg F, Reisert M, Russe MF. A retrieval-augmented chatbot based on GPT-4 provides appropriate differential diagnosis in gastrointestinal radiology: a proof of concept study. Eur Radiol Exp 2024; 8:60. [PMID: 38755410 PMCID: PMC11098977 DOI: 10.1186/s41747-024-00457-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 03/12/2024] [Indexed: 05/18/2024] Open
Abstract
BACKGROUND We investigated the potential of an imaging-aware GPT-4-based chatbot in providing diagnoses based on imaging descriptions of abdominal pathologies. METHODS Utilizing zero-shot learning via the LlamaIndex framework, GPT-4 was enhanced using the 96 documents from the Radiographics Top 10 Reading List on gastrointestinal imaging, creating a gastrointestinal imaging-aware chatbot (GIA-CB). To assess its diagnostic capability, 50 cases on a variety of abdominal pathologies were created, comprising radiological findings in fluoroscopy, MRI, and CT. We compared the GIA-CB to the generic GPT-4 chatbot (g-CB) in providing the primary and 2 additional differential diagnoses, using interpretations from senior-level radiologists as ground truth. The trustworthiness of the GIA-CB was evaluated by investigating the source documents as provided by the knowledge-retrieval mechanism. Mann-Whitney U test was employed. RESULTS The GIA-CB demonstrated a high capability to identify the most appropriate differential diagnosis in 39/50 cases (78%), significantly surpassing the g-CB in 27/50 cases (54%) (p = 0.006). Notably, the GIA-CB offered the primary differential in the top 3 differential diagnoses in 45/50 cases (90%) versus g-CB with 37/50 cases (74%) (p = 0.022) and always with appropriate explanations. The median response time was 29.8 s for GIA-CB and 15.7 s for g-CB, and the mean cost per case was $0.15 and $0.02, respectively. CONCLUSIONS The GIA-CB not only provided an accurate diagnosis for gastrointestinal pathologies, but also direct access to source documents, providing insight into the decision-making process, a step towards trustworthy and explainable AI. Integrating context-specific data into AI models can support evidence-based clinical decision-making. RELEVANCE STATEMENT A context-aware GPT-4 chatbot demonstrates high accuracy in providing differential diagnoses based on imaging descriptions, surpassing the generic GPT-4. It provided formulated rationale and source excerpts supporting the diagnoses, thus enhancing trustworthy decision-support. KEY POINTS • Knowledge retrieval enhances differential diagnoses in a gastrointestinal imaging-aware chatbot (GIA-CB). • GIA-CB outperformed the generic counterpart, providing formulated rationale and source excerpts. • GIA-CB has the potential to pave the way for AI-assisted decision support systems.
Collapse
Affiliation(s)
- Stephan Rau
- Department of Diagnostic and Interventional Radiology, Faculty of Medicine, Medical Center - University of Freiburg, University of Freiburg, 79106, Freiburg Im Breisgau, Germany.
| | - Alexander Rau
- Department of Diagnostic and Interventional Radiology, Faculty of Medicine, Medical Center - University of Freiburg, University of Freiburg, 79106, Freiburg Im Breisgau, Germany
- Department of Neuroradiology, Faculty of Medicine, Medical Center - University of Freiburg, University of Freiburg, Hugstetter Str. 55, 79106, Freiburg Im Breisgau, Germany
| | - Johanna Nattenmüller
- Department of Diagnostic and Interventional Radiology, Faculty of Medicine, Medical Center - University of Freiburg, University of Freiburg, 79106, Freiburg Im Breisgau, Germany
| | - Anna Fink
- Department of Diagnostic and Interventional Radiology, Faculty of Medicine, Medical Center - University of Freiburg, University of Freiburg, 79106, Freiburg Im Breisgau, Germany
| | - Fabian Bamberg
- Department of Diagnostic and Interventional Radiology, Faculty of Medicine, Medical Center - University of Freiburg, University of Freiburg, 79106, Freiburg Im Breisgau, Germany
| | - Marco Reisert
- Department of Diagnostic and Interventional Radiology, Faculty of Medicine, Medical Center - University of Freiburg, University of Freiburg, 79106, Freiburg Im Breisgau, Germany
| | - Maximilian F Russe
- Department of Diagnostic and Interventional Radiology, Faculty of Medicine, Medical Center - University of Freiburg, University of Freiburg, 79106, Freiburg Im Breisgau, Germany
| |
Collapse
|
27
|
Bhatia A, Khalvati F, Ertl-Wagner BB. Artificial Intelligence in the Future Landscape of Pediatric Neuroradiology: Opportunities and Challenges. AJNR Am J Neuroradiol 2024; 45:549-553. [PMID: 38176730 DOI: 10.3174/ajnr.a8086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 10/17/2023] [Indexed: 01/06/2024]
Abstract
This paper will review how artificial intelligence (AI) will play an increasingly important role in pediatric neuroradiology in the future. A safe, transparent, and human-centric AI is needed to tackle the quadruple aim of improved health outcomes, enhanced patient and family experience, reduced costs, and improved well-being of the healthcare team in pediatric neuroradiology. Equity, diversity and inclusion, data safety, and access to care will need to always be considered. In the next decade, AI algorithms are expected to play an increasingly important role in access to care, workflow management, abnormality detection, classification, response prediction, prognostication, report generation, as well as in the patient and family experience in pediatric neuroradiology. Also, AI algorithms will likely play a role in recognizing and flagging rare diseases and in pattern recognition to identify previously unknown disorders. While AI algorithms will play an important role, humans will not only need to be in the loop, but in the center of pediatric neuroimaging. AI development and deployment will need to be closely watched and monitored by experts in the field. Patient and data safety need to be at the forefront, and the risks of a dependency on technology will need to be contained. The applications and implications of AI in pediatric neuroradiology will differ from adult neuroradiology.
Collapse
Affiliation(s)
- Aashim Bhatia
- From the Children's Hospital of Philadelphia (A.B.), Philadelphia, Pennsylvania
| | - Farzad Khalvati
- Hospital for Sick Children (F.K., B.B.E.-W.), Toronto, Ontario, Canada
| | | |
Collapse
|
28
|
D'Anna G, Van Cauter S, Thurnher M, Van Goethem J, Haller S. Can large language models pass official high-grade exams of the European Society of Neuroradiology courses? A direct comparison between OpenAI chatGPT 3.5, OpenAI GPT4 and Google Bard. Neuroradiology 2024:10.1007/s00234-024-03371-6. [PMID: 38705899 DOI: 10.1007/s00234-024-03371-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 04/30/2024] [Indexed: 05/07/2024]
Abstract
We compared different LLMs, notably chatGPT, GPT4, and Google Bard and we tested whether their performance differs in subspeciality domains, in executing examinations from four different courses of the European Society of Neuroradiology (ESNR) notably anatomy/embryology, neuro-oncology, head and neck and pediatrics. Written exams of ESNR were used as input data, related to anatomy/embryology (30 questions), neuro-oncology (50 questions), head and neck (50 questions), and pediatrics (50 questions). All exams together, and each exam separately were introduced to the three LLMs: chatGPT 3.5, GPT4, and Google Bard. Statistical analyses included a group-wise Friedman test followed by a pair-wise Wilcoxon test with multiple comparison corrections. Overall, there was a significant difference between the 3 LLMs (p < 0.0001), with GPT4 having the highest accuracy (70%), followed by chatGPT 3.5 (54%) and Google Bard (36%). The pair-wise comparison showed significant differences between chatGPT vs GPT 4 (p < 0.0001), chatGPT vs Bard (p < 0. 0023), and GPT4 vs Bard (p < 0.0001). Analyses per subspecialty showed the highest difference between the best LLM (GPT4, 70%) versus the worst LLM (Google Bard, 24%) in the head and neck exam, while the difference was least pronounced in neuro-oncology (GPT4, 62% vs Google Bard, 48%). We observed significant differences in the performance of the three different LLMs in the running of official exams organized by ESNR. Overall GPT 4 performed best, and Google Bard performed worst. This difference varied depending on subspeciality and was most pronounced in head and neck subspeciality.
Collapse
Affiliation(s)
- Gennaro D'Anna
- Neuroimaging Unit, ASST Ovest Milanese, Legnano, Milan, Italy.
| | - Sofie Van Cauter
- Department of Medical Imaging, Ziekenhuis Oost-Limburg, Genk, Belgium
- Department of Medicine and Life Sciences, Hasselt University, Hasselt, Belgium
| | - Majda Thurnher
- Department for Biomedical Imaging and Image-Guided Therapy, Medical University of Vienna, Vienna, Austria
| | - Johan Van Goethem
- Department of Medical and Molecular Imaging, VITAZ, Sint-Niklaas, Belgium
- Department of Radiology, University Hospital Antwerp, Antwerp, Belgium
| | - Sven Haller
- CIMC-Centre d'Imagerie Médicale de Cornavin, Geneva, Switzerland
- Department of Surgical Sciences, Radiology, Uppsala University, Uppsala, Sweden
- Faculty of Medicine, University of Geneva, Geneva, Switzerland
- Department of Radiology, Beijing Tiantan Hospital, Capital Medical University, Beijing, People's Republic of China
| |
Collapse
|
29
|
Kedia N, Sanjeev S, Ong J, Chhablani J. ChatGPT and Beyond: An overview of the growing field of large language models and their use in ophthalmology. Eye (Lond) 2024; 38:1252-1261. [PMID: 38172581 PMCID: PMC11076576 DOI: 10.1038/s41433-023-02915-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 11/23/2023] [Accepted: 12/20/2023] [Indexed: 01/05/2024] Open
Abstract
ChatGPT, an artificial intelligence (AI) chatbot built on large language models (LLMs), has rapidly gained popularity. The benefits and limitations of this transformative technology have been discussed across various fields, including medicine. The widespread availability of ChatGPT has enabled clinicians to study how these tools could be used for a variety of tasks such as generating differential diagnosis lists, organizing patient notes, and synthesizing literature for scientific research. LLMs have shown promising capabilities in ophthalmology by performing well on the Ophthalmic Knowledge Assessment Program, providing fairly accurate responses to questions about retinal diseases, and in generating differential diagnoses list. There are current limitations to this technology, including the propensity of LLMs to "hallucinate", or confidently generate false information; their potential role in perpetuating biases in medicine; and the challenges in incorporating LLMs into research without allowing "AI-plagiarism" or publication of false information. In this paper, we provide a balanced overview of what LLMs are and introduce some of the LLMs that have been generated in the past few years. We discuss recent literature evaluating the role of these language models in medicine with a focus on ChatGPT. The field of AI is fast-paced, and new applications based on LLMs are being generated rapidly; therefore, it is important for ophthalmologists to be aware of how this technology works and how it may impact patient care. Here, we discuss the benefits, limitations, and future advancements of LLMs in patient care and research.
Collapse
Affiliation(s)
- Nikita Kedia
- Department of Ophthalmology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | | | - Joshua Ong
- Department of Ophthalmology and Visual Sciences, University of Michigan Kellogg Eye Center, Ann Arbor, MI, USA
| | - Jay Chhablani
- Department of Ophthalmology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
| |
Collapse
|
30
|
Tepe M, Emekli E. Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy. Cureus 2024; 16:e59960. [PMID: 38726360 PMCID: PMC11080394 DOI: 10.7759/cureus.59960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/09/2024] [Indexed: 05/12/2024] Open
Abstract
Background Large language models (LLMs), such as ChatGPT-4, Gemini, and Microsoft Copilot, have been instrumental in various domains, including healthcare, where they enhance health literacy and aid in patient decision-making. Given the complexities involved in breast imaging procedures, accurate and comprehensible information is vital for patient engagement and compliance. This study aims to evaluate the readability and accuracy of the information provided by three prominent LLMs, ChatGPT-4, Gemini, and Microsoft Copilot, in response to frequently asked questions in breast imaging, assessing their potential to improve patient understanding and facilitate healthcare communication. Methodology We collected the most common questions on breast imaging from clinical practice and posed them to LLMs. We then evaluated the responses in terms of readability and accuracy. Responses from LLMs were analyzed for readability using the Flesch Reading Ease and Flesch-Kincaid Grade Level tests and for accuracy through a radiologist-developed Likert-type scale. Results The study found significant variations among LLMs. Gemini and Microsoft Copilot scored higher on readability scales (p < 0.001), indicating their responses were easier to understand. In contrast, ChatGPT-4 demonstrated greater accuracy in its responses (p < 0.001). Conclusions While LLMs such as ChatGPT-4 show promise in providing accurate responses, readability issues may limit their utility in patient education. Conversely, Gemini and Microsoft Copilot, despite being less accurate, are more accessible to a broader patient audience. Ongoing adjustments and evaluations of these models are essential to ensure they meet the diverse needs of patients, emphasizing the need for continuous improvement and oversight in the deployment of artificial intelligence technologies in healthcare.
Collapse
Affiliation(s)
- Murat Tepe
- Radiology, Mediclinic City Hospital, Dubai, ARE
| | - Emre Emekli
- Radiology, Eskişehir Osmangazi University Health Practice and Research Hospital, Eskişehir, TUR
| |
Collapse
|
31
|
Wessel D, Pogrebnyakov N. Using Social Media as a Source of Real-World Data for Pharmaceutical Drug Development and Regulatory Decision Making. Drug Saf 2024; 47:495-511. [PMID: 38446405 PMCID: PMC11018692 DOI: 10.1007/s40264-024-01409-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/07/2024] [Indexed: 03/07/2024]
Abstract
INTRODUCTION While pharmaceutical companies aim to leverage real-world data (RWD) to bridge the gap between clinical drug development and real-world patient outcomes, extant research has mainly focused on the use of social media in a post-approval safety-surveillance setting. Recent regulatory and technological developments indicate that social media may serve as a rich source to expand the evidence base to pre-approval and drug development activities. However, use cases related to drug development have been largely omitted, thereby missing some of the benefits of RWD. In addition, an applied end-to-end understanding of RWD rooted in both industry and regulations is lacking. OBJECTIVE We aimed to investigate how social media can be used as a source of RWD to support regulatory decision making and drug development in the pharmaceutical industry. We aimed to specifically explore the data pipeline and examine how social-media derived RWD can align with regulatory guidance from the US Food and Drug Administration and industry needs. METHODS A machine learning pipeline was developed to extract patient insights related to anticoagulants from X (Twitter) data. These findings were then analysed from an industry perspective, and complemented by interviews with professionals from a pharmaceutical company. RESULTS The analysis reveals several use cases where RWD derived from social media can be beneficial, particularly in generating hypotheses around patient and therapeutic area needs. We also note certain limitations of social media data, particularly around inferring causality. CONCLUSIONS Social media display considerable potential as a source of RWD for guiding efforts in pharmaceutical drug development and pre-approval settings. Although further regulatory guidance on the use of social media for RWD is needed to encourage its use, regulatory and technological developments are suggested to warrant at least exploratory uses for drug development.
Collapse
Affiliation(s)
- Didrik Wessel
- Copenhagen Business School, Frederiksberg, Denmark.
- , Nørrebrogade 18A 3TH, 2200, Copenhagen N, Denmark.
| | | |
Collapse
|
32
|
Dağci M, Çam F, Dost A. Reliability and Quality of the Nursing Care Planning Texts Generated by ChatGPT. Nurse Educ 2024; 49:E109-E114. [PMID: 37994523 DOI: 10.1097/nne.0000000000001566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2023]
Abstract
BACKGROUND The research on ChatGPT-generated nursing care planning texts is critical for enhancing nursing education through innovative and accessible learning methods, improving reliability and quality. PURPOSE The aim of the study was to examine the quality, authenticity, and reliability of the nursing care planning texts produced using ChatGPT. METHODS The study sample comprised 40 texts generated by ChatGPT selected nursing diagnoses that were included in NANDA 2021-2023. The texts were evaluated by using descriptive criteria form and DISCERN tool to evaluate health information. RESULTS DISCERN total average score of the texts was 45.93 ± 4.72. All texts had a moderate level of reliability and 97.5% of them provided moderate quality subscale score of information. A statistically significant relationship was found among the number of accessible references, reliability ( r = 0.408) and quality subscale score ( r = 0.379) of the texts ( P < .05). CONCLUSION ChatGPT-generated texts exhibited moderate reliability, quality of nursing care information, and overall quality despite low similarity rates.
Collapse
Affiliation(s)
- Mahmut Dağci
- Author Affiliation: Department of Nursing, Bezmialem Vakif University, Faculty of Health Sciences, Istanbul, Turkey
| | | | | |
Collapse
|
33
|
Yilmaz Muluk S. Enhancing Musculoskeletal Injection Safety: Evaluating Checklists Generated by Artificial Intelligence and Revising the Preformed Checklist. Cureus 2024; 16:e59708. [PMID: 38841023 PMCID: PMC11150897 DOI: 10.7759/cureus.59708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/02/2024] [Indexed: 06/07/2024] Open
Abstract
Background Musculoskeletal disorders are a significant global health issue, necessitating advanced management strategies such as intra-articular and extra-articular injections to alleviate pain, inflammation, and mobility challenges. As the adoption of these interventions by physicians grows, the importance of robust safety protocols becomes paramount. This study evaluates the effectiveness of conversational artificial intelligence (AI), particularly versions 3.5 and 4 of Chat Generative Pre-trained Transformer (ChatGPT), in creating patient safety checklists for managing musculoskeletal injections to enhance the preparation of safety documentation. Methodology A quantitative analysis was conducted to evaluate AI-generated safety checklists against a preformed checklist adapted from reputable medical sources. Adherence of the generated checklists to the preformed checklist was calculated and classified. The Wilcoxon signed-rank test was used to assess the performance differences between ChatGPT versions 3.5 and 4. Results ChatGPT-4 showed superior adherence to the preformed checklist compared to ChatGPT-3.5, with both versions classified as very good in safety protocol creation. Although no significant differences were present in the sign-in and sign-out parts of the checklists of both versions, ChatGPT-4 had significantly higher scores in the procedure planning part (p = 0.007), and its overall performance was also higher (p < 0.001). Subsequently, the preformed checklist was revised to incorporate new contributions from ChatGPT. Conclusions ChatGPT, especially version 4, proved effective in generating patient safety checklists for musculoskeletal injections, highlighting the potential of AI to streamline clinical practices. Further enhancements are necessary to fully meet the medical standards.
Collapse
|
34
|
Tippareddy C, Faraji N, Awan OA. The Application of ChatGPT to Enhance Medical Education. Acad Radiol 2024; 31:2185-2187. [PMID: 38724132 DOI: 10.1016/j.acra.2023.04.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 04/14/2023] [Indexed: 06/15/2024]
Affiliation(s)
- Charit Tippareddy
- University Hospitals Cleveland Medical Center, Cleveland, Ohio (C.T., N.F.)
| | - Navid Faraji
- University Hospitals Cleveland Medical Center, Cleveland, Ohio (C.T., N.F.)
| | - Omer A Awan
- University of Maryland School of Medicine, 655 W Baltimore St, Baltimore, MD 21201 (O.A.A.).
| |
Collapse
|
35
|
Pinto DS, Noronha SM, Saigal G, Quencer RM. Comparison of an AI-Generated Case Report With a Human-Written Case Report: Practical Considerations for AI-Assisted Medical Writing. Cureus 2024; 16:e60461. [PMID: 38883028 PMCID: PMC11179998 DOI: 10.7759/cureus.60461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/15/2024] [Indexed: 06/18/2024] Open
Abstract
INTRODUCTION The utility of ChatGPT has recently caused consternation in the medical world. While it has been utilized to write manuscripts, only a few studies have evaluated the quality of manuscripts generated by AI (artificial intelligence). OBJECTIVE We evaluate the ability of ChatGPT to write a case report when provided with a framework. We also provide practical considerations for manuscript writing using AI. METHODS We compared a manuscript written by a blinded human author (10 years of medical experience) with a manuscript written by ChatGPT on a rare presentation of a common disease. We used multiple iterations of the manuscript generation request to derive the best ChatGPT output. Participants, outcomes, and measures: 22 human reviewers compared the manuscripts using parameters that characterize human writing and relevant standard manuscript assessment criteria, viz., scholarly impact quotient (SIQ). We also compared the manuscripts using the "average perplexity score" (APS), "burstiness score" (BS), and "highest perplexity of a sentence" (GPTZero parameters to detect AI-generated content). RESULTS The human manuscript had a significantly higher quality of presentation and nuanced writing (p<0.05). Both manuscripts had a logical flow. 12/22 reviewers were able to identify the AI-generated manuscript (p<0.05), but 4/22 reviewers wrongly identified the human-written manuscript as AI-generated. GPTZero software erroneously identified four sentences of the human-written manuscript to be AI-generated. CONCLUSION Though AI showed an ability to highlight the novelty of the case report and project a logical flow comparable to the human manuscript, it could not outperform the human writer on all parameters. The human manuscript showed a better quality of presentation and more nuanced writing. The practical considerations we provide for AI-assisted medical writing will help to better utilize AI in manuscript writing.
Collapse
Affiliation(s)
| | | | - Gaurav Saigal
- Radiology, University of Miami Miller School of Medicine, Miami, USA
| | - Robert M Quencer
- Radiology, University of Miami Miller School of Medicine, Miami, USA
| |
Collapse
|
36
|
Brady AP, Allen B, Chong J, Kotter E, Kottler N, Mongan J, Oakden-Rayner L, Dos Santos DP, Tang A, Wald C, Slavotinek J. Developing, Purchasing, Implementing and Monitoring AI Tools in Radiology: Practical Considerations. A Multi-Society Statement From the ACR, CAR, ESR, RANZCR & RSNA. Can Assoc Radiol J 2024; 75:226-244. [PMID: 38251882 DOI: 10.1177/08465371231222229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2024] Open
Abstract
Artificial Intelligence (AI) carries the potential for unprecedented disruption in radiology, with possible positive and negative consequences. The integration of AI in radiology holds the potential to revolutionize healthcare practices by advancing diagnosis, quantification, and management of multiple medical conditions. Nevertheless, the ever‑growing availability of AI tools in radiology highlights an increasing need to critically evaluate claims for its utility and to differentiate safe product offerings from potentially harmful, or fundamentally unhelpful ones. This multi‑society paper, presenting the views of Radiology Societies in the USA, Canada, Europe, Australia, and New Zealand, defines the potential practical problems and ethical issues surrounding the incorporation of AI into radiological practice. In addition to delineating the main points of concern that developers, regulators, and purchasers of AI tools should consider prior to their introduction into clinical practice, this statement also suggests methods to monitor their stability and safety in clinical use, and their suitability for possible autonomous function. This statement is intended to serve as a useful summary of the practical issues which should be considered by all parties involved in the development of radiology AI resources, and their implementation as clinical tools.
Collapse
Affiliation(s)
| | - Bibb Allen
- Department of Radiology, Grandview Medical Center, Birmingham, AL, USA
- Data Science Institute, American College of Radiology, Reston, VA, USA
| | - Jaron Chong
- Department of Medical Imaging, Schulich School of Medicine and Dentistry, Western University, London, ON, Canada
| | - Elmar Kotter
- Department of Diagnostic and Interventional Radiology, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Nina Kottler
- Radiology Partners, El Segundo, CA, USA
- Stanford Center for Artificial Intelligence in Medicine & Imaging, Palo Alto, CA, USA
| | - John Mongan
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, CA, USA
| | - Lauren Oakden-Rayner
- Australian Institute for Machine Learning, University of Adelaide, Adelaide, SA, Australia
| | - Daniel Pinto Dos Santos
- Department of Radiology, University Hospital of Cologne, Cologne, Germany
- Department of Radiology, University Hospital of Frankfurt, Frankfurt, Germany
| | - An Tang
- Department of Radiology, Radiation Oncology, and Nuclear Medicine, Université de Montréal, Montréal, QC, Canada
| | - Christoph Wald
- Department of Radiology, Lahey Hospital & Medical Center, Burlington, MA, USA
- Tufts University Medical School, Boston, MA, USA
- American College of Radiology, Reston, VA, USA
| | - John Slavotinek
- South Australia Medical Imaging, Flinders Medical Centre Adelaide, SA, Australia
- College of Medicine and Public Health, Flinders University, Adelaide, SA, Australia
| |
Collapse
|
37
|
Simms RC. Work With ChatGPT, Not Against: 3 Teaching Strategies That Harness the Power of Artificial Intelligence. Nurse Educ 2024; 49:158-161. [PMID: 38502607 DOI: 10.1097/nne.0000000000001634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/21/2024]
Abstract
BACKGROUND Technological advances have expanded nursing education to include generative artificial intelligence (AI) tools such as ChatGPT. PROBLEM Generative AI tools challenge academic integrity, pose a challenge to validating information accuracy, and require strategies to ensure the credibility of AI-generated information. APPROACH This article presents a dual-purpose approach integrating AI tools into prelicensure nursing education to enhance learning while promoting critical evaluation skills. Constructivist theories and Vygotsky's Zone of Proximal Development framework support this integration, with AI as a scaffold for developing critical thinking. OUTCOMES The approach involves practical activities for students to engage with AI-generated content critically, thereby reinforcing clinical judgment and preparing them for AI-prevalent health care environments. CONCLUSIONS Incorporating AI tools such as ChatGPT into nursing curricula represents a strategic educational advancement, equipping students with essential skills to navigate modern health care.
Collapse
Affiliation(s)
- Rachel Cox Simms
- Author Affiliation: Assistant Professor, School of Nursing, MGH Institute of Health Professions, Boston, Massachusetts
| |
Collapse
|
38
|
Schlussel L, Samaan JS, Chan Y, Chang B, Yeo YH, Ng WH, Rezaie A. Evaluating the accuracy and reproducibility of ChatGPT-4 in answering patient questions related to small intestinal bacterial overgrowth. Artif Intell Gastroenterol 2024; 5:90503. [DOI: 10.35712/aig.v5.i1.90503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 03/27/2024] [Accepted: 04/16/2024] [Indexed: 04/29/2024] Open
Abstract
BACKGROUND Small intestinal bacterial overgrowth (SIBO) poses diagnostic and treatment challenges due to its complex management and evolving guidelines. Patients often seek online information related to their health, prompting interest in large language models, like GPT-4, as potential sources of patient education.
AIM To investigate ChatGPT-4's accuracy and reproducibility in responding to patient questions related to SIBO.
METHODS A total of 27 patient questions related to SIBO were curated from professional societies, Facebook groups, and Reddit threads. Each question was entered into GPT-4 twice on separate days to examine reproducibility of accuracy on separate occasions. GPT-4 generated responses were independently evaluated for accuracy and reproducibility by two motility fellowship-trained gastroenterologists. A third senior fellowship-trained gastroenterologist resolved disagreements. Accuracy of responses were graded using the scale: (1) Comprehensive; (2) Correct but inadequate; (3) Some correct and some incorrect; or (4) Completely incorrect. Two responses were generated for every question to evaluate reproducibility in accuracy.
RESULTS In evaluating GPT-4's effectiveness at answering SIBO-related questions, it provided responses with correct information to 18/27 (66.7%) of questions, with 16/27 (59.3%) of responses graded as comprehensive and 2/27 (7.4%) responses graded as correct but inadequate. The model provided responses with incorrect information to 9/27 (33.3%) of questions, with 4/27 (14.8%) of responses graded as completely incorrect and 5/27 (18.5%) of responses graded as mixed correct and incorrect data. Accuracy varied by question category, with questions related to “basic knowledge” achieving the highest proportion of comprehensive responses (90%) and no incorrect responses. On the other hand, the “treatment” related questions yielded the lowest proportion of comprehensive responses (33.3%) and highest percent of completely incorrect responses (33.3%). A total of 77.8% of questions yielded reproducible responses.
CONCLUSION Though GPT-4 shows promise as a supplementary tool for SIBO-related patient education, the model requires further refinement and validation in subsequent iterations prior to its integration into patient care.
Collapse
Affiliation(s)
- Lauren Schlussel
- Division of Gastroenterology and Hepatology, Cedars-Sinai Medical Center, Los Angeles, CA 90048, United States
| | - Jamil S Samaan
- Division of Gastroenterology and Hepatology, Cedars-Sinai Medical Center, Los Angeles, CA 90048, United States
| | - Yin Chan
- Division of Gastroenterology and Hepatology, Cedars-Sinai Medical Center, Los Angeles, CA 90048, United States
| | - Bianca Chang
- Division of Gastroenterology and Hepatology, Cedars-Sinai Medical Center, Los Angeles, CA 90048, United States
| | - Yee Hui Yeo
- Division of Gastroenterology and Hepatology, Cedars-Sinai Medical Center, Los Angeles, CA 90048, United States
| | - Wee Han Ng
- Bristol Medical School, University of Bristol, BS8 1TH, Bristol, United Kingdom
| | - Ali Rezaie
- Division of Gastroenterology and Hepatology, Cedars-Sinai Medical Center, Los Angeles, CA 90048, United States
- Medically Associated Science and Technology Program, Cedars-Sinai Medical Center, Los Angeles, CA 90048, United States
| |
Collapse
|
39
|
Shiraishi M, Tomioka Y, Miyakuni A, Ishii S, Hori A, Park H, Ohba J, Okazaki M. Performance of ChatGPT in Answering Clinical Questions on the Practical Guideline of Blepharoptosis. Aesthetic Plast Surg 2024:10.1007/s00266-024-04005-1. [PMID: 38684536 DOI: 10.1007/s00266-024-04005-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 03/11/2024] [Indexed: 05/02/2024]
Abstract
BACKGROUND ChatGPT is a free artificial intelligence (AI) language model developed and released by OpenAI in late 2022. This study aimed to evaluate the performance of ChatGPT to accurately answer clinical questions (CQs) on the Guideline for the Management of Blepharoptosis published by the American Society of Plastic Surgeons (ASPS) in 2022. METHODS CQs in the guideline were used as question sources in both English and Japanese. For each question, ChatGPT provided answers for CQs, evidence quality, recommendation strength, reference match, and answered word counts. We compared the performance of ChatGPT in each component between English and Japanese queries. RESULTS A total of 11 questions were included in the final analysis, and ChatGPT answered 61.3% of these correctly. ChatGPT demonstrated a higher accuracy rate in English answers for CQs compared to Japanese answers for CQs (76.4% versus 46.4%; p = 0.004) and word counts (123 words versus 35.9 words; p = 0.004). No statistical differences were noted for evidence quality, recommendation strength, and reference match. A total of 697 references were proposed, but only 216 of them (31.0%) existed. CONCLUSIONS ChatGPT demonstrates potential as an adjunctive tool in the management of blepharoptosis. However, it is crucial to recognize that the existing AI model has distinct limitations, and its primary role should be to complement the expertise of medical professionals. LEVEL OF EVIDENCE V Observational study under respected authorities. This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
Collapse
Affiliation(s)
- Makoto Shiraishi
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan.
| | - Yoko Tomioka
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Ami Miyakuni
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Saaya Ishii
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Asei Hori
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Hwayoung Park
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Jun Ohba
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Mutsumi Okazaki
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| |
Collapse
|
40
|
Raman R, Lathabai HH, Mandal S, Das P, Kaur T, Nedungadi P. ChatGPT: Literate or intelligent about UN sustainable development goals? PLoS One 2024; 19:e0297521. [PMID: 38656952 PMCID: PMC11042716 DOI: 10.1371/journal.pone.0297521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 01/05/2024] [Indexed: 04/26/2024] Open
Abstract
Generative AI tools, such as ChatGPT, are progressively transforming numerous sectors, demonstrating a capacity to impact human life dramatically. This research seeks to evaluate the UN Sustainable Development Goals (SDGs) literacy of ChatGPT, which is crucial for diverse stakeholders involved in SDG-related policies. Experimental outcomes from two widely used Sustainability Assessment tests-the UN SDG Fitness Test and Sustainability Literacy Test (SULITEST) - suggest that ChatGPT exhibits high SDG literacy, yet its comprehensive SDG intelligence needs further exploration. The Fitness Test gauges eight vital competencies across introductory, intermediate, and advanced levels. Accurate mapping of these to the test questions is essential for partial evaluation of SDG intelligence. To assess SDG intelligence, the questions from both tests were mapped to 17 SDGs and eight cross-cutting SDG core competencies, but both test questionnaires were found to be insufficient. SULITEST could satisfactorily map only 5 out of 8 competencies, whereas the Fitness Test managed to map 6 out of 8. Regarding the coverage of the Fitness Test and SULITEST, their mapping to the 17 SDGs, both tests fell short. Most SDGs were underrepresented in both instruments, with certain SDGs not represented at all. Consequently, both tools proved ineffective in assessing SDG intelligence through SDG coverage. The study recommends future versions of ChatGPT to enhance competencies such as collaboration, critical thinking, systems thinking, and others to achieve the SDGs. It concludes that while AI models like ChatGPT hold considerable potential in sustainable development, their usage must be approached carefully, considering current limitations and ethical implications.
Collapse
Affiliation(s)
- Raghu Raman
- Amrita School of Business, Amrita Vishwa Vidyapeetham, Amritapuri, Kerala, India
| | | | - Santanu Mandal
- Amrita School of Business, Amaravati, Andhra Pradesh, India
| | - Payel Das
- Amrita School of Business, Amaravati, Andhra Pradesh, India
| | - Tavleen Kaur
- Fortune Institute of International Business, New Delhi, India
| | | |
Collapse
|
41
|
Wu C, Chen L, Han M, Li Z, Yang N, Yu C. Application of ChatGPT-based blended medical teaching in clinical education of hepatobiliary surgery. MEDICAL TEACHER 2024:1-5. [PMID: 38614458 DOI: 10.1080/0142159x.2024.2339412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 04/02/2024] [Indexed: 04/15/2024]
Abstract
OBJECTIVE This study evaluates the effectiveness of incorporating the Chat Generative Pre-trained Transformer (ChatGPT) into the clinical teaching of hepatobiliary surgery for undergraduate medical students. MATERIALS AND METHODS A group of 61 medical undergraduates from the Affiliated Hospital of Guizhou Medical University, undergoing hepatobiliary surgery training, were randomly assigned to either an experimental group (31 students) using ChatGPT-based blended teaching or a control group (30 students) with traditional teaching methods. The evaluation metrics included final exam scores, teaching satisfaction, and teaching effectiveness ratings, analyzed using SPSS 26.0 (SPSS Inc., Chicago, IL) with t-tests and χ2 tests. RESULTS The experimental group significantly outperformed the control group in final exam theoretical scores (86.44 ± 5.59 vs. 77.86 ± 4.16, p < .001) and clinical skills scores (83.84 ± 6.13 vs. 79.12 ± 4.27, p = .001). Additionally, the experimental group reported higher teaching satisfaction (17.23 ± 1.33) and self-evaluation of teaching effectiveness (9.14 ± 0.54) compared to the control group (15.38 ± 1.5 and 8.46 ± 0.70, respectively, p < .001). CONCLUSIONS The integration of ChatGPT into hepatobiliary surgery education significantly enhances theoretical knowledge, clinical skills, and overall satisfaction among medical undergraduates, suggesting a beneficial impact on their educational development.
Collapse
Affiliation(s)
- Changhao Wu
- Department of Hepatobiliary Surgery, The Affiliated Hospital of Guizhou Medical University, Guizhou Medical University, Guiyang, China
- Department of Surgery, Guizhou Medical University, Guiyang, China
- College of Clinical Medicine, Guizhou Medical University, Guiyang, China
- Guizhou Provincial Institute of Hepatobiliary, Pancreatic and Splenic Diseases, Guiyang, China
| | - Liwen Chen
- Department of Hepatobiliary Surgery, The Affiliated Hospital of Guizhou Medical University, Guizhou Medical University, Guiyang, China
- Department of Surgery, Guizhou Medical University, Guiyang, China
- College of Clinical Medicine, Guizhou Medical University, Guiyang, China
- Guizhou Provincial Institute of Hepatobiliary, Pancreatic and Splenic Diseases, Guiyang, China
| | - Min Han
- Department of Hepatobiliary Surgery, The Affiliated Hospital of Guizhou Medical University, Guizhou Medical University, Guiyang, China
- Department of Surgery, Guizhou Medical University, Guiyang, China
- College of Clinical Medicine, Guizhou Medical University, Guiyang, China
- Guizhou Provincial Institute of Hepatobiliary, Pancreatic and Splenic Diseases, Guiyang, China
| | - Zhu Li
- Department of Hepatobiliary Surgery, The Affiliated Hospital of Guizhou Medical University, Guizhou Medical University, Guiyang, China
- Department of Surgery, Guizhou Medical University, Guiyang, China
- College of Clinical Medicine, Guizhou Medical University, Guiyang, China
- Guizhou Provincial Institute of Hepatobiliary, Pancreatic and Splenic Diseases, Guiyang, China
| | - Nenghong Yang
- Department of Hepatobiliary Surgery, The Affiliated Hospital of Guizhou Medical University, Guizhou Medical University, Guiyang, China
- Department of Surgery, Guizhou Medical University, Guiyang, China
- College of Clinical Medicine, Guizhou Medical University, Guiyang, China
- Guizhou Provincial Institute of Hepatobiliary, Pancreatic and Splenic Diseases, Guiyang, China
| | - Chao Yu
- Department of Hepatobiliary Surgery, The Affiliated Hospital of Guizhou Medical University, Guizhou Medical University, Guiyang, China
- Department of Surgery, Guizhou Medical University, Guiyang, China
- College of Clinical Medicine, Guizhou Medical University, Guiyang, China
- Guizhou Provincial Institute of Hepatobiliary, Pancreatic and Splenic Diseases, Guiyang, China
| |
Collapse
|
42
|
Wu Y, Zheng Y, Feng B, Yang Y, Kang K, Zhao A. Embracing ChatGPT for Medical Education: Exploring Its Impact on Doctors and Medical Students. JMIR MEDICAL EDUCATION 2024; 10:e52483. [PMID: 38598263 PMCID: PMC11043925 DOI: 10.2196/52483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 11/03/2023] [Accepted: 01/17/2024] [Indexed: 04/11/2024]
Abstract
ChatGPT (OpenAI), a cutting-edge natural language processing model, holds immense promise for revolutionizing medical education. With its remarkable performance in language-related tasks, ChatGPT offers personalized and efficient learning experiences for medical students and doctors. Through training, it enhances clinical reasoning and decision-making skills, leading to improved case analysis and diagnosis. The model facilitates simulated dialogues, intelligent tutoring, and automated question-answering, enabling the practical application of medical knowledge. However, integrating ChatGPT into medical education raises ethical and legal concerns. Safeguarding patient data and adhering to data protection regulations are critical. Transparent communication with students, physicians, and patients is essential to ensure their understanding of the technology's purpose and implications, as well as the potential risks and benefits. Maintaining a balance between personalized learning and face-to-face interactions is crucial to avoid hindering critical thinking and communication skills. Despite challenges, ChatGPT offers transformative opportunities. Integrating it with problem-based learning, team-based learning, and case-based learning methodologies can further enhance medical education. With proper regulation and supervision, ChatGPT can contribute to a well-rounded learning environment, nurturing skilled and knowledgeable medical professionals ready to tackle health care challenges. By emphasizing ethical considerations and human-centric approaches, ChatGPT's potential can be fully harnessed in medical education, benefiting both students and patients alike.
Collapse
Affiliation(s)
- Yijun Wu
- Cancer Center, West China Hospital, Sichuan University, Chengdu, China
- Laboratory of Clinical Cell Therapy, West China Hospital, Sichuan University, Chengdu, China
| | - Yue Zheng
- Cancer Center, West China Hospital, Sichuan University, Chengdu, China
- Laboratory of Clinical Cell Therapy, West China Hospital, Sichuan University, Chengdu, China
| | - Baijie Feng
- West China School of Medicine, Sichuan University, Chengdu, China
| | - Yuqi Yang
- West China School of Medicine, Sichuan University, Chengdu, China
| | - Kai Kang
- Cancer Center, West China Hospital, Sichuan University, Chengdu, China
- Laboratory of Clinical Cell Therapy, West China Hospital, Sichuan University, Chengdu, China
| | - Ailin Zhao
- Department of Hematology, West China Hospital, Sichuan University, Chengdu, China
| |
Collapse
|
43
|
Bhayana R, Biswas S, Cook TS, Kim W, Kitamura FC, Gichoya J, Yi PH. From Bench to Bedside With Large Language Models: AJR Expert Panel Narrative Review. AJR Am J Roentgenol 2024. [PMID: 38598354 DOI: 10.2214/ajr.24.30928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2024]
Abstract
Large language models (LLMs) hold immense potential to revolutionize radiology. However, their integration into practice requires careful consideration. Artificial intelligence (AI) chatbots and general-purpose LLMs have potential pitfalls related to privacy, transparency, and accuracy, limiting their current clinical readiness. Thus, LLM-based tools must be optimized for radiology practice to overcome these limitations. While research and validation for radiology applications remain in their infancy, commercial products incorporating LLMs are becoming available alongside promises of transforming practice. To help radiologists navigate this landscape, this AJR Expert Panel Narrative Review provides a multidimensional perspective on LLMs, encompassing considerations from bench (development and optimization) to bedside (use in practice). At present, LLMs are not autonomous entities that can replace expert decision-making, and radiologists remain responsible for the content of their reports. Patient-facing tools, particularly medical AI chatbots, require additional guardrails to ensure safety and prevent misuse. Still, if responsibly implemented, LLMs are well-positioned to transform efficiency and quality in radiology. Radiologists must be well-informed and proactively involved in guiding the implementation of LLMs in practice to mitigate risks and maximize benefits to patient care.
Collapse
Affiliation(s)
- Rajesh Bhayana
- University Medical Imaging Toronto, Joint Department of Medical Imaging, University Health Network, University of Toronto, Toronto, ON, Canada
| | - Som Biswas
- Department of Radiology, Le Bonheur Children's Hospital, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Tessa S Cook
- Department of Radiology, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
| | - Woojin Kim
- Department of Radiology, Palo Alto VA Medical Center, Palo Alto, CA
| | - Felipe C Kitamura
- Department of Diagnostic Imaging, Universidade Federal de São Paulo, São Paulo, Brazil
- Dasa, São Paulo, Brazil
| | - Judy Gichoya
- Department of Radiology, Emory University School of Medicine, Georgia, U.S.A
| | - Paul H Yi
- Department of Diagnostic Radiology and Nuclear Medicine, University of Maryland School of Medicine, Baltimore, MD
| |
Collapse
|
44
|
Gande S, Gould M, Ganti L. Bibliometric analysis of ChatGPT in medicine. Int J Emerg Med 2024; 17:50. [PMID: 38575866 PMCID: PMC10993428 DOI: 10.1186/s12245-024-00624-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 03/19/2024] [Indexed: 04/06/2024] Open
Abstract
INTRODUCTION The emergence of artificial intelligence (AI) chat programs has opened two distinct paths, one enhancing interaction and another potentially replacing personal understanding. Ethical and legal concerns arise due to the rapid development of these programs. This paper investigates academic discussions on AI in medicine, analyzing the context, frequency, and reasons behind these conversations. METHODS The study collected data from the Web of Science database on articles containing the keyword "ChatGPT" published from January to September 2023, resulting in 786 medically related journal articles. The inclusion criteria were peer-reviewed articles in English related to medicine. RESULTS The United States led in publications (38.1%), followed by India (15.5%) and China (7.0%). Keywords such as "patient" (16.7%), "research" (12%), and "performance" (10.6%) were prevalent. The Cureus Journal of Medical Science (11.8%) had the most publications, followed by the Annals of Biomedical Engineering (8.3%). August 2023 had the highest number of publications (29.3%), with significant growth between February to March and April to May. Medical General Internal (21.0%) was the most common category, followed by Surgery (15.4%) and Radiology (7.9%). DISCUSSION The prominence of India in ChatGPT research, despite lower research funding, indicates the platform's popularity and highlights the importance of monitoring its use for potential medical misinformation. China's interest in ChatGPT research suggests a focus on Natural Language Processing (NLP) AI applications, despite public bans on the platform. Cureus' success in publishing ChatGPT articles can be attributed to its open-access, rapid publication model. The study identifies research trends in plastic surgery, radiology, and obstetric gynecology, emphasizing the need for ethical considerations and reliability assessments in the application of ChatGPT in medical practice. CONCLUSION ChatGPT's presence in medical literature is growing rapidly across various specialties, but concerns related to safety, privacy, and accuracy persist. More research is needed to assess its suitability for patient care and implications for non-medical use. Skepticism and thorough review of research are essential, as current studies may face retraction as more information emerges.
Collapse
Affiliation(s)
| | | | - Latha Ganti
- University of Central Florida, Orlando, FL, USA.
- Warren Alpert Medical School of Brown University, RI Providence, USA.
| |
Collapse
|
45
|
Alanezi F. Examining the role of ChatGPT in promoting health behaviors and lifestyle changes among cancer patients. Nutr Health 2024:2601060241244563. [PMID: 38567408 DOI: 10.1177/02601060241244563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Purpose: This study aims to investigate the role of ChatGPT in promoting health behavioral changes among cancer patients. Methods: A quasi-experiment design with qualitative approach was adopted in this study, as the ChatGPT technology is novel, and many people are unaware of it. The participants included outpatients at a public hospital. An experiment was carried out, where the participants used ChatGPT for seeking cancer related information for two weeks, which is then followed by focus group (FG) discussions. A total of 72 outpatients participated in ten focus groups. Results: Three main themes with 14 sub-themes were identified reflecting the role of ChatGPT in promoting health behavior changes. Its prominent role was observed in developing health literacy, promoting self-management of conditions through emotional, informational, motivational support. Three challenges including privacy, lack of personalization, and reliability issues were identified. Conclusion: Although ChatGPT has a huge potential in promoting health behavior changes among cancer patients, its ability is minimized by several factors such as regulatory, reliability, and privacy issues. There is a need for further evidence to generalize the results across the regions.
Collapse
Affiliation(s)
- Fahad Alanezi
- College of Business Administration, Department Management Information Systems, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| |
Collapse
|
46
|
Gertz RJ, Dratsch T, Bunck AC, Lennartz S, Iuga AI, Hellmich MG, Persigehl T, Pennig L, Gietzen CH, Fervers P, Maintz D, Hahnfeldt R, Kottlors J. Potential of GPT-4 for Detecting Errors in Radiology Reports: Implications for Reporting Accuracy. Radiology 2024; 311:e232714. [PMID: 38625012 DOI: 10.1148/radiol.232714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2024]
Abstract
Background Errors in radiology reports may occur because of resident-to-attending discrepancies, speech recognition inaccuracies, and large workload. Large language models, such as GPT-4 (ChatGPT; OpenAI), may assist in generating reports. Purpose To assess effectiveness of GPT-4 in identifying common errors in radiology reports, focusing on performance, time, and cost-efficiency. Materials and Methods In this retrospective study, 200 radiology reports (radiography and cross-sectional imaging [CT and MRI]) were compiled between June 2023 and December 2023 at one institution. There were 150 errors from five common error categories (omission, insertion, spelling, side confusion, and other) intentionally inserted into 100 of the reports and used as the reference standard. Six radiologists (two senior radiologists, two attending physicians, and two residents) and GPT-4 were tasked with detecting these errors. Overall error detection performance, error detection in the five error categories, and reading time were assessed using Wald χ2 tests and paired-sample t tests. Results GPT-4 (detection rate, 82.7%;124 of 150; 95% CI: 75.8, 87.9) matched the average detection performance of radiologists independent of their experience (senior radiologists, 89.3% [134 of 150; 95% CI: 83.4, 93.3]; attending physicians, 80.0% [120 of 150; 95% CI: 72.9, 85.6]; residents, 80.0% [120 of 150; 95% CI: 72.9, 85.6]; P value range, .522-.99). One senior radiologist outperformed GPT-4 (detection rate, 94.7%; 142 of 150; 95% CI: 89.8, 97.3; P = .006). GPT-4 required less processing time per radiology report than the fastest human reader in the study (mean reading time, 3.5 seconds ± 0.5 [SD] vs 25.1 seconds ± 20.1, respectively; P < .001; Cohen d = -1.08). The use of GPT-4 resulted in lower mean correction cost per report than the most cost-efficient radiologist ($0.03 ± 0.01 vs $0.42 ± 0.41; P < .001; Cohen d = -1.12). Conclusion The radiology report error detection rate of GPT-4 was comparable with that of radiologists, potentially reducing work hours and cost. © RSNA, 2024 See also the editorial by Forman in this issue.
Collapse
Affiliation(s)
- Roman Johannes Gertz
- From the Institute of Diagnostic and Interventional Radiology (R.J.G., T.D., A.C.B., S.L., A.I.I., T.P., L.P., C.H.G., P.F., D.M., R.H., J.K.) and Institute of Medical Statistics and Bioinformatics (M.G.H.), Faculty of Medicine, University Hospital Cologne, University of Cologne, Kerpener Strasse 62, 50937 Cologne, Germany
| | - Thomas Dratsch
- From the Institute of Diagnostic and Interventional Radiology (R.J.G., T.D., A.C.B., S.L., A.I.I., T.P., L.P., C.H.G., P.F., D.M., R.H., J.K.) and Institute of Medical Statistics and Bioinformatics (M.G.H.), Faculty of Medicine, University Hospital Cologne, University of Cologne, Kerpener Strasse 62, 50937 Cologne, Germany
| | - Alexander Christian Bunck
- From the Institute of Diagnostic and Interventional Radiology (R.J.G., T.D., A.C.B., S.L., A.I.I., T.P., L.P., C.H.G., P.F., D.M., R.H., J.K.) and Institute of Medical Statistics and Bioinformatics (M.G.H.), Faculty of Medicine, University Hospital Cologne, University of Cologne, Kerpener Strasse 62, 50937 Cologne, Germany
| | - Simon Lennartz
- From the Institute of Diagnostic and Interventional Radiology (R.J.G., T.D., A.C.B., S.L., A.I.I., T.P., L.P., C.H.G., P.F., D.M., R.H., J.K.) and Institute of Medical Statistics and Bioinformatics (M.G.H.), Faculty of Medicine, University Hospital Cologne, University of Cologne, Kerpener Strasse 62, 50937 Cologne, Germany
| | - Andra-Iza Iuga
- From the Institute of Diagnostic and Interventional Radiology (R.J.G., T.D., A.C.B., S.L., A.I.I., T.P., L.P., C.H.G., P.F., D.M., R.H., J.K.) and Institute of Medical Statistics and Bioinformatics (M.G.H.), Faculty of Medicine, University Hospital Cologne, University of Cologne, Kerpener Strasse 62, 50937 Cologne, Germany
| | - Martin Gunnar Hellmich
- From the Institute of Diagnostic and Interventional Radiology (R.J.G., T.D., A.C.B., S.L., A.I.I., T.P., L.P., C.H.G., P.F., D.M., R.H., J.K.) and Institute of Medical Statistics and Bioinformatics (M.G.H.), Faculty of Medicine, University Hospital Cologne, University of Cologne, Kerpener Strasse 62, 50937 Cologne, Germany
| | - Thorsten Persigehl
- From the Institute of Diagnostic and Interventional Radiology (R.J.G., T.D., A.C.B., S.L., A.I.I., T.P., L.P., C.H.G., P.F., D.M., R.H., J.K.) and Institute of Medical Statistics and Bioinformatics (M.G.H.), Faculty of Medicine, University Hospital Cologne, University of Cologne, Kerpener Strasse 62, 50937 Cologne, Germany
| | - Lenhard Pennig
- From the Institute of Diagnostic and Interventional Radiology (R.J.G., T.D., A.C.B., S.L., A.I.I., T.P., L.P., C.H.G., P.F., D.M., R.H., J.K.) and Institute of Medical Statistics and Bioinformatics (M.G.H.), Faculty of Medicine, University Hospital Cologne, University of Cologne, Kerpener Strasse 62, 50937 Cologne, Germany
| | - Carsten Herbert Gietzen
- From the Institute of Diagnostic and Interventional Radiology (R.J.G., T.D., A.C.B., S.L., A.I.I., T.P., L.P., C.H.G., P.F., D.M., R.H., J.K.) and Institute of Medical Statistics and Bioinformatics (M.G.H.), Faculty of Medicine, University Hospital Cologne, University of Cologne, Kerpener Strasse 62, 50937 Cologne, Germany
| | - Philipp Fervers
- From the Institute of Diagnostic and Interventional Radiology (R.J.G., T.D., A.C.B., S.L., A.I.I., T.P., L.P., C.H.G., P.F., D.M., R.H., J.K.) and Institute of Medical Statistics and Bioinformatics (M.G.H.), Faculty of Medicine, University Hospital Cologne, University of Cologne, Kerpener Strasse 62, 50937 Cologne, Germany
| | - David Maintz
- From the Institute of Diagnostic and Interventional Radiology (R.J.G., T.D., A.C.B., S.L., A.I.I., T.P., L.P., C.H.G., P.F., D.M., R.H., J.K.) and Institute of Medical Statistics and Bioinformatics (M.G.H.), Faculty of Medicine, University Hospital Cologne, University of Cologne, Kerpener Strasse 62, 50937 Cologne, Germany
| | - Robert Hahnfeldt
- From the Institute of Diagnostic and Interventional Radiology (R.J.G., T.D., A.C.B., S.L., A.I.I., T.P., L.P., C.H.G., P.F., D.M., R.H., J.K.) and Institute of Medical Statistics and Bioinformatics (M.G.H.), Faculty of Medicine, University Hospital Cologne, University of Cologne, Kerpener Strasse 62, 50937 Cologne, Germany
| | - Jonathan Kottlors
- From the Institute of Diagnostic and Interventional Radiology (R.J.G., T.D., A.C.B., S.L., A.I.I., T.P., L.P., C.H.G., P.F., D.M., R.H., J.K.) and Institute of Medical Statistics and Bioinformatics (M.G.H.), Faculty of Medicine, University Hospital Cologne, University of Cologne, Kerpener Strasse 62, 50937 Cologne, Germany
| |
Collapse
|
47
|
van Diessen E, van Amerongen RA, Zijlmans M, Otte WM. Potential merits and flaws of large language models in epilepsy care: A critical review. Epilepsia 2024; 65:873-886. [PMID: 38305763 DOI: 10.1111/epi.17907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 12/30/2023] [Accepted: 01/19/2024] [Indexed: 02/03/2024]
Abstract
The current pace of development and applications of large language models (LLMs) is unprecedented and will impact future medical care significantly. In this critical review, we provide the background to better understand these novel artificial intelligence (AI) models and how LLMs can be of future use in the daily care of people with epilepsy. Considering the importance of clinical history taking in diagnosing and monitoring epilepsy-combined with the established use of electronic health records-a great potential exists to integrate LLMs in epilepsy care. We present the current available LLM studies in epilepsy. Furthermore, we highlight and compare the most commonly used LLMs and elaborate on how these models can be applied in epilepsy. We further discuss important drawbacks and risks of LLMs, and we provide recommendations for overcoming these limitations.
Collapse
Affiliation(s)
- Eric van Diessen
- Department of Child Neurology, UMC Utrecht Brain Center, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands
- Department of Pediatrics, Franciscus Gasthuis & Vlietland, Rotterdam, The Netherlands
| | - Ramon A van Amerongen
- Faculty of Science, Bioinformatics and Biocomplexity, Utrecht University, Utrecht, The Netherlands
| | - Maeike Zijlmans
- Department of Neurology and Neurosurgery, UMC Utrecht Brain Center, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands
- Stichting Epilepsie Instellingen Nederland, Heemstede, The Netherlands
| | - Willem M Otte
- Department of Child Neurology, UMC Utrecht Brain Center, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
48
|
Mira FA, Favier V, Dos Santos Sobreira Nunes H, de Castro JV, Carsuzaa F, Meccariello G, Vicini C, De Vito A, Lechien JR, Chiesa-Estomba C, Maniaci A, Iannella G, Rojas EP, Cornejo JB, Cammaroto G. Chat GPT for the management of obstructive sleep apnea: do we have a polar star? Eur Arch Otorhinolaryngol 2024; 281:2087-2093. [PMID: 37980605 DOI: 10.1007/s00405-023-08270-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 09/29/2023] [Indexed: 11/21/2023]
Abstract
PURPOSE This study explores the potential of the Chat-Generative Pre-Trained Transformer (Chat-GPT), a Large Language Model (LLM), in assisting healthcare professionals in the diagnosis of obstructive sleep apnea (OSA). It aims to assess the agreement between Chat-GPT's responses and those of expert otolaryngologists, shedding light on the role of AI-generated content in medical decision-making. METHODS A prospective, cross-sectional study was conducted, involving 350 otolaryngologists from 25 countries who responded to a specialized OSA survey. Chat-GPT was tasked with providing answers to the same survey questions. Responses were assessed by both super-experts and statistically analyzed for agreement. RESULTS The study revealed that Chat-GPT and expert responses shared a common answer in over 75% of cases for individual questions. However, the overall consensus was achieved in only four questions. Super-expert assessments showed a moderate agreement level, with Chat-GPT scoring slightly lower than experts. Statistically, Chat-GPT's responses differed significantly from experts' opinions (p = 0.0009). Sub-analysis revealed areas of improvement for Chat-GPT, particularly in questions where super-experts rated its responses lower than expert consensus. CONCLUSIONS Chat-GPT demonstrates potential as a valuable resource for OSA diagnosis, especially where access to specialists is limited. The study emphasizes the importance of AI-human collaboration, with Chat-GPT serving as a complementary tool rather than a replacement for medical professionals. This research contributes to the discourse in otolaryngology and encourages further exploration of AI-driven healthcare applications. While Chat-GPT exhibits a commendable level of consensus with expert responses, ongoing refinements in AI-based healthcare tools hold significant promise for the future of medicine, addressing the underdiagnosis and undertreatment of OSA and improving patient outcomes.
Collapse
Affiliation(s)
- Felipe Ahumada Mira
- ENT Department, Hospital of Linares, Linares, Chile
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France
| | - Valentin Favier
- ENT Department, University Hospital of Montpellier, Montpellier, France
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France
| | - Heloisa Dos Santos Sobreira Nunes
- ENT and Sleep Medicine Department, Nucleus of Otolaryngology, Head and Neck Surgery and Sleep Medicine of São Paulo, São Paulo, Brazil
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France
| | - Joana Vaz de Castro
- ENT Department, Armed Forces Hospital, Lisbon, Portugal
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France
| | - Florent Carsuzaa
- ENT Department, University Hospital of Poitiers, Poitiers, France
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France
| | - Giuseppe Meccariello
- Head and Neck Department, ENT & Oral Surgery Unity, G.B. Morgagni, L. Pierantoni Hospital, Via Forlanini, 47121, Forlì, Italy
| | - Claudio Vicini
- Head and Neck Department, ENT & Oral Surgery Unity, G.B. Morgagni, L. Pierantoni Hospital, Via Forlanini, 47121, Forlì, Italy
| | - Andrea De Vito
- Head and Neck Department, ENT & Oral Surgery Unity, G.B. Morgagni, L. Pierantoni Hospital, Via Forlanini, 47121, Forlì, Italy
| | - Jerome R Lechien
- Division of Laryngology and Broncho-Esophagology, Department of Otolaryngology and Head and Neck Surgery, EpiCURA Hospital, UMONS Research Institute for Health Sciences and Technology, University of Mons, Mons, Belgium
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France
| | - Carlos Chiesa-Estomba
- Department of Otorhinolaryngology, Biodonostia Research Institute, Donostia University Hospital, Osakidetza, 20014, San Sebastian, Spain
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France
| | - Antonino Maniaci
- Department of Medical and Surgical Sciences and Advanced Technologies "GF Ingrassia", ENT Section, University of Catania, Piazza Università 2, 95100, Catania, Italy
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France
| | - Giannicola Iannella
- Department of 'Organi di Senso', University "Sapienza", Viale Dell'Università 33, 00185, Rome, Italy
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France
| | | | | | - Giovanni Cammaroto
- Head and Neck Department, ENT & Oral Surgery Unity, G.B. Morgagni, L. Pierantoni Hospital, Via Forlanini, 47121, Forlì, Italy.
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France.
| |
Collapse
|
49
|
Ni Z, Peng R, Zheng X, Xie P. Embracing the future: Integrating ChatGPT into China's nursing education system. Int J Nurs Sci 2024; 11:295-299. [PMID: 38707690 PMCID: PMC11064564 DOI: 10.1016/j.ijnss.2024.03.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 02/13/2024] [Accepted: 03/06/2024] [Indexed: 05/07/2024] Open
Abstract
This article delves into the role of ChatGPT within the rapidly evolving field of artificial intelligence, especially highlighting its significant potential in nursing education. Initially, the paper presents the notable advancements ChatGPT has achieved in facilitating interactive learning and providing real-time feedback, along with the academic community's growing interest in this technology. Subsequently, summarizing the research outcomes of ChatGPT's applications in nursing education, including various clinical disciplines and scenarios, showcases the enormous potential for multidisciplinary education and addressing clinical issues. Comparing the performance of several Large Language Models (LLMs) on China's National Nursing Licensure Examination, we observed that ChatGPT demonstrated a higher accuracy rate than its counterparts, providing a solid theoretical foundation for its application in Chinese nursing education and clinical settings. Educational institutions should establish a targeted and effective regulatory framework to leverage ChatGPT in localized nursing education while assuming corresponding responsibilities. Through standardized training for users and adjustments to existing educational assessment methods aimed at preventing potential misuse and abuse, the full potential of ChatGPT as an innovative auxiliary tool in China's nursing education system can be realized, aligning with the developmental needs of modern teaching methodologies.
Collapse
Affiliation(s)
- Zhengxin Ni
- School of Nursing, Yangzhou University, Yangzhou, China
| | - Rui Peng
- Department of Bone and Joint Surgery and Sports Medicine Center, The First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Xiaofei Zheng
- Department of Bone and Joint Surgery and Sports Medicine Center, The First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Ping Xie
- Department of External Cooperation, Northern Jiangsu People’s Hospital, Nanjing, China
| |
Collapse
|
50
|
Bajaj S, Gandhi D, Nayar D. Potential Applications and Impact of ChatGPT in Radiology. Acad Radiol 2024; 31:1256-1261. [PMID: 37802673 DOI: 10.1016/j.acra.2023.08.039] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 08/15/2023] [Accepted: 08/28/2023] [Indexed: 10/08/2023]
Abstract
Radiology has always gone hand-in-hand with technology and artificial intelligence (AI) is not new to the field. While various AI devices and algorithms have already been integrated in the daily clinical practice of radiology, with applications ranging from scheduling patient appointments to detecting and diagnosing certain clinical conditions on imaging, the use of natural language processing and large language model based software have been in discussion for a long time. Algorithms like ChatGPT can help in improving patient outcomes, increasing the efficiency of radiology interpretation, and aiding in the overall workflow of radiologists and here we discuss some of its potential applications.
Collapse
Affiliation(s)
- Suryansh Bajaj
- Department of Radiology, University of Arkansas for Medical Sciences, Little Rock, Arkansas 72205 (S.B.)
| | - Darshan Gandhi
- Department of Diagnostic Radiology, University of Tennessee Health Science Center, Memphis, Tennessee 38103 (D.G.).
| | - Divya Nayar
- Department of Neurology, University of Arkansas for Medical Sciences, Little Rock, Arkansas 72205 (D.N.)
| |
Collapse
|