51
|
Rodriguez DV, Lawrence K, Gonzalez J, Brandfield-Harvey B, Xu L, Tasneem S, Levine DL, Mann D. Leveraging Generative AI Tools to Support the Development of Digital Solutions in Health Care Research: Case Study. JMIR Hum Factors 2024; 11:e52885. [PMID: 38446539 PMCID: PMC10955400 DOI: 10.2196/52885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 11/27/2023] [Accepted: 12/15/2023] [Indexed: 03/07/2024] Open
Abstract
BACKGROUND Generative artificial intelligence has the potential to revolutionize health technology product development by improving coding quality, efficiency, documentation, quality assessment and review, and troubleshooting. OBJECTIVE This paper explores the application of a commercially available generative artificial intelligence tool (ChatGPT) to the development of a digital health behavior change intervention designed to support patient engagement in a commercial digital diabetes prevention program. METHODS We examined the capacity, advantages, and limitations of ChatGPT to support digital product idea conceptualization, intervention content development, and the software engineering process, including software requirement generation, software design, and code production. In total, 11 evaluators, each with at least 10 years of experience in fields of study ranging from medicine and implementation science to computer science, participated in the output review process (ChatGPT vs human-generated output). All had familiarity or prior exposure to the original personalized automatic messaging system intervention. The evaluators rated the ChatGPT-produced outputs in terms of understandability, usability, novelty, relevance, completeness, and efficiency. RESULTS Most metrics received positive scores. We identified that ChatGPT can (1) support developers to achieve high-quality products faster and (2) facilitate nontechnical communication and system understanding between technical and nontechnical team members around the development goal of rapid and easy-to-build computational solutions for medical technologies. CONCLUSIONS ChatGPT can serve as a usable facilitator for researchers engaging in the software development life cycle, from product conceptualization to feature identification and user story development to code generation. TRIAL REGISTRATION ClinicalTrials.gov NCT04049500; https://clinicaltrials.gov/ct2/show/NCT04049500.
Collapse
Affiliation(s)
- Danissa V Rodriguez
- Department of Population Health, New York University Grossman School of Medicine, New York, NY, United States
| | - Katharine Lawrence
- Department of Population Health, New York University Grossman School of Medicine, New York, NY, United States
- Medical Center Information Technology, Department of Health Informatics, New York University Langone Health, New York, NY, United States
| | - Javier Gonzalez
- Medical Center Information Technology, Department of Health Informatics, New York University Langone Health, New York, NY, United States
| | - Beatrix Brandfield-Harvey
- Department of Population Health, New York University Grossman School of Medicine, New York, NY, United States
| | - Lynn Xu
- Department of Population Health, New York University Grossman School of Medicine, New York, NY, United States
| | - Sumaiya Tasneem
- Department of Population Health, New York University Grossman School of Medicine, New York, NY, United States
| | - Defne L Levine
- Department of Population Health, New York University Grossman School of Medicine, New York, NY, United States
| | - Devin Mann
- Department of Population Health, New York University Grossman School of Medicine, New York, NY, United States
- Medical Center Information Technology, Department of Health Informatics, New York University Langone Health, New York, NY, United States
| |
Collapse
|
52
|
Mu Y, He D. The Potential Applications and Challenges of ChatGPT in the Medical Field. Int J Gen Med 2024; 17:817-826. [PMID: 38476626 PMCID: PMC10929156 DOI: 10.2147/ijgm.s456659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 02/26/2024] [Indexed: 03/14/2024] Open
Abstract
ChatGPT, an AI-driven conversational large language model (LLM), has garnered significant scholarly attention since its inception, owing to its manifold applications in the realm of medical science. This study primarily examines the merits, limitations, anticipated developments, and practical applications of ChatGPT in clinical practice, healthcare, medical education, and medical research. It underscores the necessity for further research and development to enhance its performance and deployment. Moreover, future research avenues encompass ongoing enhancements and standardization of ChatGPT, mitigating its limitations, and exploring its integration and applicability in translational and personalized medicine. Reflecting the narrative nature of this review, a focused literature search was performed to identify relevant publications on ChatGPT's use in medicine. This process was aimed at gathering a broad spectrum of insights to provide a comprehensive overview of the current state and future prospects of ChatGPT in the medical domain. The objective is to aid healthcare professionals in understanding the groundbreaking advancements associated with the latest artificial intelligence tools, while also acknowledging the opportunities and challenges presented by ChatGPT.
Collapse
Affiliation(s)
- Yonglin Mu
- Department of Urology, Children’s Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| | - Dawei He
- Department of Urology, Children’s Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| |
Collapse
|
53
|
Spotnitz M, Idnay B, Gordon ER, Shyu R, Zhang G, Liu C, Cimino JJ, Weng C. A Survey of Clinicians' Views of the Utility of Large Language Models. Appl Clin Inform 2024; 15:306-312. [PMID: 38442909 PMCID: PMC11023712 DOI: 10.1055/a-2281-7092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 02/15/2024] [Indexed: 03/07/2024] Open
Abstract
OBJECTIVES Large language models (LLMs) like Generative pre-trained transformer (ChatGPT) are powerful algorithms that have been shown to produce human-like text from input data. Several potential clinical applications of this technology have been proposed and evaluated by biomedical informatics experts. However, few have surveyed health care providers for their opinions about whether the technology is fit for use. METHODS We distributed a validated mixed-methods survey to gauge practicing clinicians' comfort with LLMs for a breadth of tasks in clinical practice, research, and education, which were selected from the literature. RESULTS A total of 30 clinicians fully completed the survey. Of the 23 tasks, 16 were rated positively by more than 50% of the respondents. Based on our qualitative analysis, health care providers considered LLMs to have excellent synthesis skills and efficiency. However, our respondents had concerns that LLMs could generate false information and propagate training data bias.Our survey respondents were most comfortable with scenarios that allow LLMs to function in an assistive role, like a physician extender or trainee. CONCLUSION In a mixed-methods survey of clinicians about LLM use, health care providers were encouraging of having LLMs in health care for many tasks, and especially in assistive roles. There is a need for continued human-centered development of both LLMs and artificial intelligence in general.
Collapse
Affiliation(s)
- Matthew Spotnitz
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York, United States
| | - Betina Idnay
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York, United States
| | - Emily R. Gordon
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York, United States
- Department of Dermatology, Vagelos College of Physicians and Surgeons, Columbia University Irving Medical Center, New York, New York, United States
| | - Rebecca Shyu
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York, United States
| | - Gongbo Zhang
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York, United States
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York, United States
| | - James J. Cimino
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York, United States
- Department of Biomedical Informatics and Data Science, Informatics Institute, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York, United States
| |
Collapse
|
54
|
Wei Q, Yao Z, Cui Y, Wei B, Jin Z, Xu X. Evaluation of ChatGPT-generated medical responses: A systematic review and meta-analysis. J Biomed Inform 2024; 151:104620. [PMID: 38462064 DOI: 10.1016/j.jbi.2024.104620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 02/27/2024] [Accepted: 02/29/2024] [Indexed: 03/12/2024]
Abstract
OBJECTIVE Large language models (LLMs) such as ChatGPT are increasingly explored in medical domains. However, the absence of standard guidelines for performance evaluation has led to methodological inconsistencies. This study aims to summarize the available evidence on evaluating ChatGPT's performance in answering medical questions and provide direction for future research. METHODS An extensive literature search was conducted on June 15, 2023, across ten medical databases. The keyword used was "ChatGPT," without restrictions on publication type, language, or date. Studies evaluating ChatGPT's performance in answering medical questions were included. Exclusions comprised review articles, comments, patents, non-medical evaluations of ChatGPT, and preprint studies. Data was extracted on general study characteristics, question sources, conversation processes, assessment metrics, and performance of ChatGPT. An evaluation framework for LLM in medical inquiries was proposed by integrating insights from selected literature. This study is registered with PROSPERO, CRD42023456327. RESULTS A total of 3520 articles were identified, of which 60 were reviewed and summarized in this paper and 17 were included in the meta-analysis. ChatGPT displayed an overall integrated accuracy of 56 % (95 % CI: 51 %-60 %, I2 = 87 %) in addressing medical queries. However, the studies varied in question resource, question-asking process, and evaluation metrics. As per our proposed evaluation framework, many studies failed to report methodological details, such as the date of inquiry, version of ChatGPT, and inter-rater consistency. CONCLUSION This review reveals ChatGPT's potential in addressing medical inquiries, but the heterogeneity of the study design and insufficient reporting might affect the results' reliability. Our proposed evaluation framework provides insights for the future study design and transparent reporting of LLM in responding to medical questions.
Collapse
Affiliation(s)
- Qiuhong Wei
- Big Data Center for Children's Medical Care, Children's Hospital of Chongqing Medical University, Chongqing, China; Children Nutrition Research Center, Children's Hospital of Chongqing Medical University, Chongqing, China; National Clinical Research Center for Child Health and Disorders, Ministry of Education Key Laboratory of Child Development and Disorders, China International Science and Technology Cooperation Base of Child Development and Critical Disorders, Chongqing Key Laboratory of Child Neurodevelopment and Cognitive Disorders, Chongqing, China
| | - Zhengxiong Yao
- Department of Neurology, Children's Hospital of Chongqing Medical University, Chongqing, China
| | - Ying Cui
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Bo Wei
- Department of Global Statistics and Data Science, BeiGene USA Inc., San Mateo, CA, USA
| | - Zhezhen Jin
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA
| | - Ximing Xu
- Big Data Center for Children's Medical Care, Children's Hospital of Chongqing Medical University, Chongqing, China
| |
Collapse
|
55
|
Bužančić I, Belec D, Držaić M, Kummer I, Brkić J, Fialová D, Ortner Hadžiabdić M. Clinical decision-making in benzodiazepine deprescribing by healthcare providers vs. AI-assisted approach. Br J Clin Pharmacol 2024; 90:662-674. [PMID: 37949663 DOI: 10.1111/bcp.15963] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Revised: 10/26/2023] [Accepted: 10/29/2023] [Indexed: 11/12/2023] Open
Abstract
AIMS The aim of this study was to compare the clinical decision-making for benzodiazepine deprescribing between a healthcare provider (HCP) and an artificial intelligence (AI) chatbot GPT4 (ChatGPT-4). METHODS We analysed real-world data from a Croatian cohort of community-dwelling benzodiazepine patients (n = 154) within the EuroAgeism H2020 ESR 7 project. HCPs evaluated the data using pre-established deprescribing criteria to assess benzodiazepine discontinuation potential. The research team devised and tested AI prompts to ensure consistency with HCP judgements. An independent researcher employed ChatGPT-4 with predetermined prompts to simulate clinical decisions for each patient case. Data derived from human-HCP and ChatGPT-4 decisions were compared for agreement rates and Cohen's kappa. RESULTS Both HPC and ChatGPT identified patients for benzodiazepine deprescribing (96.1% and 89.6%, respectively), showing an agreement rate of 95% (κ = .200, P = .012). Agreement on four deprescribing criteria ranged from 74.7% to 91.3% (lack of indication κ = .352, P < .001; prolonged use κ = .088, P = .280; safety concerns κ = .123, P = .006; incorrect dosage κ = .264, P = .001). Important limitations of GPT-4 responses were identified, including 22.1% ambiguous outputs, generic answers and inaccuracies, posing inappropriate decision-making risks. CONCLUSIONS While AI-HCP agreement is substantial, sole AI reliance poses a risk for unsuitable clinical decision-making. This study's findings reveal both strengths and areas for enhancement of ChatGPT-4 in the deprescribing recommendations within a real-world sample. Our study underscores the need for additional research on chatbot functionality in patient therapy decision-making, further fostering the advancement of AI for optimal performance.
Collapse
Affiliation(s)
- Iva Bužančić
- Center for Applied Pharmacy, Faculty of Pharmacy and Biochemistry, University of Zagreb, Zagreb, Croatia
- City Pharmacy Zagreb, Zagreb, Croatia
| | - Dora Belec
- Center for Applied Pharmacy, Faculty of Pharmacy and Biochemistry, University of Zagreb, Zagreb, Croatia
| | - Margita Držaić
- Center for Applied Pharmacy, Faculty of Pharmacy and Biochemistry, University of Zagreb, Zagreb, Croatia
- City Pharmacy Zagreb, Zagreb, Croatia
| | - Ingrid Kummer
- Department of Social and Clinical Pharmacy, Faculty of Pharmacy in Hradec Králové, Charles University, Hradec Králové, Czech Republic
| | - Jovana Brkić
- Department of Social and Clinical Pharmacy, Faculty of Pharmacy in Hradec Králové, Charles University, Hradec Králové, Czech Republic
- Department of Social Pharmacy and Pharmaceutical Legislation, Faculty of Pharmacy, University of Belgrade, Belgrade, Serbia
| | - Daniela Fialová
- Department of Social and Clinical Pharmacy, Faculty of Pharmacy in Hradec Králové, Charles University, Hradec Králové, Czech Republic
- Department of Geriatrics and Gerontology, 1st Faculty of Medicine in Prague, Charles University, Prague, Czech Republic
| | - Maja Ortner Hadžiabdić
- Center for Applied Pharmacy, Faculty of Pharmacy and Biochemistry, University of Zagreb, Zagreb, Croatia
| |
Collapse
|
56
|
Ge J, Buenaventura A, Berrean B, Purvis J, Fontil V, Lai JC, Pletcher MJ. Applying human-centered design to the construction of a cirrhosis management clinical decision support system. Hepatol Commun 2024; 8:e0394. [PMID: 38407255 PMCID: PMC10898661 DOI: 10.1097/hc9.0000000000000394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 12/13/2023] [Indexed: 02/27/2024] Open
Abstract
BACKGROUND Electronic health record (EHR)-based clinical decision support is a scalable way to help standardize clinical care. Clinical decision support systems have not been extensively investigated in cirrhosis management. Human-centered design (HCD) is an approach that engages with potential users in intervention development. In this study, we applied HCD to design the features and interface for a clinical decision support system for cirrhosis management, called CirrhosisRx. METHODS We conducted technical feasibility assessments to construct a visual blueprint that outlines the basic features of the interface. We then convened collaborative-design workshops with generalist and specialist clinicians. We elicited current workflows for cirrhosis management, assessed gaps in existing EHR systems, evaluated potential features, and refined the design prototype for CirrhosisRx. At the conclusion of each workshop, we analyzed recordings and transcripts. RESULTS Workshop feedback showed that the aggregation of relevant clinical data into 6 cirrhosis decompensation domains (defined as common inpatient clinical scenarios) was the most important feature. Automatic inference of clinical events from EHR data, such as gastrointestinal bleeding from hemoglobin changes, was not accepted due to accuracy concerns. Visualizations for risk stratification scores were deemed not necessary. Lastly, the HCD co-design workshops allowed us to identify the target user population (generalists). CONCLUSIONS This is one of the first applications of HCD to design the features and interface for an electronic intervention for cirrhosis management. The HCD process altered features, modified the design interface, and likely improved CirrhosisRx's overall usability. The finalized design for CirrhosisRx proceeded to development and production and will be tested for effectiveness in a pragmatic randomized controlled trial. This work provides a model for the creation of other EHR-based interventions in hepatology care.
Collapse
Affiliation(s)
- Jin Ge
- Department of Medicine, Division of Gastroenterology and Hepatology, University of California—San Francisco, San Francisco, California, USA
| | - Ana Buenaventura
- School of Medicine Technology Services, University of California—San Francisco, San Francisco, California, USA
| | - Beth Berrean
- School of Medicine Technology Services, University of California—San Francisco, San Francisco, California, USA
| | - Jory Purvis
- School of Medicine Technology Services, University of California—San Francisco, San Francisco, California, USA
| | - Valy Fontil
- Family Health Centers, NYU-Langone Medical Center, Brooklyn, New York, USA
| | - Jennifer C. Lai
- Department of Medicine, Division of Gastroenterology and Hepatology, University of California—San Francisco, San Francisco, California, USA
| | - Mark J. Pletcher
- Department of Epidemiology and Biostatistics, University of California—San Francisco, San Francisco, California, USA
| |
Collapse
|
57
|
Chandra A, Chakraborty A. Exploring the role of large language models in radiation emergency response. JOURNAL OF RADIOLOGICAL PROTECTION : OFFICIAL JOURNAL OF THE SOCIETY FOR RADIOLOGICAL PROTECTION 2024; 44:011510. [PMID: 38324900 DOI: 10.1088/1361-6498/ad270c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 02/07/2024] [Indexed: 02/09/2024]
Abstract
In recent times, the field of artificial intelligence (AI) has been transformed by the introduction of large language models (LLMs). These models, popularized by OpenAI's GPT-3, have demonstrated the emergent capabilities of AI in comprehending and producing text resembling human language, which has helped them transform several industries. But its role has yet to be explored in the nuclear industry, specifically in managing radiation emergencies. The present work explores LLMs' contextual awareness, natural language interaction, and their capacity to comprehend diverse queries in a radiation emergency response setting. In this study we identify different user types and their specific LLM use-cases in radiation emergencies. Their possible interactions with ChatGPT, a popular LLM, has also been simulated and preliminary results are presented. Drawing on the insights gained from this exercise and to address concerns of reliability and misinformation, this study advocates for expert guided and domain-specific LLMs trained on radiation safety protocols and historical data. This study aims to guide radiation emergency management practitioners and decision-makers in effectively incorporating LLMs into their decision support framework.
Collapse
Affiliation(s)
- Anirudh Chandra
- Radiation Safety Systems Division, Bhabha Atomic Research Centre, Mumbai 400085, India
| | - Abinash Chakraborty
- Health Physics Division, Bhabha Atomic Research Centre, Mumbai 400085, India
| |
Collapse
|
58
|
Abi-Rafeh J, Xu HH, Kazan R, Tevlin R, Furnas H. Large Language Models and Artificial Intelligence: A Primer for Plastic Surgeons on the Demonstrated and Potential Applications, Promises, and Limitations of ChatGPT. Aesthet Surg J 2024; 44:329-343. [PMID: 37562022 DOI: 10.1093/asj/sjad260] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 08/02/2023] [Accepted: 08/04/2023] [Indexed: 08/12/2023] Open
Abstract
BACKGROUND The rapidly evolving field of artificial intelligence (AI) holds great potential for plastic surgeons. ChatGPT, a recently released AI large language model (LLM), promises applications across many disciplines, including healthcare. OBJECTIVES The aim of this article was to provide a primer for plastic surgeons on AI, LLM, and ChatGPT, including an analysis of current demonstrated and proposed clinical applications. METHODS A systematic review was performed identifying medical and surgical literature on ChatGPT's proposed clinical applications. Variables assessed included applications investigated, command tasks provided, user input information, AI-emulated human skills, output validation, and reported limitations. RESULTS The analysis included 175 articles reporting on 13 plastic surgery applications and 116 additional clinical applications, categorized by field and purpose. Thirty-four applications within plastic surgery are thus proposed, with relevance to different target audiences, including attending plastic surgeons (n = 17, 50%), trainees/educators (n = 8, 24.0%), researchers/scholars (n = 7, 21%), and patients (n = 2, 6%). The 15 identified limitations of ChatGPT were categorized by training data, algorithm, and ethical considerations. CONCLUSIONS Widespread use of ChatGPT in plastic surgery will depend on rigorous research of proposed applications to validate performance and address limitations. This systemic review aims to guide research, development, and regulation to safely adopt AI in plastic surgery.
Collapse
|
59
|
Stengel FC, Stienen MN, Ivanov M, Gandía-González ML, Raffa G, Ganau M, Whitfield P, Motov S. Can AI pass the written European Board Examination in Neurological Surgery? - Ethical and practical issues. BRAIN & SPINE 2024; 4:102765. [PMID: 38510593 PMCID: PMC10951784 DOI: 10.1016/j.bas.2024.102765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 01/28/2024] [Accepted: 02/12/2024] [Indexed: 03/22/2024]
Abstract
Introduction Artificial intelligence (AI) based large language models (LLM) contain enormous potential in education and training. Recent publications demonstrated that they are able to outperform participants in written medical exams. Research question We aimed to explore the accuracy of AI in the written part of the EANS board exam. Material and methods Eighty-six representative single best answer (SBA) questions, included at least ten times in prior EANS board exams, were selected by the current EANS board exam committee. The questions' content was classified as 75 text-based (TB) and 11 image-based (IB) and their structure as 50 interpretation-weighted, 30 theory-based and 6 true-or-false. Questions were tested with Chat GPT 3.5, Bing and Bard. The AI and participant results were statistically analyzed through ANOVA tests with Stata SE 15 (StataCorp, College Station, TX). P-values of <0.05 were considered as statistically significant. Results The Bard LLM achieved the highest accuracy with 62% correct questions overall and 69% excluding IB, outperforming human exam participants 59% (p = 0.67) and 59% (p = 0.42), respectively. All LLMs scored highest in theory-based questions, excluding IB questions (Chat-GPT: 79%; Bing: 83%; Bard: 86%) and significantly better than the human exam participants (60%; p = 0.03). AI could not answer any IB question correctly. Discussion and conclusion AI passed the written EANS board exam based on representative SBA questions and achieved results close to or even better than the human exam participants. Our results raise several ethical and practical implications, which may impact the current concept for the written EANS board exam.
Collapse
Affiliation(s)
- Felix C. Stengel
- Department of Neurosurgery & Spine Center of Eastern Switzerland, Kantonsspital St. Gallen & Medical School of St.Gallen, St. Gallen, Switzerland
| | - Martin N. Stienen
- Department of Neurosurgery & Spine Center of Eastern Switzerland, Kantonsspital St. Gallen & Medical School of St.Gallen, St. Gallen, Switzerland
| | - Marcel Ivanov
- Royal Hallamshire Hospital, Sheffield, United Kingdom
| | | | - Giovanni Raffa
- Division of Neurosurgery, BIOMORF Department, University of Messina, Messina, Italy
| | - Mario Ganau
- Oxford University Hospitals NHS Foundation Trust, Oxford, United Kingdom
| | | | - Stefan Motov
- Department of Neurosurgery & Spine Center of Eastern Switzerland, Kantonsspital St. Gallen & Medical School of St.Gallen, St. Gallen, Switzerland
| |
Collapse
|
60
|
Abdullahi T, Singh R, Eickhoff C. Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models. JMIR MEDICAL EDUCATION 2024; 10:e51391. [PMID: 38349725 PMCID: PMC10900078 DOI: 10.2196/51391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 11/07/2023] [Accepted: 12/11/2023] [Indexed: 02/15/2024]
Abstract
BACKGROUND Patients with rare and complex diseases often experience delayed diagnoses and misdiagnoses because comprehensive knowledge about these diseases is limited to only a few medical experts. In this context, large language models (LLMs) have emerged as powerful knowledge aggregation tools with applications in clinical decision support and education domains. OBJECTIVE This study aims to explore the potential of 3 popular LLMs, namely Bard (Google LLC), ChatGPT-3.5 (OpenAI), and GPT-4 (OpenAI), in medical education to enhance the diagnosis of rare and complex diseases while investigating the impact of prompt engineering on their performance. METHODS We conducted experiments on publicly available complex and rare cases to achieve these objectives. We implemented various prompt strategies to evaluate the performance of these models using both open-ended and multiple-choice prompts. In addition, we used a majority voting strategy to leverage diverse reasoning paths within language models, aiming to enhance their reliability. Furthermore, we compared their performance with the performance of human respondents and MedAlpaca, a generative LLM specifically designed for medical tasks. RESULTS Notably, all LLMs outperformed the average human consensus and MedAlpaca, with a minimum margin of 5% and 13%, respectively, across all 30 cases from the diagnostic case challenge collection. On the frequently misdiagnosed cases category, Bard tied with MedAlpaca but surpassed the human average consensus by 14%, whereas GPT-4 and ChatGPT-3.5 outperformed MedAlpaca and the human respondents on the moderately often misdiagnosed cases category with minimum accuracy scores of 28% and 11%, respectively. The majority voting strategy, particularly with GPT-4, demonstrated the highest overall score across all cases from the diagnostic complex case collection, surpassing that of other LLMs. On the Medical Information Mart for Intensive Care-III data sets, Bard and GPT-4 achieved the highest diagnostic accuracy scores, with multiple-choice prompts scoring 93%, whereas ChatGPT-3.5 and MedAlpaca scored 73% and 47%, respectively. Furthermore, our results demonstrate that there is no one-size-fits-all prompting approach for improving the performance of LLMs and that a single strategy does not universally apply to all LLMs. CONCLUSIONS Our findings shed light on the diagnostic capabilities of LLMs and the challenges associated with identifying an optimal prompting strategy that aligns with each language model's characteristics and specific task requirements. The significance of prompt engineering is highlighted, providing valuable insights for researchers and practitioners who use these language models for medical training. Furthermore, this study represents a crucial step toward understanding how LLMs can enhance diagnostic reasoning in rare and complex medical cases, paving the way for developing effective educational tools and accurate diagnostic aids to improve patient care and outcomes.
Collapse
Affiliation(s)
- Tassallah Abdullahi
- Department of Computer Science, Brown University, Providence, RI, United States
| | - Ritambhara Singh
- Department of Computer Science, Brown University, Providence, RI, United States
- Center for Computational Molecular Biology, Brown University, Providence, RI, United States
| | | |
Collapse
|
61
|
Wang Z, Zhang Z, Traverso A, Dekker A, Qian L, Sun P. Assessing the role of GPT-4 in thyroid ultrasound diagnosis and treatment recommendations: enhancing interpretability with a chain of thought approach. Quant Imaging Med Surg 2024; 14:1602-1615. [PMID: 38415150 PMCID: PMC10895085 DOI: 10.21037/qims-23-1180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Accepted: 11/30/2023] [Indexed: 02/29/2024]
Abstract
Background As artificial intelligence (AI) becomes increasingly prevalent in the medical field, the effectiveness of AI-generated medical reports in disease diagnosis remains to be evaluated. ChatGPT is a large language model developed by open AI with a notable capacity for text abstraction and comprehension. This study aimed to explore the capabilities, limitations, and potential of Generative Pre-trained Transformer (GPT)-4 in analyzing thyroid cancer ultrasound reports, providing diagnoses, and recommending treatment plans. Methods Using 109 diverse thyroid cancer cases, we evaluated GPT-4's performance by comparing its generated reports to those from doctors with various levels of experience. We also conducted a Turing Test and a consistency analysis. To enhance the interpretability of the model, we applied the Chain of Thought (CoT) method to deconstruct the decision-making chain of the GPT model. Results GPT-4 demonstrated proficiency in report structuring, professional terminology, and clarity of expression, but showed limitations in diagnostic accuracy. In addition, our consistency analysis highlighted certain discrepancies in the AI's performance. The CoT method effectively enhanced the interpretability of the AI's decision-making process. Conclusions GPT-4 exhibits potential as a supplementary tool in healthcare, especially for generating thyroid gland diagnostic reports. Our proposed online platform, "ThyroAIGuide", alongside the CoT method, underscores the potential of AI to augment diagnostic processes, elevate healthcare accessibility, and advance patient education. However, the journey towards fully integrating AI into healthcare is ongoing, requiring continuous research, development, and careful monitoring by medical professionals to ensure patient safety and quality of care.
Collapse
Affiliation(s)
- Zhixiang Wang
- Department of Ultrasound, Beijing Friendship Hospital, Capital Medical University, Beijing, China
- Department of Radiation Oncology (Maastro), GROW-School for Oncology, Maastricht University Medical Centre+, Maastricht, The Netherlands
| | - Zhen Zhang
- Department of Radiation Oncology (Maastro), GROW-School for Oncology, Maastricht University Medical Centre+, Maastricht, The Netherlands
| | - Alberto Traverso
- Department of Radiation Oncology (Maastro), GROW-School for Oncology, Maastricht University Medical Centre+, Maastricht, The Netherlands
| | - Andre Dekker
- Department of Radiation Oncology (Maastro), GROW-School for Oncology, Maastricht University Medical Centre+, Maastricht, The Netherlands
| | - Linxue Qian
- Department of Ultrasound, Beijing Friendship Hospital, Capital Medical University, Beijing, China
| | - Pengfei Sun
- Department of Ultrasound, Beijing Friendship Hospital, Capital Medical University, Beijing, China
| |
Collapse
|
62
|
Segal S, Saha AK, Khanna AK. Appropriateness of Answers to Common Preanesthesia Patient Questions Composed by the Large Language Model GPT-4 Compared to Human Authors. Anesthesiology 2024; 140:333-335. [PMID: 38193737 DOI: 10.1097/aln.0000000000004824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2024]
Affiliation(s)
- Scott Segal
- Wake Forest University School of Medicine, Atrium Health Wake Forest Baptist Medical Center, Winston-Salem, North Carolina (S.S.).
| | | | | |
Collapse
|
63
|
Liao Z, Wang J, Shi Z, Lu L, Tabata H. Revolutionary Potential of ChatGPT in Constructing Intelligent Clinical Decision Support Systems. Ann Biomed Eng 2024; 52:125-129. [PMID: 37332008 DOI: 10.1007/s10439-023-03288-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 06/13/2023] [Indexed: 06/20/2023]
Abstract
Recently, Chatbot Generative Pre-trained Transformer (ChatGPT) is recognized as a promising clinical decision support system (CDSS) in the medical field owing to its advanced text analysis capabilities and interactive design. However, ChatGPT primarily focuses on learning text semantics rather than learning complex data structures and conducting real-time data analysis, which typically necessitate the development of intelligent CDSS employing specialized machine learning algorithms. Although ChatGPT cannot directly execute specific algorithms, it aids in algorithm design for intelligent CDSS at the textual level. In this study, besides discussing the types of CDSS and their relationship with ChatGPT, we mainly investigate the benefits and drawbacks of employing ChatGPT as an auxiliary design tool for intelligent CDSS. Our findings indicate that by collaborating with human expertise, ChatGPT has the potential to revolutionize the development of robust and effective intelligent CDSS.
Collapse
Affiliation(s)
- Zhiqiang Liao
- Department of Electrical Engineering and Information Systems, Graduate School of Engineering, The University of Tokyo, Tokyo, 113-8656, Japan.
| | - Jian Wang
- Department of Orthopaedics, Qilu Hospital of Shandong University, Jinan, 250012, People's Republic of China
| | - Zhuozheng Shi
- Department of Bioengineering, Graduate School of Engineering, The University of Tokyo, Tokyo, 113-8656, Japan
| | - Lintao Lu
- Department of Orthopaedics, Qilu Hospital of Shandong University, Jinan, 250012, People's Republic of China.
- Department of Orthopaedics, Qilu Hospital of Shandong University Dezhou Hospital, Dezhou, 253000, People's Republic of China.
| | - Hitoshi Tabata
- Department of Electrical Engineering and Information Systems, Graduate School of Engineering, The University of Tokyo, Tokyo, 113-8656, Japan
- Department of Bioengineering, Graduate School of Engineering, The University of Tokyo, Tokyo, 113-8656, Japan
| |
Collapse
|
64
|
Jin X, Frock A, Nagaraja S, Wallqvist A, Reifman J. AI algorithm for personalized resource allocation and treatment of hemorrhage casualties. Front Physiol 2024; 15:1327948. [PMID: 38332989 PMCID: PMC10851938 DOI: 10.3389/fphys.2024.1327948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 01/09/2024] [Indexed: 02/10/2024] Open
Abstract
A deep neural network-based artificial intelligence (AI) model was assessed for its utility in predicting vital signs of hemorrhage patients and optimizing the management of fluid resuscitation in mass casualties. With the use of a cardio-respiratory computational model to generate synthetic data of hemorrhage casualties, an application was created where a limited data stream (the initial 10 min of vital-sign monitoring) could be used to predict the outcomes of different fluid resuscitation allocations 60 min into the future. The predicted outcomes were then used to select the optimal resuscitation allocation for various simulated mass-casualty scenarios. This allowed the assessment of the potential benefits of using an allocation method based on personalized predictions of future vital signs versus a static population-based method that only uses currently available vital-sign information. The theoretical benefits of this approach included up to 46% additional casualties restored to healthy vital signs and a 119% increase in fluid-utilization efficiency. Although the study is not immune from limitations associated with synthetic data under specific assumptions, the work demonstrated the potential for incorporating neural network-based AI technologies in hemorrhage detection and treatment. The simulated injury and treatment scenarios used delineated possible benefits and opportunities available for using AI in pre-hospital trauma care. The greatest benefit of this technology lies in its ability to provide personalized interventions that optimize clinical outcomes under resource-limited conditions, such as in civilian or military mass-casualty events, involving moderate and severe hemorrhage.
Collapse
Affiliation(s)
- Xin Jin
- Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, United States Army Medical Research and Development Command, Fort Detrick, MD, United States
- The Henry M. Jackson Foundation for the Advancement of Military Medicine Inc., Bethesda, MD, United States
| | - Andrew Frock
- Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, United States Army Medical Research and Development Command, Fort Detrick, MD, United States
- The Henry M. Jackson Foundation for the Advancement of Military Medicine Inc., Bethesda, MD, United States
| | - Sridevi Nagaraja
- Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, United States Army Medical Research and Development Command, Fort Detrick, MD, United States
- The Henry M. Jackson Foundation for the Advancement of Military Medicine Inc., Bethesda, MD, United States
| | - Anders Wallqvist
- Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, United States Army Medical Research and Development Command, Fort Detrick, MD, United States
| | - Jaques Reifman
- Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, United States Army Medical Research and Development Command, Fort Detrick, MD, United States
| |
Collapse
|
65
|
Liu X, Wu J, Shao A, Shen W, Ye P, Wang Y, Ye J, Jin K, Yang J. Uncovering Language Disparity of ChatGPT on Retinal Vascular Disease Classification: Cross-Sectional Study. J Med Internet Res 2024; 26:e51926. [PMID: 38252483 PMCID: PMC10845019 DOI: 10.2196/51926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 10/07/2023] [Accepted: 11/30/2023] [Indexed: 01/23/2024] Open
Abstract
BACKGROUND Benefiting from rich knowledge and the exceptional ability to understand text, large language models like ChatGPT have shown great potential in English clinical environments. However, the performance of ChatGPT in non-English clinical settings, as well as its reasoning, have not been explored in depth. OBJECTIVE This study aimed to evaluate ChatGPT's diagnostic performance and inference abilities for retinal vascular diseases in a non-English clinical environment. METHODS In this cross-sectional study, we collected 1226 fundus fluorescein angiography reports and corresponding diagnoses written in Chinese and tested ChatGPT with 4 prompting strategies (direct diagnosis or diagnosis with a step-by-step reasoning process and in Chinese or English). RESULTS Compared with ChatGPT using Chinese prompts for direct diagnosis that achieved an F1-score of 70.47%, ChatGPT using English prompts for direct diagnosis achieved the best diagnostic performance (80.05%), which was inferior to ophthalmologists (89.35%) but close to ophthalmologist interns (82.69%). As for its inference abilities, although ChatGPT can derive a reasoning process with a low error rate (0.4 per report) for both Chinese and English prompts, ophthalmologists identified that the latter brought more reasoning steps with less incompleteness (44.31%), misinformation (1.96%), and hallucinations (0.59%) (all P<.001). Also, analysis of the robustness of ChatGPT with different language prompts indicated significant differences in the recall (P=.03) and F1-score (P=.04) between Chinese and English prompts. In short, when prompted in English, ChatGPT exhibited enhanced diagnostic and inference capabilities for retinal vascular disease classification based on Chinese fundus fluorescein angiography reports. CONCLUSIONS ChatGPT can serve as a helpful medical assistant to provide diagnosis in non-English clinical environments, but there are still performance gaps, language disparities, and errors compared to professionals, which demonstrate the potential limitations and the need to continually explore more robust large language models in ophthalmology practice.
Collapse
Affiliation(s)
- Xiaocong Liu
- Eye Center, The Second Affiliated Hospital, Zhejiang University, Zhejiang, China
- School of Public Health, Zhejiang University School of Medicine, Zhejiang, China
| | - Jiageng Wu
- School of Public Health, Zhejiang University School of Medicine, Zhejiang, China
| | - An Shao
- Eye Center, The Second Affiliated Hospital, Zhejiang University, Zhejiang, China
| | - Wenyue Shen
- Eye Center, The Second Affiliated Hospital, Zhejiang University, Zhejiang, China
| | - Panpan Ye
- Eye Center, The Second Affiliated Hospital, Zhejiang University, Zhejiang, China
| | - Yao Wang
- Eye Center, The Second Affiliated Hospital, Zhejiang University, Zhejiang, China
| | - Juan Ye
- Eye Center, The Second Affiliated Hospital, Zhejiang University, Zhejiang, China
| | - Kai Jin
- Eye Center, The Second Affiliated Hospital, Zhejiang University, Zhejiang, China
| | - Jie Yang
- School of Public Health, Zhejiang University School of Medicine, Zhejiang, China
| |
Collapse
|
66
|
dos Santos ML, Victória VNG. Critical evaluation of applications of artificial intelligence based linguistic models in Occupational Health. Rev Bras Med Trab 2024; 22:e20231241. [PMID: 39165532 PMCID: PMC11333049 DOI: 10.47626/1679-4435-2023-1241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 10/30/2023] [Indexed: 08/22/2024] Open
Abstract
This article explores the impact and potential applications of large language models in Occupational Medicine. Large language models have the ability to provide support for medical decision-making, patient screening, summarization and creation of technical, scientific, and legal documents, training and education for doctors and occupational health teams, as well as patient education, potentially leading to lower costs, reduced time expenditure, and a lower incidence of human errors. Despite promising results and a wide range of applications, large language models also have significant limitations in terms of their accuracy, the risk of generating false information, and incorrect recommendations. Various ethical aspects that have not been well elucidated by the medical and academic communities should also be considered, and the lack of regulation by government entities can create areas of legal uncertainty regarding their use in Occupational Medicine and in the legal environment. Significant future improvements can be expected in these models in the coming years, and further studies on the applications of large language models in Occupational Medicine should be encouraged.
Collapse
Affiliation(s)
- Mateus Lins dos Santos
- 6ª Vara, Justiça Federal em Sergipe, Itabaiana, SE, Brazil
- 9ª Vara, Justiça Federal em Sergipe, Propriá, SE,
Brazil
| | | |
Collapse
|
67
|
Harrington L. ChatGPT Is Trending: Trust but Verify. AACN Adv Crit Care 2023; 34:280-286. [PMID: 37619604 DOI: 10.4037/aacnacc2023129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/26/2023]
Affiliation(s)
- Linda Harrington
- Linda Harrington is an Independent Consultant, Health Informatics and Digital Strategy, and Adjunct Faculty at Texas Christian University, 2800 South University Drive, Fort Worth, TX 76109
| |
Collapse
|
68
|
Zawiah M, Al-Ashwal FY, Gharaibeh L, Abu Farha R, Alzoubi KH, Abu Hammour K, Qasim QA, Abrah F. ChatGPT and Clinical Training: Perception, Concerns, and Practice of Pharm-D Students. J Multidiscip Healthc 2023; 16:4099-4110. [PMID: 38116306 PMCID: PMC10729768 DOI: 10.2147/jmdh.s439223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Accepted: 12/04/2023] [Indexed: 12/21/2023] Open
Abstract
Background The emergence of Chat-Generative Pre-trained Transformer (ChatGPT) by OpenAI has revolutionized AI technology, demonstrating significant potential in healthcare and pharmaceutical education, yet its real-world applicability in clinical training warrants further investigation. Methods A cross-sectional study was conducted between April and May 2023 to assess PharmD students' perceptions, concerns, and experiences regarding the integration of ChatGPT into clinical pharmacy education. The study utilized a convenient sampling method through online platforms and involved a questionnaire with sections on demographics, perceived benefits, concerns, and experience with ChatGPT. Statistical analysis was performed using SPSS, including descriptive and inferential analyses. Results The findings of the study involving 211 PharmD students revealed that the majority of participants were male (77.3%), and had prior experience with artificial intelligence (68.2%). Over two-thirds were aware of ChatGPT. Most students (n= 139, 65.9%) perceived potential benefits in using ChatGPT for various clinical tasks, with concerns including over-reliance, accuracy, and ethical considerations. Adoption of ChatGPT in clinical training varied, with some students not using it at all, while others utilized it for tasks like evaluating drug-drug interactions and developing care plans. Previous users tended to have higher perceived benefits and lower concerns, but the differences were not statistically significant. Conclusion Utilizing ChatGPT in clinical training offers opportunities, but students' lack of trust in it for clinical decisions highlights the need for collaborative human-ChatGPT decision-making. It should complement healthcare professionals' expertise and be used strategically to compensate for human limitations. Further research is essential to optimize ChatGPT's effective integration.
Collapse
Affiliation(s)
- Mohammed Zawiah
- Department of Clinical Pharmacy, College of Pharmacy, Northern Border University, Rafha, 91911, Saudi Arabia
- Department of Pharmacy Practice, College of Clinical Pharmacy, Hodeidah University, Al Hodeidah, Yemen
| | - Fahmi Y Al-Ashwal
- Department of Clinical Pharmacy, College of Pharmacy, Al-Ayen University, Thi-Qar, Iraq
| | - Lobna Gharaibeh
- Pharmacological and Diagnostic Research Center, Faculty of Pharmacy, Al-Ahliyya Amman University, Amman, Jordan
| | - Rana Abu Farha
- Clinical Pharmacy and Therapeutics Department, Faculty of Pharmacy, Applied Science Private University, Amman, Jordan
| | - Karem H Alzoubi
- Department of Pharmacy Practice and Pharmacotherapeutics, University of Sharjah, Sharjah, 27272, United Arab Emirates
- Department of Clinical Pharmacy, Faculty of Pharmacy, Jordan University of Science and Technology, Irbid, 22110, Jordan
| | - Khawla Abu Hammour
- Department of Clinical Pharmacy and Biopharmaceutics, Faculty of Pharmacy, University of Jordan, Amman, Jordan
| | - Qutaiba A Qasim
- Department of Clinical Pharmacy, College of Pharmacy, Al-Ayen University, Thi-Qar, Iraq
| | - Fahd Abrah
- Discipline of Social and Administrative Pharmacy, School of Pharmaceutical Sciences, Universiti Sains Malaysia, Penang, Malaysia
| |
Collapse
|
69
|
Miao J, Thongprayoon C, Suppadungsuk S, Garcia Valencia OA, Qureshi F, Cheungpasitporn W. Innovating Personalized Nephrology Care: Exploring the Potential Utilization of ChatGPT. J Pers Med 2023; 13:1681. [PMID: 38138908 PMCID: PMC10744377 DOI: 10.3390/jpm13121681] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 12/02/2023] [Accepted: 12/02/2023] [Indexed: 12/24/2023] Open
Abstract
The rapid advancement of artificial intelligence (AI) technologies, particularly machine learning, has brought substantial progress to the field of nephrology, enabling significant improvements in the management of kidney diseases. ChatGPT, a revolutionary language model developed by OpenAI, is a versatile AI model designed to engage in meaningful and informative conversations. Its applications in healthcare have been notable, with demonstrated proficiency in various medical knowledge assessments. However, ChatGPT's performance varies across different medical subfields, posing challenges in nephrology-related queries. At present, comprehensive reviews regarding ChatGPT's potential applications in nephrology remain lacking despite the surge of interest in its role in various domains. This article seeks to fill this gap by presenting an overview of the integration of ChatGPT in nephrology. It discusses the potential benefits of ChatGPT in nephrology, encompassing dataset management, diagnostics, treatment planning, and patient communication and education, as well as medical research and education. It also explores ethical and legal concerns regarding the utilization of AI in medical practice. The continuous development of AI models like ChatGPT holds promise for the healthcare realm but also underscores the necessity of thorough evaluation and validation before implementing AI in real-world medical scenarios. This review serves as a valuable resource for nephrologists and healthcare professionals interested in fully utilizing the potential of AI in innovating personalized nephrology care.
Collapse
Affiliation(s)
- Jing Miao
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (C.T.); (S.S.); (O.A.G.V.); (F.Q.)
| | - Charat Thongprayoon
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (C.T.); (S.S.); (O.A.G.V.); (F.Q.)
| | - Supawadee Suppadungsuk
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (C.T.); (S.S.); (O.A.G.V.); (F.Q.)
- Chakri Naruebodindra Medical Institute, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Samut Prakan 10540, Thailand
| | - Oscar A. Garcia Valencia
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (C.T.); (S.S.); (O.A.G.V.); (F.Q.)
| | - Fawad Qureshi
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (C.T.); (S.S.); (O.A.G.V.); (F.Q.)
| | - Wisit Cheungpasitporn
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (C.T.); (S.S.); (O.A.G.V.); (F.Q.)
| |
Collapse
|
70
|
Wang G, Liu Q, Chen G, Xia B, Zeng D, Chen G, Guo C. AI's deep dive into complex pediatric inguinal hernia issues: a challenge to traditional guidelines? Hernia 2023; 27:1587-1599. [PMID: 37843604 DOI: 10.1007/s10029-023-02900-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 09/19/2023] [Indexed: 10/17/2023]
Abstract
OBJECTIVE This study utilized ChatGPT, an artificial intelligence program based on large language models, to explore controversial issues in pediatric inguinal hernia surgery and compare its responses with the guidelines of the European Association of Pediatric Surgeons (EUPSA). METHODS Six contentious issues raised by EUPSA were submitted to ChatGPT 4.0 for analysis, for which two independent responses were generated for each issue. These generated answers were subsequently compared with systematic reviews and guidelines. To ensure content accuracy and reliability, a content analysis was conducted, and expert evaluations were solicited for validation. Content analysis evaluated the consistency or discrepancy between ChatGPT 4.0's responses and the guidelines. An expert scoring method assess the quality, reliability, and applicability of responses. The TF-IDF model tested the stability and consistency of the two responses. RESULTS The responses generated by ChatGPT 4.0 were mostly consistent with the guidelines. However, some differences and contradictions were noted. The average quality score was 3.33, reliability score was 2.75, and applicability score was 3.46 (out of 5). The average similarity between the two responses was 0.72 (out of 1), Content analysis and expert ratings yielded consistent conclusions, enhancing the credibility of our research. CONCLUSION ChatGPT can provide valuable responses to clinical questions, but it has limitations and requires further improvement. It is recommended to combine ChatGPT with other reliable data sources to improve clinical practice and decision-making.
Collapse
Affiliation(s)
- G Wang
- Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China
- Department of Pediatrics, Children's Hospital, Chongqing Medical University, Chongqing, People's Republic of China
- Department of Pediatric General Surgery, Chongqing Maternal and Child Health Hospital, Chongqing Medical University, Chongqing, People's Republic of China
| | - Q Liu
- Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China
- Department of Fetus and Pediatrics, Chongqing Health Center for Women and Children, Chongqing, People's Republic of China
| | - G Chen
- Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China
- Department of Fetus and Pediatrics, Chongqing Health Center for Women and Children, Chongqing, People's Republic of China
| | - B Xia
- Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China
- Department of Fetus and Pediatrics, Chongqing Health Center for Women and Children, Chongqing, People's Republic of China
| | - D Zeng
- Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China
- Department of Fetus and Pediatrics, Chongqing Health Center for Women and Children, Chongqing, People's Republic of China
| | - G Chen
- Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China.
- Department of Fetus and Pediatrics, Chongqing Health Center for Women and Children, Chongqing, People's Republic of China.
- Department of Pediatric General Surgery, Chongqing Maternal and Child Health Hospital, Chongqing Medical University, Chongqing, People's Republic of China.
- Department of Obstetrics and Gynecology, Chongqing Health Center for Women and Children, Women and Children's Hospital of Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China.
| | - C Guo
- Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China.
- Department of Fetus and Pediatrics, Chongqing Health Center for Women and Children, Chongqing, People's Republic of China.
- Department of Pediatric General Surgery, Chongqing Maternal and Child Health Hospital, Chongqing Medical University, Chongqing, People's Republic of China.
| |
Collapse
|
71
|
Levartovsky A, Ben-Horin S, Kopylov U, Klang E, Barash Y. Towards AI-Augmented Clinical Decision-Making: An Examination of ChatGPT's Utility in Acute Ulcerative Colitis Presentations. Am J Gastroenterol 2023; 118:2283-2289. [PMID: 37611254 DOI: 10.14309/ajg.0000000000002483] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 08/04/2023] [Indexed: 08/25/2023]
Abstract
This study explores the potential of OpenAI's ChatGPT as a decision support tool for acute ulcerative colitis presentations in the setting of an emergency department. We assessed ChatGPT's performance in determining disease severity using TrueLove and Witts criteria and the necessity of hospitalization for patients with ulcerative colitis, comparing results with those of expert gastroenterologists. Of 20 cases, ChatGPT's assessments were found to be 80% consistent with gastroenterologist evaluations and indicated a high degree of reliability. This suggests that ChatGPT could provide as a clinical decision support tool in assessing acute ulcerative colitis, serving as an adjunct to clinical judgment.
Collapse
Affiliation(s)
- Asaf Levartovsky
- Department of Gastroenterology, Sheba Medical Center, Affiliated to Tel Aviv University, Tel Aviv, Israel
| | - Shomron Ben-Horin
- Department of Gastroenterology, Sheba Medical Center, Affiliated to Tel Aviv University, Tel Aviv, Israel
| | - Uri Kopylov
- Department of Gastroenterology, Sheba Medical Center, Affiliated to Tel Aviv University, Tel Aviv, Israel
| | - Eyal Klang
- Department of Diagnostic Imaging, Sheba Medical Center, Affiliated to Tel Aviv University, Tel Aviv, Israel
- DeepVision Lab, Sheba Medical Center, Affiliated to Tel Aviv University, Tel Aviv, Israel
| | - Yiftach Barash
- Department of Diagnostic Imaging, Sheba Medical Center, Affiliated to Tel Aviv University, Tel Aviv, Israel
- DeepVision Lab, Sheba Medical Center, Affiliated to Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
72
|
Benary M, Wang XD, Schmidt M, Soll D, Hilfenhaus G, Nassir M, Sigler C, Knödler M, Keller U, Beule D, Keilholz U, Leser U, Rieke DT. Leveraging Large Language Models for Decision Support in Personalized Oncology. JAMA Netw Open 2023; 6:e2343689. [PMID: 37976064 PMCID: PMC10656647 DOI: 10.1001/jamanetworkopen.2023.43689] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 10/04/2023] [Indexed: 11/19/2023] Open
Abstract
Importance Clinical interpretation of complex biomarkers for precision oncology currently requires manual investigations of previous studies and databases. Conversational large language models (LLMs) might be beneficial as automated tools for assisting clinical decision-making. Objective To assess performance and define their role using 4 recent LLMs as support tools for precision oncology. Design, Setting, and Participants This diagnostic study examined 10 fictional cases of patients with advanced cancer with genetic alterations. Each case was submitted to 4 different LLMs (ChatGPT, Galactica, Perplexity, and BioMedLM) and 1 expert physician to identify personalized treatment options in 2023. Treatment options were masked and presented to a molecular tumor board (MTB), whose members rated the likelihood of a treatment option coming from an LLM on a scale from 0 to 10 (0, extremely unlikely; 10, extremely likely) and decided whether the treatment option was clinically useful. Main Outcomes and Measures Number of treatment options, precision, recall, F1 score of LLMs compared with human experts, recognizability, and usefulness of recommendations. Results For 10 fictional cancer patients (4 with lung cancer, 6 with other; median [IQR] 3.5 [3.0-4.8] molecular alterations per patient), a median (IQR) number of 4.0 (4.0-4.0) compared with 3.0 (3.0-5.0), 7.5 (4.3-9.8), 11.5 (7.8-13.0), and 13.0 (11.3-21.5) treatment options each was identified by the human expert and 4 LLMs, respectively. When considering the expert as a criterion standard, LLM-proposed treatment options reached F1 scores of 0.04, 0.17, 0.14, and 0.19 across all patients combined. Combining treatment options from different LLMs allowed a precision of 0.29 and a recall of 0.29 for an F1 score of 0.29. LLM-generated treatment options were recognized as AI-generated with a median (IQR) 7.5 (5.3-9.0) points in contrast to 2.0 (1.0-3.0) points for manually annotated cases. A crucial reason for identifying AI-generated treatment options was insufficient accompanying evidence. For each patient, at least 1 LLM generated a treatment option that was considered helpful by MTB members. Two unique useful treatment options (including 1 unique treatment strategy) were identified only by LLM. Conclusions and Relevance In this diagnostic study, treatment options of LLMs in precision oncology did not reach the quality and credibility of human experts; however, they generated helpful ideas that might have complemented established procedures. Considering technological progress, LLMs could play an increasingly important role in assisting with screening and selecting relevant biomedical literature to support evidence-based, personalized treatment decisions.
Collapse
Affiliation(s)
- Manuela Benary
- Charité Comprehensive Cancer Center, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Core Unit Bioinformatics, Berlin Institute of Health at Charité–Universitätsmedizin Berlin, Charitéplatz 1, Berlin, Germany
| | - Xing David Wang
- Knowledge Management in Bioinformatics, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Max Schmidt
- Charité Comprehensive Cancer Center, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Department of Hematology, Oncology and Cancer Immunology, Campus Benjamin Franklin, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Dominik Soll
- Charité Comprehensive Cancer Center, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Department of Endocrinology and Metabolic Diseases, Charité Universitätsmedizin Berlin, Department of Endocrinology and Metabolic Diseases, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Georg Hilfenhaus
- Charité Comprehensive Cancer Center, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Department of Hematology, Oncology and Cancer Immunology, Campus Charité Mitte, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Mani Nassir
- Charité Comprehensive Cancer Center, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Department of Hematology, Oncology and Cancer Immunology, Campus Charité Mitte, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Christian Sigler
- Charité Comprehensive Cancer Center, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Maren Knödler
- Charité Comprehensive Cancer Center, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Ulrich Keller
- Department of Hematology, Oncology and Cancer Immunology, Campus Benjamin Franklin, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- German Cancer Consortium and German Cancer Research Center, Partner Site Berlin, Germany
| | - Dieter Beule
- Core Unit Bioinformatics, Berlin Institute of Health at Charité–Universitätsmedizin Berlin, Charitéplatz 1, Berlin, Germany
| | - Ulrich Keilholz
- Charité Comprehensive Cancer Center, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- German Cancer Consortium and German Cancer Research Center, Partner Site Berlin, Germany
| | - Ulf Leser
- Knowledge Management in Bioinformatics, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Damian T. Rieke
- Charité Comprehensive Cancer Center, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Department of Hematology, Oncology and Cancer Immunology, Campus Benjamin Franklin, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- German Cancer Consortium and German Cancer Research Center, Partner Site Berlin, Germany
| |
Collapse
|
73
|
Waters MR, Aneja S, Hong JC. Unlocking the Power of ChatGPT, Artificial Intelligence, and Large Language Models: Practical Suggestions for Radiation Oncologists. Pract Radiat Oncol 2023; 13:e484-e490. [PMID: 37598727 DOI: 10.1016/j.prro.2023.06.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 06/28/2023] [Accepted: 06/29/2023] [Indexed: 08/22/2023]
Abstract
Recent advances in artificial intelligence (AI), such as generative AI and large language models (LLMs), have generated significant excitement about the potential of AI to revolutionize our lives, work, and interaction with technology. This article explores the practical applications of LLMs, particularly ChatGPT, in the field of radiation oncology. We offer a guide on how radiation oncologists can interact with LLMs like ChatGPT in their routine clinical and administrative tasks, highlighting potential use cases of the present and future. We also highlight limitations and ethical considerations, including the current state of LLMs in decision making, protection of sensitive data, and the important role of human review of AI-generated content.
Collapse
Affiliation(s)
- Michael R Waters
- Department of Radiation Oncology, Washington University School of Medicine, St. Louis, Missouri
| | - Sanjay Aneja
- Department of Radiation Oncology, Yale School of Medicine, New Haven, Connecticut
| | - Julian C Hong
- Department of Radiation Oncology and Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California.
| |
Collapse
|
74
|
Yu P, Xu H, Hu X, Deng C. Leveraging Generative AI and Large Language Models: A Comprehensive Roadmap for Healthcare Integration. Healthcare (Basel) 2023; 11:2776. [PMID: 37893850 PMCID: PMC10606429 DOI: 10.3390/healthcare11202776] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 10/13/2023] [Accepted: 10/17/2023] [Indexed: 10/29/2023] Open
Abstract
Generative artificial intelligence (AI) and large language models (LLMs), exemplified by ChatGPT, are promising for revolutionizing data and information management in healthcare and medicine. However, there is scant literature guiding their integration for non-AI professionals. This study conducts a scoping literature review to address the critical need for guidance on integrating generative AI and LLMs into healthcare and medical practices. It elucidates the distinct mechanisms underpinning these technologies, such as Reinforcement Learning from Human Feedback (RLFH), including few-shot learning and chain-of-thought reasoning, which differentiates them from traditional, rule-based AI systems. It requires an inclusive, collaborative co-design process that engages all pertinent stakeholders, including clinicians and consumers, to achieve these benefits. Although global research is examining both opportunities and challenges, including ethical and legal dimensions, LLMs offer promising advancements in healthcare by enhancing data management, information retrieval, and decision-making processes. Continued innovation in data acquisition, model fine-tuning, prompt strategy development, evaluation, and system implementation is imperative for realizing the full potential of these technologies. Organizations should proactively engage with these technologies to improve healthcare quality, safety, and efficiency, adhering to ethical and legal guidelines for responsible application.
Collapse
Affiliation(s)
- Ping Yu
- School of Computing and Information Technology, University of Wollongong, Wollongong, NSW 2522, Australia
| | - Hua Xu
- Section of Biomedical Informatics and Data Science, Yale School of Medicine, 100 College Street, Fl 9, New Haven, CT 06510, USA;
| | - Xia Hu
- Department of Computer Science, Rice University, P.O. Box 1892, Houston, TX 77251-1892, USA;
| | - Chao Deng
- School of Medical, Indigenous and Health Sciences, University of Wollongong, Wollongong, NSW 2522, Australia;
| |
Collapse
|
75
|
Draschl A, Hauer G, Fischerauer SF, Kogler A, Leitner L, Andreou D, Leithner A, Sadoghi P. Are ChatGPT's Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful? J Clin Med 2023; 12:6655. [PMID: 37892793 PMCID: PMC10607052 DOI: 10.3390/jcm12206655] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 10/12/2023] [Accepted: 10/17/2023] [Indexed: 10/29/2023] Open
Abstract
BACKGROUND This study aimed to evaluate ChatGPT's performance on questions about periprosthetic joint infections (PJI) of the hip and knee. METHODS Twenty-seven questions from the 2018 International Consensus Meeting on Musculoskeletal Infection were selected for response generation. The free-text responses were evaluated by three orthopedic surgeons using a five-point Likert scale. Inter-rater reliability (IRR) was assessed via Fleiss' kappa (FK). RESULTS Overall, near-perfect IRR was found for disagreement on the presence of factual errors (FK: 0.880, 95% CI [0.724, 1.035], p < 0.001) and agreement on information completeness (FK: 0.848, 95% CI [0.699, 0.996], p < 0.001). Substantial IRR was observed for disagreement on misleading information (FK: 0.743, 95% CI [0.601, 0.886], p < 0.001) and agreement on suitability for patients (FK: 0.627, 95% CI [0.478, 0.776], p < 0.001). Moderate IRR was observed for agreement on "up-to-dateness" (FK: 0.584, 95% CI [0.434, 0.734], p < 0.001) and suitability for orthopedic surgeons (FK: 0.505, 95% CI [0.383, 0.628], p < 0.001). Question- and subtopic-specific analysis revealed diverse IRR levels ranging from near-perfect to poor. CONCLUSIONS ChatGPT's free-text responses to complex orthopedic questions were predominantly reliable and useful for orthopedic surgeons and patients. Given variations in performance by question and subtopic, consulting additional sources and exercising careful interpretation should be emphasized for reliable medical decision-making.
Collapse
Affiliation(s)
- Alexander Draschl
- Department of Orthopedics and Trauma, Medical University of Graz, Auenbruggerplatz 5, 8036 Graz, Austria
- Division of Plastic, Aesthetic and Reconstructive Surgery, Department of Surgery, Medical University of Graz, Auenbruggerplatz 29/4, 8036 Graz, Austria
| | - Georg Hauer
- Department of Orthopedics and Trauma, Medical University of Graz, Auenbruggerplatz 5, 8036 Graz, Austria
| | - Stefan Franz Fischerauer
- Department of Orthopedics and Trauma, Medical University of Graz, Auenbruggerplatz 5, 8036 Graz, Austria
| | - Angelika Kogler
- Department of Orthopedics and Trauma, Medical University of Graz, Auenbruggerplatz 5, 8036 Graz, Austria
- Department of Dermatology and Venereology, Medical University of Graz, Auenbruggerplatz 8, 8036 Graz, Austria
| | - Lukas Leitner
- Department of Orthopedics and Trauma, Medical University of Graz, Auenbruggerplatz 5, 8036 Graz, Austria
| | - Dimosthenis Andreou
- Department of Orthopedics and Trauma, Medical University of Graz, Auenbruggerplatz 5, 8036 Graz, Austria
| | - Andreas Leithner
- Department of Orthopedics and Trauma, Medical University of Graz, Auenbruggerplatz 5, 8036 Graz, Austria
| | - Patrick Sadoghi
- Department of Orthopedics and Trauma, Medical University of Graz, Auenbruggerplatz 5, 8036 Graz, Austria
| |
Collapse
|
76
|
Wosny M, Strasser LM, Hastings J. Experience of Health Care Professionals Using Digital Tools in the Hospital: Qualitative Systematic Review. JMIR Hum Factors 2023; 10:e50357. [PMID: 37847535 PMCID: PMC10618886 DOI: 10.2196/50357] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 08/25/2023] [Accepted: 08/25/2023] [Indexed: 10/18/2023] Open
Abstract
BACKGROUND The digitalization of health care has many potential benefits, but it may also negatively impact health care professionals' well-being. Burnout can, in part, result from inefficient work processes related to the suboptimal implementation and use of health information technologies. Although strategies to reduce stress and mitigate clinician burnout typically involve individual-based interventions, emerging evidence suggests that improving the experience of using health information technologies can have a notable impact. OBJECTIVE The aim of this systematic review was to collect evidence of the benefits and challenges associated with the use of digital tools in hospital settings with a particular focus on the experiences of health care professionals using these tools. METHODS We conducted a systematic literature review following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines to explore the experience of health care professionals with digital tools in hospital settings. Using a rigorous selection process to ensure the methodological quality and validity of the study results, we included qualitative studies with distinct data that described the experiences of physicians and nurses. A panel of 3 independent researchers performed iterative data analysis and identified thematic constructs. RESULTS Of the 1175 unique primary studies, we identified 17 (1.45%) publications that focused on health care professionals' experiences with various digital tools in their day-to-day practice. Of the 17 studies, 10 (59%) focused on clinical decision support tools, followed by 6 (35%) studies focusing on electronic health records and 1 (6%) on a remote patient-monitoring tool. We propose a theoretical framework for understanding the complex interplay between the use of digital tools, experience, and outcomes. We identified 6 constructs that encompass the positive and negative experiences of health care professionals when using digital tools, along with moderators and outcomes. Positive experiences included feeling confident, responsible, and satisfied, whereas negative experiences included frustration, feeling overwhelmed, and feeling frightened. Positive moderators that may reinforce the use of digital tools included sufficient training and adequate workflow integration, whereas negative moderators comprised unfavorable social structures and the lack of training. Positive outcomes included improved patient care and increased workflow efficiency, whereas negative outcomes included increased workload, increased safety risks, and issues with information quality. CONCLUSIONS Although positive and negative outcomes and moderators that may affect the use of digital tools were commonly reported, the experiences of health care professionals, such as their thoughts and emotions, were less frequently discussed. On the basis of this finding, this study highlights the need for further research specifically targeting experiences as an important mediator of clinician well-being. It also emphasizes the importance of considering differences in the nature of specific tools as well as the profession and role of individual users. TRIAL REGISTRATION PROSPERO CRD42023393883; https://tinyurl.com/2htpzzxj.
Collapse
Affiliation(s)
- Marie Wosny
- School of Medicine, University of St Gallen (HSG), St Gallen, Switzerland
| | | | - Janna Hastings
- School of Medicine, University of St Gallen (HSG), St Gallen, Switzerland
- Institute for Implementation Science in Health Care, University of Zurich (UZH), Zurich, Switzerland
| |
Collapse
|
77
|
Goodman RS, Patrinely JR, Stone CA, Zimmerman E, Donald RR, Chang SS, Berkowitz ST, Finn AP, Jahangir E, Scoville EA, Reese TS, Friedman DL, Bastarache JA, van der Heijden YF, Wright JJ, Ye F, Carter N, Alexander MR, Choe JH, Chastain CA, Zic JA, Horst SN, Turker I, Agarwal R, Osmundson E, Idrees K, Kiernan CM, Padmanabhan C, Bailey CE, Schlegel CE, Chambless LB, Gibson MK, Osterman TJ, Wheless LE, Johnson DB. Accuracy and Reliability of Chatbot Responses to Physician Questions. JAMA Netw Open 2023; 6:e2336483. [PMID: 37782499 PMCID: PMC10546234 DOI: 10.1001/jamanetworkopen.2023.36483] [Citation(s) in RCA: 58] [Impact Index Per Article: 58.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 08/22/2023] [Indexed: 10/03/2023] Open
Abstract
Importance Natural language processing tools, such as ChatGPT (generative pretrained transformer, hereafter referred to as chatbot), have the potential to radically enhance the accessibility of medical information for health professionals and patients. Assessing the safety and efficacy of these tools in answering physician-generated questions is critical to determining their suitability in clinical settings, facilitating complex decision-making, and optimizing health care efficiency. Objective To assess the accuracy and comprehensiveness of chatbot-generated responses to physician-developed medical queries, highlighting the reliability and limitations of artificial intelligence-generated medical information. Design, Setting, and Participants Thirty-three physicians across 17 specialties generated 284 medical questions that they subjectively classified as easy, medium, or hard with either binary (yes or no) or descriptive answers. The physicians then graded the chatbot-generated answers to these questions for accuracy (6-point Likert scale with 1 being completely incorrect and 6 being completely correct) and completeness (3-point Likert scale, with 1 being incomplete and 3 being complete plus additional context). Scores were summarized with descriptive statistics and compared using the Mann-Whitney U test or the Kruskal-Wallis test. The study (including data analysis) was conducted from January to May 2023. Main Outcomes and Measures Accuracy, completeness, and consistency over time and between 2 different versions (GPT-3.5 and GPT-4) of chatbot-generated medical responses. Results Across all questions (n = 284) generated by 33 physicians (31 faculty members and 2 recent graduates from residency or fellowship programs) across 17 specialties, the median accuracy score was 5.5 (IQR, 4.0-6.0) (between almost completely and complete correct) with a mean (SD) score of 4.8 (1.6) (between mostly and almost completely correct). The median completeness score was 3.0 (IQR, 2.0-3.0) (complete and comprehensive) with a mean (SD) score of 2.5 (0.7). For questions rated easy, medium, and hard, the median accuracy scores were 6.0 (IQR, 5.0-6.0), 5.5 (IQR, 5.0-6.0), and 5.0 (IQR, 4.0-6.0), respectively (mean [SD] scores were 5.0 [1.5], 4.7 [1.7], and 4.6 [1.6], respectively; P = .05). Accuracy scores for binary and descriptive questions were similar (median score, 6.0 [IQR, 4.0-6.0] vs 5.0 [IQR, 3.4-6.0]; mean [SD] score, 4.9 [1.6] vs 4.7 [1.6]; P = .07). Of 36 questions with scores of 1.0 to 2.0, 34 were requeried or regraded 8 to 17 days later with substantial improvement (median score 2.0 [IQR, 1.0-3.0] vs 4.0 [IQR, 2.0-5.3]; P < .01). A subset of questions, regardless of initial scores (version 3.5), were regenerated and rescored using version 4 with improvement (mean accuracy [SD] score, 5.2 [1.5] vs 5.7 [0.8]; median score, 6.0 [IQR, 5.0-6.0] for original and 6.0 [IQR, 6.0-6.0] for rescored; P = .002). Conclusions and Relevance In this cross-sectional study, chatbot generated largely accurate information to diverse medical queries as judged by academic physician specialists with improvement over time, although it had important limitations. Further research and model development are needed to correct inaccuracies and for validation.
Collapse
Affiliation(s)
| | - J. Randall Patrinely
- Department of Dermatology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Cosby A. Stone
- Department of Allergy, Pulmonology, and Critical Care, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Eli Zimmerman
- Department of Neurology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Rebecca R. Donald
- Department of Anesthesiology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Sam S. Chang
- Department of Urology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Sean T. Berkowitz
- Vanderbilt Eye Institute, Department of Ophthalmology, Vanderbilt University Medical, Nashville, Tennessee
| | - Avni P. Finn
- Vanderbilt Eye Institute, Department of Ophthalmology, Vanderbilt University Medical, Nashville, Tennessee
| | - Eiman Jahangir
- Department of Cardiovascular Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Elizabeth A. Scoville
- Department of Gastroenterology, Hepatology, and Nutrition, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Tyler S. Reese
- Department of Rheumatology and Immunology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Debra L. Friedman
- Department of Pediatric Hematology/Oncology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Julie A. Bastarache
- Department of Allergy, Pulmonology, and Critical Care, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Yuri F. van der Heijden
- Department of Infectious Disease, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Jordan J. Wright
- Department of Diabetes, Endocrinology, and Metabolism, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Fei Ye
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Nicholas Carter
- Division of Trauma and Surgical Critical Care, University of Miami Miller School of Medicine, Miami, Florida
| | - Matthew R. Alexander
- Department of Cardiovascular Medicine and Clinical Pharmacology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Jennifer H. Choe
- Department of Hematology/Oncology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Cody A. Chastain
- Department of Infectious Disease, Vanderbilt University Medical Center, Nashville, Tennessee
| | - John A. Zic
- Department of Dermatology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Sara N. Horst
- Department of Gastroenterology, Hepatology, and Nutrition, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Isik Turker
- Department of Cardiology, Washington University School of Medicine in St Louis, St Louis, Missouri
| | - Rajiv Agarwal
- Department of Hematology/Oncology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Evan Osmundson
- Department of Radiation Oncology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Kamran Idrees
- Department of Surgical Oncology & Endocrine Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Colleen M. Kiernan
- Department of Surgical Oncology & Endocrine Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Chandrasekhar Padmanabhan
- Department of Surgical Oncology & Endocrine Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Christina E. Bailey
- Department of Surgical Oncology & Endocrine Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Cameron E. Schlegel
- Department of Surgical Oncology & Endocrine Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Lola B. Chambless
- Department of Neurological Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Michael K. Gibson
- Department of Hematology/Oncology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Travis J. Osterman
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Lee E. Wheless
- Department of Dermatology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Douglas B. Johnson
- Department of Hematology/Oncology, Vanderbilt University Medical Center, Nashville, Tennessee
| |
Collapse
|
78
|
Ong J, Hariprasad SM, Chhablani J. ChatGPT and GPT-4 in Ophthalmology: Applications of Large Language Model Artificial Intelligence in Retina. Ophthalmic Surg Lasers Imaging Retina 2023; 54:557-562. [PMID: 37847163 DOI: 10.3928/23258160-20230926-01] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2023]
|
79
|
Eguia H, Sanz García JF. [Artificial intelligence, ChatGPT and primary care]. Semergen 2023; 49:102069. [PMID: 37647848 DOI: 10.1016/j.semerg.2023.102069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 07/13/2023] [Indexed: 09/01/2023]
Affiliation(s)
- Hans Eguia
- Rudkøbing lægehuset, Dinamarca, Miembro del grupo de trabajo de nuevas tecnologías SEMERGEN, miembro DSAM - Dinamarca.
| | - Javier Francisco Sanz García
- Conselleria de Sanitat Valenciana, Coordinador del grupo de trabajo de nuevas tecnologías SEMERGEN. https://twitter.com/@javikin84
| |
Collapse
|
80
|
Liu J, Zheng J, Cai X, Wu D, Yin C. A descriptive study based on the comparison of ChatGPT and evidence-based neurosurgeons. iScience 2023; 26:107590. [PMID: 37705958 PMCID: PMC10495632 DOI: 10.1016/j.isci.2023.107590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 06/21/2023] [Accepted: 08/04/2023] [Indexed: 09/15/2023] Open
Abstract
ChatGPT is an artificial intelligence product developed by OpenAI. This study aims to investigate whether ChatGPT can respond in accordance with evidence-based medicine in neurosurgery. We generated 50 neurosurgical questions covering neurosurgical diseases. Each question was posed three times to GPT-3.5 and GPT-4.0. We also recruited three neurosurgeons with high, middle, and low seniority to respond to questions. The results were analyzed regarding ChatGPT's overall performance score, mean scores by the items' specialty classification, and question type. In conclusion, GPT-3.5's ability to respond in accordance with evidence-based medicine was comparable to that of neurosurgeons with low seniority, and GPT-4.0's ability was comparable to that of neurosurgeons with high seniority. Although ChatGPT is yet to be comparable to a neurosurgeon with high seniority, future upgrades could enhance its performance and abilities.
Collapse
Affiliation(s)
- Jiayu Liu
- Department of Neurosurgery, the First Medical Centre, Chinese PLA General Hospital, Beijing 100853, China
| | - Jiqi Zheng
- School of Health Humanities, Peking University, Beijing 100191, China
| | - Xintian Cai
- Department of Graduate School, Xinjiang Medical University, Urumqi 830001, China
| | - Dongdong Wu
- Department of Information, Daping Hospital, Army Medical University, Chongqing 400042, China
| | - Chengliang Yin
- Faculty of Medicine, Macau University of Science and Technology, Macau 999078, China
| |
Collapse
|
81
|
Unleashing the Potential of ChatGPT. ARTIFICIAL INTELLIGENCE APPLICATIONS USING CHATGPT IN EDUCATION 2023:84-92. [DOI: 10.4018/978-1-6684-9300-7.ch008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
ChatGPT's extensive and strong capabilities make it useful for various applications, such as scholarly research, content creation, and language translation. Natural language processing is revolutionized by its capacity to produce human-like writing and deliver pertinent information quickly and accurately. Artificial intelligence aids in class material preparation can improve the efficiency, accessibility, engagement, personalization, and collaboration of the teaching and learning process. As technology evolves, it will likely become an even more essential academic tool. Academics can benefit greatly from ChatGPT's ability to provide content suited to particular subject matter or writing styles. It enables individuals to provide instructional materials that are more precise, pertinent, and consistent with their own preferred writing style. The accessibility of ChatGPT is a major advantage for academics, as it allows them to create class materials quickly and conveniently from anywhere with an internet connection.
Collapse
|
82
|
Khlaif ZN, Mousa A, Hattab MK, Itmazi J, Hassan AA, Sanmugam M, Ayyoub A. The Potential and Concerns of Using AI in Scientific Research: ChatGPT Performance Evaluation. JMIR MEDICAL EDUCATION 2023; 9:e47049. [PMID: 37707884 PMCID: PMC10636627 DOI: 10.2196/47049] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 06/04/2023] [Accepted: 07/21/2023] [Indexed: 09/15/2023]
Abstract
BACKGROUND Artificial intelligence (AI) has many applications in various aspects of our daily life, including health, criminal, education, civil, business, and liability law. One aspect of AI that has gained significant attention is natural language processing (NLP), which refers to the ability of computers to understand and generate human language. OBJECTIVE This study aims to examine the potential for, and concerns of, using AI in scientific research. For this purpose, high-impact research articles were generated by analyzing the quality of reports generated by ChatGPT and assessing the application's impact on the research framework, data analysis, and the literature review. The study also explored concerns around ownership and the integrity of research when using AI-generated text. METHODS A total of 4 articles were generated using ChatGPT, and thereafter evaluated by 23 reviewers. The researchers developed an evaluation form to assess the quality of the articles generated. Additionally, 50 abstracts were generated using ChatGPT and their quality was evaluated. The data were subjected to ANOVA and thematic analysis to analyze the qualitative data provided by the reviewers. RESULTS When using detailed prompts and providing the context of the study, ChatGPT would generate high-quality research that could be published in high-impact journals. However, ChatGPT had a minor impact on developing the research framework and data analysis. The primary area needing improvement was the development of the literature review. Moreover, reviewers expressed concerns around ownership and the integrity of the research when using AI-generated text. Nonetheless, ChatGPT has a strong potential to increase human productivity in research and can be used in academic writing. CONCLUSIONS AI-generated text has the potential to improve the quality of high-impact research articles. The findings of this study suggest that decision makers and researchers should focus more on the methodology part of the research, which includes research design, developing research tools, and analyzing data in depth, to draw strong theoretical and practical implications, thereby establishing a revolution in scientific research in the era of AI. The practical implications of this study can be used in different fields such as medical education to deliver materials to develop the basic competencies for both medicine students and faculty members.
Collapse
Affiliation(s)
- Zuheir N Khlaif
- Faculty of Humanities and Educational Sciences, An-Najah National University, Nablus, Occupied Palestinian Territory
| | - Allam Mousa
- Artificial Intelligence and Virtual Reality Research Center, Department of Electrical and Computer Engineering, An Najah National University, Nablus, Occupied Palestinian Territory
| | - Muayad Kamal Hattab
- Faculty of Law and Political Sciences, An-Najah National University, Nablus, Occupied Palestinian Territory
| | - Jamil Itmazi
- Department of Information Technology, College of Engineering and Information Technology, Palestine Ahliya University, Bethlahem, Occupied Palestinian Territory
| | - Amjad A Hassan
- Faculty of Law and Political Sciences, An-Najah National University, Nablus, Occupied Palestinian Territory
| | - Mageswaran Sanmugam
- Centre for Instructional Technology and Multimedia, Universiti Sains Malaysia, Penang, Malaysia
| | - Abedalkarim Ayyoub
- Faculty of Humanities and Educational Sciences, An-Najah National University, Nablus, Occupied Palestinian Territory
| |
Collapse
|
83
|
Huang Y, Gomaa A, Semrau S, Haderlein M, Lettmaier S, Weissmann T, Grigo J, Tkhayat HB, Frey B, Gaipl U, Distel L, Maier A, Fietkau R, Bert C, Putz F. Benchmarking ChatGPT-4 on a radiation oncology in-training exam and Red Journal Gray Zone cases: potentials and challenges for ai-assisted medical education and decision making in radiation oncology. Front Oncol 2023; 13:1265024. [PMID: 37790756 PMCID: PMC10543650 DOI: 10.3389/fonc.2023.1265024] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 08/23/2023] [Indexed: 10/05/2023] Open
Abstract
Purpose The potential of large language models in medicine for education and decision-making purposes has been demonstrated as they have achieved decent scores on medical exams such as the United States Medical Licensing Exam (USMLE) and the MedQA exam. This work aims to evaluate the performance of ChatGPT-4 in the specialized field of radiation oncology. Methods The 38th American College of Radiology (ACR) radiation oncology in-training (TXIT) exam and the 2022 Red Journal Gray Zone cases are used to benchmark the performance of ChatGPT-4. The TXIT exam contains 300 questions covering various topics of radiation oncology. The 2022 Gray Zone collection contains 15 complex clinical cases. Results For the TXIT exam, ChatGPT-3.5 and ChatGPT-4 have achieved the scores of 62.05% and 78.77%, respectively, highlighting the advantage of the latest ChatGPT-4 model. Based on the TXIT exam, ChatGPT-4's strong and weak areas in radiation oncology are identified to some extent. Specifically, ChatGPT-4 demonstrates better knowledge of statistics, CNS & eye, pediatrics, biology, and physics than knowledge of bone & soft tissue and gynecology, as per the ACR knowledge domain. Regarding clinical care paths, ChatGPT-4 performs better in diagnosis, prognosis, and toxicity than brachytherapy and dosimetry. It lacks proficiency in in-depth details of clinical trials. For the Gray Zone cases, ChatGPT-4 is able to suggest a personalized treatment approach to each case with high correctness and comprehensiveness. Importantly, it provides novel treatment aspects for many cases, which are not suggested by any human experts. Conclusion Both evaluations demonstrate the potential of ChatGPT-4 in medical education for the general public and cancer patients, as well as the potential to aid clinical decision-making, while acknowledging its limitations in certain domains. Owing to the risk of hallucinations, it is essential to verify the content generated by models such as ChatGPT for accuracy.
Collapse
Affiliation(s)
- Yixing Huang
- Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
| | - Ahmed Gomaa
- Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
| | - Sabine Semrau
- Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
| | - Marlen Haderlein
- Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
| | - Sebastian Lettmaier
- Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
| | - Thomas Weissmann
- Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
| | - Johanna Grigo
- Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
| | - Hassen Ben Tkhayat
- Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
- Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Benjamin Frey
- Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
| | - Udo Gaipl
- Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
| | - Luitpold Distel
- Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
| | - Andreas Maier
- Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Rainer Fietkau
- Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
| | - Christoph Bert
- Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
| | - Florian Putz
- Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
| |
Collapse
|
84
|
Pugliese G, Maccari A, Felisati E, Felisati G, Giudici L, Rapolla C, Pisani A, Saibene AM. Are artificial intelligence large language models a reliable tool for difficult differential diagnosis? An a posteriori analysis of a peculiar case of necrotizing otitis externa. Clin Case Rep 2023; 11:e7933. [PMID: 37736475 PMCID: PMC10509342 DOI: 10.1002/ccr3.7933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 08/30/2023] [Accepted: 09/12/2023] [Indexed: 09/23/2023] Open
Abstract
Key Clinical Message Large language models have made artificial intelligence readily available to the general public and potentially have a role in healthcare; however, their use in difficult differential diagnosis is still limited, as demonstrated by a case of necrotizing otitis externa. Abstract This case report presents a peculiar case of necrotizing otitis externa (NOE) with skull base involvement which proved diagnostically challenging. The initial patient presentation and the imaging performed on the 78-year-old patient suggested a neoplastic rhinopharyngeal lesion and only after several unsuccessful biopsies the patient was transferred to our unit. Upon re-evaluation of the clinical picture, a clinical hypothesis of NOE with skull base erosion was made and confirmed by identifying Pseudomonas aeruginosa in biopsy specimens of skull base bone and external auditory canal skin. Upon diagnosis confirmation, the patient was treated with culture-oriented long-term antibiotics with complete resolution of the disease. Given the complex clinical presentation, we chose to submit a posteriori this NOE case to two large language models (LLM) to test their ability to handle difficult differential diagnoses. LLMs are easily approachable artificial intelligence tools that enable human-like interaction with the user relying upon large information databases for analyzing queries. The LLMs of choice were ChatGPT-3 and ChatGPT-4 and they were requested to analyze the case being provided with only objective clinical and imaging data.
Collapse
Affiliation(s)
- Giorgia Pugliese
- Otolaryngology UnitSanti Paolo e Carlo HospitalMilanItaly
- Department of Health SciencesUniversità degli Studi di MilanoMilanItaly
| | - Alberto Maccari
- Otolaryngology UnitSanti Paolo e Carlo HospitalMilanItaly
- Department of Health SciencesUniversità degli Studi di MilanoMilanItaly
| | - Elena Felisati
- Otolaryngology UnitSanti Paolo e Carlo HospitalMilanItaly
- Department of Health SciencesUniversità degli Studi di MilanoMilanItaly
| | - Giovanni Felisati
- Otolaryngology UnitSanti Paolo e Carlo HospitalMilanItaly
- Department of Health SciencesUniversità degli Studi di MilanoMilanItaly
| | - Leonardo Giudici
- Otolaryngology UnitSanti Paolo e Carlo HospitalMilanItaly
- Department of Health SciencesUniversità degli Studi di MilanoMilanItaly
| | - Chiara Rapolla
- Otolaryngology UnitSanti Paolo e Carlo HospitalMilanItaly
- Department of Health SciencesUniversità degli Studi di MilanoMilanItaly
| | - Antonia Pisani
- Otolaryngology UnitSanti Paolo e Carlo HospitalMilanItaly
- Department of Health SciencesUniversità degli Studi di MilanoMilanItaly
| | - Alberto Maria Saibene
- Otolaryngology UnitSanti Paolo e Carlo HospitalMilanItaly
- Department of Health SciencesUniversità degli Studi di MilanoMilanItaly
| |
Collapse
|
85
|
Lu Y, Wu H, Qi S, Cheng K. Artificial Intelligence in Intensive Care Medicine: Toward a ChatGPT/GPT-4 Way? Ann Biomed Eng 2023; 51:1898-1903. [PMID: 37179277 PMCID: PMC10182840 DOI: 10.1007/s10439-023-03234-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 05/08/2023] [Indexed: 05/15/2023]
Abstract
Although intensive care medicine (ICM) is a relatively young discipline, it has rapidly developed into a full-fledged and highly specialized specialty covering several fields of medicine. The COVID-19 pandemic led to a surge in intensive care unit demand and also bring unprecedented development opportunities for this area. Multiple new technologies such as artificial intelligence (AI) and machine learning (ML) were gradually being applied in this field. In this study, through an online survey, we have summarized the potential uses of ChatGPT/GPT-4 in ICM range from knowledge augmentation, device management, clinical decision-making support, early warning systems, and establishment of intensive care unit (ICU) database.
Collapse
Affiliation(s)
- Yanqiu Lu
- Department of Intensive Care Unit, The Second Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China
| | - Haiyang Wu
- Department of Graduate School, Tianjin Medical University, Tianjin, China
- Duke Molecular Physiology Institute, Duke University School of Medicine, Durham, NC, USA
| | - Shaoyan Qi
- Department of Intensive Care Unit, The Second Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China.
| | - Kunming Cheng
- Department of Intensive Care Unit, The Second Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China.
| |
Collapse
|
86
|
Ge J, Fontil V, Ackerman S, Pletcher MJ, Lai JC. Clinical decision support and electronic interventions to improve care quality in chronic liver diseases and cirrhosis. Hepatology 2023:01515467-990000000-00546. [PMID: 37611253 PMCID: PMC10998693 DOI: 10.1097/hep.0000000000000583] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 07/17/2023] [Indexed: 08/25/2023]
Abstract
Significant quality gaps exist in the management of chronic liver diseases and cirrhosis. Clinical decision support systems-information-driven tools based in and launched from the electronic health record-are attractive and potentially scalable prospective interventions that could help standardize clinical care in hepatology. Yet, clinical decision support systems have had a mixed record in clinical medicine due to issues with interoperability and compatibility with clinical workflows. In this review, we discuss the conceptual origins of clinical decision support systems, existing applications in liver diseases, issues and challenges with implementation, and emerging strategies to improve their integration in hepatology care.
Collapse
Affiliation(s)
- Jin Ge
- Department of Medicine, Division of Gastroenterology and Hepatology, University of California – San Francisco, San Francisco, California, USA
| | - Valy Fontil
- Department of Medicine, NYU Grossman School of Medicine and Family Health Centers at NYU-Langone Medical Center, Brooklyn, New York, USA
| | - Sara Ackerman
- Department of Social and Behavioral Sciences, University of California – San Francisco, San Francisco, California, USA
| | - Mark J. Pletcher
- Department of Epidemiology and Biostatistics, University of California – San Francisco, San Francisco, California, USA
| | - Jennifer C. Lai
- Department of Medicine, Division of Gastroenterology and Hepatology, University of California – San Francisco, San Francisco, California, USA
| |
Collapse
|
87
|
Liu S, McCoy AB, Wright AP, Carew B, Genkins JZ, Huang SS, Peterson JF, Steitz B, Wright A. Leveraging Large Language Models for Generating Responses to Patient Messages. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.07.14.23292669. [PMID: 37503263 PMCID: PMC10370222 DOI: 10.1101/2023.07.14.23292669] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Objective This study aimed to develop and assess the performance of fine-tuned large language models for generating responses to patient messages sent via an electronic health record patient portal. Methods Utilizing a dataset of messages and responses extracted from the patient portal at a large academic medical center, we developed a model (CLAIR-Short) based on a pre-trained large language model (LLaMA-65B). In addition, we used the OpenAI API to update physician responses from an open-source dataset into a format with informative paragraphs that offered patient education while emphasizing empathy and professionalism. By combining with this dataset, we further fine-tuned our model (CLAIR-Long). To evaluate the fine-tuned models, we used ten representative patient portal questions in primary care to generate responses. We asked primary care physicians to review generated responses from our models and ChatGPT and rated them for empathy, responsiveness, accuracy, and usefulness. Results The dataset consisted of a total of 499,794 pairs of patient messages and corresponding responses from the patient portal, with 5,000 patient messages and ChatGPT-updated responses from an online platform. Four primary care physicians participated in the survey. CLAIR-Short exhibited the ability to generate concise responses similar to provider's responses. CLAIR-Long responses provided increased patient educational content compared to CLAIR-Short and were rated similarly to ChatGPT's responses, receiving positive evaluations for responsiveness, empathy, and accuracy, while receiving a neutral rating for usefulness. Conclusion Leveraging large language models to generate responses to patient messages demonstrates significant potential in facilitating communication between patients and primary care providers.
Collapse
|
88
|
Wei WQ, Yan C, Grabowska M, Dickson A, Li B, Wen Z, Roden D, Stein C, Embí P, Peterson J, Feng Q, Malin B. Leveraging Generative AI to Prioritize Drug Repurposing Candidates: Validating Identified Candidates for Alzheimer's Disease in Real-World Clinical Datasets. RESEARCH SQUARE 2023:rs.3.rs-3125859. [PMID: 37503019 PMCID: PMC10371084 DOI: 10.21203/rs.3.rs-3125859/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Drug repurposing represents an attractive alternative to the costly and time-consuming process of new drug development, particularly for serious, widespread conditions with limited effective treatments, such as Alzheimer's disease (AD). Emerging generative artificial intelligence (GAI) technologies like ChatGPT offer the promise of expediting the review and summary of scientific knowledge. To examine the feasibility of using GAI for identifying drug repurposing candidates, we iteratively tasked ChatGPT with proposing the twenty most promising drugs for repurposing in AD, and tested the top ten for risk of incident AD in exposed and unexposed individuals over age 65 in two large clinical datasets: 1) Vanderbilt University Medical Center and 2) the All of Us Research Program. Among the candidates suggested by ChatGPT, metformin, simvastatin, and losartan were associated with lower AD risk in meta-analysis. These findings suggest GAI technologies can assimilate scientific insights from an extensive Internet-based search space, helping to prioritize drug repurposing candidates and facilitate the treatment of diseases.
Collapse
Affiliation(s)
| | - Chao Yan
- Vanderbilt University Medical Center
| | | | | | | | | | - Dan Roden
- Vanderbilt University Medical Center
| | - C Stein
- Vanderbilt University Medical Center
| | | | | | | | | |
Collapse
|
89
|
Yan C, Grabowska ME, Dickson AL, Li B, Wen Z, Roden DM, Stein CM, Embí PJ, Peterson JF, Feng Q, Malin BA, Wei WQ. Leveraging Generative AI to Prioritize Drug Repurposing Candidates: Validating Identified Candidates for Alzheimer's Disease in Real-World Clinical Datasets. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.07.07.23292388. [PMID: 37461512 PMCID: PMC10350158 DOI: 10.1101/2023.07.07.23292388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
Drug repurposing represents an attractive alternative to the costly and time-consuming process of new drug development, particularly for serious, widespread conditions with limited effective treatments, such as Alzheimer's disease (AD). Emerging generative artificial intelligence (GAI) technologies like ChatGPT offer the promise of expediting the review and summary of scientific knowledge. To examine the feasibility of using GAI for identifying drug repurposing candidates, we iteratively tasked ChatGPT with proposing the twenty most promising drugs for repurposing in AD, and tested the top ten for risk of incident AD in exposed and unexposed individuals over age 65 in two large clinical datasets: 1) Vanderbilt University Medical Center and 2) the All of Us Research Program. Among the candidates suggested by ChatGPT, metformin, simvastatin, and losartan were associated with lower AD risk in meta-analysis. These findings suggest GAI technologies can assimilate scientific insights from an extensive Internet-based search space, helping to prioritize drug repurposing candidates and facilitate the treatment of diseases.
Collapse
|
90
|
Gupta NK, Doyle DM, D'Amico RS. Response to "Large language model artificial intelligence: the current state and future of ChatGPT in neuro-oncology publishing". J Neurooncol 2023; 163:731-733. [PMID: 37440100 DOI: 10.1007/s11060-023-04396-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 07/08/2023] [Indexed: 07/14/2023]
Affiliation(s)
- Nithin K Gupta
- Campbell University School of Osteopathic Medicine, Lillington, NC, USA.
| | - David M Doyle
- Central Michigan University College of Medicine, Mount Pleasant, MI, USA
| | | |
Collapse
|
91
|
Bakken S. AI in health: keeping the human in the loop. J Am Med Inform Assoc 2023; 30:1225-1226. [PMID: 37337923 PMCID: PMC10280340 DOI: 10.1093/jamia/ocad091] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 05/10/2023] [Indexed: 06/21/2023] Open
Affiliation(s)
- Suzanne Bakken
- School of Nursing, Department of Biomedical Informatics, and Data Science Institute, Columbia University, New York, New York, USA
| |
Collapse
|
92
|
Agrawal A. Laying an equitable data foundation for foundation models. THE LANCET REGIONAL HEALTH. SOUTHEAST ASIA 2023; 13:100221. [PMID: 37383558 PMCID: PMC10305916 DOI: 10.1016/j.lansea.2023.100221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 05/09/2023] [Indexed: 06/30/2023]
|
93
|
Sorin V, Barash Y, Konen E, Klang E. Large language models for oncological applications. J Cancer Res Clin Oncol 2023:10.1007/s00432-023-04824-w. [PMID: 37160626 DOI: 10.1007/s00432-023-04824-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Accepted: 04/28/2023] [Indexed: 05/11/2023]
Abstract
Large language models such as ChatGPT have gained public and scientific attention. These models may support oncologists in their work. Oncologists should be familiar with large language models to harness their potential while being aware of potential dangers and limitations.
Collapse
Affiliation(s)
- Vera Sorin
- Department of Diagnostic Imaging, Chaim Sheba Medical Center, Ramat Gan, Israel.
- DeepVision Lab, Chaim Sheba Medical Center, Tel Hashomer, Ramat Gan, Israel.
- Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel.
| | - Yiftach Barash
- Department of Diagnostic Imaging, Chaim Sheba Medical Center, Ramat Gan, Israel
- DeepVision Lab, Chaim Sheba Medical Center, Tel Hashomer, Ramat Gan, Israel
- Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Eli Konen
- Department of Diagnostic Imaging, Chaim Sheba Medical Center, Ramat Gan, Israel
- Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Eyal Klang
- Department of Diagnostic Imaging, Chaim Sheba Medical Center, Ramat Gan, Israel
- DeepVision Lab, Chaim Sheba Medical Center, Tel Hashomer, Ramat Gan, Israel
- Sami Sagol AI Hub, ARC, Chaim Sheba Medical Center, Ramat Gan, Israel
- Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|