Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Sarbay İ, Berikol GB, Özturan İU. Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): A preliminary, scenario-based cross-sectional study. Turk J Emerg Med 2023;23:156-161. [PMID: 37529789 PMCID: PMC10389099 DOI: 10.4103/tjem.tjem_79_23] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Revised: 04/13/2023] [Accepted: 05/24/2023] [Indexed: 08/03/2023] Open

For:	Sarbay İ, Berikol GB, Özturan İU. Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): A preliminary, scenario-based cross-sectional study. Turk J Emerg Med 2023;23:156-161. [PMID: 37529789 PMCID: PMC10389099 DOI: 10.4103/tjem.tjem_79_23] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Revised: 04/13/2023] [Accepted: 05/24/2023] [Indexed: 08/03/2023] Open

Number

Cited by Other Article(s)

Masanneck L, Schmidt L, Seifert A, Kölsche T, Huntemann N, Jansen R, Mehsin M, Bernhard M, Meuth SG, Böhm L, Pawlitzki M. Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study. J Med Internet Res 2024;26:e53297. [PMID: 38875696 PMCID: PMC11214027 DOI: 10.2196/53297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 04/17/2024] [Accepted: 05/14/2024] [Indexed: 06/16/2024] Open

Abstract

BACKGROUND

Large language models (LLMs) have demonstrated impressive performances in various medical domains, prompting an exploration of their potential utility within the high-demand setting of emergency department (ED) triage. This study evaluated the triage proficiency of different LLMs and ChatGPT, an LLM-based chatbot, compared to professionally trained ED staff and untrained personnel. We further explored whether LLM responses could guide untrained staff in effective triage.

OBJECTIVE

This study aimed to assess the efficacy of LLMs and the associated product ChatGPT in ED triage compared to personnel of varying training status and to investigate if the models' responses can enhance the triage proficiency of untrained personnel.

METHODS

A total of 124 anonymized case vignettes were triaged by untrained doctors; different versions of currently available LLMs; ChatGPT; and professionally trained raters, who subsequently agreed on a consensus set according to the Manchester Triage System (MTS). The prototypical vignettes were adapted from cases at a tertiary ED in Germany. The main outcome was the level of agreement between raters' MTS level assignments, measured via quadratic-weighted Cohen κ. The extent of over- and undertriage was also determined. Notably, instances of ChatGPT were prompted using zero-shot approaches without extensive background information on the MTS. The tested LLMs included raw GPT-4, Llama 3 70B, Gemini 1.5, and Mixtral 8x7b.

RESULTS

GPT-4-based ChatGPT and untrained doctors showed substantial agreement with the consensus triage of professional raters (κ=mean 0.67, SD 0.037 and κ=mean 0.68, SD 0.056, respectively), significantly exceeding the performance of GPT-3.5-based ChatGPT (κ=mean 0.54, SD 0.024; P<.001). When untrained doctors used this LLM for second-opinion triage, there was a slight but statistically insignificant performance increase (κ=mean 0.70, SD 0.047; P=.97). Other tested LLMs performed similar to or worse than GPT-4-based ChatGPT or showed odd triaging behavior with the used parameters. LLMs and ChatGPT models tended toward overtriage, whereas untrained doctors undertriaged.

CONCLUSIONS

While LLMs and the LLM-based product ChatGPT do not yet match professionally trained raters, their best models' triage proficiency equals that of untrained ED doctors. In its current form, LLMs or ChatGPT thus did not demonstrate gold-standard performance in ED triage and, in the setting of this study, failed to significantly improve untrained doctors' triage when used as decision support. Notable performance enhancements in newer LLM versions over older ones hint at future improvements with further technological development and specific training.

Collapse

Amacher SA, Arpagaus A, Sahmer C, Becker C, Gross S, Urben T, Tisljar K, Sutter R, Marsch S, Hunziker S. Prediction of outcomes after cardiac arrest by a generative artificial intelligence model. Resusc Plus 2024;18:100587. [PMID: 38433764 PMCID: PMC10906512 DOI: 10.1016/j.resplu.2024.100587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 02/01/2024] [Accepted: 02/11/2024] [Indexed: 03/05/2024] Open

Abstract

Aims

To investigate the prognostic accuracy of a non-medical generative artificial intelligence model (Chat Generative Pre-Trained Transformer 4 - ChatGPT-4) as a novel aspect in predicting death and poor neurological outcome at hospital discharge based on real-life data from cardiac arrest patients.

Methods

This prospective cohort study investigates the prognostic performance of ChatGPT-4 to predict outcomes at hospital discharge of adult cardiac arrest patients admitted to intensive care at a large Swiss tertiary academic medical center (COMMUNICATE/PROPHETIC cohort study). We prompted ChatGPT-4 with sixteen prognostic parameters derived from established post-cardiac arrest scores for each patient. We compared the prognostic performance of ChatGPT-4 regarding the area under the curve (AUC), sensitivity, specificity, positive and negative predictive values, and likelihood ratios of three cardiac arrest scores (Out-of-Hospital Cardiac Arrest [OHCA], Cardiac Arrest Hospital Prognosis [CAHP], and PROgnostication using LOGistic regression model for Unselected adult cardiac arrest patients in the Early stages [PROLOGUE score]) for in-hospital mortality and poor neurological outcome.

Results

Mortality at hospital discharge was 43% (n = 309/713), 54% of patients (n = 387/713) had a poor neurological outcome. ChatGPT-4 showed good discrimination regarding in-hospital mortality with an AUC of 0.85, similar to the OHCA, CAHP, and PROLOGUE (AUCs of 0.82, 0.83, and 0.84, respectively) scores. For poor neurological outcome, ChatGPT-4 showed a similar prediction to the post-cardiac arrest scores (AUC 0.83).

Conclusions

ChatGPT-4 showed a similar performance in predicting mortality and poor neurological outcome compared to validated post-cardiac arrest scores. However, more research is needed regarding illogical answers for potential incorporation of an LLM in the multimodal outcome prognostication after cardiac arrest.

Collapse

Petrella RJ. The AI Future of Emergency Medicine. Ann Emerg Med 2024:S0196-0644(24)00043-X. [PMID: 38795081 DOI: 10.1016/j.annemergmed.2024.01.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 01/23/2024] [Accepted: 01/24/2024] [Indexed: 05/27/2024]

Preiksaitis C, Ashenburg N, Bunney G, Chu A, Kabeer R, Riley F, Ribeira R, Rose C. The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review. JMIR Med Inform 2024;12:e53787. [PMID: 38728687 PMCID: PMC11127144 DOI: 10.2196/53787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 12/20/2023] [Accepted: 04/05/2024] [Indexed: 05/12/2024] Open

Abstract

BACKGROUND

Artificial intelligence (AI), more specifically large language models (LLMs), holds significant potential in revolutionizing emergency care delivery by optimizing clinical workflows and enhancing the quality of decision-making. Although enthusiasm for integrating LLMs into emergency medicine (EM) is growing, the existing literature is characterized by a disparate collection of individual studies, conceptual analyses, and preliminary implementations. Given these complexities and gaps in understanding, a cohesive framework is needed to comprehend the existing body of knowledge on the application of LLMs in EM.

OBJECTIVE

Given the absence of a comprehensive framework for exploring the roles of LLMs in EM, this scoping review aims to systematically map the existing literature on LLMs' potential applications within EM and identify directions for future research. Addressing this gap will allow for informed advancements in the field.

METHODS

Using PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) criteria, we searched Ovid MEDLINE, Embase, Web of Science, and Google Scholar for papers published between January 2018 and August 2023 that discussed LLMs' use in EM. We excluded other forms of AI. A total of 1994 unique titles and abstracts were screened, and each full-text paper was independently reviewed by 2 authors. Data were abstracted independently, and 5 authors performed a collaborative quantitative and qualitative synthesis of the data.

RESULTS

A total of 43 papers were included. Studies were predominantly from 2022 to 2023 and conducted in the United States and China. We uncovered four major themes: (1) clinical decision-making and support was highlighted as a pivotal area, with LLMs playing a substantial role in enhancing patient care, notably through their application in real-time triage, allowing early recognition of patient urgency; (2) efficiency, workflow, and information management demonstrated the capacity of LLMs to significantly boost operational efficiency, particularly through the automation of patient record synthesis, which could reduce administrative burden and enhance patient-centric care; (3) risks, ethics, and transparency were identified as areas of concern, especially regarding the reliability of LLMs' outputs, and specific studies highlighted the challenges of ensuring unbiased decision-making amidst potentially flawed training data sets, stressing the importance of thorough validation and ethical oversight; and (4) education and communication possibilities included LLMs' capacity to enrich medical training, such as through using simulated patient interactions that enhance communication skills.

CONCLUSIONS

LLMs have the potential to fundamentally transform EM, enhancing clinical decision-making, optimizing workflows, and improving patient outcomes. This review sets the stage for future advancements by identifying key research areas: prospective validation of LLM applications, establishing standards for responsible use, understanding provider and patient perceptions, and improving physicians' AI literacy. Effective integration of LLMs into EM will require collaborative efforts and thorough evaluation to ensure these technologies can be safely and effectively applied.

Collapse

Wang L, Wan Z, Ni C, Song Q, Li Y, Clayton EW, Malin BA, Yin Z. A Systematic Review of ChatGPT and Other Conversational Large Language Models in Healthcare. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.26.24306390. [PMID: 38712148 PMCID: PMC11071576 DOI: 10.1101/2024.04.26.24306390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]

Abstract

Background

The launch of the Chat Generative Pre-trained Transformer (ChatGPT) in November 2022 has attracted public attention and academic interest to large language models (LLMs), facilitating the emergence of many other innovative LLMs. These LLMs have been applied in various fields, including healthcare. Numerous studies have since been conducted regarding how to employ state-of-the-art LLMs in health-related scenarios to assist patients, doctors, and public health administrators.

Objective

This review aims to summarize the applications and concerns of applying conversational LLMs in healthcare and provide an agenda for future research on LLMs in healthcare.

Methods

We utilized PubMed, ACM, and IEEE digital libraries as primary sources for this review. We followed the guidance of Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRIMSA) to screen and select peer-reviewed research articles that (1) were related to both healthcare applications and conversational LLMs and (2) were published before September 1st, 2023, the date when we started paper collection and screening. We investigated these papers and classified them according to their applications and concerns.

Results

Our search initially identified 820 papers according to targeted keywords, out of which 65 papers met our criteria and were included in the review. The most popular conversational LLM was ChatGPT from OpenAI (60), followed by Bard from Google (1), Large Language Model Meta AI (LLaMA) from Meta (1), and other LLMs (5). These papers were classified into four categories in terms of their applications: 1) summarization, 2) medical knowledge inquiry, 3) prediction, and 4) administration, and four categories of concerns: 1) reliability, 2) bias, 3) privacy, and 4) public acceptability. There are 49 (75%) research papers using LLMs for summarization and/or medical knowledge inquiry, and 58 (89%) research papers expressing concerns about reliability and/or bias. We found that conversational LLMs exhibit promising results in summarization and providing medical knowledge to patients with a relatively high accuracy. However, conversational LLMs like ChatGPT are not able to provide reliable answers to complex health-related tasks that require specialized domain expertise. Additionally, no experiments in our reviewed papers have been conducted to thoughtfully examine how conversational LLMs lead to bias or privacy issues in healthcare research.

Conclusions

Future studies should focus on improving the reliability of LLM applications in complex health-related tasks, as well as investigating the mechanisms of how LLM applications brought bias and privacy issues. Considering the vast accessibility of LLMs, legal, social, and technical efforts are all needed to address concerns about LLMs to promote, improve, and regularize the application of LLMs in healthcare.

Collapse

Paslı S, Şahin AS, Beşer MF, Topçuoğlu H, Yadigaroğlu M, İmamoğlu M. Assessing the precision of artificial intelligence in ED triage decisions: Insights from a study with ChatGPT. Am J Emerg Med 2024;78:170-175. [PMID: 38295466 DOI: 10.1016/j.ajem.2024.01.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 12/25/2023] [Accepted: 01/21/2024] [Indexed: 02/02/2024] Open

Abstract

BACKGROUND

The rise in emergency department presentations globally poses challenges for efficient patient management. To address this, various strategies aim to expedite patient management. Artificial intelligence's (AI) consistent performance and rapid data interpretation extend its healthcare applications, especially in emergencies. The introduction of a robust AI tool like ChatGPT, based on GPT-4 developed by OpenAI, can benefit patients and healthcare professionals by improving the speed and accuracy of resource allocation. This study examines ChatGPT's capability to predict triage outcomes based on local emergency department rules.

METHODS

This study is a single-center prospective observational study. The study population consists of all patients who presented to the emergency department with any symptoms and agreed to participate. The study was conducted on three non-consecutive days for a total of 72 h. Patients' chief complaints, vital parameters, medical history and the area to which they were directed by the triage team in the emergency department were recorded. Concurrently, an emergency medicine physician inputted the same data into previously trained GPT-4, according to local rules. According to this data, the triage decisions made by GPT-4 were recorded. In the same process, an emergency medicine specialist determined where the patient should be directed based on the data collected, and this decision was considered the gold standard. Accuracy rates and reliability for directing patients to specific areas by the triage team and GPT-4 were evaluated using Cohen's kappa test. Furthermore, the accuracy of the patient triage process performed by the triage team and GPT-4 was assessed by receiver operating characteristic (ROC) analysis. Statistical analysis considered a value of p < 0.05 as significant.

RESULTS

The study was carried out on 758 patients. Among the participants, 416 (54.9%) were male and 342 (45.1%) were female. Evaluating the primary endpoints of our study - the agreement between the decisions of the triage team, GPT-4 decisions in emergency department triage, and the gold standard - we observed almost perfect agreement both between the triage team and the gold standard and between GPT-4 and the gold standard (Cohen's Kappa 0.893 and 0.899, respectively; p < 0.001 for each).

CONCLUSION

Our findings suggest GPT-4 possess outstanding predictive skills in triaging patients in an emergency setting. GPT-4 can serve as an effective tool to support the triage process.

Collapse

Barak-Corren Y, Wolf R, Rozenblum R, Creedon JK, Lipsett SC, Lyons TW, Michelson KA, Miller KA, Shapiro DJ, Reis BY, Fine AM. Harnessing the Power of Generative AI for Clinical Summaries: Perspectives From Emergency Physicians. Ann Emerg Med 2024:S0196-0644(24)00078-7. [PMID: 38483426 DOI: 10.1016/j.annemergmed.2024.01.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 01/24/2024] [Accepted: 01/29/2024] [Indexed: 04/14/2024]

Abstract

STUDY OBJECTIVE

The workload of clinical documentation contributes to health care costs and professional burnout. The advent of generative artificial intelligence language models presents a promising solution. The perspective of clinicians may contribute to effective and responsible implementation of such tools. This study sought to evaluate 3 uses for generative artificial intelligence for clinical documentation in pediatric emergency medicine, measuring time savings, effort reduction, and physician attitudes and identifying potential risks and barriers.

METHODS

This mixed-methods study was performed with 10 pediatric emergency medicine attending physicians from a single pediatric emergency department. Participants were asked to write a supervisory note for 4 clinical scenarios, with varying levels of complexity, twice without any assistance and twice with the assistance of ChatGPT Version 4.0. Participants evaluated 2 additional ChatGPT-generated clinical summaries: a structured handoff and a visit summary for a family written at an 8th grade reading level. Finally, a semistructured interview was performed to assess physicians' perspective on the use of ChatGPT in pediatric emergency medicine. Main outcomes and measures included between subjects' comparisons of the effort and time taken to complete the supervisory note with and without ChatGPT assistance. Effort was measured using a self-reported Likert scale of 0 to 10. Physicians' scoring of and attitude toward the ChatGPT-generated summaries were measured using a 0 to 10 Likert scale and open-ended questions. Summaries were scored for completeness, accuracy, efficiency, readability, and overall satisfaction. A thematic analysis was performed to analyze the content of the open-ended questions and to identify key themes.

RESULTS

ChatGPT yielded a 40% reduction in time and a 33% decrease in effort for supervisory notes in intricate cases, with no discernible effect on simpler notes. ChatGPT-generated summaries for structured handoffs and family letters were highly rated, ranging from 7.0 to 9.0 out of 10, and most participants favored their inclusion in clinical practice. However, there were several critical reservations, out of which a set of general recommendations for applying ChatGPT to clinical summaries was formulated.

CONCLUSION

Pediatric emergency medicine attendings in our study perceived that ChatGPT can deliver high-quality summaries while saving time and effort in many scenarios, but not all.

Collapse

Lee KH, Lee RW. ChatGPT's Accuracy on Magnetic Resonance Imaging Basics: Characteristics and Limitations Depending on the Question Type. Diagnostics (Basel) 2024;14:171. [PMID: 38248048 PMCID: PMC10814518 DOI: 10.3390/diagnostics14020171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 01/04/2024] [Accepted: 01/11/2024] [Indexed: 01/23/2024] Open

Ray PP. ChatGPT's competence in addressing urolithiasis: myth or reality? Int Urol Nephrol 2024;56:149-150. [PMID: 37726510 DOI: 10.1007/s11255-023-03802-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 09/09/2023] [Indexed: 09/21/2023]

Ćirković A, Katz T. Exploring the Potential of ChatGPT-4 in Predicting Refractive Surgery Categorizations: Comparative Study. JMIR Form Res 2023;7:e51798. [PMID: 38153777 PMCID: PMC10784977 DOI: 10.2196/51798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 11/01/2023] [Accepted: 12/04/2023] [Indexed: 12/29/2023] Open

Abstract

BACKGROUND

Refractive surgery research aims to optimally precategorize patients by their suitability for various types of surgery. Recent advances have led to the development of artificial intelligence-powered algorithms, including machine learning approaches, to assess risks and enhance workflow. Large language models (LLMs) like ChatGPT-4 (OpenAI LP) have emerged as potential general artificial intelligence tools that can assist across various disciplines, possibly including refractive surgery decision-making. However, their actual capabilities in precategorizing refractive surgery patients based on real-world parameters remain unexplored.

OBJECTIVE

This exploratory study aimed to validate ChatGPT-4's capabilities in precategorizing refractive surgery patients based on commonly used clinical parameters. The goal was to assess whether ChatGPT-4's performance when categorizing batch inputs is comparable to those made by a refractive surgeon. A simple binary set of categories (patient suitable for laser refractive surgery or not) as well as a more detailed set were compared.

METHODS

Data from 100 consecutive patients from a refractive clinic were anonymized and analyzed. Parameters included age, sex, manifest refraction, visual acuity, and various corneal measurements and indices from Scheimpflug imaging. This study compared ChatGPT-4's performance with a clinician's categorizations using Cohen κ coefficient, a chi-square test, a confusion matrix, accuracy, precision, recall, F1-score, and receiver operating characteristic area under the curve.

RESULTS

A statistically significant noncoincidental accordance was found between ChatGPT-4 and the clinician's categorizations with a Cohen κ coefficient of 0.399 for 6 categories (95% CI 0.256-0.537) and 0.610 for binary categorization (95% CI 0.372-0.792). The model showed temporal instability and response variability, however. The chi-square test on 6 categories indicated an association between the 2 raters' distributions (χ²5=94.7, P<.001). Here, the accuracy was 0.68, precision 0.75, recall 0.68, and F1-score 0.70. For 2 categories, the accuracy was 0.88, precision 0.88, recall 0.88, F1-score 0.88, and area under the curve 0.79.

CONCLUSIONS

This study revealed that ChatGPT-4 exhibits potential as a precategorization tool in refractive surgery, showing promising agreement with clinician categorizations. However, its main limitations include, among others, dependency on solely one human rater, small sample size, the instability and variability of ChatGPT's (OpenAI LP) output between iterations and nontransparency of the underlying models. The results encourage further exploration into the application of LLMs like ChatGPT-4 in health care, particularly in decision-making processes that require understanding vast clinical data. Future research should focus on defining the model's accuracy with prompt and vignette standardization, detecting confounding factors, and comparing to other versions of ChatGPT-4 and other LLMs to pave the way for larger-scale validation and real-world implementation.

Collapse

Birkun AA, Gautam A. Large Language Model (LLM)-Powered Chatbots Fail to Generate Guideline-Consistent Content on Resuscitation and May Provide Potentially Harmful Advice. Prehosp Disaster Med 2023;38:757-763. [PMID: 37927093 DOI: 10.1017/s1049023x23006568] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2023]

Abstract

INTRODUCTION

Innovative large language model (LLM)-powered chatbots, which are extremely popular nowadays, represent potential sources of information on resuscitation for the general public. For instance, the chatbot-generated advice could be used for purposes of community resuscitation education or for just-in-time informational support of untrained lay rescuers in a real-life emergency.

STUDY OBJECTIVE

This study focused on assessing performance of two prominent LLM-based chatbots, particularly in terms of quality of the chatbot-generated advice on how to give help to a non-breathing victim.

METHODS

In May 2023, the new Bing (Microsoft Corporation, USA) and Bard (Google LLC, USA) chatbots were inquired (n = 20 each): "What to do if someone is not breathing?" Content of the chatbots' responses was evaluated for compliance with the 2021 Resuscitation Council United Kingdom guidelines using a pre-developed checklist.

RESULTS

Both chatbots provided context-dependent textual responses to the query. However, coverage of the guideline-consistent instructions on help to a non-breathing victim within the responses was poor: mean percentage of the responses completely satisfying the checklist criteria was 9.5% for Bing and 11.4% for Bard (P >.05). Essential elements of the bystander action, including early start and uninterrupted performance of chest compressions with adequate depth, rate, and chest recoil, as well as request for and use of an automated external defibrillator (AED), were missing as a rule. Moreover, 55.0% of Bard's responses contained plausible sounding, but nonsensical guidance, called artificial hallucinations, that create risk for inadequate care and harm to a victim.

CONCLUSION

The LLM-powered chatbots' advice on help to a non-breathing victim omits essential details of resuscitation technique and occasionally contains deceptive, potentially harmful directives. Further research and regulatory measures are required to mitigate risks related to the chatbot-generated misinformation of public on resuscitation.

Collapse

Denecke K. Potential and pitfalls of conversational agents in health care. Nat Rev Dis Primers 2023;9:66. [PMID: 37996477 DOI: 10.1038/s41572-023-00482-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/25/2023]

Abujaber AA, Abd-Alrazaq A, Al-Qudimat AR, Nashwan AJ. A Strengths, Weaknesses, Opportunities, and Threats (SWOT) Analysis of ChatGPT Integration in Nursing Education: A Narrative Review. Cureus 2023;15:e48643. [PMID: 38090452 PMCID: PMC10711690 DOI: 10.7759/cureus.48643] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/11/2023] [Indexed: 03/25/2024] Open

Abstract

Amidst evolving healthcare demands, nursing education plays a pivotal role in preparing future nurses for complex challenges. Traditional approaches, however, must be revised to meet modern healthcare needs. The ChatGPT, an AI-based chatbot, has garnered significant attention due to its ability to personalize learning experiences, enhance virtual clinical simulations, and foster collaborative learning in nursing education. This review aims to thoroughly assess the potential impact of integrating ChatGPT into nursing education. The hypothesis is that valuable insights can be provided for stakeholders through a comprehensive SWOT analysis examining the strengths, weaknesses, opportunities, and threats associated with ChatGPT. This will enable informed decisions about its integration, prioritizing improved learning outcomes. A thorough narrative literature review was undertaken to provide a solid foundation for the SWOT analysis. The materials included scholarly articles and reports, which ensure the study's credibility and allow for a holistic and unbiased assessment. The analysis identified accessibility, consistency, adaptability, cost-effectiveness, and staying up-to-date as crucial factors influencing the strengths, weaknesses, opportunities, and threats associated with ChatGPT integration in nursing education. These themes provided a framework to understand the potential risks and benefits of integrating ChatGPT into nursing education. This review highlights the importance of responsible and effective use of ChatGPT in nursing education and the need for collaboration among educators, policymakers, and AI developers. Addressing the identified challenges and leveraging the strengths of ChatGPT can lead to improved learning outcomes and enriched educational experiences for students. The findings emphasize the importance of responsibly integrating ChatGPT in nursing education, balancing technological advancement with careful consideration of associated risks, to achieve optimal outcomes.

Collapse