1
|
Ge J, Kim WR, Kwong AJ. Common definitions and variables are needed for the United States to join the conversation on acute-on-chronic liver failure. Am J Transplant 2024; 24:1755-1760. [PMID: 38977243 PMCID: PMC11439574 DOI: 10.1016/j.ajt.2024.06.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2024] [Revised: 06/19/2024] [Accepted: 06/28/2024] [Indexed: 07/10/2024]
Abstract
Acute-on-chronic liver failure (ACLF) is a variably defined syndrome characterized by acute decompensation of cirrhosis with organ failures. At least 13 different definitions and diagnostic criteria for ACLF have been proposed, and there is increasing recognition that patients with ACLF may face disadvantages in the current United States liver allocation system. There is a need, therefore, for more standardized data collection and consensus to improve study design and outcome assessment in ACLF. In this article, we discuss the current landscape of transplantation for patients with ACLF, strategies to optimize organ utility, and data opportunities based on emerging technologies to facilitate improved data collection.
Collapse
Affiliation(s)
- Jin Ge
- Division of Gastroenterology and Hepatology, Department of Medicine, University of California - San Francisco, San Francisco, California, USA
| | - W Ray Kim
- Division of Gastroenterology and Hepatology, Department of Medicine, Stanford University School of Medicine, Stanford, California, USA
| | - Allison J Kwong
- Division of Gastroenterology and Hepatology, Department of Medicine, Stanford University School of Medicine, Stanford, California, USA.
| |
Collapse
|
2
|
Al-Abdullatif AM, Alsubaie MA. ChatGPT in Learning: Assessing Students' Use Intentions through the Lens of Perceived Value and the Influence of AI Literacy. Behav Sci (Basel) 2024; 14:845. [PMID: 39336060 PMCID: PMC11428673 DOI: 10.3390/bs14090845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2024] [Revised: 09/14/2024] [Accepted: 09/18/2024] [Indexed: 09/30/2024] Open
Abstract
This study sought to understand students' intentions regarding the use of ChatGPT in learning from the perspective of perceived value, exploring the influence of artificial intelligent (AI) literacy. Drawing on a sample of 676 university students from diverse academic backgrounds, we employed a structured survey questionnaire to measure their perceptions of ChatGPT as a learning tool. The collected data were then analyzed using structural equation modeling (SEM) via SmartPLS 4 software. The findings showed a strong effect of the students' perceived value of ChatGPT on their intention to use it. Our findings suggest that perceived usefulness, perceived enjoyment and perceived fees had a significant influence on students' perceived value of ChatGPT, while perceived risk showed no effect. Moreover, the role of AI literacy emerged as pivotal in shaping these perceptions. Students with higher AI literacy demonstrated an enhanced ability to discern the value of ChatGPT. AI literacy proved to be a strong predictor of students' perception of usefulness, enjoyment, and fees for using ChatGPT in learning. However, AI literacy did not have an impact on students' perceptions of using ChatGPT in learning. This study underscores the growing importance of integrating AI literacy into educational curricula to optimize the reception and utilization of innovative AI tools in academic scenarios. Future interventions aiming to boost the adoption of such tools should consider incorporating AI literacy components to maximize perceived value and, subsequently, use intention.
Collapse
Affiliation(s)
| | - Merfat Ayesh Alsubaie
- Department of Curriculum and Instruction, King Faisal University (KFU), Al-Hasa P.O. Box 400, Saudi Arabia
| |
Collapse
|
3
|
Gravina AG, Pellegrino R, Palladino G, Imperio G, Ventura A, Federico A. Charting new AI education in gastroenterology: Cross-sectional evaluation of ChatGPT and perplexity AI in medical residency exam. Dig Liver Dis 2024; 56:1304-1311. [PMID: 38503659 DOI: 10.1016/j.dld.2024.02.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 02/08/2024] [Accepted: 02/28/2024] [Indexed: 03/21/2024]
Abstract
BACKGROUND Conversational chatbots, fueled by large language models, spark debate over their potential in education and medical career exams. There is debate in the literature about the scientific integrity of the outputs produced by these chatbots. AIMS This study evaluates ChatGPT 3.5 and Perplexity AI's cross-sectional performance in responding to questions from the 2023 Italian national residency admission exam (SSM23), comparing results and chatbots' concordance with previous years SSMs. METHODS Gastroenterology-related SSM23 questions were input into ChatGPT 3.5 and Perplexity AI, evaluating their performance in correct responses and total scores. This process was repeated with questions from the three preceding years. Additionally, chatbot concordance was assessed using Cohen's method. RESULTS In SSM23, ChatGPT 3.5 outperforms Perplexity AI with 94.11% correct responses, demonstrating consistency across years. Concordance weakened in 2023 (κ=0.203, P = 0.148), but ChatGPT consistently maintains a high standard compared to Perplexity AI. CONCLUSION ChatGPT 3.5 and Perplexity AI exhibit promise in addressing gastroenterological queries, emphasizing potential educational roles. However, their variable performance mandates cautious use as supplementary tools alongside conventional study methods. Clear guidelines are crucial for educators to balance traditional approaches and innovative systems, enhancing educational standards.
Collapse
Affiliation(s)
- Antonietta Gerarda Gravina
- Hepatogastroenterology Division, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Via Luigi de Crecchio, 80138, Naples, Italy
| | - Raffaele Pellegrino
- Hepatogastroenterology Division, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Via Luigi de Crecchio, 80138, Naples, Italy.
| | - Giovanna Palladino
- Hepatogastroenterology Division, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Via Luigi de Crecchio, 80138, Naples, Italy
| | - Giuseppe Imperio
- Hepatogastroenterology Division, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Via Luigi de Crecchio, 80138, Naples, Italy
| | - Andrea Ventura
- Hepatogastroenterology Division, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Via Luigi de Crecchio, 80138, Naples, Italy
| | - Alessandro Federico
- Hepatogastroenterology Division, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Via Luigi de Crecchio, 80138, Naples, Italy
| |
Collapse
|
4
|
Knoedler L, Vogt A, Alfertshofer M, Camacho JM, Najafali D, Kehrer A, Prantl L, Iske J, Dean J, Hoefer S, Knoedler C, Knoedler S. The law code of ChatGPT and artificial intelligence-how to shield plastic surgeons and reconstructive surgeons against Justitia's sword. Front Surg 2024; 11:1390684. [PMID: 39132668 PMCID: PMC11312379 DOI: 10.3389/fsurg.2024.1390684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Accepted: 07/02/2024] [Indexed: 08/13/2024] Open
Abstract
Large Language Models (LLMs) like ChatGPT 4 (OpenAI), Claude 2 (Anthropic), and Llama 2 (Meta AI) have emerged as novel technologies to integrate artificial intelligence (AI) into everyday work. LLMs in particular, and AI in general, carry infinite potential to streamline clinical workflows, outsource resource-intensive tasks, and disburden the healthcare system. While a plethora of trials is elucidating the untapped capabilities of this technology, the sheer pace of scientific progress also takes its toll. Legal guidelines hold a key role in regulating upcoming technologies, safeguarding patients, and determining individual and institutional liabilities. To date, there is a paucity of research work delineating the legal regulations of Language Models and AI for clinical scenarios in plastic and reconstructive surgery. This knowledge gap poses the risk of lawsuits and penalties against plastic surgeons. Thus, we aim to provide the first overview of legal guidelines and pitfalls of LLMs and AI for plastic surgeons. Our analysis encompasses models like ChatGPT, Claude 2, and Llama 2, among others, regardless of their closed or open-source nature. Ultimately, this line of research may help clarify the legal responsibilities of plastic surgeons and seamlessly integrate such cutting-edge technologies into the field of PRS.
Collapse
Affiliation(s)
- Leonard Knoedler
- Department of Plastic, Hand, and Reconstructive Surgery, University Hospital Regensburg, Regensburg, Germany
| | - Alexander Vogt
- Corporate/M&A Department, Dentons Europe (Germany) GmbH & Co. KG, Munich, Germany
- UC Law San Francisco (Formerly UC Hastings), San Francisco, CA, United States
| | - Michael Alfertshofer
- Division of Hand, Plastic and Aesthetic Surgery, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Justin M. Camacho
- College of Medicine, Drexel University, Philadelphia, PA, United States
| | - Daniel Najafali
- Carle Illinois College of Medicine, University of Illinois Urbana-Champaign, Urbana, IL, United States
| | - Andreas Kehrer
- Department of Plastic, Hand, and Reconstructive Surgery, University Hospital Regensburg, Regensburg, Germany
| | - Lukas Prantl
- Department of Plastic, Hand, and Reconstructive Surgery, University Hospital Regensburg, Regensburg, Germany
| | - Jasper Iske
- Department of Cardiothoracic and Vascular Surgery, Deutsches Herzzentrum der Charité (DHZC), Berlin, Germany
| | - Jillian Dean
- School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| | | | - Christoph Knoedler
- Faculty of Applied Social and Health Sciences, Regensburg University of Applied Sciences, Regensburg, Germany
| | - Samuel Knoedler
- Department of Plastic, Hand, and Reconstructive Surgery, University Hospital Regensburg, Regensburg, Germany
| |
Collapse
|
5
|
Wang D, Liang J, Ye J, Li J, Li J, Zhang Q, Hu Q, Pan C, Wang D, Liu Z, Shi W, Shi D, Li F, Qu B, Zheng Y. Enhancement of Large Language Models' Performance in Diabetes Education: Retrieval-Augmented Generation Approach. J Med Internet Res 2024. [PMID: 39046096 DOI: 10.2196/58041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/25/2024] Open
Abstract
BACKGROUND Large language models (LLMs) demonstrated advanced performance in processing clinical information. However, commercially available LLMs lack specialized medical knowledge and remain susceptible to generating inaccurate information. Given the need for self-management in diabetes, patients commonly seek information online. We introduce the RISE framework and evaluate its performance in enhancing LLMs to provide accurate responses to diabetes-related inquiries. OBJECTIVE This study aimed to evaluate the potential of RISE framework, an information retrieval and augmentation tool, to improve the LLM's performance to accurately and safely respond to diabetes-related inquiries. METHODS The RISE, an innovative retrieval augmentation framework, comprises four steps: Rewriting Query, Information Retrieval, Summarization, and Execution. Using a set of 43 common diabetes-related questions, we evaluated three base LLMs (GPT-4, Anthropic Claude 2, Google Bard) and their RISE-enhanced versions. Assessments were conducted by clinicians for accuracy and comprehensiveness, and by patients for understandability. RESULTS The integration of RISE significantly improved the accuracy and comprehensiveness of responses from all three based LLMs. On average, the percentage of accurate responses increased by 12% (122 - 107/129) with RISE. Specifically, the rates of accurate responses increased by 7% (42 - 39/43) for GPT-4, 19% (39 - 31/43) for Claude 2, and 9% (41 - 37/43) for Google Bard. The framework also enhanced response comprehensiveness, with mean scores improving by 0.44. Understandability was also enhanced by 0.19 on average. Data collection was conducted from Sept. 30, 2023, to Feb. 05, 2024. CONCLUSIONS RISE significantly improves LLMs' performance in responding to diabetes-related inquiries, enhancing accuracy, comprehensiveness, and understandability. These improvements have crucial implications for RISE's future role in patient education and chronic illness self-management, which contributes to relieving medical resource pressures and raising public awareness of medical knowledge. CLINICALTRIAL
Collapse
Affiliation(s)
- Dingqiao Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| | - Jiangbo Liang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| | - Jinguo Ye
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| | - Jingni Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| | - Jingpeng Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| | - Qikai Zhang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| | - Qiuling Hu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| | - Caineng Pan
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| | - Dongliang Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| | - Zhong Liu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| | - Wen Shi
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| | - Danli Shi
- Research Centre for SHARP Vision, The Hong Kong Polytechnic University, Hong Kong, CN
| | - Fei Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| | - Bo Qu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| | - Yingfeng Zheng
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| |
Collapse
|
6
|
Lahat A, Sharif K, Zoabi N, Shneor Patt Y, Sharif Y, Fisher L, Shani U, Arow M, Levin R, Klang E. Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4. J Med Internet Res 2024; 26:e54571. [PMID: 38935937 PMCID: PMC11240076 DOI: 10.2196/54571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 02/02/2024] [Accepted: 04/29/2024] [Indexed: 06/29/2024] Open
Abstract
BACKGROUND Artificial intelligence, particularly chatbot systems, is becoming an instrumental tool in health care, aiding clinical decision-making and patient engagement. OBJECTIVE This study aims to analyze the performance of ChatGPT-3.5 and ChatGPT-4 in addressing complex clinical and ethical dilemmas, and to illustrate their potential role in health care decision-making while comparing seniors' and residents' ratings, and specific question types. METHODS A total of 4 specialized physicians formulated 176 real-world clinical questions. A total of 8 senior physicians and residents assessed responses from GPT-3.5 and GPT-4 on a 1-5 scale across 5 categories: accuracy, relevance, clarity, utility, and comprehensiveness. Evaluations were conducted within internal medicine, emergency medicine, and ethics. Comparisons were made globally, between seniors and residents, and across classifications. RESULTS Both GPT models received high mean scores (4.4, SD 0.8 for GPT-4 and 4.1, SD 1.0 for GPT-3.5). GPT-4 outperformed GPT-3.5 across all rating dimensions, with seniors consistently rating responses higher than residents for both models. Specifically, seniors rated GPT-4 as more beneficial and complete (mean 4.6 vs 4.0 and 4.6 vs 4.1, respectively; P<.001), and GPT-3.5 similarly (mean 4.1 vs 3.7 and 3.9 vs 3.5, respectively; P<.001). Ethical queries received the highest ratings for both models, with mean scores reflecting consistency across accuracy and completeness criteria. Distinctions among question types were significant, particularly for the GPT-4 mean scores in completeness across emergency, internal, and ethical questions (4.2, SD 1.0; 4.3, SD 0.8; and 4.5, SD 0.7, respectively; P<.001), and for GPT-3.5's accuracy, beneficial, and completeness dimensions. CONCLUSIONS ChatGPT's potential to assist physicians with medical issues is promising, with prospects to enhance diagnostics, treatments, and ethics. While integration into clinical workflows may be valuable, it must complement, not replace, human expertise. Continued research is essential to ensure safe and effective implementation in clinical environments.
Collapse
Affiliation(s)
- Adi Lahat
- Department of Gastroenterology, Chaim Sheba Medical Center, Affiliated with Tel Aviv University, Ramat Gan, Israel
- Department of Gastroenterology, Samson Assuta Ashdod Medical Center, Affiliated with Ben Gurion University of the Negev, Be'er Sheva, Israel
| | - Kassem Sharif
- Department of Gastroenterology, Chaim Sheba Medical Center, Affiliated with Tel Aviv University, Ramat Gan, Israel
- Department of Internal Medicine B, Sheba Medical Centre, Tel Aviv, Israel
| | - Narmin Zoabi
- Department of Gastroenterology, Chaim Sheba Medical Center, Affiliated with Tel Aviv University, Ramat Gan, Israel
| | | | - Yousra Sharif
- Department of Internal Medicine C, Hadassah Medical Center, Jerusalem, Israel
| | - Lior Fisher
- Department of Internal Medicine B, Sheba Medical Centre, Tel Aviv, Israel
| | - Uria Shani
- Department of Internal Medicine B, Sheba Medical Centre, Tel Aviv, Israel
| | - Mohamad Arow
- Department of Internal Medicine B, Sheba Medical Centre, Tel Aviv, Israel
| | - Roni Levin
- Department of Internal Medicine B, Sheba Medical Centre, Tel Aviv, Israel
| | - Eyal Klang
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, United States
| |
Collapse
|
7
|
Calvo-Lorenzo I, Uriarte-Llano I. [Massive generation of synthetic medical records with ChatGPT: An example in hip fractures]. Med Clin (Barc) 2024; 162:549-554. [PMID: 38290872 DOI: 10.1016/j.medcli.2023.11.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 11/20/2023] [Accepted: 11/22/2023] [Indexed: 02/01/2024]
Affiliation(s)
- Isidoro Calvo-Lorenzo
- Servicio de Cirugía Ortopédica y Traumatología, Hospital Universitario Galdakao Usansolo, Galdakao, Vizcaya, España.
| | - Iker Uriarte-Llano
- Servicio de Cirugía Ortopédica y Traumatología, Hospital Universitario Galdakao Usansolo, Galdakao, Vizcaya, España
| |
Collapse
|
8
|
Abi-Rafeh J, Cattelan L, Xu HH, Bassiri-Tehrani B, Kazan R, Nahai F. Artificial Intelligence-Generated Social Media Content Creation and Management Strategies for Plastic Surgeons. Aesthet Surg J 2024; 44:769-778. [PMID: 38366026 DOI: 10.1093/asj/sjae036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Revised: 02/02/2024] [Accepted: 02/08/2024] [Indexed: 02/18/2024] Open
Abstract
BACKGROUND Social media platforms have come to represent integral components of the professional marketing and advertising strategy for plastic surgeons. Effective and consistent content development, however, remains technically demanding and time consuming, prompting most to employ, at non-negligible costs, social media marketing specialists for content planning and development. OBJECTIVES In the present study, we aimed to investigate the ability of presently available artificial intelligence (AI) models to assist plastic surgeons in their social media content development and sharing plans. METHODS An AI large language model was prompted on the study's objectives through a series of standardized user interactions. Social media platforms of interest, on which the AI model was prompted, included Instagram, TikTok, and X (formerly Twitter). RESULTS A 1-year, entirely AI-generated social media plan, comprising a total of 1091 posts for the 3 aforementioned social media platforms, is presented. Themes of the AI-generated content proposed for each platform were classified in 6 categories, including patient-related, practice-related, educational, "uplifting," interactive, and promotional posts. Overall, 91 publicly recognized holidays and observant and awareness days were incorporated into the content calendars. The AI model demonstrated an ability to differentiate between the distinct formats of each of the 3 social media platforms investigated, generating unique ideas for each, and providing detailed content development and posting instructions, scripts, and post captions, leveraging features specific to each platform. CONCLUSIONS By providing detailed and actionable social media content creation and posting plans to plastic surgeons, presently available AI models can be readily leveraged to assist in and significantly alleviate the burden associated with social media account management, content generation, and potentially patient conversion.
Collapse
|
9
|
Amacher SA, Arpagaus A, Sahmer C, Becker C, Gross S, Urben T, Tisljar K, Sutter R, Marsch S, Hunziker S. Prediction of outcomes after cardiac arrest by a generative artificial intelligence model. Resusc Plus 2024; 18:100587. [PMID: 38433764 PMCID: PMC10906512 DOI: 10.1016/j.resplu.2024.100587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 02/01/2024] [Accepted: 02/11/2024] [Indexed: 03/05/2024] Open
Abstract
Aims To investigate the prognostic accuracy of a non-medical generative artificial intelligence model (Chat Generative Pre-Trained Transformer 4 - ChatGPT-4) as a novel aspect in predicting death and poor neurological outcome at hospital discharge based on real-life data from cardiac arrest patients. Methods This prospective cohort study investigates the prognostic performance of ChatGPT-4 to predict outcomes at hospital discharge of adult cardiac arrest patients admitted to intensive care at a large Swiss tertiary academic medical center (COMMUNICATE/PROPHETIC cohort study). We prompted ChatGPT-4 with sixteen prognostic parameters derived from established post-cardiac arrest scores for each patient. We compared the prognostic performance of ChatGPT-4 regarding the area under the curve (AUC), sensitivity, specificity, positive and negative predictive values, and likelihood ratios of three cardiac arrest scores (Out-of-Hospital Cardiac Arrest [OHCA], Cardiac Arrest Hospital Prognosis [CAHP], and PROgnostication using LOGistic regression model for Unselected adult cardiac arrest patients in the Early stages [PROLOGUE score]) for in-hospital mortality and poor neurological outcome. Results Mortality at hospital discharge was 43% (n = 309/713), 54% of patients (n = 387/713) had a poor neurological outcome. ChatGPT-4 showed good discrimination regarding in-hospital mortality with an AUC of 0.85, similar to the OHCA, CAHP, and PROLOGUE (AUCs of 0.82, 0.83, and 0.84, respectively) scores. For poor neurological outcome, ChatGPT-4 showed a similar prediction to the post-cardiac arrest scores (AUC 0.83). Conclusions ChatGPT-4 showed a similar performance in predicting mortality and poor neurological outcome compared to validated post-cardiac arrest scores. However, more research is needed regarding illogical answers for potential incorporation of an LLM in the multimodal outcome prognostication after cardiac arrest.
Collapse
Affiliation(s)
- Simon A. Amacher
- Intensive Care Medicine, Department of Acute Medical Care, University Hospital Basel, Basel, Switzerland
- Medical Communication and Psychosomatic Medicine, University Hospital Basel, Basel, Switzerland
- Emergency Medicine, Department of Acute Medical Care, University Hospital Basel, Basel, Switzerland
| | - Armon Arpagaus
- Medical Communication and Psychosomatic Medicine, University Hospital Basel, Basel, Switzerland
| | - Christian Sahmer
- Medical Communication and Psychosomatic Medicine, University Hospital Basel, Basel, Switzerland
| | - Christoph Becker
- Medical Communication and Psychosomatic Medicine, University Hospital Basel, Basel, Switzerland
- Emergency Medicine, Department of Acute Medical Care, University Hospital Basel, Basel, Switzerland
| | - Sebastian Gross
- Medical Communication and Psychosomatic Medicine, University Hospital Basel, Basel, Switzerland
| | - Tabita Urben
- Medical Communication and Psychosomatic Medicine, University Hospital Basel, Basel, Switzerland
| | - Kai Tisljar
- Intensive Care Medicine, Department of Acute Medical Care, University Hospital Basel, Basel, Switzerland
| | - Raoul Sutter
- Intensive Care Medicine, Department of Acute Medical Care, University Hospital Basel, Basel, Switzerland
- Medical Faculty, University of Basel, Basel, Switzerland
- Division of Neurophysiology, Department of Neurology, University Hospital Basel, Basel, Switzerland
| | - Stephan Marsch
- Intensive Care Medicine, Department of Acute Medical Care, University Hospital Basel, Basel, Switzerland
- Medical Faculty, University of Basel, Basel, Switzerland
| | - Sabina Hunziker
- Medical Communication and Psychosomatic Medicine, University Hospital Basel, Basel, Switzerland
- Medical Faculty, University of Basel, Basel, Switzerland
- Post-Intensive Care Clinic, University Hospital Basel, Basel, Switzerland
| |
Collapse
|
10
|
Marti-Aguado D, Pazó J, Diaz-Gonzalez A, de Las Heras Páez de la Cadena B, Conthe A, Gallego Duran R, Rodríguez-Gandía MA, Turnes J, Romero-Gomez M. LiverAI: New tool in the landscape for liver health. GASTROENTEROLOGIA Y HEPATOLOGIA 2024; 47:646-648. [PMID: 38582150 DOI: 10.1016/j.gastrohep.2024.04.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 04/02/2024] [Indexed: 04/08/2024]
Affiliation(s)
- David Marti-Aguado
- Digestive Disease Department, Clinic University Hospital, INCLIVA Health Research Institute, Valencia, Spain.
| | - Javier Pazó
- AI and IT Solutions Manager, Spanish Association for the Study of the Liver (AEEH), Spain
| | - Alvaro Diaz-Gonzalez
- Gastroenterology and Hepatology Department, Clinical and Translational Research in Digestive Diseases Group, Valdecilla Research Institute (IDIVAL), Marqués de Valdecilla University Hospital, Santander, Spain
| | | | - Andres Conthe
- Department of Gastroenterology and Hepatology, Hospital General Universitario Gregorio Marañón, Madrid, Spain
| | - Rocio Gallego Duran
- Digestive Diseases Unit and CIBERehd, Virgen del Rocío University Hospital, Institute of Biomedicine of Seville (HUVR/CSIC/US), University of Seville, Seville, Spain
| | - Miguel A Rodríguez-Gandía
- Department of Gastroenterology and Hepatology, Hospital Universitario Ramón y Cajal, Instituto Ramón y Cajal de Investigación Sanitaria, Madrid, Spain
| | - Juan Turnes
- Department of Gastroenterology and Hepatology, Complejo Hospitalario Universitario Pontevedra & IIS Galicia Sur, Spain
| | - Manuel Romero-Gomez
- Digestive Diseases Unit and CIBERehd, Virgen del Rocío University Hospital, Institute of Biomedicine of Seville (HUVR/CSIC/US), University of Seville, Seville, Spain
| |
Collapse
|
11
|
Saeidnia HR, Kozak M, Lund BD, Hassanzadeh M. Evaluation of ChatGPT's responses to information needs and information seeking of dementia patients. Sci Rep 2024; 14:10273. [PMID: 38704403 PMCID: PMC11069588 DOI: 10.1038/s41598-024-61068-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 04/30/2024] [Indexed: 05/06/2024] Open
Abstract
Many people in the advanced stages of dementia require full-time caregivers, most of whom are family members who provide informal (non-specialized) care. It is important to provide these caregivers with high-quality information to help them understand and manage the symptoms and behaviors of dementia patients. This study aims to evaluate ChatGPT, a chatbot built using the Generative Pre-trained Transformer (GPT) large language model, in responding to information needs and information seeking of such informal caregivers. We identified the information needs of dementia patients based on the relevant literature (22 articles were selected from 2442 retrieved articles). From this analysis, we created a list of 31 items that describe these information needs, and used them to formulate 118 relevant questions. We then asked these questions to ChatGPT and investigated its responses. In the next phase, we asked 15 informal and 15 formal dementia-patient caregivers to analyze and evaluate these ChatGPT responses, using both quantitative (questionnaire) and qualitative (interview) approaches. In the interviews conducted, informal caregivers were more positive towards the use of ChatGPT to obtain non-specialized information about dementia compared to formal caregivers. However, ChatGPT struggled to provide satisfactory responses to more specialized (clinical) inquiries. In the questionnaire study, informal caregivers gave higher ratings to ChatGPT's responsiveness on the 31 items describing information needs, giving an overall mean score of 3.77 (SD 0.98) out of 5; the mean score among formal caregivers was 3.13 (SD 0.65), indicating that formal caregivers showed less trust in ChatGPT's responses compared to informal caregivers. ChatGPT's responses to non-clinical information needs related to dementia patients were generally satisfactory at this stage. As this tool is still under heavy development, it holds promise for providing even higher-quality information in response to information needs, particularly when developed in collaboration with healthcare professionals. Thus, large language models such as ChatGPT can serve as valuable sources of information for informal caregivers, although they may not fully meet the needs of formal caregivers who seek specialized (clinical) answers. Nevertheless, even in its current state, ChatGPT was able to provide responses to some of the clinical questions related to dementia that were asked.
Collapse
Affiliation(s)
- Hamid Reza Saeidnia
- Department of Knowledge and Information Science, Tarbiat Modares University, Tehran, Iran.
| | - Marcin Kozak
- Department of Media, Journalism and Social Communication, University of Information Technology and Management in Rzeszow, Rzeszow, Poland
| | - Brady D Lund
- Department of Information Science, University of North Texas, Denton, USA
| | - Mohammad Hassanzadeh
- Department of Knowledge and Information Science, Tarbiat Modares University, Tehran, Iran.
| |
Collapse
|
12
|
Ge J, Chen IY, Pletcher MJ, Lai JC. Prompt Engineering for Generative Artificial Intelligence in Gastroenterology and Hepatology. Am J Gastroenterol 2024; 119:00000434-990000000-01003. [PMID: 38294157 PMCID: PMC11413230 DOI: 10.14309/ajg.0000000000002689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 12/28/2023] [Indexed: 02/01/2024]
Affiliation(s)
- Jin Ge
- Division of Gastroenterology and Hepatology, Department of Medicine, University of California, San Francisco, San Francisco, California, USA
| | - Irene Y Chen
- UCSF and UC Berkeley Joint Program in Computational Precision Health, Berkeley, California, USA
| | - Mark J Pletcher
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, California, USA
| | - Jennifer C Lai
- Division of Gastroenterology and Hepatology, Department of Medicine, University of California, San Francisco, San Francisco, California, USA
| |
Collapse
|
13
|
Ge J, Sun S, Owens J, Galvez V, Gologorskaya O, Lai JC, Pletcher MJ, Lai K. Development of a liver disease-specific large language model chat interface using retrieval-augmented generation. Hepatology 2024:01515467-990000000-00791. [PMID: 38451962 DOI: 10.1097/hep.0000000000000834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 02/24/2024] [Indexed: 03/09/2024]
Abstract
BACKGROUND AND AIMS Large language models (LLMs) have significant capabilities in clinical information processing tasks. Commercially available LLMs, however, are not optimized for clinical uses and are prone to generating hallucinatory information. Retrieval-augmented generation (RAG) is an enterprise architecture that allows the embedding of customized data into LLMs. This approach "specializes" the LLMs and is thought to reduce hallucinations. APPROACH AND RESULTS We developed "LiVersa," a liver disease-specific LLM, by using our institution's protected health information-complaint text embedding and LLM platform, "Versa." We conducted RAG on 30 publicly available American Association for the Study of Liver Diseases guidance documents to be incorporated into LiVersa. We evaluated LiVersa's performance by conducting 2 rounds of testing. First, we compared LiVersa's outputs versus those of trainees from a previously published knowledge assessment. LiVersa answered all 10 questions correctly. Second, we asked 15 hepatologists to evaluate the outputs of 10 hepatology topic questions generated by LiVersa, OpenAI's ChatGPT 4, and Meta's Large Language Model Meta AI 2. LiVersa's outputs were more accurate but were rated less comprehensive and safe compared to those of ChatGPT 4. RESULTS We evaluated LiVersa's performance by conducting 2 rounds of testing. First, we compared LiVersa's outputs versus those of trainees from a previously published knowledge assessment. LiVersa answered all 10 questions correctly. Second, we asked 15 hepatologists to evaluate the outputs of 10 hepatology topic questions generated by LiVersa, OpenAI's ChatGPT 4, and Meta's Large Language Model Meta AI 2. LiVersa's outputs were more accurate but were rated less comprehensive and safe compared to those of ChatGPT 4. CONCLUSIONS In this demonstration, we built disease-specific and protected health information-compliant LLMs using RAG. While LiVersa demonstrated higher accuracy in answering questions related to hepatology, there were some deficiencies due to limitations set by the number of documents used for RAG. LiVersa will likely require further refinement before potential live deployment. The LiVersa prototype, however, is a proof of concept for utilizing RAG to customize LLMs for clinical use cases.
Collapse
Affiliation(s)
- Jin Ge
- Department of Medicine, Division of Gastroenterology and Hepatology, University of California-San Francisco, San Francisco, California, USA
| | - Steve Sun
- UCSF Health Information Technology, University of California-San Francisco, San Francisco, California, USA
| | - Joseph Owens
- UCSF Health Information Technology, University of California-San Francisco, San Francisco, California, USA
| | - Victor Galvez
- UCSF Health Information Technology, University of California-San Francisco, San Francisco, California, USA
| | - Oksana Gologorskaya
- UCSF Health Information Technology, University of California-San Francisco, San Francisco, California, USA
- Bakar Computational Health Sciences Institute, University of California-San Francisco, San Francisco, California, USA
| | - Jennifer C Lai
- Department of Medicine, Division of Gastroenterology and Hepatology, University of California-San Francisco, San Francisco, California, USA
| | - Mark J Pletcher
- Department of Epidemiology and Biostatistics, University of California-San Francisco, San Francisco, California, USA
| | - Ki Lai
- UCSF Health Information Technology, University of California-San Francisco, San Francisco, California, USA
| |
Collapse
|
14
|
Ge J, Buenaventura A, Berrean B, Purvis J, Fontil V, Lai JC, Pletcher MJ. Applying human-centered design to the construction of a cirrhosis management clinical decision support system. Hepatol Commun 2024; 8:e0394. [PMID: 38407255 PMCID: PMC10898661 DOI: 10.1097/hc9.0000000000000394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 12/13/2023] [Indexed: 02/27/2024] Open
Abstract
BACKGROUND Electronic health record (EHR)-based clinical decision support is a scalable way to help standardize clinical care. Clinical decision support systems have not been extensively investigated in cirrhosis management. Human-centered design (HCD) is an approach that engages with potential users in intervention development. In this study, we applied HCD to design the features and interface for a clinical decision support system for cirrhosis management, called CirrhosisRx. METHODS We conducted technical feasibility assessments to construct a visual blueprint that outlines the basic features of the interface. We then convened collaborative-design workshops with generalist and specialist clinicians. We elicited current workflows for cirrhosis management, assessed gaps in existing EHR systems, evaluated potential features, and refined the design prototype for CirrhosisRx. At the conclusion of each workshop, we analyzed recordings and transcripts. RESULTS Workshop feedback showed that the aggregation of relevant clinical data into 6 cirrhosis decompensation domains (defined as common inpatient clinical scenarios) was the most important feature. Automatic inference of clinical events from EHR data, such as gastrointestinal bleeding from hemoglobin changes, was not accepted due to accuracy concerns. Visualizations for risk stratification scores were deemed not necessary. Lastly, the HCD co-design workshops allowed us to identify the target user population (generalists). CONCLUSIONS This is one of the first applications of HCD to design the features and interface for an electronic intervention for cirrhosis management. The HCD process altered features, modified the design interface, and likely improved CirrhosisRx's overall usability. The finalized design for CirrhosisRx proceeded to development and production and will be tested for effectiveness in a pragmatic randomized controlled trial. This work provides a model for the creation of other EHR-based interventions in hepatology care.
Collapse
Affiliation(s)
- Jin Ge
- Department of Medicine, Division of Gastroenterology and Hepatology, University of California—San Francisco, San Francisco, California, USA
| | - Ana Buenaventura
- School of Medicine Technology Services, University of California—San Francisco, San Francisco, California, USA
| | - Beth Berrean
- School of Medicine Technology Services, University of California—San Francisco, San Francisco, California, USA
| | - Jory Purvis
- School of Medicine Technology Services, University of California—San Francisco, San Francisco, California, USA
| | - Valy Fontil
- Family Health Centers, NYU-Langone Medical Center, Brooklyn, New York, USA
| | - Jennifer C. Lai
- Department of Medicine, Division of Gastroenterology and Hepatology, University of California—San Francisco, San Francisco, California, USA
| | - Mark J. Pletcher
- Department of Epidemiology and Biostatistics, University of California—San Francisco, San Francisco, California, USA
| |
Collapse
|
15
|
Abi-Rafeh J, Xu HH, Kazan R, Tevlin R, Furnas H. Large Language Models and Artificial Intelligence: A Primer for Plastic Surgeons on the Demonstrated and Potential Applications, Promises, and Limitations of ChatGPT. Aesthet Surg J 2024; 44:329-343. [PMID: 37562022 DOI: 10.1093/asj/sjad260] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 08/02/2023] [Accepted: 08/04/2023] [Indexed: 08/12/2023] Open
Abstract
BACKGROUND The rapidly evolving field of artificial intelligence (AI) holds great potential for plastic surgeons. ChatGPT, a recently released AI large language model (LLM), promises applications across many disciplines, including healthcare. OBJECTIVES The aim of this article was to provide a primer for plastic surgeons on AI, LLM, and ChatGPT, including an analysis of current demonstrated and proposed clinical applications. METHODS A systematic review was performed identifying medical and surgical literature on ChatGPT's proposed clinical applications. Variables assessed included applications investigated, command tasks provided, user input information, AI-emulated human skills, output validation, and reported limitations. RESULTS The analysis included 175 articles reporting on 13 plastic surgery applications and 116 additional clinical applications, categorized by field and purpose. Thirty-four applications within plastic surgery are thus proposed, with relevance to different target audiences, including attending plastic surgeons (n = 17, 50%), trainees/educators (n = 8, 24.0%), researchers/scholars (n = 7, 21%), and patients (n = 2, 6%). The 15 identified limitations of ChatGPT were categorized by training data, algorithm, and ethical considerations. CONCLUSIONS Widespread use of ChatGPT in plastic surgery will depend on rigorous research of proposed applications to validate performance and address limitations. This systemic review aims to guide research, development, and regulation to safely adopt AI in plastic surgery.
Collapse
|
16
|
Eppler M, Ganjavi C, Ramacciotti LS, Piazza P, Rodler S, Checcucci E, Gomez Rivas J, Kowalewski KF, Belenchón IR, Puliatti S, Taratkin M, Veccia A, Baekelandt L, Teoh JYC, Somani BK, Wroclawski M, Abreu A, Porpiglia F, Gill IS, Murphy DG, Canes D, Cacciamani GE. Awareness and Use of ChatGPT and Large Language Models: A Prospective Cross-sectional Global Survey in Urology. Eur Urol 2024; 85:146-153. [PMID: 37926642 DOI: 10.1016/j.eururo.2023.10.014] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 09/27/2023] [Accepted: 10/24/2023] [Indexed: 11/07/2023]
Abstract
BACKGROUND Since its release in November 2022, ChatGPT has captivated society and shown potential for various aspects of health care. OBJECTIVE To investigate potential use of ChatGPT, a large language model (LLM), in urology by gathering opinions from urologists worldwide. DESIGN, SETTING, AND PARTICIPANTS An open web-based survey was distributed via social media and e-mail chains to urologists between April 20, 2023 and May 5, 2023. Participants were asked to answer questions related to their knowledge and experience with artificial intelligence, as well as their opinions of potential use of ChatGPT/LLMs in research and clinical practice. OUTCOME MEASUREMENTS AND STATISTICAL ANALYSIS Data are reported as the mean and standard deviation for continuous variables, and the frequency and percentage for categorical variables. Charts and tables are used as appropriate, with descriptions of the chart types and the measures used. The data are reported in accordance with the Checklist for Reporting Results of Internet E-Surveys (CHERRIES). RESULTS AND LIMITATIONS A total of 456 individuals completed the survey (64% completion rate). Nearly half (47.7%) reported that they use ChatGPT/LLMs in their academic practice, with fewer using the technology in clinical practice (19.8%). More than half (62.2%) believe there are potential ethical concerns when using ChatGPT for scientific or academic writing, and 53% reported that they have experienced limitations when using ChatGPT in academic practice. CONCLUSIONS Urologists recognise the potential of ChatGPT/LLMs in research but have concerns regarding ethics and patient acceptance. There is a desire for regulations and guidelines to ensure appropriate use. In addition, measures should be taken to establish rules and guidelines to maximise safety and efficiency when using this novel technology. PATIENT SUMMARY A survey asked 456 urologists from around the world about using an artificial intelligence tool called ChatGPT in their work. Almost half of them use ChatGPT for research, but not many use it for patients care. The resonders think ChatGPT could be helpful, but they worry about problems like ethics and want rules to make sure it's used safely.
Collapse
Affiliation(s)
- Michael Eppler
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; AI Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Conner Ganjavi
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; AI Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Lorenzo Storino Ramacciotti
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; AI Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Pietro Piazza
- Division of Urology, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Severin Rodler
- Department of Urology, Klinikum der Universität München, Munich, Germany
| | - Enrico Checcucci
- Department of Surgery, FPO-IRCCS Candiolo Cancer Institute, Candiolo, Italy
| | - Juan Gomez Rivas
- Department of Urology, Clinico San Carlos University Hospital, Madrid, Spain
| | - Karl F Kowalewski
- Department of Urology, University Medical Center Mannheim, Heidelberg University, Mannheim, Germany
| | - Ines Rivero Belenchón
- Urology and Nephrology Department, Virgen del Rocío University Hospital, Seville, Spain
| | - Stefano Puliatti
- Urology Department, University of Modena and Reggio Emilia, Modena, Italy
| | - Mark Taratkin
- Institute for Urology and Reproductive Health, Sechenov University, Moscow, Russia
| | - Alessandro Veccia
- Department of Urology, Azienda Ospedaliera Universitaria Integrata Verona, Verona, Italy
| | - Loïc Baekelandt
- Department of Urology, University Hospitals Leuven, Leuven, Belgium
| | - Jeremy Y-C Teoh
- Department of Surgery, S.H. Ho Urology Centre, The Chinese University of Hong Kong, Hong Kong, China
| | - Bhaskar K Somani
- University Hospital Southampton NHS Foundation Trust, Southampton, UK
| | - Marcelo Wroclawski
- Hospital Israelita Albert Einstein, São Paulo, Brazil; Beneficência Portuguesa de São Paulo, São Paulo, Brazil; Faculdade de Medicina do ABC, Santo Andre, Brazil
| | - Andre Abreu
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; AI Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | | | - Inderbir S Gill
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; AI Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Declan G Murphy
- Division of Cancer Surgery, Peter MacCallum Cancer Centre, University of Melbourne, Melbourne, Australia
| | - David Canes
- Division of Urology, Lahey Hospital & Medical Center, Burlington, MA, USA
| | - Giovanni E Cacciamani
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; AI Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
17
|
Jain N, Gottlich C, Fisher J, Campano D, Winston T. Assessing ChatGPT's orthopedic in-service training exam performance and applicability in the field. J Orthop Surg Res 2024; 19:27. [PMID: 38167093 PMCID: PMC10762835 DOI: 10.1186/s13018-023-04467-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 12/12/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND ChatGPT has gained widespread attention for its ability to understand and provide human-like responses to inputs. However, few works have focused on its use in Orthopedics. This study assessed ChatGPT's performance on the Orthopedic In-Service Training Exam (OITE) and evaluated its decision-making process to determine whether adoption as a resource in the field is practical. METHODS ChatGPT's performance on three OITE exams was evaluated through inputting multiple choice questions. Questions were classified by their orthopedic subject area. Yearly, OITE technical reports were used to gauge scores against resident physicians. ChatGPT's rationales were compared with testmaker explanations using six different groups denoting answer accuracy and logic consistency. Variables were analyzed using contingency table construction and Chi-squared analyses. RESULTS Of 635 questions, 360 were useable as inputs (56.7%). ChatGPT-3.5 scored 55.8%, 47.7%, and 54% for the years 2020, 2021, and 2022, respectively. Of 190 correct outputs, 179 provided a consistent logic (94.2%). Of 170 incorrect outputs, 133 provided an inconsistent logic (78.2%). Significant associations were found between test topic and correct answer (p = 0.011), and type of logic used and tested topic (p = < 0.001). Basic Science and Sports had adjusted residuals greater than 1.96. Basic Science and correct, no logic; Basic Science and incorrect, inconsistent logic; Sports and correct, no logic; and Sports and incorrect, inconsistent logic; had adjusted residuals greater than 1.96. CONCLUSIONS Based on annual OITE technical reports for resident physicians, ChatGPT-3.5 performed around the PGY-1 level. When answering correctly, it displayed congruent reasoning with testmakers. When answering incorrectly, it exhibited some understanding of the correct answer. It outperformed in Basic Science and Sports, likely due to its ability to output rote facts. These findings suggest that it lacks the fundamental capabilities to be a comprehensive tool in Orthopedic Surgery in its current form. LEVEL OF EVIDENCE II.
Collapse
Affiliation(s)
- Neil Jain
- Department of Orthopedic Surgery, Texas Tech University Health Sciences Center Lubbock, 3601 4th St, Lubbock, TX, 79430, USA.
| | - Caleb Gottlich
- Department of Orthopedic Surgery, Texas Tech University Health Sciences Center Lubbock, 3601 4th St, Lubbock, TX, 79430, USA
| | - John Fisher
- Department of Orthopedic Surgery, Texas Tech University Health Sciences Center Lubbock, 3601 4th St, Lubbock, TX, 79430, USA
| | - Dominic Campano
- Department of Orthopedic Surgery, Texas Tech University Health Sciences Center Lubbock, 3601 4th St, Lubbock, TX, 79430, USA
| | - Travis Winston
- Department of Orthopedic Surgery, Texas Tech University Health Sciences Center Lubbock, 3601 4th St, Lubbock, TX, 79430, USA
| |
Collapse
|
18
|
Suárez A, Díaz-Flores García V, Algar J, Gómez Sánchez M, Llorente de Pedro M, Freire Y. Unveiling the ChatGPT phenomenon: Evaluating the consistency and accuracy of endodontic question answers. Int Endod J 2024; 57:108-113. [PMID: 37814369 DOI: 10.1111/iej.13985] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Revised: 09/26/2023] [Accepted: 09/27/2023] [Indexed: 10/11/2023]
Abstract
AIM Chatbot Generative Pre-trained Transformer (ChatGPT) is a generative artificial intelligence (AI) software based on large language models (LLMs), designed to simulate human conversations and generate novel content based on the training data it has been exposed to. The aim of this study was to evaluate the consistency and accuracy of ChatGPT-generated answers to clinical questions in endodontics, compared to answers provided by human experts. METHODOLOGY Ninety-one dichotomous (yes/no) questions were designed and categorized into three levels of difficulty. Twenty questions were randomly selected from each difficulty level. Sixty answers were generated by ChatGPT for each question. Two endodontic experts independently answered the 60 questions. Statistical analysis was performed using the SPSS program to calculate the consistency and accuracy of the answers generated by ChatGPT compared to the experts. Confidence intervals (95%) and standard deviations were used to estimate variability. RESULTS The answers generated by ChatGPT showed high consistency (85.44%). No significant differences in consistency were found based on question difficulty. In terms of answer accuracy, ChatGPT achieved an average accuracy of 57.33%. However, significant differences in accuracy were observed based on question difficulty, with lower accuracy for easier questions. CONCLUSIONS Currently, ChatGPT is not capable of replacing dentists in clinical decision-making. As ChatGPT's performance improves through deep learning, it is expected to become more useful and effective in the field of endodontics. However, careful attention and ongoing evaluation are needed to ensure its accuracy, reliability and safety in endodontics.
Collapse
Affiliation(s)
- Ana Suárez
- Department of Pre-Clinic Dentistry, School of Biomedical Sciences, Universidad Europea de Madrid, Madrid, Spain
| | - Víctor Díaz-Flores García
- Department of Pre-Clinic Dentistry, School of Biomedical Sciences, Universidad Europea de Madrid, Madrid, Spain
| | - Juan Algar
- Department of Clinical Dentistry, School of Biomedical Sciences, Universidad Europea de Madrid, Madrid, Spain
| | - Margarita Gómez Sánchez
- Department of Pre-Clinic Dentistry, School of Biomedical Sciences, Universidad Europea de Madrid, Madrid, Spain
| | - María Llorente de Pedro
- Department of Pre-Clinic Dentistry, School of Biomedical Sciences, Universidad Europea de Madrid, Madrid, Spain
| | - Yolanda Freire
- Department of Pre-Clinic Dentistry, School of Biomedical Sciences, Universidad Europea de Madrid, Madrid, Spain
| |
Collapse
|
19
|
Riedel M, Kaefinger K, Stuehrenberg A, Ritter V, Amann N, Graf A, Recker F, Klein E, Kiechle M, Riedel F, Meyer B. ChatGPT's performance in German OB/GYN exams - paving the way for AI-enhanced medical education and clinical practice. Front Med (Lausanne) 2023; 10:1296615. [PMID: 38155661 PMCID: PMC10753765 DOI: 10.3389/fmed.2023.1296615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 11/27/2023] [Indexed: 12/30/2023] Open
Abstract
Background Chat Generative Pre-Trained Transformer (ChatGPT) is an artificial learning and large language model tool developed by OpenAI in 2022. It utilizes deep learning algorithms to process natural language and generate responses, which renders it suitable for conversational interfaces. ChatGPT's potential to transform medical education and clinical practice is currently being explored, but its capabilities and limitations in this domain remain incompletely investigated. The present study aimed to assess ChatGPT's performance in medical knowledge competency for problem assessment in obstetrics and gynecology (OB/GYN). Methods Two datasets were established for analysis: questions (1) from OB/GYN course exams at a German university hospital and (2) from the German medical state licensing exams. In order to assess ChatGPT's performance, questions were entered into the chat interface, and responses were documented. A quantitative analysis compared ChatGPT's accuracy with that of medical students for different levels of difficulty and types of questions. Additionally, a qualitative analysis assessed the quality of ChatGPT's responses regarding ease of understanding, conciseness, accuracy, completeness, and relevance. Non-obvious insights generated by ChatGPT were evaluated, and a density index of insights was established in order to quantify the tool's ability to provide students with relevant and concise medical knowledge. Results ChatGPT demonstrated consistent and comparable performance across both datasets. It provided correct responses at a rate comparable with that of medical students, thereby indicating its ability to handle a diverse spectrum of questions ranging from general knowledge to complex clinical case presentations. The tool's accuracy was partly affected by question difficulty in the medical state exam dataset. Our qualitative assessment revealed that ChatGPT provided mostly accurate, complete, and relevant answers. ChatGPT additionally provided many non-obvious insights, especially in correctly answered questions, which indicates its potential for enhancing autonomous medical learning. Conclusion ChatGPT has promise as a supplementary tool in medical education and clinical practice. Its ability to provide accurate and insightful responses showcases its adaptability to complex clinical scenarios. As AI technologies continue to evolve, ChatGPT and similar tools may contribute to more efficient and personalized learning experiences and assistance for health care providers.
Collapse
Affiliation(s)
- Maximilian Riedel
- Department of Gynecology and Obstetrics, Klinikum Rechts der Isar, Technical University Munich (TU), Munich, Germany
| | - Katharina Kaefinger
- Department of Gynecology and Obstetrics, Klinikum Rechts der Isar, Technical University Munich (TU), Munich, Germany
| | - Antonia Stuehrenberg
- Department of Gynecology and Obstetrics, Klinikum Rechts der Isar, Technical University Munich (TU), Munich, Germany
| | - Viktoria Ritter
- Department of Gynecology and Obstetrics, Klinikum Rechts der Isar, Technical University Munich (TU), Munich, Germany
| | - Niklas Amann
- Department of Gynecology and Obstetrics, Friedrich–Alexander-University Erlangen–Nuremberg (FAU), Erlangen, Germany
| | - Anna Graf
- Department of Gynecology and Obstetrics, Klinikum Rechts der Isar, Technical University Munich (TU), Munich, Germany
| | - Florian Recker
- Department of Gynecology and Obstetrics, Bonn University Hospital, Bonn, Germany
| | - Evelyn Klein
- Department of Gynecology and Obstetrics, Klinikum Rechts der Isar, Technical University Munich (TU), Munich, Germany
| | - Marion Kiechle
- Department of Gynecology and Obstetrics, Klinikum Rechts der Isar, Technical University Munich (TU), Munich, Germany
| | - Fabian Riedel
- Department of Gynecology and Obstetrics, Heidelberg University Hospital, Heidelberg, Germany
| | - Bastian Meyer
- Department of Gynecology and Obstetrics, Klinikum Rechts der Isar, Technical University Munich (TU), Munich, Germany
| |
Collapse
|
20
|
Enomoto M, Tseng CH, Hsu YC, Thuy LTT, Nguyen MH. Collaborating with AI in literature search-An important frontier. Hepatol Commun 2023; 7:e0336. [PMID: 38055656 PMCID: PMC10984654 DOI: 10.1097/hc9.0000000000000336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 10/11/2023] [Indexed: 12/08/2023] Open
Affiliation(s)
- Masaru Enomoto
- Department of Hepatology, Graduate School of Medicine, Osaka Metropolitan University, Osaka, Japan
| | - Cheng-Hao Tseng
- Division of Gastroenterology and Hepatology, E-Da Hospital, Kaohsiung, Taiwan
| | - Yao-Chun Hsu
- Division of Gastroenterology and Hepatology, E-Da Hospital, Kaohsiung, Taiwan
| | - Le Thi Thanh Thuy
- Department of Hepatology, Graduate School of Medicine, Osaka Metropolitan University, Osaka, Japan
| | - Mindie H. Nguyen
- Division of Gastroenterology and Hepatology, Stanford University Medical Center, Palo Alto, California, USA
- Department of Epidemiology and Population Health, Stanford University Medical Center, Palo Alto, California, USA
| |
Collapse
|
21
|
Dang F, Samarasena JB. Generative Artificial Intelligence for Gastroenterology: Neither Friend nor Foe. Am J Gastroenterol 2023; 118:2146-2147. [PMID: 38033225 DOI: 10.14309/ajg.0000000000002573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 11/02/2023] [Indexed: 12/02/2023]
Affiliation(s)
- Frances Dang
- Division of Gastroenterology/Hepatology, University of California Irvine School of Medicine, Orange, California, USA
| | | |
Collapse
|
22
|
Ge J, Sun S, Owens J, Galvez V, Gologorskaya O, Lai JC, Pletcher MJ, Lai K. Development of a Liver Disease-Specific Large Language Model Chat Interface using Retrieval Augmented Generation. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.11.10.23298364. [PMID: 37986764 PMCID: PMC10659484 DOI: 10.1101/2023.11.10.23298364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Background Large language models (LLMs) have significant capabilities in clinical information processing tasks. Commercially available LLMs, however, are not optimized for clinical uses and are prone to generating incorrect or hallucinatory information. Retrieval-augmented generation (RAG) is an enterprise architecture that allows embedding of customized data into LLMs. This approach "specializes" the LLMs and is thought to reduce hallucinations. Methods We developed "LiVersa," a liver disease-specific LLM, by using our institution's protected health information (PHI)-complaint text embedding and LLM platform, "Versa." We conducted RAG on 30 publicly available American Association for the Study of Liver Diseases (AASLD) guidelines and guidance documents to be incorporated into LiVersa. We evaluated LiVersa's performance by comparing its responses versus those of trainees from a previously published knowledge assessment study regarding hepatitis B (HBV) treatment and hepatocellular carcinoma (HCC) surveillance. Results LiVersa answered all 10 questions correctly when forced to provide a "yes" or "no" answer. Full detailed responses with justifications and rationales, however, were not completely correct for three of the questions. Discussions In this study, we demonstrated the ability to build disease-specific and PHI-compliant LLMs using RAG. While our LLM, LiVersa, demonstrated more specificity in answering questions related to clinical hepatology - there were some knowledge deficiencies due to limitations set by the number and types of documents used for RAG. The LiVersa prototype, however, is a proof of concept for utilizing RAG to customize LLMs for clinical uses and a potential strategy to realize personalized medicine in the future.
Collapse
Affiliation(s)
- Jin Ge
- Division of Gastroenterology and Hepatology, Department of Medicine, University of California – San Francisco, San Francisco, CA
| | - Steve Sun
- UCSF Health Information Technology, University of California – San Francisco, San Francisco, CA
| | - Joseph Owens
- UCSF Health Information Technology, University of California – San Francisco, San Francisco, CA
| | - Victor Galvez
- UCSF Health Information Technology, University of California – San Francisco, San Francisco, CA
| | - Oksana Gologorskaya
- UCSF Health Information Technology, University of California – San Francisco, San Francisco, CA
- Bakar Computational Health Sciences Institute, University of California – San Francisco, San Francisco, CA
| | - Jennifer C. Lai
- Division of Gastroenterology and Hepatology, Department of Medicine, University of California – San Francisco, San Francisco, CA
| | - Mark J. Pletcher
- Department of Epidemiology and Biostatistics, University of California – San Francisco, San Francisco, CA
| | - Ki Lai
- UCSF Health Information Technology, University of California – San Francisco, San Francisco, CA
| |
Collapse
|
23
|
Barash Y, Klang E, Konen E, Sorin V. ChatGPT-4 Assistance in Optimizing Emergency Department Radiology Referrals and Imaging Selection. J Am Coll Radiol 2023; 20:998-1003. [PMID: 37423350 DOI: 10.1016/j.jacr.2023.06.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 06/19/2023] [Accepted: 06/29/2023] [Indexed: 07/11/2023]
Abstract
PURPOSE The quality of radiology referrals influences patient management and imaging interpretation by radiologists. The aim of this study was to evaluate ChatGPT-4 as a decision support tool for selecting imaging examinations and generating radiology referrals in the emergency department (ED). METHODS Five consecutive clinical notes from the ED were retrospectively extracted, for each of the following pathologies: pulmonary embolism, obstructing kidney stones, acute appendicitis, diverticulitis, small bowel obstruction, acute cholecystitis, acute hip fracture, and testicular torsion. A total of 40 cases were included. These notes were entered into ChatGPT-4, requesting recommendations on the most appropriate imaging examinations and protocols. The chatbot was also asked to generate radiology referrals. Two independent radiologists graded the referral on a scale ranging from 1 to 5 for clarity, clinical relevance, and differential diagnosis. The chatbot's imaging recommendations were compared with the ACR Appropriateness Criteria (AC) and with the examinations performed in the ED. Agreement between readers was assessed using linear weighted Cohen's κ coefficient. RESULTS ChatGPT-4's imaging recommendations aligned with the ACR AC and ED examinations in all cases. Protocol discrepancies between ChatGPT and the ACR AC were observed in two cases (5%). ChatGPT-4-generated referrals received mean scores of 4.6 and 4.8 for clarity, 4.5 and 4.4 for clinical relevance, and 4.9 from both reviewers for differential diagnosis. Agreement between readers was moderate for clinical relevance and clarity and substantial for differential diagnosis grading. CONCLUSIONS ChatGPT-4 has shown potential in aiding imaging study selection for select clinical cases. As a complementary tool, large language models may improve radiology referral quality. Radiologists should stay informed about this technology and be mindful of potential challenges and risks.
Collapse
Affiliation(s)
- Yiftach Barash
- Department of Diagnostic Imaging, Chaim Sheba Medical Center, Tel Hashomer, Israel; Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel; DeepVision Lab, Chaim Sheba Medical Center, Tel Hashomer, Israel.
| | - Eyal Klang
- Department of Diagnostic Imaging, Chaim Sheba Medical Center, Tel Hashomer, Israel; Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel; Head, Sami Sagol AI Hub, ARC, Chaim Sheba Medical Center, Tel Hashomer, Israel; DeepVision Lab, Chaim Sheba Medical Center, Tel Hashomer, Israel
| | - Eli Konen
- Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel; Head, Department of Diagnostic Imaging, Chaim Sheba Medical Center, Tel Hashomer, Israel
| | - Vera Sorin
- Department of Diagnostic Imaging, Chaim Sheba Medical Center, Tel Hashomer, Israel; Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel; DeepVision Lab, Chaim Sheba Medical Center, Tel Hashomer, Israel
| |
Collapse
|
24
|
Hart SN, Hoffman NG, Gershkovich P, Christenson C, McClintock DS, Miller LJ, Jackups R, Azimi V, Spies N, Brodsky V. Organizational preparedness for the use of large language models in pathology informatics. J Pathol Inform 2023; 14:100338. [PMID: 37860713 PMCID: PMC10582733 DOI: 10.1016/j.jpi.2023.100338] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 09/25/2023] [Accepted: 09/28/2023] [Indexed: 10/21/2023] Open
Abstract
In this paper, we consider the current and potential role of the latest generation of Large Language Models (LLMs) in medical informatics, particularly within the realms of clinical and anatomic pathology. We aim to provide a thorough understanding of the considerations that arise when employing LLMs in healthcare settings, such as determining appropriate use cases and evaluating the advantages and limitations of these models. Furthermore, this paper will consider the infrastructural and organizational requirements necessary for the successful implementation and utilization of LLMs in healthcare environments. We will discuss the importance of addressing education, security, bias, and privacy concerns associated with LLMs in clinical informatics, as well as the need for a robust framework to overcome regulatory, compliance, and legal challenges.
Collapse
Affiliation(s)
- Steven N. Hart
- Division of Computational Pathology and AI, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, United States
| | - Noah G. Hoffman
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, United States
| | - Peter Gershkovich
- Yale Medical School Department of Pathology, New Haven, CT, United States
| | - Chancey Christenson
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Jacksonville, FL, United States
| | - David S. McClintock
- Division of Computational Pathology and AI, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, United States
| | - Lauren J. Miller
- Department of Pathology, University of Michigan, Ann Arbor, MI, United States
| | - Ronald Jackups
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, United States
| | - Vahid Azimi
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, United States
| | - Nicholas Spies
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, United States
| | - Victor Brodsky
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, United States
| |
Collapse
|
25
|
Ge J, Li M, Delk MB, Lai JC. A comparison of large language model versus manual chart review for extraction of data elements from the electronic health record. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.08.31.23294924. [PMID: 37693398 PMCID: PMC10491368 DOI: 10.1101/2023.08.31.23294924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Importance Large language models (LLMs) have proven useful for extracting data from publicly available sources, but their uses in clinical settings and with clinical data are unknown. Objective To determine the accuracy of data extraction using "Versa Chat," a chat implementation of the general-purpose OpenAI gpt-35-turbo LLM model, versus manual chart review for hepatocellular carcinoma (HCC) imaging reports. Design We engineered a prompt for the data extraction task of six distinct data elements and input 182 abdominal imaging reports that were also manually tagged. We evaluated performance by calculating accuracy, precision, recall, and F1 scores. Setting/Participants Cross-sectional abdominal imaging reports of patients diagnosed with hepatocellular carcinoma enrolled in the Functional Assessment in Liver Transplantation (FrAILT) study.
Collapse
Affiliation(s)
- Jin Ge
- Division of Gastroenterology and Hepatology, Department of Medicine, University of California – San Francisco, San Francisco, CA
| | - Michael Li
- Division of Gastroenterology and Hepatology, Department of Medicine, University of California – San Francisco, San Francisco, CA
| | - Molly B. Delk
- Section of Gastroenterology and Hepatology, Department of Medicine, Tulane University School of Medicine, New Orleans, LA
| | - Jennifer C. Lai
- Division of Gastroenterology and Hepatology, Department of Medicine, University of California – San Francisco, San Francisco, CA
| |
Collapse
|
26
|
Lim ZW, Pushpanathan K, Yew SME, Lai Y, Sun CH, Lam JSH, Chen DZ, Goh JHL, Tan MCJ, Sheng B, Cheng CY, Koh VTC, Tham YC. Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard. EBioMedicine 2023; 95:104770. [PMID: 37625267 PMCID: PMC10470220 DOI: 10.1016/j.ebiom.2023.104770] [Citation(s) in RCA: 78] [Impact Index Per Article: 78.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 07/21/2023] [Accepted: 08/08/2023] [Indexed: 08/27/2023] Open
Abstract
BACKGROUND Large language models (LLMs) are garnering wide interest due to their human-like and contextually relevant responses. However, LLMs' accuracy across specific medical domains has yet been thoroughly evaluated. Myopia is a frequent topic which patients and parents commonly seek information online. Our study evaluated the performance of three LLMs namely ChatGPT-3.5, ChatGPT-4.0, and Google Bard, in delivering accurate responses to common myopia-related queries. METHODS We curated thirty-one commonly asked myopia care-related questions, which were categorised into six domains-pathogenesis, risk factors, clinical presentation, diagnosis, treatment and prevention, and prognosis. Each question was posed to the LLMs, and their responses were independently graded by three consultant-level paediatric ophthalmologists on a three-point accuracy scale (poor, borderline, good). A majority consensus approach was used to determine the final rating for each response. 'Good' rated responses were further evaluated for comprehensiveness on a five-point scale. Conversely, 'poor' rated responses were further prompted for self-correction and then re-evaluated for accuracy. FINDINGS ChatGPT-4.0 demonstrated superior accuracy, with 80.6% of responses rated as 'good', compared to 61.3% in ChatGPT-3.5 and 54.8% in Google Bard (Pearson's chi-squared test, all p ≤ 0.009). All three LLM-Chatbots showed high mean comprehensiveness scores (Google Bard: 4.35; ChatGPT-4.0: 4.23; ChatGPT-3.5: 4.11, out of a maximum score of 5). All LLM-Chatbots also demonstrated substantial self-correction capabilities: 66.7% (2 in 3) of ChatGPT-4.0's, 40% (2 in 5) of ChatGPT-3.5's, and 60% (3 in 5) of Google Bard's responses improved after self-correction. The LLM-Chatbots performed consistently across domains, except for 'treatment and prevention'. However, ChatGPT-4.0 still performed superiorly in this domain, receiving 70% 'good' ratings, compared to 40% in ChatGPT-3.5 and 45% in Google Bard (Pearson's chi-squared test, all p ≤ 0.001). INTERPRETATION Our findings underscore the potential of LLMs, particularly ChatGPT-4.0, for delivering accurate and comprehensive responses to myopia-related queries. Continuous strategies and evaluations to improve LLMs' accuracy remain crucial. FUNDING Dr Yih-Chung Tham was supported by the National Medical Research Council of Singapore (NMRC/MOH/HCSAINV21nov-0001).
Collapse
Affiliation(s)
- Zhi Wei Lim
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Krithi Pushpanathan
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore
| | - Samantha Min Er Yew
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore
| | - Yien Lai
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
| | - Chen-Hsin Sun
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
| | - Janice Sing Harn Lam
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
| | - David Ziyou Chen
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
| | | | - Marcus Chun Jin Tan
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
| | - Bin Sheng
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China; Department of Endocrinology and Metabolism, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China; MoE Key Lab of Artificial Intelligence, Artificial Intelligence Institute, Shanghai Jiao Tong University, Shanghai, China
| | - Ching-Yu Cheng
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore; Eye Academic Clinical Program (Eye ACP), Duke NUS Medical School, Singapore
| | - Victor Teck Chang Koh
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
| | - Yih-Chung Tham
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore; Eye Academic Clinical Program (Eye ACP), Duke NUS Medical School, Singapore.
| |
Collapse
|
27
|
Li W, Zhang Y, Chen F. ChatGPT in Colorectal Surgery: A Promising Tool or a Passing Fad? Ann Biomed Eng 2023; 51:1892-1897. [PMID: 37162695 DOI: 10.1007/s10439-023-03232-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 05/03/2023] [Indexed: 05/11/2023]
Abstract
Colorectal surgery is a specialized branch of surgery that involves the diagnosis and treatment of conditions affecting the colon, rectum, and anus. In the recent years, the use of artificial intelligence (AI) has gained considerable interest in various medical specialties, including surgery. Chatbot Generative Pre-Trained Transformer (ChatGPT), an AI-based chatbot developed by OpenAI, has shown great potential in improving the quality of healthcare delivery by providing accurate and timely information to both patients and healthcare professionals. In this paper, we investigate the potential application of ChatGPT in colorectal surgery. We also discuss the potential advantages and challenges associated with the implementation of ChatGPT in the surgical setting. Furthermore, we address the socio-ethical implications of utilizing ChatGPT in healthcare. This includes concerns over patient privacy, liability, and the potential impact on the doctor-patient relationship. Our findings suggest that ChatGPT has the potential to revolutionize the field of colorectal surgery by providing personalized and precise medical information, reducing errors and complications, and improving patient outcomes.
Collapse
Affiliation(s)
- Wenbo Li
- Department of Nursing, Jinzhou Medical University, Jinzhou, China
| | - Yinxu Zhang
- Department of Colorectal Surgery, The First Affiliated Hospital, Jinzhou Medical University, Jinzhou, 121001, China
| | - Fengmin Chen
- Department of Colorectal Surgery, The First Affiliated Hospital, Jinzhou Medical University, Jinzhou, 121001, China.
| |
Collapse
|
28
|
Ge J, Fontil V, Ackerman S, Pletcher MJ, Lai JC. Clinical decision support and electronic interventions to improve care quality in chronic liver diseases and cirrhosis. Hepatology 2023:01515467-990000000-00546. [PMID: 37611253 PMCID: PMC10998693 DOI: 10.1097/hep.0000000000000583] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 07/17/2023] [Indexed: 08/25/2023]
Abstract
Significant quality gaps exist in the management of chronic liver diseases and cirrhosis. Clinical decision support systems-information-driven tools based in and launched from the electronic health record-are attractive and potentially scalable prospective interventions that could help standardize clinical care in hepatology. Yet, clinical decision support systems have had a mixed record in clinical medicine due to issues with interoperability and compatibility with clinical workflows. In this review, we discuss the conceptual origins of clinical decision support systems, existing applications in liver diseases, issues and challenges with implementation, and emerging strategies to improve their integration in hepatology care.
Collapse
Affiliation(s)
- Jin Ge
- Department of Medicine, Division of Gastroenterology and Hepatology, University of California – San Francisco, San Francisco, California, USA
| | - Valy Fontil
- Department of Medicine, NYU Grossman School of Medicine and Family Health Centers at NYU-Langone Medical Center, Brooklyn, New York, USA
| | - Sara Ackerman
- Department of Social and Behavioral Sciences, University of California – San Francisco, San Francisco, California, USA
| | - Mark J. Pletcher
- Department of Epidemiology and Biostatistics, University of California – San Francisco, San Francisco, California, USA
| | - Jennifer C. Lai
- Department of Medicine, Division of Gastroenterology and Hepatology, University of California – San Francisco, San Francisco, California, USA
| |
Collapse
|
29
|
Samaan JS, Yeo YH, Ng WH, Ting PS, Trivedi H, Vipani A, Yang JD, Liran O, Spiegel B, Kuo A, Ayoub WS. ChatGPT's ability to comprehend and answer cirrhosis related questions in Arabic. Arab J Gastroenterol 2023; 24:145-148. [PMID: 37673708 DOI: 10.1016/j.ajg.2023.08.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 07/24/2023] [Accepted: 08/18/2023] [Indexed: 09/08/2023]
Abstract
BACKGROUND AND STUDY AIMS Cirrhosis is a chronic progressive disease which requires complex care. Its incidence is rising in the Arab countries making it the 7th leading cause of death in the Arab League in 2010. ChatGPT is a large language model with a growing body of literature demonstrating its ability to answer clinical questions. We examined ChatGPT's accuracy in responding to cirrhosis related questions in Arabic and compared its performance to English. MATERIALS AND METHODS ChatGPTs responses to 91 questions in Arabic and English were graded by a transplant hepatologist fluent in both languages. Accuracy of responses was assessed using the scale: 1. Comprehensive, 2. Correct but inadequate, 3. Mixed with correct and incorrect/outdated data, and 4. Completely incorrect.Accuracy of Arabic compared to English responses was assessed using the scale: 1. Arabic response is more accurate, 2. Similar accuracy, 3. Arabic response is less accurate. RESULTS The model provided 22 (24.2%) comprehensive, 44 (48.4%) correct but inadequate, 13 (14.3%) mixed with correct and incorrect/outdated data and 12 (13.2%) completely incorrect Arabic responses. When comparing the accuracy of Arabic and English responses, 9 (9.9%) of the Arabic responses were graded as more accurate, 52 (57.1%) similar in accuracy and 30 (33.0%) as less accurate compared to English. CONCLUSION ChatGPT has the potential to serve as an adjunct source of information for Arabic speaking patients with cirrhosis. The model provided correct responses in Arabic to 72.5% of questions, although its performance in Arabic was less accurate than in English. The model produced completely incorrect responses to 13.2% of questions, reinforcing its potential role as an adjunct and not replacement of care by licensed healthcare professionals. Future studies to refine this technology are needed to help Arabic speaking patients with cirrhosis across the globe understand their disease and improve their outcomes.
Collapse
Affiliation(s)
- Jamil S Samaan
- Karsh Division of Gastroenterology and Hepatology, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Yee Hui Yeo
- Karsh Division of Gastroenterology and Hepatology, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Wee Han Ng
- Bristol Medical School, University of Bristol, Bristol, UK
| | | | - Hirsh Trivedi
- Karsh Division of Gastroenterology and Hepatology, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA; Comprehensive Transplant Center, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Aarshi Vipani
- Karsh Division of Gastroenterology and Hepatology, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Ju Dong Yang
- Karsh Division of Gastroenterology and Hepatology, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA; Comprehensive Transplant Center, Cedars-Sinai Medical Center, Los Angeles, CA, USA; Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Omer Liran
- Department of Psychiatry and Behavioral Sciences, Cedars-Sinai, Los Angeles, CA, USA; Division of Health Services Research, Department of Medicine, Cedars-Sinai, Los Angeles, CA, USA
| | - Brennan Spiegel
- Karsh Division of Gastroenterology and Hepatology, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA; Division of Health Services Research, Department of Medicine, Cedars-Sinai, Los Angeles, CA, USA
| | - Alexander Kuo
- Karsh Division of Gastroenterology and Hepatology, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA; Comprehensive Transplant Center, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Walid S Ayoub
- Karsh Division of Gastroenterology and Hepatology, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA; Comprehensive Transplant Center, Cedars-Sinai Medical Center, Los Angeles, CA, USA.
| |
Collapse
|
30
|
Lahat A, Shachar E, Avidan B, Glicksberg B, Klang E. Evaluating the Utility of a Large Language Model in Answering Common Patients' Gastrointestinal Health-Related Questions: Are We There Yet? Diagnostics (Basel) 2023; 13:diagnostics13111950. [PMID: 37296802 DOI: 10.3390/diagnostics13111950] [Citation(s) in RCA: 33] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/28/2023] [Accepted: 06/01/2023] [Indexed: 06/12/2023] Open
Abstract
BACKGROUND AND AIMS Patients frequently have concerns about their disease and find it challenging to obtain accurate Information. OpenAI's ChatGPT chatbot (ChatGPT) is a new large language model developed to provide answers to a wide range of questions in various fields. Our aim is to evaluate the performance of ChatGPT in answering patients' questions regarding gastrointestinal health. METHODS To evaluate the performance of ChatGPT in answering patients' questions, we used a representative sample of 110 real-life questions. The answers provided by ChatGPT were rated in consensus by three experienced gastroenterologists. The accuracy, clarity, and efficacy of the answers provided by ChatGPT were assessed. RESULTS ChatGPT was able to provide accurate and clear answers to patients' questions in some cases, but not in others. For questions about treatments, the average accuracy, clarity, and efficacy scores (1 to 5) were 3.9 ± 0.8, 3.9 ± 0.9, and 3.3 ± 0.9, respectively. For symptoms questions, the average accuracy, clarity, and efficacy scores were 3.4 ± 0.8, 3.7 ± 0.7, and 3.2 ± 0.7, respectively. For diagnostic test questions, the average accuracy, clarity, and efficacy scores were 3.7 ± 1.7, 3.7 ± 1.8, and 3.5 ± 1.7, respectively. CONCLUSIONS While ChatGPT has potential as a source of information, further development is needed. The quality of information is contingent upon the quality of the online information provided. These findings may be useful for healthcare providers and patients alike in understanding the capabilities and limitations of ChatGPT.
Collapse
Affiliation(s)
- Adi Lahat
- Chaim Sheba Medical Center, Department of Gastroenterology, Affiliated to Tel Aviv University, Tel Aviv 69978, Israel
| | - Eyal Shachar
- Chaim Sheba Medical Center, Department of Gastroenterology, Affiliated to Tel Aviv University, Tel Aviv 69978, Israel
| | - Benjamin Avidan
- Chaim Sheba Medical Center, Department of Gastroenterology, Affiliated to Tel Aviv University, Tel Aviv 69978, Israel
| | - Benjamin Glicksberg
- Mount Sinai Clinical Intelligence Center, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Eyal Klang
- The Sami Sagol AI Hub, ARC Innovation Center, Chaim Sheba Medical Center, Affiliated to Tel-Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
31
|
Javan R, Kim T, Mostaghni N, Sarin S. ChatGPT's Potential Role in Interventional Radiology. Cardiovasc Intervent Radiol 2023:10.1007/s00270-023-03448-4. [PMID: 37127733 DOI: 10.1007/s00270-023-03448-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Accepted: 04/12/2023] [Indexed: 05/03/2023]
Affiliation(s)
- Ramin Javan
- Department of Radiology, George Washington University Hospital, 900 23Rd St NW, Suite G2092, Washington, DC, 20037, USA.
| | - Theodore Kim
- George Washington University School of Medicine and Health Sciences, Washington, DC, 20037, USA
| | - Navid Mostaghni
- School of Medicine, California University of Science and Medicine, Colton, CA, 92324, USA
| | - Shawn Sarin
- Department of Interventional Radiology, George Washington University Hospital, Washington, DC, 20037, USA
| |
Collapse
|
32
|
Oanh NK, Na BK, Yoo WG. The Potential Breakthroughs with ChatGPT in Parasitology. IRANIAN JOURNAL OF PARASITOLOGY 2023; 18:275-278. [PMID: 37583636 PMCID: PMC10423911 DOI: 10.18502/ijpa.v18i2.13197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Accepted: 05/26/2023] [Indexed: 08/17/2023]
Abstract
The Article Abstract is not available.
Collapse
Affiliation(s)
- Nguyen Kim Oanh
- Department of Parasitology and Tropical Medicine, Gyeongsang National University College of Medicine, Jinju 52727, Republic of Korea
- Department of Convergence Medical Science, Gyeongsang National University, Jinju 52727, Republic of Korea
| | - Byoung-Kuk Na
- Department of Parasitology and Tropical Medicine, Gyeongsang National University College of Medicine, Jinju 52727, Republic of Korea
- Department of Convergence Medical Science, Gyeongsang National University, Jinju 52727, Republic of Korea
| | - Won Gi Yoo
- Department of Parasitology and Tropical Medicine, Gyeongsang National University College of Medicine, Jinju 52727, Republic of Korea
- Department of Convergence Medical Science, Gyeongsang National University, Jinju 52727, Republic of Korea
| |
Collapse
|