Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Ge J, Lai JC. Artificial intelligence-based text generators in hepatology: ChatGPT is just the beginning. Hepatol Commun 2023;7:e0097. [PMID: 36972383 PMCID: PMC10043591 DOI: 10.1097/hc9.0000000000000097] [Citation(s) in RCA: 31] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 01/31/2023] [Indexed: 03/29/2023] Open

For:	Ge J, Lai JC. Artificial intelligence-based text generators in hepatology: ChatGPT is just the beginning. Hepatol Commun 2023;7:e0097. [PMID: 36972383 PMCID: PMC10043591 DOI: 10.1097/hc9.0000000000000097] [Citation(s) in RCA: 31] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 01/31/2023] [Indexed: 03/29/2023] Open

Number

Cited by Other Article(s)

Ge J, Kim WR, Kwong AJ. Common definitions and variables are needed for the United States to join the conversation on acute-on-chronic liver failure. Am J Transplant 2024;24:1755-1760. [PMID: 38977243 PMCID: PMC11439574 DOI: 10.1016/j.ajt.2024.06.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2024] [Revised: 06/19/2024] [Accepted: 06/28/2024] [Indexed: 07/10/2024]

Al-Abdullatif AM, Alsubaie MA. ChatGPT in Learning: Assessing Students' Use Intentions through the Lens of Perceived Value and the Influence of AI Literacy. Behav Sci (Basel) 2024;14:845. [PMID: 39336060 PMCID: PMC11428673 DOI: 10.3390/bs14090845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2024] [Revised: 09/14/2024] [Accepted: 09/18/2024] [Indexed: 09/30/2024] Open

Gravina AG, Pellegrino R, Palladino G, Imperio G, Ventura A, Federico A. Charting new AI education in gastroenterology: Cross-sectional evaluation of ChatGPT and perplexity AI in medical residency exam. Dig Liver Dis 2024;56:1304-1311. [PMID: 38503659 DOI: 10.1016/j.dld.2024.02.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 02/08/2024] [Accepted: 02/28/2024] [Indexed: 03/21/2024]

Knoedler L, Vogt A, Alfertshofer M, Camacho JM, Najafali D, Kehrer A, Prantl L, Iske J, Dean J, Hoefer S, Knoedler C, Knoedler S. The law code of ChatGPT and artificial intelligence-how to shield plastic surgeons and reconstructive surgeons against Justitia's sword. Front Surg 2024;11:1390684. [PMID: 39132668 PMCID: PMC11312379 DOI: 10.3389/fsurg.2024.1390684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Accepted: 07/02/2024] [Indexed: 08/13/2024] Open

Wang D, Liang J, Ye J, Li J, Li J, Zhang Q, Hu Q, Pan C, Wang D, Liu Z, Shi W, Shi D, Li F, Qu B, Zheng Y. Enhancement of Large Language Models' Performance in Diabetes Education: Retrieval-Augmented Generation Approach. J Med Internet Res 2024. [PMID: 39046096 DOI: 10.2196/58041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/25/2024] Open

Abstract

BACKGROUND

Large language models (LLMs) demonstrated advanced performance in processing clinical information. However, commercially available LLMs lack specialized medical knowledge and remain susceptible to generating inaccurate information. Given the need for self-management in diabetes, patients commonly seek information online. We introduce the RISE framework and evaluate its performance in enhancing LLMs to provide accurate responses to diabetes-related inquiries.

OBJECTIVE

This study aimed to evaluate the potential of RISE framework, an information retrieval and augmentation tool, to improve the LLM's performance to accurately and safely respond to diabetes-related inquiries.

METHODS

The RISE, an innovative retrieval augmentation framework, comprises four steps: Rewriting Query, Information Retrieval, Summarization, and Execution. Using a set of 43 common diabetes-related questions, we evaluated three base LLMs (GPT-4, Anthropic Claude 2, Google Bard) and their RISE-enhanced versions. Assessments were conducted by clinicians for accuracy and comprehensiveness, and by patients for understandability.

RESULTS

The integration of RISE significantly improved the accuracy and comprehensiveness of responses from all three based LLMs. On average, the percentage of accurate responses increased by 12% (122 - 107/129) with RISE. Specifically, the rates of accurate responses increased by 7% (42 - 39/43) for GPT-4, 19% (39 - 31/43) for Claude 2, and 9% (41 - 37/43) for Google Bard. The framework also enhanced response comprehensiveness, with mean scores improving by 0.44. Understandability was also enhanced by 0.19 on average. Data collection was conducted from Sept. 30, 2023, to Feb. 05, 2024.

CONCLUSIONS

RISE significantly improves LLMs' performance in responding to diabetes-related inquiries, enhancing accuracy, comprehensiveness, and understandability. These improvements have crucial implications for RISE's future role in patient education and chronic illness self-management, which contributes to relieving medical resource pressures and raising public awareness of medical knowledge.

CLINICALTRIAL

Collapse

Lahat A, Sharif K, Zoabi N, Shneor Patt Y, Sharif Y, Fisher L, Shani U, Arow M, Levin R, Klang E. Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4. J Med Internet Res 2024;26:e54571. [PMID: 38935937 PMCID: PMC11240076 DOI: 10.2196/54571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 02/02/2024] [Accepted: 04/29/2024] [Indexed: 06/29/2024] Open

Abstract

BACKGROUND

Artificial intelligence, particularly chatbot systems, is becoming an instrumental tool in health care, aiding clinical decision-making and patient engagement.

OBJECTIVE

This study aims to analyze the performance of ChatGPT-3.5 and ChatGPT-4 in addressing complex clinical and ethical dilemmas, and to illustrate their potential role in health care decision-making while comparing seniors' and residents' ratings, and specific question types.

METHODS

A total of 4 specialized physicians formulated 176 real-world clinical questions. A total of 8 senior physicians and residents assessed responses from GPT-3.5 and GPT-4 on a 1-5 scale across 5 categories: accuracy, relevance, clarity, utility, and comprehensiveness. Evaluations were conducted within internal medicine, emergency medicine, and ethics. Comparisons were made globally, between seniors and residents, and across classifications.

RESULTS

Both GPT models received high mean scores (4.4, SD 0.8 for GPT-4 and 4.1, SD 1.0 for GPT-3.5). GPT-4 outperformed GPT-3.5 across all rating dimensions, with seniors consistently rating responses higher than residents for both models. Specifically, seniors rated GPT-4 as more beneficial and complete (mean 4.6 vs 4.0 and 4.6 vs 4.1, respectively; P<.001), and GPT-3.5 similarly (mean 4.1 vs 3.7 and 3.9 vs 3.5, respectively; P<.001). Ethical queries received the highest ratings for both models, with mean scores reflecting consistency across accuracy and completeness criteria. Distinctions among question types were significant, particularly for the GPT-4 mean scores in completeness across emergency, internal, and ethical questions (4.2, SD 1.0; 4.3, SD 0.8; and 4.5, SD 0.7, respectively; P<.001), and for GPT-3.5's accuracy, beneficial, and completeness dimensions.

CONCLUSIONS

ChatGPT's potential to assist physicians with medical issues is promising, with prospects to enhance diagnostics, treatments, and ethics. While integration into clinical workflows may be valuable, it must complement, not replace, human expertise. Continued research is essential to ensure safe and effective implementation in clinical environments.

Collapse

Calvo-Lorenzo I, Uriarte-Llano I. [Massive generation of synthetic medical records with ChatGPT: An example in hip fractures]. Med Clin (Barc) 2024;162:549-554. [PMID: 38290872 DOI: 10.1016/j.medcli.2023.11.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 11/20/2023] [Accepted: 11/22/2023] [Indexed: 02/01/2024]

Abi-Rafeh J, Cattelan L, Xu HH, Bassiri-Tehrani B, Kazan R, Nahai F. Artificial Intelligence-Generated Social Media Content Creation and Management Strategies for Plastic Surgeons. Aesthet Surg J 2024;44:769-778. [PMID: 38366026 DOI: 10.1093/asj/sjae036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Revised: 02/02/2024] [Accepted: 02/08/2024] [Indexed: 02/18/2024] Open

Abstract

BACKGROUND

Social media platforms have come to represent integral components of the professional marketing and advertising strategy for plastic surgeons. Effective and consistent content development, however, remains technically demanding and time consuming, prompting most to employ, at non-negligible costs, social media marketing specialists for content planning and development.

OBJECTIVES

In the present study, we aimed to investigate the ability of presently available artificial intelligence (AI) models to assist plastic surgeons in their social media content development and sharing plans.

METHODS

An AI large language model was prompted on the study's objectives through a series of standardized user interactions. Social media platforms of interest, on which the AI model was prompted, included Instagram, TikTok, and X (formerly Twitter).

RESULTS

A 1-year, entirely AI-generated social media plan, comprising a total of 1091 posts for the 3 aforementioned social media platforms, is presented. Themes of the AI-generated content proposed for each platform were classified in 6 categories, including patient-related, practice-related, educational, "uplifting," interactive, and promotional posts. Overall, 91 publicly recognized holidays and observant and awareness days were incorporated into the content calendars. The AI model demonstrated an ability to differentiate between the distinct formats of each of the 3 social media platforms investigated, generating unique ideas for each, and providing detailed content development and posting instructions, scripts, and post captions, leveraging features specific to each platform.

CONCLUSIONS

By providing detailed and actionable social media content creation and posting plans to plastic surgeons, presently available AI models can be readily leveraged to assist in and significantly alleviate the burden associated with social media account management, content generation, and potentially patient conversion.

Collapse

Amacher SA, Arpagaus A, Sahmer C, Becker C, Gross S, Urben T, Tisljar K, Sutter R, Marsch S, Hunziker S. Prediction of outcomes after cardiac arrest by a generative artificial intelligence model. Resusc Plus 2024;18:100587. [PMID: 38433764 PMCID: PMC10906512 DOI: 10.1016/j.resplu.2024.100587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 02/01/2024] [Accepted: 02/11/2024] [Indexed: 03/05/2024] Open

Abstract

Aims

To investigate the prognostic accuracy of a non-medical generative artificial intelligence model (Chat Generative Pre-Trained Transformer 4 - ChatGPT-4) as a novel aspect in predicting death and poor neurological outcome at hospital discharge based on real-life data from cardiac arrest patients.

Methods

This prospective cohort study investigates the prognostic performance of ChatGPT-4 to predict outcomes at hospital discharge of adult cardiac arrest patients admitted to intensive care at a large Swiss tertiary academic medical center (COMMUNICATE/PROPHETIC cohort study). We prompted ChatGPT-4 with sixteen prognostic parameters derived from established post-cardiac arrest scores for each patient. We compared the prognostic performance of ChatGPT-4 regarding the area under the curve (AUC), sensitivity, specificity, positive and negative predictive values, and likelihood ratios of three cardiac arrest scores (Out-of-Hospital Cardiac Arrest [OHCA], Cardiac Arrest Hospital Prognosis [CAHP], and PROgnostication using LOGistic regression model for Unselected adult cardiac arrest patients in the Early stages [PROLOGUE score]) for in-hospital mortality and poor neurological outcome.

Results

Mortality at hospital discharge was 43% (n = 309/713), 54% of patients (n = 387/713) had a poor neurological outcome. ChatGPT-4 showed good discrimination regarding in-hospital mortality with an AUC of 0.85, similar to the OHCA, CAHP, and PROLOGUE (AUCs of 0.82, 0.83, and 0.84, respectively) scores. For poor neurological outcome, ChatGPT-4 showed a similar prediction to the post-cardiac arrest scores (AUC 0.83).

Conclusions

ChatGPT-4 showed a similar performance in predicting mortality and poor neurological outcome compared to validated post-cardiac arrest scores. However, more research is needed regarding illogical answers for potential incorporation of an LLM in the multimodal outcome prognostication after cardiac arrest.

Collapse

Marti-Aguado D, Pazó J, Diaz-Gonzalez A, de Las Heras Páez de la Cadena B, Conthe A, Gallego Duran R, Rodríguez-Gandía MA, Turnes J, Romero-Gomez M. LiverAI: New tool in the landscape for liver health. GASTROENTEROLOGIA Y HEPATOLOGIA 2024;47:646-648. [PMID: 38582150 DOI: 10.1016/j.gastrohep.2024.04.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 04/02/2024] [Indexed: 04/08/2024]

Saeidnia HR, Kozak M, Lund BD, Hassanzadeh M. Evaluation of ChatGPT's responses to information needs and information seeking of dementia patients. Sci Rep 2024;14:10273. [PMID: 38704403 PMCID: PMC11069588 DOI: 10.1038/s41598-024-61068-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 04/30/2024] [Indexed: 05/06/2024] Open

Abstract

Many people in the advanced stages of dementia require full-time caregivers, most of whom are family members who provide informal (non-specialized) care. It is important to provide these caregivers with high-quality information to help them understand and manage the symptoms and behaviors of dementia patients. This study aims to evaluate ChatGPT, a chatbot built using the Generative Pre-trained Transformer (GPT) large language model, in responding to information needs and information seeking of such informal caregivers. We identified the information needs of dementia patients based on the relevant literature (22 articles were selected from 2442 retrieved articles). From this analysis, we created a list of 31 items that describe these information needs, and used them to formulate 118 relevant questions. We then asked these questions to ChatGPT and investigated its responses. In the next phase, we asked 15 informal and 15 formal dementia-patient caregivers to analyze and evaluate these ChatGPT responses, using both quantitative (questionnaire) and qualitative (interview) approaches. In the interviews conducted, informal caregivers were more positive towards the use of ChatGPT to obtain non-specialized information about dementia compared to formal caregivers. However, ChatGPT struggled to provide satisfactory responses to more specialized (clinical) inquiries. In the questionnaire study, informal caregivers gave higher ratings to ChatGPT's responsiveness on the 31 items describing information needs, giving an overall mean score of 3.77 (SD 0.98) out of 5; the mean score among formal caregivers was 3.13 (SD 0.65), indicating that formal caregivers showed less trust in ChatGPT's responses compared to informal caregivers. ChatGPT's responses to non-clinical information needs related to dementia patients were generally satisfactory at this stage. As this tool is still under heavy development, it holds promise for providing even higher-quality information in response to information needs, particularly when developed in collaboration with healthcare professionals. Thus, large language models such as ChatGPT can serve as valuable sources of information for informal caregivers, although they may not fully meet the needs of formal caregivers who seek specialized (clinical) answers. Nevertheless, even in its current state, ChatGPT was able to provide responses to some of the clinical questions related to dementia that were asked.

Collapse

Ge J, Chen IY, Pletcher MJ, Lai JC. Prompt Engineering for Generative Artificial Intelligence in Gastroenterology and Hepatology. Am J Gastroenterol 2024;119:00000434-990000000-01003. [PMID: 38294157 PMCID: PMC11413230 DOI: 10.14309/ajg.0000000000002689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 12/28/2023] [Indexed: 02/01/2024]

Ge J, Sun S, Owens J, Galvez V, Gologorskaya O, Lai JC, Pletcher MJ, Lai K. Development of a liver disease-specific large language model chat interface using retrieval-augmented generation. Hepatology 2024:01515467-990000000-00791. [PMID: 38451962 DOI: 10.1097/hep.0000000000000834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 02/24/2024] [Indexed: 03/09/2024]

Abstract

BACKGROUND AND AIMS

Large language models (LLMs) have significant capabilities in clinical information processing tasks. Commercially available LLMs, however, are not optimized for clinical uses and are prone to generating hallucinatory information. Retrieval-augmented generation (RAG) is an enterprise architecture that allows the embedding of customized data into LLMs. This approach "specializes" the LLMs and is thought to reduce hallucinations.

APPROACH AND RESULTS

We developed "LiVersa," a liver disease-specific LLM, by using our institution's protected health information-complaint text embedding and LLM platform, "Versa." We conducted RAG on 30 publicly available American Association for the Study of Liver Diseases guidance documents to be incorporated into LiVersa. We evaluated LiVersa's performance by conducting 2 rounds of testing. First, we compared LiVersa's outputs versus those of trainees from a previously published knowledge assessment. LiVersa answered all 10 questions correctly. Second, we asked 15 hepatologists to evaluate the outputs of 10 hepatology topic questions generated by LiVersa, OpenAI's ChatGPT 4, and Meta's Large Language Model Meta AI 2. LiVersa's outputs were more accurate but were rated less comprehensive and safe compared to those of ChatGPT 4.

RESULTS

We evaluated LiVersa's performance by conducting 2 rounds of testing. First, we compared LiVersa's outputs versus those of trainees from a previously published knowledge assessment. LiVersa answered all 10 questions correctly. Second, we asked 15 hepatologists to evaluate the outputs of 10 hepatology topic questions generated by LiVersa, OpenAI's ChatGPT 4, and Meta's Large Language Model Meta AI 2. LiVersa's outputs were more accurate but were rated less comprehensive and safe compared to those of ChatGPT 4.

CONCLUSIONS

In this demonstration, we built disease-specific and protected health information-compliant LLMs using RAG. While LiVersa demonstrated higher accuracy in answering questions related to hepatology, there were some deficiencies due to limitations set by the number of documents used for RAG. LiVersa will likely require further refinement before potential live deployment. The LiVersa prototype, however, is a proof of concept for utilizing RAG to customize LLMs for clinical use cases.

Collapse

Ge J, Buenaventura A, Berrean B, Purvis J, Fontil V, Lai JC, Pletcher MJ. Applying human-centered design to the construction of a cirrhosis management clinical decision support system. Hepatol Commun 2024;8:e0394. [PMID: 38407255 PMCID: PMC10898661 DOI: 10.1097/hc9.0000000000000394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 12/13/2023] [Indexed: 02/27/2024] Open

Abstract

BACKGROUND

Electronic health record (EHR)-based clinical decision support is a scalable way to help standardize clinical care. Clinical decision support systems have not been extensively investigated in cirrhosis management. Human-centered design (HCD) is an approach that engages with potential users in intervention development. In this study, we applied HCD to design the features and interface for a clinical decision support system for cirrhosis management, called CirrhosisRx.

METHODS

We conducted technical feasibility assessments to construct a visual blueprint that outlines the basic features of the interface. We then convened collaborative-design workshops with generalist and specialist clinicians. We elicited current workflows for cirrhosis management, assessed gaps in existing EHR systems, evaluated potential features, and refined the design prototype for CirrhosisRx. At the conclusion of each workshop, we analyzed recordings and transcripts.

RESULTS

Workshop feedback showed that the aggregation of relevant clinical data into 6 cirrhosis decompensation domains (defined as common inpatient clinical scenarios) was the most important feature. Automatic inference of clinical events from EHR data, such as gastrointestinal bleeding from hemoglobin changes, was not accepted due to accuracy concerns. Visualizations for risk stratification scores were deemed not necessary. Lastly, the HCD co-design workshops allowed us to identify the target user population (generalists).

CONCLUSIONS

This is one of the first applications of HCD to design the features and interface for an electronic intervention for cirrhosis management. The HCD process altered features, modified the design interface, and likely improved CirrhosisRx's overall usability. The finalized design for CirrhosisRx proceeded to development and production and will be tested for effectiveness in a pragmatic randomized controlled trial. This work provides a model for the creation of other EHR-based interventions in hepatology care.

Collapse

Abi-Rafeh J, Xu HH, Kazan R, Tevlin R, Furnas H. Large Language Models and Artificial Intelligence: A Primer for Plastic Surgeons on the Demonstrated and Potential Applications, Promises, and Limitations of ChatGPT. Aesthet Surg J 2024;44:329-343. [PMID: 37562022 DOI: 10.1093/asj/sjad260] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 08/02/2023] [Accepted: 08/04/2023] [Indexed: 08/12/2023] Open

Eppler M, Ganjavi C, Ramacciotti LS, Piazza P, Rodler S, Checcucci E, Gomez Rivas J, Kowalewski KF, Belenchón IR, Puliatti S, Taratkin M, Veccia A, Baekelandt L, Teoh JYC, Somani BK, Wroclawski M, Abreu A, Porpiglia F, Gill IS, Murphy DG, Canes D, Cacciamani GE. Awareness and Use of ChatGPT and Large Language Models: A Prospective Cross-sectional Global Survey in Urology. Eur Urol 2024;85:146-153. [PMID: 37926642 DOI: 10.1016/j.eururo.2023.10.014] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 09/27/2023] [Accepted: 10/24/2023] [Indexed: 11/07/2023]

Abstract

BACKGROUND

Since its release in November 2022, ChatGPT has captivated society and shown potential for various aspects of health care.

OBJECTIVE

To investigate potential use of ChatGPT, a large language model (LLM), in urology by gathering opinions from urologists worldwide.

DESIGN, SETTING, AND PARTICIPANTS

An open web-based survey was distributed via social media and e-mail chains to urologists between April 20, 2023 and May 5, 2023. Participants were asked to answer questions related to their knowledge and experience with artificial intelligence, as well as their opinions of potential use of ChatGPT/LLMs in research and clinical practice.

OUTCOME MEASUREMENTS AND STATISTICAL ANALYSIS

Data are reported as the mean and standard deviation for continuous variables, and the frequency and percentage for categorical variables. Charts and tables are used as appropriate, with descriptions of the chart types and the measures used. The data are reported in accordance with the Checklist for Reporting Results of Internet E-Surveys (CHERRIES).

RESULTS AND LIMITATIONS

A total of 456 individuals completed the survey (64% completion rate). Nearly half (47.7%) reported that they use ChatGPT/LLMs in their academic practice, with fewer using the technology in clinical practice (19.8%). More than half (62.2%) believe there are potential ethical concerns when using ChatGPT for scientific or academic writing, and 53% reported that they have experienced limitations when using ChatGPT in academic practice.

CONCLUSIONS

Urologists recognise the potential of ChatGPT/LLMs in research but have concerns regarding ethics and patient acceptance. There is a desire for regulations and guidelines to ensure appropriate use. In addition, measures should be taken to establish rules and guidelines to maximise safety and efficiency when using this novel technology.

PATIENT SUMMARY

A survey asked 456 urologists from around the world about using an artificial intelligence tool called ChatGPT in their work. Almost half of them use ChatGPT for research, but not many use it for patients care. The resonders think ChatGPT could be helpful, but they worry about problems like ethics and want rules to make sure it's used safely.

Collapse

Affiliation(s)

Michael Eppler USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; AI Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
Conner Ganjavi USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; AI Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
Lorenzo Storino Ramacciotti USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; AI Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
Pietro Piazza Division of Urology, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
Severin Rodler Department of Urology, Klinikum der Universität München, Munich, Germany
Enrico Checcucci Department of Surgery, FPO-IRCCS Candiolo Cancer Institute, Candiolo, Italy
Juan Gomez Rivas Department of Urology, Clinico San Carlos University Hospital, Madrid, Spain
Karl F Kowalewski Department of Urology, University Medical Center Mannheim, Heidelberg University, Mannheim, Germany
Ines Rivero Belenchón Urology and Nephrology Department, Virgen del Rocío University Hospital, Seville, Spain
Stefano Puliatti Urology Department, University of Modena and Reggio Emilia, Modena, Italy
Mark Taratkin Institute for Urology and Reproductive Health, Sechenov University, Moscow, Russia
Alessandro Veccia Department of Urology, Azienda Ospedaliera Universitaria Integrata Verona, Verona, Italy
Loïc Baekelandt Department of Urology, University Hospitals Leuven, Leuven, Belgium
Jeremy Y-C Teoh Department of Surgery, S.H. Ho Urology Centre, The Chinese University of Hong Kong, Hong Kong, China
Bhaskar K Somani University Hospital Southampton NHS Foundation Trust, Southampton, UK
Marcelo Wroclawski Hospital Israelita Albert Einstein, São Paulo, Brazil; Beneficência Portuguesa de São Paulo, São Paulo, Brazil; Faculdade de Medicina do ABC, Santo Andre, Brazil
Andre Abreu USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; AI Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
Francesco Porpiglia Department of Surgery, FPO-IRCCS Candiolo Cancer Institute, Candiolo, Italy
Inderbir S Gill USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; AI Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
Declan G Murphy Division of Cancer Surgery, Peter MacCallum Cancer Centre, University of Melbourne, Melbourne, Australia
David Canes Division of Urology, Lahey Hospital & Medical Center, Burlington, MA, USA
Giovanni E Cacciamani USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; AI Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA.

Collapse

Jain N, Gottlich C, Fisher J, Campano D, Winston T. Assessing ChatGPT's orthopedic in-service training exam performance and applicability in the field. J Orthop Surg Res 2024;19:27. [PMID: 38167093 PMCID: PMC10762835 DOI: 10.1186/s13018-023-04467-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 12/12/2023] [Indexed: 01/05/2024] Open

Abstract

BACKGROUND

ChatGPT has gained widespread attention for its ability to understand and provide human-like responses to inputs. However, few works have focused on its use in Orthopedics. This study assessed ChatGPT's performance on the Orthopedic In-Service Training Exam (OITE) and evaluated its decision-making process to determine whether adoption as a resource in the field is practical.

METHODS

ChatGPT's performance on three OITE exams was evaluated through inputting multiple choice questions. Questions were classified by their orthopedic subject area. Yearly, OITE technical reports were used to gauge scores against resident physicians. ChatGPT's rationales were compared with testmaker explanations using six different groups denoting answer accuracy and logic consistency. Variables were analyzed using contingency table construction and Chi-squared analyses.

RESULTS

Of 635 questions, 360 were useable as inputs (56.7%). ChatGPT-3.5 scored 55.8%, 47.7%, and 54% for the years 2020, 2021, and 2022, respectively. Of 190 correct outputs, 179 provided a consistent logic (94.2%). Of 170 incorrect outputs, 133 provided an inconsistent logic (78.2%). Significant associations were found between test topic and correct answer (p = 0.011), and type of logic used and tested topic (p = < 0.001). Basic Science and Sports had adjusted residuals greater than 1.96. Basic Science and correct, no logic; Basic Science and incorrect, inconsistent logic; Sports and correct, no logic; and Sports and incorrect, inconsistent logic; had adjusted residuals greater than 1.96.

CONCLUSIONS

Based on annual OITE technical reports for resident physicians, ChatGPT-3.5 performed around the PGY-1 level. When answering correctly, it displayed congruent reasoning with testmakers. When answering incorrectly, it exhibited some understanding of the correct answer. It outperformed in Basic Science and Sports, likely due to its ability to output rote facts. These findings suggest that it lacks the fundamental capabilities to be a comprehensive tool in Orthopedic Surgery in its current form.

LEVEL OF EVIDENCE

II.

Collapse

Suárez A, Díaz-Flores García V, Algar J, Gómez Sánchez M, Llorente de Pedro M, Freire Y. Unveiling the ChatGPT phenomenon: Evaluating the consistency and accuracy of endodontic question answers. Int Endod J 2024;57:108-113. [PMID: 37814369 DOI: 10.1111/iej.13985] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Revised: 09/26/2023] [Accepted: 09/27/2023] [Indexed: 10/11/2023]

Riedel M, Kaefinger K, Stuehrenberg A, Ritter V, Amann N, Graf A, Recker F, Klein E, Kiechle M, Riedel F, Meyer B. ChatGPT's performance in German OB/GYN exams - paving the way for AI-enhanced medical education and clinical practice. Front Med (Lausanne) 2023;10:1296615. [PMID: 38155661 PMCID: PMC10753765 DOI: 10.3389/fmed.2023.1296615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 11/27/2023] [Indexed: 12/30/2023] Open

Abstract

Background

Chat Generative Pre-Trained Transformer (ChatGPT) is an artificial learning and large language model tool developed by OpenAI in 2022. It utilizes deep learning algorithms to process natural language and generate responses, which renders it suitable for conversational interfaces. ChatGPT's potential to transform medical education and clinical practice is currently being explored, but its capabilities and limitations in this domain remain incompletely investigated. The present study aimed to assess ChatGPT's performance in medical knowledge competency for problem assessment in obstetrics and gynecology (OB/GYN).

Methods

Two datasets were established for analysis: questions (1) from OB/GYN course exams at a German university hospital and (2) from the German medical state licensing exams. In order to assess ChatGPT's performance, questions were entered into the chat interface, and responses were documented. A quantitative analysis compared ChatGPT's accuracy with that of medical students for different levels of difficulty and types of questions. Additionally, a qualitative analysis assessed the quality of ChatGPT's responses regarding ease of understanding, conciseness, accuracy, completeness, and relevance. Non-obvious insights generated by ChatGPT were evaluated, and a density index of insights was established in order to quantify the tool's ability to provide students with relevant and concise medical knowledge.

Results

ChatGPT demonstrated consistent and comparable performance across both datasets. It provided correct responses at a rate comparable with that of medical students, thereby indicating its ability to handle a diverse spectrum of questions ranging from general knowledge to complex clinical case presentations. The tool's accuracy was partly affected by question difficulty in the medical state exam dataset. Our qualitative assessment revealed that ChatGPT provided mostly accurate, complete, and relevant answers. ChatGPT additionally provided many non-obvious insights, especially in correctly answered questions, which indicates its potential for enhancing autonomous medical learning.

Conclusion

ChatGPT has promise as a supplementary tool in medical education and clinical practice. Its ability to provide accurate and insightful responses showcases its adaptability to complex clinical scenarios. As AI technologies continue to evolve, ChatGPT and similar tools may contribute to more efficient and personalized learning experiences and assistance for health care providers.

Collapse

Enomoto M, Tseng CH, Hsu YC, Thuy LTT, Nguyen MH. Collaborating with AI in literature search-An important frontier. Hepatol Commun 2023;7:e0336. [PMID: 38055656 PMCID: PMC10984654 DOI: 10.1097/hc9.0000000000000336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 10/11/2023] [Indexed: 12/08/2023] Open

Dang F, Samarasena JB. Generative Artificial Intelligence for Gastroenterology: Neither Friend nor Foe. Am J Gastroenterol 2023;118:2146-2147. [PMID: 38033225 DOI: 10.14309/ajg.0000000000002573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 11/02/2023] [Indexed: 12/02/2023]

Ge J, Sun S, Owens J, Galvez V, Gologorskaya O, Lai JC, Pletcher MJ, Lai K. Development of a Liver Disease-Specific Large Language Model Chat Interface using Retrieval Augmented Generation. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.11.10.23298364. [PMID: 37986764 PMCID: PMC10659484 DOI: 10.1101/2023.11.10.23298364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]

Barash Y, Klang E, Konen E, Sorin V. ChatGPT-4 Assistance in Optimizing Emergency Department Radiology Referrals and Imaging Selection. J Am Coll Radiol 2023;20:998-1003. [PMID: 37423350 DOI: 10.1016/j.jacr.2023.06.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 06/19/2023] [Accepted: 06/29/2023] [Indexed: 07/11/2023]

Abstract

PURPOSE

The quality of radiology referrals influences patient management and imaging interpretation by radiologists. The aim of this study was to evaluate ChatGPT-4 as a decision support tool for selecting imaging examinations and generating radiology referrals in the emergency department (ED).

METHODS

Five consecutive clinical notes from the ED were retrospectively extracted, for each of the following pathologies: pulmonary embolism, obstructing kidney stones, acute appendicitis, diverticulitis, small bowel obstruction, acute cholecystitis, acute hip fracture, and testicular torsion. A total of 40 cases were included. These notes were entered into ChatGPT-4, requesting recommendations on the most appropriate imaging examinations and protocols. The chatbot was also asked to generate radiology referrals. Two independent radiologists graded the referral on a scale ranging from 1 to 5 for clarity, clinical relevance, and differential diagnosis. The chatbot's imaging recommendations were compared with the ACR Appropriateness Criteria (AC) and with the examinations performed in the ED. Agreement between readers was assessed using linear weighted Cohen's κ coefficient.

RESULTS

ChatGPT-4's imaging recommendations aligned with the ACR AC and ED examinations in all cases. Protocol discrepancies between ChatGPT and the ACR AC were observed in two cases (5%). ChatGPT-4-generated referrals received mean scores of 4.6 and 4.8 for clarity, 4.5 and 4.4 for clinical relevance, and 4.9 from both reviewers for differential diagnosis. Agreement between readers was moderate for clinical relevance and clarity and substantial for differential diagnosis grading.

CONCLUSIONS

ChatGPT-4 has shown potential in aiding imaging study selection for select clinical cases. As a complementary tool, large language models may improve radiology referral quality. Radiologists should stay informed about this technology and be mindful of potential challenges and risks.

Collapse

Hart SN, Hoffman NG, Gershkovich P, Christenson C, McClintock DS, Miller LJ, Jackups R, Azimi V, Spies N, Brodsky V. Organizational preparedness for the use of large language models in pathology informatics. J Pathol Inform 2023;14:100338. [PMID: 37860713 PMCID: PMC10582733 DOI: 10.1016/j.jpi.2023.100338] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 09/25/2023] [Accepted: 09/28/2023] [Indexed: 10/21/2023] Open

Ge J, Li M, Delk MB, Lai JC. A comparison of large language model versus manual chart review for extraction of data elements from the electronic health record. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.08.31.23294924. [PMID: 37693398 PMCID: PMC10491368 DOI: 10.1101/2023.08.31.23294924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]

Lim ZW, Pushpanathan K, Yew SME, Lai Y, Sun CH, Lam JSH, Chen DZ, Goh JHL, Tan MCJ, Sheng B, Cheng CY, Koh VTC, Tham YC. Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard. EBioMedicine 2023;95:104770. [PMID: 37625267 PMCID: PMC10470220 DOI: 10.1016/j.ebiom.2023.104770] [Citation(s) in RCA: 78] [Impact Index Per Article: 78.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 07/21/2023] [Accepted: 08/08/2023] [Indexed: 08/27/2023] Open

Abstract

BACKGROUND

Large language models (LLMs) are garnering wide interest due to their human-like and contextually relevant responses. However, LLMs' accuracy across specific medical domains has yet been thoroughly evaluated. Myopia is a frequent topic which patients and parents commonly seek information online. Our study evaluated the performance of three LLMs namely ChatGPT-3.5, ChatGPT-4.0, and Google Bard, in delivering accurate responses to common myopia-related queries.

METHODS

We curated thirty-one commonly asked myopia care-related questions, which were categorised into six domains-pathogenesis, risk factors, clinical presentation, diagnosis, treatment and prevention, and prognosis. Each question was posed to the LLMs, and their responses were independently graded by three consultant-level paediatric ophthalmologists on a three-point accuracy scale (poor, borderline, good). A majority consensus approach was used to determine the final rating for each response. 'Good' rated responses were further evaluated for comprehensiveness on a five-point scale. Conversely, 'poor' rated responses were further prompted for self-correction and then re-evaluated for accuracy.

FINDINGS

ChatGPT-4.0 demonstrated superior accuracy, with 80.6% of responses rated as 'good', compared to 61.3% in ChatGPT-3.5 and 54.8% in Google Bard (Pearson's chi-squared test, all p ≤ 0.009). All three LLM-Chatbots showed high mean comprehensiveness scores (Google Bard: 4.35; ChatGPT-4.0: 4.23; ChatGPT-3.5: 4.11, out of a maximum score of 5). All LLM-Chatbots also demonstrated substantial self-correction capabilities: 66.7% (2 in 3) of ChatGPT-4.0's, 40% (2 in 5) of ChatGPT-3.5's, and 60% (3 in 5) of Google Bard's responses improved after self-correction. The LLM-Chatbots performed consistently across domains, except for 'treatment and prevention'. However, ChatGPT-4.0 still performed superiorly in this domain, receiving 70% 'good' ratings, compared to 40% in ChatGPT-3.5 and 45% in Google Bard (Pearson's chi-squared test, all p ≤ 0.001).

INTERPRETATION

Our findings underscore the potential of LLMs, particularly ChatGPT-4.0, for delivering accurate and comprehensive responses to myopia-related queries. Continuous strategies and evaluations to improve LLMs' accuracy remain crucial.

FUNDING

Dr Yih-Chung Tham was supported by the National Medical Research Council of Singapore (NMRC/MOH/HCSAINV21nov-0001).

Collapse

Affiliation(s)

Zhi Wei Lim Yong Loo Lin School of Medicine, National University of Singapore, Singapore
Krithi Pushpanathan Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore
Samantha Min Er Yew Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore
Yien Lai Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
Chen-Hsin Sun Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
Janice Sing Harn Lam Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
David Ziyou Chen Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
Jocelyn Hui Lin Goh Singapore Eye Research Institute, Singapore National Eye Centre, Singapore
Marcus Chun Jin Tan Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
Bin Sheng Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China; Department of Endocrinology and Metabolism, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China; MoE Key Lab of Artificial Intelligence, Artificial Intelligence Institute, Shanghai Jiao Tong University, Shanghai, China
Ching-Yu Cheng Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore; Eye Academic Clinical Program (Eye ACP), Duke NUS Medical School, Singapore
Victor Teck Chang Koh Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
Yih-Chung Tham Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore; Eye Academic Clinical Program (Eye ACP), Duke NUS Medical School, Singapore.

Collapse

Li W, Zhang Y, Chen F. ChatGPT in Colorectal Surgery: A Promising Tool or a Passing Fad? Ann Biomed Eng 2023;51:1892-1897. [PMID: 37162695 DOI: 10.1007/s10439-023-03232-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 05/03/2023] [Indexed: 05/11/2023]

Ge J, Fontil V, Ackerman S, Pletcher MJ, Lai JC. Clinical decision support and electronic interventions to improve care quality in chronic liver diseases and cirrhosis. Hepatology 2023:01515467-990000000-00546. [PMID: 37611253 PMCID: PMC10998693 DOI: 10.1097/hep.0000000000000583] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 07/17/2023] [Indexed: 08/25/2023]

Samaan JS, Yeo YH, Ng WH, Ting PS, Trivedi H, Vipani A, Yang JD, Liran O, Spiegel B, Kuo A, Ayoub WS. ChatGPT's ability to comprehend and answer cirrhosis related questions in Arabic. Arab J Gastroenterol 2023;24:145-148. [PMID: 37673708 DOI: 10.1016/j.ajg.2023.08.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 07/24/2023] [Accepted: 08/18/2023] [Indexed: 09/08/2023]

Abstract

BACKGROUND AND STUDY AIMS

Cirrhosis is a chronic progressive disease which requires complex care. Its incidence is rising in the Arab countries making it the 7th leading cause of death in the Arab League in 2010. ChatGPT is a large language model with a growing body of literature demonstrating its ability to answer clinical questions. We examined ChatGPT's accuracy in responding to cirrhosis related questions in Arabic and compared its performance to English.

MATERIALS AND METHODS

ChatGPTs responses to 91 questions in Arabic and English were graded by a transplant hepatologist fluent in both languages. Accuracy of responses was assessed using the scale: 1. Comprehensive, 2. Correct but inadequate, 3. Mixed with correct and incorrect/outdated data, and 4. Completely incorrect.Accuracy of Arabic compared to English responses was assessed using the scale: 1. Arabic response is more accurate, 2. Similar accuracy, 3. Arabic response is less accurate.

RESULTS

The model provided 22 (24.2%) comprehensive, 44 (48.4%) correct but inadequate, 13 (14.3%) mixed with correct and incorrect/outdated data and 12 (13.2%) completely incorrect Arabic responses. When comparing the accuracy of Arabic and English responses, 9 (9.9%) of the Arabic responses were graded as more accurate, 52 (57.1%) similar in accuracy and 30 (33.0%) as less accurate compared to English.

CONCLUSION

ChatGPT has the potential to serve as an adjunct source of information for Arabic speaking patients with cirrhosis. The model provided correct responses in Arabic to 72.5% of questions, although its performance in Arabic was less accurate than in English. The model produced completely incorrect responses to 13.2% of questions, reinforcing its potential role as an adjunct and not replacement of care by licensed healthcare professionals. Future studies to refine this technology are needed to help Arabic speaking patients with cirrhosis across the globe understand their disease and improve their outcomes.

Collapse

Affiliation(s)

Jamil S Samaan Karsh Division of Gastroenterology and Hepatology, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
Yee Hui Yeo Karsh Division of Gastroenterology and Hepatology, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
Wee Han Ng Bristol Medical School, University of Bristol, Bristol, UK
Peng-Sheng Ting School of Medicine, Tulane University, New Orleans, LA, USA
Hirsh Trivedi Karsh Division of Gastroenterology and Hepatology, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA; Comprehensive Transplant Center, Cedars-Sinai Medical Center, Los Angeles, CA, USA
Aarshi Vipani Karsh Division of Gastroenterology and Hepatology, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
Ju Dong Yang Karsh Division of Gastroenterology and Hepatology, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA; Comprehensive Transplant Center, Cedars-Sinai Medical Center, Los Angeles, CA, USA; Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
Omer Liran Department of Psychiatry and Behavioral Sciences, Cedars-Sinai, Los Angeles, CA, USA; Division of Health Services Research, Department of Medicine, Cedars-Sinai, Los Angeles, CA, USA
Brennan Spiegel Karsh Division of Gastroenterology and Hepatology, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA; Division of Health Services Research, Department of Medicine, Cedars-Sinai, Los Angeles, CA, USA
Alexander Kuo Karsh Division of Gastroenterology and Hepatology, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA; Comprehensive Transplant Center, Cedars-Sinai Medical Center, Los Angeles, CA, USA
Walid S Ayoub Karsh Division of Gastroenterology and Hepatology, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA; Comprehensive Transplant Center, Cedars-Sinai Medical Center, Los Angeles, CA, USA.

Collapse

Lahat A, Shachar E, Avidan B, Glicksberg B, Klang E. Evaluating the Utility of a Large Language Model in Answering Common Patients' Gastrointestinal Health-Related Questions: Are We There Yet? Diagnostics (Basel) 2023;13:diagnostics13111950. [PMID: 37296802 DOI: 10.3390/diagnostics13111950] [Citation(s) in RCA: 33] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/28/2023] [Accepted: 06/01/2023] [Indexed: 06/12/2023] Open

Javan R, Kim T, Mostaghni N, Sarin S. ChatGPT's Potential Role in Interventional Radiology. Cardiovasc Intervent Radiol 2023:10.1007/s00270-023-03448-4. [PMID: 37127733 DOI: 10.1007/s00270-023-03448-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Accepted: 04/12/2023] [Indexed: 05/03/2023]

Oanh NK, Na BK, Yoo WG. The Potential Breakthroughs with ChatGPT in Parasitology. IRANIAN JOURNAL OF PARASITOLOGY 2023;18:275-278. [PMID: 37583636 PMCID: PMC10423911 DOI: 10.18502/ijpa.v18i2.13197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Accepted: 05/26/2023] [Indexed: 08/17/2023]