Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zhou Z, Wang X, Li X, Liao L. Is ChatGPT an Evidence-based Doctor? Eur Urol 2023;84:355-356. [PMID: 37061445 DOI: 10.1016/j.eururo.2023.03.037] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Accepted: 03/31/2023] [Indexed: 04/17/2023]

For:	Zhou Z, Wang X, Li X, Liao L. Is ChatGPT an Evidence-based Doctor? Eur Urol 2023;84:355-356. [PMID: 37061445 DOI: 10.1016/j.eururo.2023.03.037] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Accepted: 03/31/2023] [Indexed: 04/17/2023]

Number

Cited by Other Article(s)

Song Y, Xu T. Letter to the editor for the article "Performance of ChatGPT-3.5 and ChatGPT-4 on the European Board of Urology (EBU) exams: a comparative analysis". World J Urol 2024;42:555. [PMID: 39361038 DOI: 10.1007/s00345-024-05256-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2024] [Accepted: 09/01/2024] [Indexed: 10/05/2024] Open

Wu Z, Gan W, Xue Z, Ni Z, Zheng X, Zhang Y. Performance of ChatGPT on Nursing Licensure Examinations in the United States and China: Cross-Sectional Study. JMIR MEDICAL EDUCATION 2024;10:e52746. [PMID: 39363539 PMCID: PMC11466054 DOI: 10.2196/52746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 06/12/2024] [Accepted: 06/15/2024] [Indexed: 10/05/2024]

Abstract

Background

The creation of large language models (LLMs) such as ChatGPT is an important step in the development of artificial intelligence, which shows great potential in medical education due to its powerful language understanding and generative capabilities. The purpose of this study was to quantitatively evaluate and comprehensively analyze ChatGPT's performance in handling questions for the National Nursing Licensure Examination (NNLE) in China and the United States, including the National Council Licensure Examination for Registered Nurses (NCLEX-RN) and the NNLE.

Objective

This study aims to examine how well LLMs respond to the NCLEX-RN and the NNLE multiple-choice questions (MCQs) in various language inputs. To evaluate whether LLMs can be used as multilingual learning assistance for nursing, and to assess whether they possess a repository of professional knowledge applicable to clinical nursing practice.

Methods

First, we compiled 150 NCLEX-RN Practical MCQs, 240 NNLE Theoretical MCQs, and 240 NNLE Practical MCQs. Then, the translation function of ChatGPT 3.5 was used to translate NCLEX-RN questions from English to Chinese and NNLE questions from Chinese to English. Finally, the original version and the translated version of the MCQs were inputted into ChatGPT 4.0, ChatGPT 3.5, and Google Bard. Different LLMs were compared according to the accuracy rate, and the differences between different language inputs were compared.

Results

The accuracy rates of ChatGPT 4.0 for NCLEX-RN practical questions and Chinese-translated NCLEX-RN practical questions were 88.7% (133/150) and 79.3% (119/150), respectively. Despite the statistical significance of the difference (P=.03), the correct rate was generally satisfactory. Around 71.9% (169/235) of NNLE Theoretical MCQs and 69.1% (161/233) of NNLE Practical MCQs were correctly answered by ChatGPT 4.0. The accuracy of ChatGPT 4.0 in processing NNLE Theoretical MCQs and NNLE Practical MCQs translated into English was 71.5% (168/235; P=.92) and 67.8% (158/233; P=.77), respectively, and there was no statistically significant difference between the results of text input in different languages. ChatGPT 3.5 (NCLEX-RN P=.003, NNLE Theoretical P<.001, NNLE Practical P=.12) and Google Bard (NCLEX-RN P<.001, NNLE Theoretical P<.001, NNLE Practical P<.001) had lower accuracy rates for nursing-related MCQs than ChatGPT 4.0 in English input. English accuracy was higher when compared with ChatGPT 3.5's Chinese input, and the difference was statistically significant (NCLEX-RN P=.02, NNLE Practical P=.02). Whether submitted in Chinese or English, the MCQs from the NCLEX-RN and NNLE demonstrated that ChatGPT 4.0 had the highest number of unique correct responses and the lowest number of unique incorrect responses among the 3 LLMs.

Conclusions

This study, focusing on 618 nursing MCQs including NCLEX-RN and NNLE exams, found that ChatGPT 4.0 outperformed ChatGPT 3.5 and Google Bard in accuracy. It excelled in processing English and Chinese inputs, underscoring its potential as a valuable tool in nursing education and clinical decision-making.

Collapse

Demirci A. A Comparison of ChatGPT and Human Questionnaire Evaluations of the Urological Cancer Videos Most Watched on YouTube. Clin Genitourin Cancer 2024;22:102145. [PMID: 39033711 DOI: 10.1016/j.clgc.2024.102145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 06/21/2024] [Accepted: 06/22/2024] [Indexed: 07/23/2024]

Halawani A, Almehmadi SG, Alhubaishy BA, Alnefaie ZA, Hasan MN. Empowering patients: how accurate and readable are large language models in renal cancer education. Front Oncol 2024;14:1457516. [PMID: 39391252 PMCID: PMC11464325 DOI: 10.3389/fonc.2024.1457516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Accepted: 09/09/2024] [Indexed: 10/12/2024] Open

Abstract

Background

The incorporation of Artificial Intelligence (AI) into healthcare sector has fundamentally transformed patient care paradigms, particularly through the creation of patient education materials (PEMs) tailored to individual needs. This Study aims to assess the precision and readability AI-generated information on kidney cancer using ChatGPT 4.0, Gemini AI, and Perplexity AI., comparing these outputs to PEMs provided by the American Urological Association (AUA) and the European Association of Urology (EAU). The objective is to guide physicians in directing patients to accurate and understandable resources.

Methods

PEMs published by AUA and EAU were collected and categorized. kidney cancer-related queries, identified via Google Trends (GT), were input into CahtGPT-4.0, Gemini AI, and Perplexity AI. Four independent reviewers assessed the AI outputs for accuracy grounded on five distinct categories, employing a 5-point Likert scale. A readability evaluation was conducted utilizing established formulas, including Gunning Fog Index (GFI), Simple Measure of Gobbledygook (SMOG), and Flesch-Kincaid Grade Formula (FKGL). AI chatbots were then tasked with simplifying their outputs to achieve a sixth-grade reading level.

Results

The PEM published by the AUA was the most readable with a mean readability score of 9.84 ± 1.2, in contrast to EAU (11.88 ± 1.11), ChatGPT-4.0 (11.03 ± 1.76), Perplexity AI (12.66 ± 1.83), and Gemini AI (10.83 ± 2.31). The Chatbots demonstrated the capability to simplify text lower grade levels upon request, with ChatGPT-4.0 achieving a readability grade level ranging from 5.76 to 9.19, Perplexity AI from 7.33 to 8.45, Gemini AI from 6.43 to 8.43. While official PEMS were considered accurate, the LLMs generated outputs exhibited an overall high level of accuracy with minor detail omission and some information inaccuracies. Information related to kidney cancer treatment was found to be the least accurate among the evaluated categories.

Conclusion

Although the PEM published by AUA being the most readable, both authoritative PEMs and Large Language Models (LLMs) generated outputs exceeded the recommended readability threshold for general population. AI Chatbots can simplify their outputs when explicitly instructed. However, notwithstanding their accuracy, LLMs-generated outputs are susceptible to detail omission and inaccuracies. The variability in AI performance necessitates cautious use as an adjunctive tool in patient education.

Collapse

Hashemi S, Karbalaei M, Keikha M. Comments on "Performance of ChatGPT in Answering Clinical Questions on the Practical Guideline of Blepharoptosis". Aesthetic Plast Surg 2024:10.1007/s00266-024-04320-7. [PMID: 39120728 DOI: 10.1007/s00266-024-04320-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Accepted: 08/01/2024] [Indexed: 08/10/2024]

Şahin B, Emre Genç Y, Doğan K, Emre Şener T, Şekerci ÇA, Tanıdır Y, Yücel S, Tarcan T, Çam HK. Evaluating the Performance of ChatGPT in Urology: A Comparative Study of Knowledge Interpretation and Patient Guidance. J Endourol 2024;38:799-808. [PMID: 38815140 DOI: 10.1089/end.2023.0413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024] Open

Abstract

Background/Aim: To evaluate the performance of Chat Generative Pre-trained Transformer (ChatGPT), a large language model trained by Open artificial intelligence. Materials and Methods: This study has three main steps to evaluate the effectiveness of ChatGPT in the urologic field. The first step involved 35 questions from our institution's experts, who have at least 10 years of experience in their fields. The responses of ChatGPT versions were qualitatively compared with the responses of urology residents to the same questions. The second step assesses the reliability of ChatGPT versions in answering current debate topics. The third step was to assess the reliability of ChatGPT versions in providing medical recommendations and directives to patients' commonly asked questions during the outpatient and inpatient clinic. Results: In the first step, version 4 provided correct answers to 25 questions out of 35 while version 3.5 provided only 19 (71.4% vs 54%). It was observed that residents in their last year of education in our clinic also provided a mean of 25 correct answers, and 4th year residents provided a mean of 19.3 correct responses. The second step involved evaluating the response of both versions to debate situations in urology, and it was found that both versions provided variable and inappropriate results. In the last step, both versions had a similar success rate in providing recommendations and guidance to patients based on expert ratings. Conclusion: The difference between the two versions of the 35 questions in the first step of the study was thought to be due to the improvement of ChatGPT's literature and data synthesis abilities. It may be a logical approach to use ChatGPT versions to inform the nonhealth care providers' questions with quick and safe answers but should not be used to as a diagnostic tool or make a choice among different treatment modalities.

Collapse

Pompili D, Richa Y, Collins P, Richards H, Hennessey DB. Using artificial intelligence to generate medical literature for urology patients: a comparison of three different large language models. World J Urol 2024;42:455. [PMID: 39073590 PMCID: PMC11286728 DOI: 10.1007/s00345-024-05146-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Accepted: 06/23/2024] [Indexed: 07/30/2024] Open

Schoch J, Schmelz HU, Strauch A, Borgmann H, Nestler T. Performance of ChatGPT-3.5 and ChatGPT-4 on the European Board of Urology (EBU) exams: a comparative analysis. World J Urol 2024;42:445. [PMID: 39060792 DOI: 10.1007/s00345-024-05137-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Accepted: 06/17/2024] [Indexed: 07/28/2024] Open

Wu H, Sun Z, Guo Q, Li C. The rapid growth of ChatGPT-related publications: A call for international guidelines. Asian J Surg 2024:S1015-9584(24)01246-6. [PMID: 38942627 DOI: 10.1016/j.asjsur.2024.06.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Accepted: 06/13/2024] [Indexed: 06/30/2024] Open

Zhu L, Rong Y, McGee LA, Rwigema JCM, Patel SH. Testing and Validation of a Custom Retrained Large Language Model for the Supportive Care of HN Patients with External Knowledge Base. Cancers (Basel) 2024;16:2311. [PMID: 39001375 PMCID: PMC11240646 DOI: 10.3390/cancers16132311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 06/17/2024] [Accepted: 06/18/2024] [Indexed: 07/16/2024] Open

Cakir H, Caglar U, Sekkeli S, Zerdali E, Sarilar O, Yildiz O, Ozgor F. Evaluating ChatGPT ability to answer urinary tract Infection-Related questions. Infect Dis Now 2024;54:104884. [PMID: 38460761 DOI: 10.1016/j.idnow.2024.104884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 02/20/2024] [Accepted: 03/05/2024] [Indexed: 03/11/2024]

Abstract

INTRODUCTION

For the first time, the accuracy and proficiency of ChatGPT answers on urogenital tract infection (UTIs) were evaluated.

METHODS

The study aimed to create two lists of questions: frequently asked questions (FAQs, public-based inquiries) on relevant topics, and questions based on guideline information (guideline-based inquiries). ChatGPT responses to FAQs and scientific questions were scored by two urologists and an infectious disease specialist. Quality and reliability of all ChatGPT answers were checked using the Global Quality Score (GQS). The reproducibility of ChatGPT answers was analyzed by asking each question twice.

RESULTS

All in all, 96.2 % of FAQs (75/78 inquiries) related to UTIs were correctly and adequately answered by ChatGPT, and scored GQS 5. None of the ChatGPT answers were classified as GQS 2 and GQS 1. Moreover, FAQs about cystitis, urethritis, and epididymo-orchitis were answered by ChatGPT with 100 % accuracy (GQS 5). ChatGPT answers for EAU urological infections guidelines showed that 61 (89.7 %), 5 (7.4 %), and 2 (2.9 %) ChatGPT responses were scored GQS 5, GQS 4, and GQS 3, respectively. None of the ChatGPT responses for EAU urological infections guidelines were categorized as GQS 2 and GQS 1. Comparison of mean GQS values of ChatGPT answers for FAQs and EAU urological guideline questions showed that ChatGPT was similarly able to respond to both question groups (p = 0.168). The ChatGPT response reproducibility rate was highest for the FAQ subgroups of cystitis, urethritis, and epididymo-orchitis (100 % for each subgroup).

CONCLUSION

The present study showed that ChatGPT gave accurate and satisfactory answers for both public-based inquiries, and EAU urological infection guideline-based questions. Reproducibility of ChatGPT answers exceeded 90% for both FAQs and scientific questions.

Collapse

Perrot O, Schirmann A, Vidart A, Guillot-Tantay C, Izard V, Lebret T, Boillot B, Mesnard B, Lebacle C, Madec FX. Chatbots vs andrologists: Testing 25 clinical cases. THE FRENCH JOURNAL OF UROLOGY 2024;34:102636. [PMID: 38599321 DOI: 10.1016/j.fjurol.2024.102636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 04/02/2024] [Indexed: 04/12/2024]

Shah YB, Goldberg ZN, Harness ED, Nash DB. Charting a Path to the Quintuple Aim: Harnessing AI to Address Social Determinants of Health. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2024;21:718. [PMID: 38928964 PMCID: PMC11203467 DOI: 10.3390/ijerph21060718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Revised: 05/29/2024] [Accepted: 05/31/2024] [Indexed: 06/28/2024]

Gwon YN, Kim JH, Chung HS, Jung EJ, Chun J, Lee S, Shim SR. The Use of Generative AI for Scientific Literature Searches for Systematic Reviews: ChatGPT and Microsoft Bing AI Performance Evaluation. JMIR Med Inform 2024;12:e51187. [PMID: 38771247 PMCID: PMC11107769 DOI: 10.2196/51187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 03/31/2024] [Accepted: 04/04/2024] [Indexed: 05/22/2024] Open

Abstract

Background

A large language model is a type of artificial intelligence (AI) model that opens up great possibilities for health care practice, research, and education, although scholars have emphasized the need to proactively address the issue of unvalidated and inaccurate information regarding its use. One of the best-known large language models is ChatGPT (OpenAI). It is believed to be of great help to medical research, as it facilitates more efficient data set analysis, code generation, and literature review, allowing researchers to focus on experimental design as well as drug discovery and development.

Objective

This study aims to explore the potential of ChatGPT as a real-time literature search tool for systematic reviews and clinical decision support systems, to enhance their efficiency and accuracy in health care settings.

Methods

The search results of a published systematic review by human experts on the treatment of Peyronie disease were selected as a benchmark, and the literature search formula of the study was applied to ChatGPT and Microsoft Bing AI as a comparison to human researchers. Peyronie disease typically presents with discomfort, curvature, or deformity of the penis in association with palpable plaques and erectile dysfunction. To evaluate the quality of individual studies derived from AI answers, we created a structured rating system based on bibliographic information related to the publications. We classified its answers into 4 grades if the title existed: A, B, C, and F. No grade was given for a fake title or no answer.

Results

From ChatGPT, 7 (0.5%) out of 1287 identified studies were directly relevant, whereas Bing AI resulted in 19 (40%) relevant studies out of 48, compared to the human benchmark of 24 studies. In the qualitative evaluation, ChatGPT had 7 grade A, 18 grade B, 167 grade C, and 211 grade F studies, and Bing AI had 19 grade A and 28 grade C studies.

Conclusions

This is the first study to compare AI and conventional human systematic review methods as a real-time literature collection tool for evidence-based medicine. The results suggest that the use of ChatGPT as a tool for real-time evidence generation is not yet accurate and feasible. Therefore, researchers should be cautious about using such AI. The limitations of this study using the generative pre-trained transformer model are that the search for research topics was not diverse and that it did not prevent the hallucination of generative AI. However, this study will serve as a standard for future studies by providing an index to verify the reliability and consistency of generative AI from a user's point of view. If the reliability and consistency of AI literature search services are verified, then the use of these technologies will help medical research greatly.

Collapse

Hershenhouse JS, Mokhtar D, Eppler MB, Rodler S, Storino Ramacciotti L, Ganjavi C, Hom B, Davis RJ, Tran J, Russo GI, Cocci A, Abreu A, Gill I, Desai M, Cacciamani GE. Accuracy, readability, and understandability of large language models for prostate cancer information to the public. Prostate Cancer Prostatic Dis 2024:10.1038/s41391-024-00826-y. [PMID: 38744934 DOI: 10.1038/s41391-024-00826-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 03/14/2024] [Accepted: 03/26/2024] [Indexed: 05/16/2024]

Abstract

BACKGROUND

Generative Pretrained Model (GPT) chatbots have gained popularity since the public release of ChatGPT. Studies have evaluated the ability of different GPT models to provide information about medical conditions. To date, no study has assessed the quality of ChatGPT outputs to prostate cancer related questions from both the physician and public perspective while optimizing outputs for patient consumption.

METHODS

Nine prostate cancer-related questions, identified through Google Trends (Global), were categorized into diagnosis, treatment, and postoperative follow-up. These questions were processed using ChatGPT 3.5, and the responses were recorded. Subsequently, these responses were re-inputted into ChatGPT to create simplified summaries understandable at a sixth-grade level. Readability of both the original ChatGPT responses and the layperson summaries was evaluated using validated readability tools. A survey was conducted among urology providers (urologists and urologists in training) to rate the original ChatGPT responses for accuracy, completeness, and clarity using a 5-point Likert scale. Furthermore, two independent reviewers evaluated the layperson summaries on correctness trifecta: accuracy, completeness, and decision-making sufficiency. Public assessment of the simplified summaries' clarity and understandability was carried out through Amazon Mechanical Turk (MTurk). Participants rated the clarity and demonstrated their understanding through a multiple-choice question.

RESULTS

GPT-generated output was deemed correct by 71.7% to 94.3% of raters (36 urologists, 17 urology residents) across 9 scenarios. GPT-generated simplified layperson summaries of this output was rated as accurate in 8 of 9 (88.9%) scenarios and sufficient for a patient to make a decision in 8 of 9 (88.9%) scenarios. Mean readability of layperson summaries was higher than original GPT outputs ([original ChatGPT v. simplified ChatGPT, mean (SD), p-value] Flesch Reading Ease: 36.5(9.1) v. 70.2(11.2), <0.0001; Gunning Fog: 15.8(1.7) v. 9.5(2.0), p < 0.0001; Flesch Grade Level: 12.8(1.2) v. 7.4(1.7), p < 0.0001; Coleman Liau: 13.7(2.1) v. 8.6(2.4), 0.0002; Smog index: 11.8(1.2) v. 6.7(1.8), <0.0001; Automated Readability Index: 13.1(1.4) v. 7.5(2.1), p < 0.0001). MTurk workers (n = 514) rated the layperson summaries as correct (89.5-95.7%) and correctly understood the content (63.0-87.4%).

CONCLUSION

GPT shows promise for correct patient education for prostate cancer-related contents, but the technology is not designed for delivering patients information. Prompting the model to respond with accuracy, completeness, clarity and readability may enhance its utility when used for GPT-powered medical chatbots.

Collapse

Affiliation(s)

Jacob S Hershenhouse USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
Daniel Mokhtar USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
Michael B Eppler USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
Severin Rodler USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
Lorenzo Storino Ramacciotti USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
Conner Ganjavi USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
Brian Hom USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
Ryan J Davis USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
John Tran USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
Giorgio Ivan Russo Urology Section, University of Catania, Catania, Italy
Andrea Cocci Urology Section, University of Florence, Florence, Italy
Andre Abreu USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
Inderbir Gill USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
Mihir Desai USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
Giovanni E Cacciamani USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA. Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA.

Collapse

Ozgor BY, Simavi MA. Accuracy and reproducibility of ChatGPT's free version answers about endometriosis. Int J Gynaecol Obstet 2024;165:691-695. [PMID: 38108232 DOI: 10.1002/ijgo.15309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 11/27/2023] [Accepted: 12/04/2023] [Indexed: 12/19/2023]

Cocci A, Pezzoli M, Lo Re M, Russo GI, Asmundo MG, Fode M, Cacciamani G, Cimino S, Minervini A, Durukan E. Quality of information and appropriateness of ChatGPT outputs for urology patients. Prostate Cancer Prostatic Dis 2024;27:103-108. [PMID: 37516804 DOI: 10.1038/s41391-023-00705-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 06/22/2023] [Accepted: 07/17/2023] [Indexed: 07/31/2023]

Abstract

BACKGROUND

The proportion of health-related searches on the internet is continuously growing. ChatGPT, a natural language processing (NLP) tool created by OpenAI, has been gaining increasing user attention and can potentially be used as a source for obtaining information related to health concerns. This study aims to analyze the quality and appropriateness of ChatGPT's responses to Urology case studies compared to those of a urologist.

METHODS

Data from 100 patient case studies, comprising patient demographics, medical history, and urologic complaints, were sequentially inputted into ChatGPT, one by one. A question was posed to determine the most likely diagnosis, suggested examinations, and treatment options. The responses generated by ChatGPT were then compared to those provided by a board-certified urologist who was blinded to ChatGPT's responses and graded on a 5-point Likert scale based on accuracy, comprehensiveness, and clarity as criterias for appropriateness. The quality of information was graded based on the section 2 of the DISCERN tool and readability assessments were performed using the Flesch Reading Ease (FRE) and Flesch-Kincaid Reading Grade Level (FKGL) formulas.

RESULTS

52% of all responses were deemed appropriate. ChatGPT provided more appropriate responses for non-oncology conditions (58.5%) compared to oncology (52.6%) and emergency urology cases (11.1%) (p = 0.03). The median score of the DISCERN tool was 15 (IQR = 5.3) corresponding to a quality score of poor. The ChatGPT responses demonstrated a college graduate reading level, as indicated by the median FRE score of 18 (IQR = 21) and the median FKGL score of 15.8 (IQR = 3).

CONCLUSIONS

ChatGPT serves as an interactive tool for providing medical information online, offering the possibility of enhancing health outcomes and patient satisfaction. Nevertheless, the insufficient appropriateness and poor quality of the responses on Urology cases emphasizes the importance of thorough evaluation and use of NLP-generated outputs when addressing health-related concerns.

Collapse

Peng S, Wang D, Liang Y, Xiao W, Zhang Y, Liu L. AI-ChatGPT/GPT-4: An Booster for the Development of Physical Medicine and Rehabilitation in the New Era! Ann Biomed Eng 2024;52:462-466. [PMID: 37500980 PMCID: PMC10859338 DOI: 10.1007/s10439-023-03314-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 07/06/2023] [Indexed: 07/29/2023]

Abi-Rafeh J, Xu HH, Kazan R, Tevlin R, Furnas H. Large Language Models and Artificial Intelligence: A Primer for Plastic Surgeons on the Demonstrated and Potential Applications, Promises, and Limitations of ChatGPT. Aesthet Surg J 2024;44:329-343. [PMID: 37562022 DOI: 10.1093/asj/sjad260] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 08/02/2023] [Accepted: 08/04/2023] [Indexed: 08/12/2023] Open

Weng SX, Zheng HH, Yu QX, Wang JC. A commentary on'Re: Is ChatGPT a qualified thoracic surgeon assistant?'. Int J Surg 2024;110:1287-1288. [PMID: 38016293 PMCID: PMC10871630 DOI: 10.1097/js9.0000000000000880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Accepted: 10/20/2023] [Indexed: 11/30/2023]

Khene ZE, Bigot P, Mathieu R, Rouprêt M, Bensalah K. Development of a Personalized Chat Model Based on the European Association of Urology Oncology Guidelines: Harnessing the Power of Generative Artificial Intelligence in Clinical Practice. Eur Urol Oncol 2024;7:160-162. [PMID: 37474402 DOI: 10.1016/j.euo.2023.06.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 06/22/2023] [Accepted: 06/28/2023] [Indexed: 07/22/2023]

May M, Körner-Riffard K, Marszalek M, Eredics K. Would Uro_Chat, a Newly Developed Generative Artificial Intelligence Large Language Model, Have Successfully Passed the In-Service Assessment Questions of the European Board of Urology in 2022? Eur Urol Oncol 2024;7:155-156. [PMID: 37716835 DOI: 10.1016/j.euo.2023.08.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 08/29/2023] [Indexed: 09/18/2023]

Choi J, Kim JW, Lee YS, Tae JH, Choi SY, Chang IH, Kim JH. Availability of ChatGPT to provide medical information for patients with kidney cancer. Sci Rep 2024;14:1542. [PMID: 38233511 PMCID: PMC10794224 DOI: 10.1038/s41598-024-51531-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 01/06/2024] [Indexed: 01/19/2024] Open

Ferreira RM. New evidence-based practice: Artificial intelligence as a barrier breaker. World J Methodol 2023;13:384-389. [PMID: 38229944 PMCID: PMC10789101 DOI: 10.5662/wjm.v13.i5.384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 10/24/2023] [Accepted: 11/08/2023] [Indexed: 12/20/2023] Open

Huo B, Cacciamani GE, Collins GS, McKechnie T, Lee Y, Guyatt G. Reporting standards for the use of large language model-linked chatbots for health advice. Nat Med 2023;29:2988. [PMID: 37957381 DOI: 10.1038/s41591-023-02656-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]

Yu QX, Wu RC, Feng DC, Li DX. Re: ChatGPT encounters multiple opportunities and challenges in neurosurgery. Int J Surg 2023;109:4393-4394. [PMID: 37720947 PMCID: PMC10720816 DOI: 10.1097/js9.0000000000000749] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 08/25/2023] [Indexed: 09/19/2023]

Rodler S, Kopliku R, Ulrich D, Kaltenhauser A, Casuscelli J, Eismann L, Waidelich R, Buchner A, Butz A, Cacciamani GE, Stief CG, Westhofen T. Patients' Trust in Artificial Intelligence-based Decision-making for Localized Prostate Cancer: Results from a Prospective Trial. Eur Urol Focus 2023:S2405-4569(23)00237-7. [PMID: 37923632 DOI: 10.1016/j.euf.2023.10.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 10/04/2023] [Accepted: 10/21/2023] [Indexed: 11/07/2023]

Abstract

BACKGROUND

Artificial intelligence (AI) has the potential to enhance diagnostic accuracy and improve treatment outcomes. However, AI integration into clinical workflows and patient perspectives remain unclear.

OBJECTIVE

To determine patients' trust in AI and their perception of urologists relying on AI, and future diagnostic and therapeutic AI applications for patients.

DESIGN, SETTING, AND PARTICIPANTS

A prospective trial was conducted involving patients who received diagnostic or therapeutic interventions for prostate cancer (PC).

INTERVENTION

Patients were asked to complete a survey before magnetic resonance imaging, prostate biopsy, or radical prostatectomy.

OUTCOME MEASUREMENTS AND STATISTICAL ANALYSIS

The primary outcome was patient trust in AI. Secondary outcomes were the choice of AI in treatment settings and traits attributed to AI and urologists.

RESULTS AND LIMITATIONS

Data for 466 patients were analyzed. The cumulative affinity for technology was positively correlated with trust in AI (correlation coefficient 0.094; p = 0.04), whereas patient age, level of education, and subjective perception of illness were not (p > 0.05). The mean score (± standard deviation) for trust in capability was higher for physicians than for AI for responding in an individualized way when communicating a diagnosis (4.51 ± 0.76 vs 3.38 ± 1.07; mean difference [MD] 1.130, 95% confidence interval [CI] 1.010-1.250; t924 = 18.52, p < 0.001; Cohen's d = 1.040) and for explaining information in an understandable way (4.57 ± vs 3.18 ± 1.09; MD 1.392, 95% CI 1.275-1.509; t921 = 27.27, p < 0.001; Cohen's d = 1.216). Patients stated that they had higher trust in a diagnosis made by AI controlled by a physician versus AI not controlled by a physician (4.31 ± 0.88 vs 1.75 ± 0.93; MD 2.561, 95% CI 2.444-2.678; t925 = 42.89, p < 0.001; Cohen's d = 2.818). AI-assisted physicians (66.74%) were preferred over physicians alone (29.61%), physicians controlled by AI (2.36%), and AI alone (0.64%) for treatment in the current clinical scenario.

CONCLUSIONS

Trust in future diagnostic and therapeutic AI-based treatment relies on optimal integration with urologists as the human-machine interface to leverage human and AI capabilities.

PATIENT SUMMARY

Artificial intelligence (AI) will play a role in diagnostic decisions in prostate cancer in the future. At present, patients prefer AI-assisted urologists over urologists alone, AI alone, and AI-controlled urologists. Specific traits of AI and urologists could be used to optimize diagnosis and treatment for patients with prostate cancer.

Collapse

Liu J, Zheng J, Cai X, Wu D, Yin C. A descriptive study based on the comparison of ChatGPT and evidence-based neurosurgeons. iScience 2023;26:107590. [PMID: 37705958 PMCID: PMC10495632 DOI: 10.1016/j.isci.2023.107590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 06/21/2023] [Accepted: 08/04/2023] [Indexed: 09/15/2023] Open

Zhang L, Tashiro S, Mukaino M, Yamada S. Use of artificial intelligence large language models as a clinical tool in rehabilitation medicine: a comparative test case. J Rehabil Med 2023;55:jrm13373. [PMID: 37691497 PMCID: PMC10501385 DOI: 10.2340/jrm.v55.13373] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Accepted: 07/05/2023] [Indexed: 09/12/2023] Open

Peng L, Liang R, Yi F, Zhang S, Wu S. Re: Zhonghan Zhou, Xuesheng Wang, Xunhua Li, Limin Liao. Is ChatGPT an Evidence-based Doctor? Eur Urol. 2023;84:355-6. Eur Urol 2023;84:e76. [PMID: 37271634 DOI: 10.1016/j.eururo.2023.04.042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 04/27/2023] [Indexed: 06/06/2023]

Cinar C. Analyzing the Performance of ChatGPT About Osteoporosis. Cureus 2023;15:e45890. [PMID: 37885522 PMCID: PMC10599213 DOI: 10.7759/cureus.45890] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/24/2023] [Indexed: 10/28/2023] Open

Kim SH, Tae JH, Chang IH, Kim TH, Myung SC, Nguyen TT, Choi J, Kim JH, Kim JW, Lee YS, Choi SY. Changes in patient perceptions regarding ChatGPT-written explanations on lifestyle modifications for preventing urolithiasis recurrence. Digit Health 2023;9:20552076231203940. [PMID: 37780059 PMCID: PMC10540569 DOI: 10.1177/20552076231203940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 09/09/2023] [Indexed: 10/03/2023] Open

Abstract

Purpose

Artificial Intelligence (AI) imitating human-like language, such as ChatGPT, has impacted lives throughout various multidisciplinary fields. However, despite these innovations, it is unclear how well its implementation will assist patients in clinical situations. We evaluated changes in patient perceptions regarding AI before and after reading a ChatGPT-written explanation.

Materials and methods

In total, 24 South Korean patients receiving urolithiasis treatment were surveyed through questionnaires. The ChatGPT explanatory note was provided between the first and second questionnaires, detailing lifestyle modifications for preventing urolithiasis recurrence. The study questionnaire was the Korean version of the General Attitudes toward Artificial Intelligence Scale, including positive and negative attitude items. Wilcoxon signed-rank tests were accomplished to compare questionnaire scores before and after receiving the explanatory note. A linear regression analysis with stepwise elimination was used to assess variable (demographic data) accuracy in predicting outcomes.

Results

There were significant differences between total negative questionnaire scores pre- and post-surveys of ChatGPT, but not in the positive scores. Among variables, only education level significantly influenced mean score differences in the negative questionnaires.

Conclusions

The negative perception change among urolithiasis patients after receiving the explanatory note provided by the AI chatbot program was observed, evidencing that patients with lower education levels expressed a more negative response. The explanatory note provided by the AI chatbot program could provoke an adverse change in AI perception. Negative human responses must be considered to improve and adapt new technology in health care. Only through changing patient perspectives will upgraded AI technology integrate into medical healthcare.

Collapse