1
|
Baumgärtner K, Byczkowski M, Schmid T, Muschko M, Woessner P, Gerlach A, Bonekamp D, Schlemmer HP, Hohenfellner M, Görtz M. Effectiveness of the Medical Chatbot PROSCA to Inform Patients About Prostate Cancer: Results of a Randomized Controlled Trial. EUR UROL SUPPL 2024; 69:80-88. [PMID: 39329071 PMCID: PMC11424957 DOI: 10.1016/j.euros.2024.08.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/30/2024] [Indexed: 09/28/2024] Open
Abstract
Background and objective Artificial intelligence (AI)-powered conversational agents are increasingly finding application in health care, as these can provide patient education at any time. However, their effectiveness in medical settings remains largely unexplored. This study aimed to assess the impact of the chatbot "PROState cancer Conversational Agent" (PROSCA), which was trained to provide validated support from diagnostic tests to treatment options for men facing prostate cancer (PC) diagnosis. Methods The chatbot PROSCA, developed by urologists at Heidelberg University Hospital and SAP SE, was evaluated through a randomized controlled trial (RCT). Patients were assigned to either the chatbot group, receiving additional access to PROSCA alongside standard information by urologists, or the control group (1:1), receiving standard information. A total of 112 men were included, of whom 103 gave feedback at study completion. Key findings and limitations Over time, patients' information needs decreased significantly more in the chatbot group than in the control group (p = 0.035). In the chatbot group, 43/54 men (79.6%) used PROSCA, and all of them found it easy to use. Of the men, 71.4% agreed that the chatbot improved their informedness about PC and 90.7% would like to use PROSCA again. Limitations are study sample size, single-center design, and specific clinical application. Conclusions and clinical implications With the introduction of the PROSCA chatbot, we created and evaluated an innovative, evidence-based AI health information tool as an additional source of information for PC. Our RCT results showed significant benefits of the chatbot in reducing patients' information needs and enhancing their understanding of PC. This easy-to-use AI tool provides accurate, timely, and accessible support, demonstrating its value in the PC diagnosis process. Future steps include further customization of the chatbot's responses and integration with the existing health care systems to maximize its impact on patient outcomes. Patient summary This study evaluated an artificial intelligence-powered chatbot-PROSCA, a digital tool designed to support men facing prostate cancer diagnosis by providing validated information from diagnosis to treatment. Results showed that patients who used the chatbot as an additional tool felt better informed than those who received standard information from urologists. The majority of users appreciated the ease of use of the chatbot and expressed a desire to use it again; this suggests that PROSCA could be a valuable resource to improve patient understanding in prostate cancer diagnosis.
Collapse
Affiliation(s)
- Kilian Baumgärtner
- Medical Faculty, Ruprecht-Karls University of Heidelberg, Heidelberg, Germany
| | | | | | | | | | | | - David Bonekamp
- Department of Radiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | | | | | - Magdalena Görtz
- Department of Urology, Heidelberg University Hospital, Heidelberg, Germany
- Junior Clinical Cooperation Unit ‘Multiparametric Methods for Early Detection of Prostate Cancer’, German Cancer Research Center (DKFZ), Heidelberg, Germany
| |
Collapse
|
2
|
Song Y, Xu T. Letter to the editor for the article "Performance of ChatGPT-3.5 and ChatGPT-4 on the European Board of Urology (EBU) exams: a comparative analysis". World J Urol 2024; 42:555. [PMID: 39361038 DOI: 10.1007/s00345-024-05256-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2024] [Accepted: 09/01/2024] [Indexed: 10/05/2024] Open
Affiliation(s)
- Yuxuan Song
- Department of Urology, Peking University People's Hospital, Beijing, 100044, China
| | - Tao Xu
- Department of Urology, Peking University People's Hospital, Beijing, 100044, China.
| |
Collapse
|
3
|
Pozzi E, Velasquez DA, Varnum AA, Kava BR, Ramasamy R. Artificial Intelligence Modeling and Priapism. Curr Urol Rep 2024; 25:261-265. [PMID: 38886246 DOI: 10.1007/s11934-024-01221-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/12/2024] [Indexed: 06/20/2024]
Abstract
PURPOSE OF REVIEW This narrative review aims to outline the current available evidence, challenges, and future perspectives of Artificial Intelligence (AI) in the diagnosis and management of priapism, a condition marked by prolonged and often painful erections that presents unique diagnostic and therapeutic challenges. RECENT FINDINGS Recent advancements in AI offer promising solutions to face the challenges in diagnosing and treating priapism. AI models have demonstrated the potential to predict the need for surgical intervention and improve diagnostic accuracy. The integration of AI models into medical decision-making for priapism can also predict long-term consequences. AI is currently being implemented in urology to enhance diagnostics and treatment work-up for various conditions, including priapism. Traditional diagnostic approaches rely heavily on assessments based on history, leading to potential delays in treatment with possible long-term sequelae. To date, the role of AI in the management of priapism is understudied, yet to achieve dependable and effective models that can reliably assist physicians in making decisions regarding both diagnostic and treatment strategies.
Collapse
Affiliation(s)
- Edoardo Pozzi
- Desai Sethi Urology Institute, Miller School of Medicine, University of Miami, Miami, FL, USA.
- University Vita-Salute San Raffaele, Milan, Italy.
- Division of Experimental Oncology, Unit of Urology, URI, IRCCS Ospedale San Raffaele, Milan, Italy.
| | - David A Velasquez
- Desai Sethi Urology Institute, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Alexandra Aponte Varnum
- Desai Sethi Urology Institute, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Bruce R Kava
- Desai Sethi Urology Institute, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Ranjith Ramasamy
- Desai Sethi Urology Institute, Miller School of Medicine, University of Miami, Miami, FL, USA
| |
Collapse
|
4
|
Gurbuz T, Gokmen O, Devranoglu B, Yurci A, Madenli AA. Artificial intelligence in reproductive endocrinology: an in-depth longitudinal analysis of ChatGPTv4's month-by-month interpretation and adherence to clinical guidelines for diminished ovarian reserve. Endocrine 2024:10.1007/s12020-024-04031-8. [PMID: 39341951 DOI: 10.1007/s12020-024-04031-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Accepted: 09/03/2024] [Indexed: 10/01/2024]
Abstract
OBJECTIVE To quantitatively assess the performance of ChatGPTv4, an Artificial Intelligence Language Model, in adhering to clinical guidelines for Diminished Ovarian Reserve (DOR) over two months, evaluating the model's consistency in providing guideline-based responses. DESIGN A longitudinal study design was employed to evaluate ChatGPTv4's response accuracy and completeness using a structured questionnaire at baseline and at a two-month follow-up. SETTING ChatGPTv4 was tasked with interpreting DOR questionnaires based on standardized clinical guidelines. PARTICIPANTS The study did not involve human participants; the questionnaire was exclusively administered to the ChatGPT model to generate responses about DOR. METHODS A guideline-based questionnaire with 176 open-ended, 166 multiple-choice, and 153 true/false questions were deployed to rigorously assess ChatGPTv4's ability to provide accurate medical advice aligned with current DOR clinical guidelines. AI-generated responses were rated on a 6-point Likert scale for accuracy and a 3-point scale for completeness. The two-phase design assessed the stability and consistency of AI-generated answers over two months. RESULTS ChatGPTv4 achieved near-perfect scores across all question types, with true/false questions consistently answered with 100% accuracy. In multiple-choice queries, accuracy improved from 98.2 to 100% at the two-month follow-up. Open-ended question responses exhibited significant positive enhancements, with accuracy scores increasing from an average of 5.38 ± 0.71 to 5.74 ± 0.51 (max: 6.0) and completeness scores from 2.57 ± 0.52 to 2.85 ± 0.36 (max: 3.0). It underscored the improvements as significant (p < 0.001), with positive correlations between initial and follow-up accuracy (r = 0.597) and completeness (r = 0.381) scores. LIMITATIONS The study was limited by the reliance on a controlled, albeit simulated, setting that may not perfectly mirror real-world clinical interactions. CONCLUSION ChatGPTv4 demonstrated exceptional and improving accuracy and completeness in handling DOR-related guideline queries over the studied period. These findings highlight ChatGPTv4's potential as a reliable, adaptable AI tool in reproductive endocrinology, capable of augmenting clinical decision-making and guideline development.
Collapse
Affiliation(s)
- Tugba Gurbuz
- Department of Gynecology and Obstetrics Clinic, Vocational School of Health Services, Operating Room Services (Turkish-English) Medical Imaging Techniques (Turkish-English), Medistate Hospital, Istanbul Nişantaşı University, Istanbul, Turkey.
| | - Oya Gokmen
- Department of Gynecology, Obstetrics and In Vitro Fertilization Clinic, Medistate Hospital, Istanbul, Turkey
| | - Belgin Devranoglu
- Department of Obstetrics and Gynecology, Zeynep Kamil Maternity/Children, Education and Training Hospital, Istanbul, Turkey
| | - Arzu Yurci
- IVF Department, Department of Gynecology and Obstetrics, Memorial Bahçelievler Hospital, Istanbul Arel University, Istanbul, Turkey
| | - Asena Ayar Madenli
- Department of Obstetrics and Gynecology, Liv Hospital Vadistanbul, Istanbul, Turkey
- Department of Obstetrics and Gynecology, Faculty of Medicine, Istinye University, Istanbul, Turkey
| |
Collapse
|
5
|
Wang L, Mao Y, Wang L, Sun Y, Song J, Zhang Y. Suitability of GPT-4o as an Evaluator of Cardiopulmonary Resuscitation Skills Examinations. Resuscitation 2024:110404. [PMID: 39343124 DOI: 10.1016/j.resuscitation.2024.110404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2024] [Revised: 09/23/2024] [Accepted: 09/24/2024] [Indexed: 10/01/2024]
Abstract
AIM To assess the accuracy and reliability of GPT-4o for scoring examinees' performance on cardiopulmonary resuscitation (CPR) skills tests. METHODS This study included six experts certified to supervise the national medical licensing examination (three junior and three senior) who reviewed the CPR skills test videos across 103 examinees. All videos reviewed by the experts were subjected to automated assessment by GPT-4o. Both the experts and GPT-4o scored the videos across four sections: patient assessment, chest compressions, rescue breathing, and repeated operations. The experts subsequently rated GPT-4o's reliability on a 5-point Likert scale (1, completely unreliable; 5, completely reliable). GPT-4o's accuracy was evaluated using the intraclass correlation coefficient (for the first three sections) and Fleiss' Kappa (for the last section) to assess the agreement between its scores vs those of the experts. RESULTS The mean accuracy scores for the patient assessment, chest compressions, rescue breathing, and repeated operation sections were 0.65, 0.58, 0.60, and 0.31, respectively, when comparing the GPT-4o's vs. junior experts' scores and 0.75, 0.65, 0.72, and 0.41, respectively, when comparing the GPT-4o's vs. senior experts' scores. For reliability, the median Likert scale scores were 4.00 (interquartile range [IQR] = 3.66-4.33, mean [standard deviation] = 3.95 [0.55]) and 4.33 (4.00-4.67, 4.29 [0.50]) for the junior and senior experts, respectively. CONCLUSIONS GPT-4o demonstrated a level of accuracy that was similar to that of senior experts in examining CPR skills examination videos. The results demonstrate the potential for deploying this large language model in medical examination settings.
Collapse
Affiliation(s)
- Lu Wang
- Shengjing Hospital of China Medical University, Shenyang, Liaoning, 110004, China; School of Health Management, China Medical University, Shenyang, Liaoning, 110122, China
| | - Yuqiang Mao
- Department of Thoracic Surgery, Shengjing Hospital of China Medical University, Shenyang, Liaoning, 110004, China
| | - Lin Wang
- Department of Emergency Medicine, Shengjing Hospital of China Medical University, Shenyang, Liaoning, 110022, China
| | - Yujie Sun
- Center for Clinical Skills Practice and Teaching, China Medical University, Shenyang, Liaoning, 110122, China
| | - Jiangdian Song
- School of Health Management, China Medical University, Shenyang, Liaoning, 110122, China.
| | - Yang Zhang
- Center for Clinical Skills Practice and Teaching, China Medical University, Shenyang, Liaoning, 110122, China
| |
Collapse
|
6
|
Ahn S. The transformative impact of large language models on medical writing and publishing: current applications, challenges and future directions. THE KOREAN JOURNAL OF PHYSIOLOGY & PHARMACOLOGY : OFFICIAL JOURNAL OF THE KOREAN PHYSIOLOGICAL SOCIETY AND THE KOREAN SOCIETY OF PHARMACOLOGY 2024; 28:393-401. [PMID: 39198220 PMCID: PMC11362003 DOI: 10.4196/kjpp.2024.28.5.393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 06/10/2024] [Accepted: 06/14/2024] [Indexed: 09/01/2024]
Abstract
Large language models (LLMs) are rapidly transforming medical writing and publishing. This review article focuses on experimental evidence to provide a comprehensive overview of the current applications, challenges, and future implications of LLMs in various stages of academic research and publishing process. Global surveys reveal a high prevalence of LLM usage in scientific writing, with both potential benefits and challenges associated with its adoption. LLMs have been successfully applied in literature search, research design, writing assistance, quality assessment, citation generation, and data analysis. LLMs have also been used in peer review and publication processes, including manuscript screening, generating review comments, and identifying potential biases. To ensure the integrity and quality of scholarly work in the era of LLM-assisted research, responsible artificial intelligence (AI) use is crucial. Researchers should prioritize verifying the accuracy and reliability of AI-generated content, maintain transparency in the use of LLMs, and develop collaborative human-AI workflows. Reviewers should focus on higher-order reviewing skills and be aware of the potential use of LLMs in manuscripts. Editorial offices should develop clear policies and guidelines on AI use and foster open dialogue within the academic community. Future directions include addressing the limitations and biases of current LLMs, exploring innovative applications, and continuously updating policies and practices in response to technological advancements. Collaborative efforts among stakeholders are necessary to harness the transformative potential of LLMs while maintaining the integrity of medical writing and publishing.
Collapse
Affiliation(s)
- Sangzin Ahn
- Department of Pharmacology and PharmacoGenomics Research Center, Inje University College of Medicine, Busan 47392, Korea
- Center for Personalized Precision Medicine of Tuberculosis, Inje University College of Medicine, Busan 47392, Korea
| |
Collapse
|
7
|
Ganjavi C, Eppler M, O'Brien D, Ramacciotti LS, Ghauri MS, Anderson I, Choi J, Dwyer D, Stephens C, Shi V, Ebert M, Derby M, Yazdi B, Cacciamani GE. ChatGPT and large language models (LLMs) awareness and use. A prospective cross-sectional survey of U.S. medical students. PLOS DIGITAL HEALTH 2024; 3:e0000596. [PMID: 39236008 PMCID: PMC11376538 DOI: 10.1371/journal.pdig.0000596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 07/29/2024] [Indexed: 09/07/2024]
Abstract
Generative-AI (GAI) models like ChatGPT are becoming widely discussed and utilized tools in medical education. For example, it can be used to assist with studying for exams, shown capable of passing the USMLE board exams. However, there have been concerns expressed regarding its fair and ethical use. We designed an electronic survey for students across North American medical colleges to gauge their views on and current use of ChatGPT and similar technologies in May, 2023. Overall, 415 students from at least 28 medical schools completed the questionnaire and 96% of respondents had heard of ChatGPT and 52% had used it for medical school coursework. The most common use in pre-clerkship and clerkship phase was asking for explanations of medical concepts and assisting with diagnosis/treatment plans, respectively. The most common use in academic research was for proof reading and grammar edits. Respondents recognized the potential limitations of ChatGPT, including inaccurate responses, patient privacy, and plagiarism. Students recognized the importance of regulations to ensure proper use of this novel technology. Understanding the views of students is essential to crafting workable instructional courses, guidelines, and regulations that ensure the safe, productive use of generative-AI in medical school.
Collapse
Affiliation(s)
- Conner Ganjavi
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, California United States of America
- AI Center at USC Urology, USC Institute of Urology, University of Southern California, Los Angeles, California, United States of America
- Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America
| | - Michael Eppler
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, California United States of America
- AI Center at USC Urology, USC Institute of Urology, University of Southern California, Los Angeles, California, United States of America
- Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America
| | - Devon O'Brien
- Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America
| | - Lorenzo Storino Ramacciotti
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, California United States of America
- AI Center at USC Urology, USC Institute of Urology, University of Southern California, Los Angeles, California, United States of America
| | | | - Issac Anderson
- Wayne State University School of Medicine, Detroid, Michigan, United States of America
| | - Jae Choi
- UT Southwestern Medical School, Dallas, Texas, United States of America
| | - Darby Dwyer
- Texas A&M School of Medicine, Bryan, Texas, United States of America
| | - Claudia Stephens
- Frederick P. Whiddon College of Medicine, University of South Alabama, Mobile, Alabama, United States of America
| | - Victoria Shi
- University of Missouri-Kansas City School of Medicine, Kansas City, Missouri, United States of America
| | - Madeline Ebert
- Medical College of Wisconsin, Milwaukee, Wisconsin, United States of America
| | - Michaela Derby
- Sanford School of Medicine, University of South Dakota, Vermillion, South Dakota, United States of America
| | - Bayan Yazdi
- Stritch School of Medicine, Loyola University Chicago, Maywood, Illinois, United States of America
| | - Giovanni E Cacciamani
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, California United States of America
- AI Center at USC Urology, USC Institute of Urology, University of Southern California, Los Angeles, California, United States of America
- Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America
| |
Collapse
|
8
|
Zhang P, Wang H, Li P, Fu X, Yuan H, Ji H, Niu H. Assessing state-of-the-art online large language models for patient education regarding prostatitis. Prostate 2024; 84:1173-1175. [PMID: 38751201 DOI: 10.1002/pros.24746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 04/16/2024] [Accepted: 05/03/2024] [Indexed: 07/25/2024]
Affiliation(s)
- Pengfei Zhang
- Department of Urology, The Affiliated Hospital of Qingdao University, Qingdao, China
- Key Laboratory, Department of Urology and Andrology, Medical Research Center, The Affiliated Hospital of Qingdao University, Qingdao, China
- Department of Urology, The First Affiliated Hospital of Anhui Medical University, Hefei, China
| | - Hui Wang
- Department of Urology, The First Affiliated Hospital of Anhui Medical University, Hefei, China
- Anhui Province Key Laboratory of Genitourinary Diseases, Anhui Medical University, Hefei, China
| | - Pengfei Li
- Department of General Practice, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Xianchun Fu
- Department of Urology, The Affiliated Hospital of Qingdao University, Qingdao, China
- Key Laboratory, Department of Urology and Andrology, Medical Research Center, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Hang Yuan
- Department of Urology, The Affiliated Hospital of Qingdao University, Qingdao, China
- Key Laboratory, Department of Urology and Andrology, Medical Research Center, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Hongwei Ji
- Tsinghua Medicine, Tsinghua University, Beijing, China
| | - Haitao Niu
- Department of Urology, The Affiliated Hospital of Qingdao University, Qingdao, China
- Key Laboratory, Department of Urology and Andrology, Medical Research Center, The Affiliated Hospital of Qingdao University, Qingdao, China
| |
Collapse
|
9
|
Luo MJ, Pang J, Bi S, Lai Y, Zhao J, Shang Y, Cui T, Yang Y, Lin Z, Zhao L, Wu X, Lin D, Chen J, Lin H. Development and Evaluation of a Retrieval-Augmented Large Language Model Framework for Ophthalmology. JAMA Ophthalmol 2024; 142:798-805. [PMID: 39023885 PMCID: PMC11258636 DOI: 10.1001/jamaophthalmol.2024.2513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 05/14/2024] [Indexed: 07/20/2024]
Abstract
Importance Although augmenting large language models (LLMs) with knowledge bases may improve medical domain-specific performance, practical methods are needed for local implementation of LLMs that address privacy concerns and enhance accessibility for health care professionals. Objective To develop an accurate, cost-effective local implementation of an LLM to mitigate privacy concerns and support their practical deployment in health care settings. Design, Setting, and Participants ChatZOC (Sun Yat-Sen University Zhongshan Ophthalmology Center), a retrieval-augmented LLM framework, was developed by enhancing a baseline LLM with a comprehensive ophthalmic dataset and evaluation framework (CODE), which includes over 30 000 pieces of ophthalmic knowledge. This LLM was benchmarked against 10 representative LLMs, including GPT-4 and GPT-3.5 Turbo (OpenAI), across 300 clinical questions in ophthalmology. The evaluation, involving a panel of medical experts and biomedical researchers, focused on accuracy, utility, and safety. A double-masked approach was used to try to minimize bias assessment across all models. The study used a comprehensive knowledge base derived from ophthalmic clinical practice, without directly involving clinical patients. Exposures LLM response to clinical questions. Main Outcomes and Measures Accuracy, utility, and safety of LLMs in responding to clinical questions. Results The baseline model achieved a human ranking score of 0.48. The retrieval-augmented LLM had a score of 0.60, a difference of 0.12 (95% CI, 0.02-0.22; P = .02) from baseline and not different from GPT-4 with a score of 0.61 (difference = 0.01; 95% CI, -0.11 to 0.13; P = .89). For scientific consensus, the retrieval-augmented LLM was 84.0% compared with the baseline model of 46.5% (difference = 37.5%; 95% CI, 29.0%-46.0%; P < .001) and not different from GPT-4 with a value of 79.2% (difference = 4.8%; 95% CI, -0.3% to 10.0%; P = .06). Conclusions and Relevance Results of this quality improvement study suggest that the integration of high-quality knowledge bases improved the LLM's performance in medical domains. This study highlights the transformative potential of augmented LLMs in clinical practice by providing reliable, safe, and practical clinical information. Further research is needed to explore the broader application of such frameworks in the real world.
Collapse
Affiliation(s)
- Ming-Jie Luo
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Jianyu Pang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Shaowei Bi
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Yunxi Lai
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Jiaman Zhao
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Yuanrui Shang
- The Second Affiliated Hospital of Xi’an Jiaotong University, Xi’an, China
| | - Tingxin Cui
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Yahan Yang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Zhenzhe Lin
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Lanqin Zhao
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Xiaohang Wu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Duoru Lin
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Jingjing Chen
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Haotian Lin
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
- Center for Precision Medicine and Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China
- Hainan Eye Hospital and Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Haikou, China
| |
Collapse
|
10
|
Kowalewski KF, Rodler S. [Large language models in science]. UROLOGIE (HEIDELBERG, GERMANY) 2024; 63:860-866. [PMID: 39048694 DOI: 10.1007/s00120-024-02396-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 06/26/2024] [Indexed: 07/27/2024]
Abstract
OBJECTIVE Large language models (LLMs) are gaining popularity due to their ability to communicate in a human-like manner. Their potential for science, including urology, is increasingly recognized. However, unresolved concerns regarding transparency, accountability, and the accuracy of LLM results still exist. RESEARCH QUESTION This review examines the ethical, technical, and practical challenges as well as the potential applications of LLMs in urology and science. MATERIALS AND METHODS A selective literature review was conducted to analyze current findings and developments in the field of LLMs. The review considered studies on technical aspects, ethical considerations, and practical applications in research and practice. RESULTS LLMs, such as GPT from OpenAI and Gemini from Google, show great potential for processing and analyzing text data. Applications in urology include creating patient information and supporting administrative tasks. However, for purely clinical and scientific questions, the methods do not yet seem mature. Currently, concerns about ethical issues and the accuracy of results persist. CONCLUSION LLMs have the potential to support research and practice through efficient data processing and information provision. Despite their advantages, ethical concerns and technical challenges must be addressed to ensure responsible and trustworthy use. Increased implementation could reduce the workload of urologists and improve communication with patients.
Collapse
Affiliation(s)
- Karl-Friedrich Kowalewski
- Klinik für Urologie und Urochirurgie, Universitätsmedizin Mannheim, Universität Heidelberg, 68167, Theodor-Kutzer-Ufer 1-3, Deutschland.
| | - Severin Rodler
- Klinik für Urologie, Universitätsklinikum Schleswig-Holstein, Campus Kiel, Arnold-Heller-Straße 3, 24105, Kiel, Deutschland.
| |
Collapse
|
11
|
Chao-Yang, Bao YY, Yang YY, Mao CK. Re: Michael Eppler, Conner Ganjavi, Lorenzo Storino Ramacciotti, et al. Awareness and Use of ChatGPT and Large Language Models: A Prospective Cross-sectional Global Survey in Urology. Eur Urol 2024;85:146-53. Eur Urol 2024; 86:e46-e47. [PMID: 38644140 DOI: 10.1016/j.eururo.2024.02.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 02/11/2024] [Indexed: 04/23/2024]
Affiliation(s)
- Chao-Yang
- Department of Urology, Anhui Provincial Children's Hospital, Hefei, China
| | - Yuan-Yuan Bao
- Department of Electrocardiography, Anhui Maternal and Child Health Hospital, Hefei, China
| | - Yuan-Yuan Yang
- Department of Electrocardiography, Anhui Maternal and Child Health Hospital, Hefei, China.
| | - Chang-Kun Mao
- Department of Urology, Anhui Provincial Children's Hospital, Hefei, China.
| |
Collapse
|
12
|
Laohawetwanit T, Pinto DG, Bychkov A. A survey analysis of the adoption of large language models among pathologists. Am J Clin Pathol 2024:aqae093. [PMID: 39076014 DOI: 10.1093/ajcp/aqae093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Accepted: 06/28/2024] [Indexed: 07/31/2024] Open
Abstract
OBJECTIVES We sought to investigate the adoption and perception of large language model (LLM) applications among pathologists. METHODS A cross-sectional survey was conducted, gathering data from pathologists on their usage and views concerning LLM tools. The survey, distributed globally through various digital platforms, included quantitative and qualitative questions. Patterns in the respondents' adoption and perspectives on these artificial intelligence tools were analyzed. RESULTS Of 215 respondents, 100 (46.5%) reported using LLMs, particularly ChatGPT (OpenAI), for professional purposes, predominantly for information retrieval, proofreading, academic writing, and drafting pathology reports, highlighting a significant time-saving benefit. Academic pathologists demonstrated a better level of understanding of LLMs than their peers. Although chatbots sometimes provided incorrect general domain information, they were considered moderately proficient concerning pathology-specific knowledge. The technology was mainly used for drafting educational materials and programming tasks. The most sought-after feature in LLMs was their image analysis capabilities. Participants expressed concerns about information accuracy, privacy, and the need for regulatory approval. CONCLUSIONS Large language model applications are gaining notable acceptance among pathologists, with nearly half of respondents indicating adoption less than a year after the tools' introduction to the market. They see the benefits but are also worried about these tools' reliability, ethical implications, and security.
Collapse
Affiliation(s)
- Thiyaphat Laohawetwanit
- Division of Pathology, Chulabhorn International College of Medicine, Thammasat University, Pathum Thani, Thailand
- Division of Pathology, Thammasat University Hospital, Pathum Thani, Thailand
| | - Daniel Gomes Pinto
- Department of Pathology, Hospital Garcia de Orta, Almada, Portugal
- Nova Medical School, Lisbon, Portugal
| | - Andrey Bychkov
- Department of Pathology, Kameda Medical Center, Kamogawa, Japan
| |
Collapse
|
13
|
Lareyre F, D'Oria M, Caradu C, Jongkind V, Di Lorenzo G, Smeds MR, Nasr B, Raffort J. Open E-survey on the Use and Perception of Chatbots in Vascular Surgery. EJVES Vasc Forum 2024; 62:57-63. [PMID: 39346798 PMCID: PMC11437816 DOI: 10.1016/j.ejvsvf.2024.07.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 06/10/2024] [Accepted: 07/04/2024] [Indexed: 10/01/2024] Open
Abstract
Objective Large language models and artificial intelligence (AI) based chatbots have brought new insights in healthcare, but they also raise major concerns. Their applications in vascular surgery have scarcely been investigated to date. This international survey aimed to evaluate the perceptions and feedback from vascular surgeons on the use of AI chatbots in vascular surgery. Methods This international open e-survey comprised 50 items that covered participant characteristics, their perceptions on the use of AI chatbots in vascular surgery, and their user experience. The study was designed in accordance with the Checklist for reporting Results of Internet E-Surveys and was critically reviewed and approved by international members of the European Vascular Research Collaborative (EVRC) prior to distribution. Participation was open to self reported health professionals specialised (or specialising) in vascular surgery, including residents or fellows. Results Of the 342 individuals who visited the survey page, 318 (93%) agreed to participate; 262 (82.4%) finished the survey and were included in the analysis. Most were consultants or attending physicians (64.1%), most declared not having any training or education related to AI in healthcare (221; 84.4%), and 198 (75.6%) rated their knowledge about the abilities of AI chatbots between average to very poor. Interestingly, 95 participants (36.3%) found that AI chatbots were very useful or somewhat useful in clinical practice at this stage and 229 (87.4%) agreed that they should be systematically validated prior to being used. Eighty participants (30.5%) had specifically tested it for questions related to clinical practice and 59 (73.8%) of them experienced issues or limitations. Conclusion This international survey provides an overview of perceptions of AI chatbots by vascular surgeons and highlights the need to improve knowledge and training of health professionals to better evaluate, define, and implement their use in vascular surgery.
Collapse
Affiliation(s)
- Fabien Lareyre
- Department of Vascular Surgery, Hospital of Antibes Juan-les-Pins, Antibes, France
- Université Côte d'Azur, CNRS, UMR7370, LP2M, Nice, France
- Fédération Hospitalo-Universitaire FHU Plan&Go, Nice, France
| | - Mario D'Oria
- Division of Vascular and Endovascular Surgery, Cardiovascular Department, University Hospital of Trieste, Trieste, Italy
| | - Caroline Caradu
- Bordeaux University Hospital, Department of Vascular Surgery, Bordeaux, France
| | - Vincent Jongkind
- Department of Surgery, Amsterdam UMC, location Vrije Universiteit, University of Amsterdam, Amsterdam, the Netherlands
| | - Gilles Di Lorenzo
- Department of Vascular Surgery, Hospital of Antibes Juan-les-Pins, Antibes, France
| | - Matthew R. Smeds
- Division of Vascular and Endovascular Surgery, Department of Surgery, Saint Louis University, Saint Louis, MO, USA
| | - Bahaa Nasr
- Department of Vascular and Endovascular Surgery, Brest University Hospital, Brest, France
| | - Juliette Raffort
- Université Côte d'Azur, CNRS, UMR7370, LP2M, Nice, France
- Fédération Hospitalo-Universitaire FHU Plan&Go, Nice, France
- Institute 3IA Côte d’Azur, Université Côte d’Azur, France
- Clinical Chemistry Laboratory, University Hospital of Nice, Nice, France
| |
Collapse
|
14
|
Lin HL, Liao LL, Wang YN, Chang LC. Attitude and utilization of ChatGPT among registered nurses: A cross-sectional study. Int Nurs Rev 2024. [PMID: 38979771 DOI: 10.1111/inr.13012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Accepted: 06/10/2024] [Indexed: 07/10/2024]
Abstract
AIM This study explores the influencing factors of attitudes and behaviors toward use of ChatGPT based on the Technology Acceptance Model among registered nurses in Taiwan. BACKGROUND The complexity of medical services and nursing shortages increases workloads. ChatGPT swiftly answers medical questions, provides clinical guidelines, and assists with patient information management, thereby improving nursing efficiency. INTRODUCTION To facilitate the development of effective ChatGPT training programs, it is essential to examine registered nurses' attitudes toward and utilization of ChatGPT across diverse workplace settings. METHODS An anonymous online survey was used to collect data from over 1000 registered nurses recruited through social media platforms between November 2023 and January 2024. Descriptive statistics and multiple linear regression analyses were conducted for data analysis. RESULTS Among respondents, some were unfamiliar with ChatGPT, while others had used it before, with higher usage among males, higher-educated individuals, experienced nurses, and supervisors. Gender and work settings influenced perceived risks, and those familiar with ChatGPT recognized its social impact. Perceived risk and usefulness significantly influenced its adoption. DISCUSSION Nurse attitudes to ChatGPT vary based on gender, education, experience, and role. Positive perceptions emphasize its usefulness, while risk concerns affect adoption. The insignificant role of perceived ease of use highlights ChatGPT's user-friendly nature. CONCLUSION Over half of the surveyed nurses had used or were familiar with ChatGPT and showed positive attitudes toward its use. Establishing rigorous guidelines to enhance their interaction with ChatGPT is crucial for future training. IMPLICATIONS FOR NURSING AND HEALTH POLICY Nurse managers should understand registered nurses' attitudes toward ChatGPT and integrate it into in-service education with tailored support and training, including appropriate prompt formulation and advanced decision-making, to prevent misuse.
Collapse
Affiliation(s)
- Hui-Ling Lin
- Department of Nursing, Linkou Branch, Chang Gung Memorial Hospital, Taoyuan, Taiwan, ROC
- School of Nursing, College of Medicine, Chang Gung University, Taoyuan, Taiwan, ROC
- School of Nursing, Chang Gung University of Science and Technology, Gui-Shan Town, Taoyuan, Taiwan, ROC
- Taipei Medical University, Taipei, Taiwan
| | - Li-Ling Liao
- Department of Public Health, College of Health Science, Kaohsiung Medical University, Kaohsiung City, Taiwan
- Department of Medical Research, Kaohsiung Medical University Hospital, Kaohsiung City, Taiwan
| | - Ya-Ni Wang
- School of Nursing, College of Medicine, Chang Gung University, Taoyuan, Taiwan, ROC
| | - Li-Chun Chang
- Department of Nursing, Linkou Branch, Chang Gung Memorial Hospital, Taoyuan, Taiwan, ROC
- School of Nursing, College of Medicine, Chang Gung University, Taoyuan, Taiwan, ROC
- School of Nursing, Chang Gung University of Science and Technology, Gui-Shan Town, Taoyuan, Taiwan, ROC
| |
Collapse
|
15
|
Li J, Bao Y, Yang Y, Mao C. ChatGPT promotes healthcare: current applications and potential challenges - correspondence. Int J Surg 2024; 110:4459-4460. [PMID: 39042075 PMCID: PMC11254293 DOI: 10.1097/js9.0000000000001354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 03/03/2024] [Indexed: 07/23/2024]
Affiliation(s)
- Junting Li
- Department of Urology, Anhui Provincial Children’s Hospital
| | - Yuanyuan Bao
- Department of Electrocardiogram, Anhui Maternal and Child Health Hospital, Hefei, Anhui, People’s Republic of China
| | - Yuanyuan Yang
- Department of Electrocardiogram, Anhui Maternal and Child Health Hospital, Hefei, Anhui, People’s Republic of China
| | - Changkun Mao
- Department of Urology, Anhui Provincial Children’s Hospital
| |
Collapse
|
16
|
Cardona Ortegón JD, Serrano S, Romero Cortes D. Re: Michael Eppler, Conner Ganjavi, Lorenzo Storino Ramacciotti, et al. Awareness and Use of ChatGPT and Large Language Models: A Prospective Cross-sectional Global Survey in Urology. Eur Urol 2024;85:146-53. Eur Urol 2024; 86:e22. [PMID: 38644147 DOI: 10.1016/j.eururo.2024.02.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Accepted: 02/21/2024] [Indexed: 04/23/2024]
Affiliation(s)
- José David Cardona Ortegón
- Department of Diagnostic Imaging, Fundación Santa Fe de Bogotá, Bogotá, Colombia; School of Medicine, El Bosque University, Bogotá, Colombia.
| | - Samuel Serrano
- School of Medicine, El Bosque University, Bogotá, Colombia; Department of Urology, El Bosque University, Bogotá, Colombia
| | - Daniel Romero Cortes
- School of Medicine, El Bosque University, Bogotá, Colombia; Department of Urology, El Bosque University, Bogotá, Colombia
| |
Collapse
|
17
|
Puerto Nino AK, Garcia Perez V, Secco S, De Nunzio C, Lombardo R, Tikkinen KAO, Elterman DS. Can ChatGPT provide high-quality patient information on male lower urinary tract symptoms suggestive of benign prostate enlargement? Prostate Cancer Prostatic Dis 2024:10.1038/s41391-024-00847-7. [PMID: 38871841 DOI: 10.1038/s41391-024-00847-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 05/03/2024] [Accepted: 05/10/2024] [Indexed: 06/15/2024]
Abstract
BACKGROUND ChatGPT has recently emerged as a novel resource for patients' disease-specific inquiries. There is, however, limited evidence assessing the quality of the information. We evaluated the accuracy and quality of the ChatGPT's responses on male lower urinary tract symptoms (LUTS) suggestive of benign prostate enlargement (BPE) when compared to two reference resources. METHODS Using patient information websites from the European Association of Urology and the American Urological Association as reference material, we formulated 88 BPE-centric questions for ChatGPT 4.0+. Independently and in duplicate, we compared the ChatGPT's responses and the reference material, calculating accuracy through F1 score, precision, and recall metrics. We used a 5-point Likert scale for quality rating. We evaluated examiner agreement using the interclass correlation coefficient and assessed the difference in the quality scores with the Wilcoxon signed-rank test. RESULTS ChatGPT addressed all (88/88) LUTS/BPE-related questions. For the 88 questions, the recorded F1 score was 0.79 (range: 0-1), precision 0.66 (range: 0-1), recall 0.97 (range: 0-1), and the quality score had a median of 4 (range = 1-5). Examiners had a good level of agreement (ICC = 0.86). We found no statistically significant difference between the scores given by the examiners and the overall quality of the responses (p = 0.72). DISCUSSION ChatGPT demostrated a potential utility in educating patients about BPE/LUTS, its prognosis, and treatment that helps in the decision-making process. One must exercise prudence when recommending this as the sole information outlet. Additional studies are needed to completely understand the full extent of AI's efficacy in delivering patient education in urology.
Collapse
Affiliation(s)
- Angie K Puerto Nino
- Faculty of Medicine, University of Helsinki, Helsinki, Finland.
- Division of Urology, Department of Surgery, University of Toronto, Toronto, ON, Canada.
| | | | - Silvia Secco
- Department of Urology, Niguarda Hospital, Milan, Italy
| | - Cosimo De Nunzio
- Urology Unit, Ospedale Sant'Andrea, La Sapienza University of Rome, Rome, Italy
| | - Riccardo Lombardo
- Urology Unit, Ospedale Sant'Andrea, La Sapienza University of Rome, Rome, Italy
| | - Kari A O Tikkinen
- Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Department of Urology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
- Department of Surgery, South Karelian Central Hospital, Lappeenranta, Finland
- Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, ON, Canada
| | - Dean S Elterman
- Division of Urology, Department of Surgery, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
18
|
Kıyak YS, Emekli E. ChatGPT prompts for generating multiple-choice questions in medical education and evidence on their validity: a literature review. Postgrad Med J 2024:qgae065. [PMID: 38840505 DOI: 10.1093/postmj/qgae065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 04/29/2024] [Accepted: 05/23/2024] [Indexed: 06/07/2024]
Abstract
ChatGPT's role in creating multiple-choice questions (MCQs) is growing but the validity of these artificial-intelligence-generated questions is unclear. This literature review was conducted to address the urgent need for understanding the application of ChatGPT in generating MCQs for medical education. Following the database search and screening of 1920 studies, we found 23 relevant studies. We extracted the prompts for MCQ generation and assessed the validity evidence of MCQs. The findings showed that prompts varied, including referencing specific exam styles and adopting specific personas, which align with recommended prompt engineering tactics. The validity evidence covered various domains, showing mixed accuracy rates, with some studies indicating comparable quality to human-written questions, and others highlighting differences in difficulty and discrimination levels, alongside a significant reduction in question creation time. Despite its efficiency, we highlight the necessity of careful review and suggest a need for further research to optimize the use of ChatGPT in question generation. Main messages Ensure high-quality outputs by utilizing well-designed prompts; medical educators should prioritize the use of detailed, clear ChatGPT prompts when generating MCQs. Avoid using ChatGPT-generated MCQs directly in examinations without thorough review to prevent inaccuracies and ensure relevance. Leverage ChatGPT's potential to streamline the test development process, enhancing efficiency without compromising quality.
Collapse
Affiliation(s)
- Yavuz Selim Kıyak
- Department of Medical Education and Informatics, Faculty of Medicine, Gazi University, Ankara 06500, Turkey
| | - Emre Emekli
- Department of Radiology, Faculty of Medicine, Eskişehir Osmangazi University, Eskişehir 26040, Turkey
| |
Collapse
|
19
|
Hershenhouse JS, Mokhtar D, Eppler MB, Rodler S, Storino Ramacciotti L, Ganjavi C, Hom B, Davis RJ, Tran J, Russo GI, Cocci A, Abreu A, Gill I, Desai M, Cacciamani GE. Accuracy, readability, and understandability of large language models for prostate cancer information to the public. Prostate Cancer Prostatic Dis 2024:10.1038/s41391-024-00826-y. [PMID: 38744934 DOI: 10.1038/s41391-024-00826-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 03/14/2024] [Accepted: 03/26/2024] [Indexed: 05/16/2024]
Abstract
BACKGROUND Generative Pretrained Model (GPT) chatbots have gained popularity since the public release of ChatGPT. Studies have evaluated the ability of different GPT models to provide information about medical conditions. To date, no study has assessed the quality of ChatGPT outputs to prostate cancer related questions from both the physician and public perspective while optimizing outputs for patient consumption. METHODS Nine prostate cancer-related questions, identified through Google Trends (Global), were categorized into diagnosis, treatment, and postoperative follow-up. These questions were processed using ChatGPT 3.5, and the responses were recorded. Subsequently, these responses were re-inputted into ChatGPT to create simplified summaries understandable at a sixth-grade level. Readability of both the original ChatGPT responses and the layperson summaries was evaluated using validated readability tools. A survey was conducted among urology providers (urologists and urologists in training) to rate the original ChatGPT responses for accuracy, completeness, and clarity using a 5-point Likert scale. Furthermore, two independent reviewers evaluated the layperson summaries on correctness trifecta: accuracy, completeness, and decision-making sufficiency. Public assessment of the simplified summaries' clarity and understandability was carried out through Amazon Mechanical Turk (MTurk). Participants rated the clarity and demonstrated their understanding through a multiple-choice question. RESULTS GPT-generated output was deemed correct by 71.7% to 94.3% of raters (36 urologists, 17 urology residents) across 9 scenarios. GPT-generated simplified layperson summaries of this output was rated as accurate in 8 of 9 (88.9%) scenarios and sufficient for a patient to make a decision in 8 of 9 (88.9%) scenarios. Mean readability of layperson summaries was higher than original GPT outputs ([original ChatGPT v. simplified ChatGPT, mean (SD), p-value] Flesch Reading Ease: 36.5(9.1) v. 70.2(11.2), <0.0001; Gunning Fog: 15.8(1.7) v. 9.5(2.0), p < 0.0001; Flesch Grade Level: 12.8(1.2) v. 7.4(1.7), p < 0.0001; Coleman Liau: 13.7(2.1) v. 8.6(2.4), 0.0002; Smog index: 11.8(1.2) v. 6.7(1.8), <0.0001; Automated Readability Index: 13.1(1.4) v. 7.5(2.1), p < 0.0001). MTurk workers (n = 514) rated the layperson summaries as correct (89.5-95.7%) and correctly understood the content (63.0-87.4%). CONCLUSION GPT shows promise for correct patient education for prostate cancer-related contents, but the technology is not designed for delivering patients information. Prompting the model to respond with accuracy, completeness, clarity and readability may enhance its utility when used for GPT-powered medical chatbots.
Collapse
Affiliation(s)
- Jacob S Hershenhouse
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Daniel Mokhtar
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Michael B Eppler
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Severin Rodler
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Lorenzo Storino Ramacciotti
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Conner Ganjavi
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Brian Hom
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Ryan J Davis
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - John Tran
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | | | - Andrea Cocci
- Urology Section, University of Florence, Florence, Italy
| | - Andre Abreu
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Inderbir Gill
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Mihir Desai
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Giovanni E Cacciamani
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
20
|
Tsai CY, Hsieh SJ, Huang HH, Deng JH, Huang YY, Cheng PY. Performance of ChatGPT on the Taiwan urology board examination: insights into current strengths and shortcomings. World J Urol 2024; 42:250. [PMID: 38652322 DOI: 10.1007/s00345-024-04957-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 03/25/2024] [Indexed: 04/25/2024] Open
Abstract
PURPOSE To compare ChatGPT-4 and ChatGPT-3.5's performance on Taiwan urology board examination (TUBE), focusing on answer accuracy, explanation consistency, and uncertainty management tactics to minimize score penalties from incorrect responses across 12 urology domains. METHODS 450 multiple-choice questions from TUBE(2020-2022) were presented to two models. Three urologists assessed correctness and consistency of each response. Accuracy quantifies correct answers; consistency assesses logic and coherence in explanations out of total responses, alongside a penalty reduction experiment with prompt variations. Univariate logistic regression was applied for subgroup comparison. RESULTS ChatGPT-4 showed strengths in urology, achieved an overall accuracy of 57.8%, with annual accuracies of 64.7% (2020), 58.0% (2021), and 50.7% (2022), significantly surpassing ChatGPT-3.5 (33.8%, OR = 2.68, 95% CI [2.05-3.52]). It could have passed the TUBE written exams if solely based on accuracy but failed in the final score due to penalties. ChatGPT-4 displayed a declining accuracy trend over time. Variability in accuracy across 12 urological domains was noted, with more frequently updated knowledge domains showing lower accuracy (53.2% vs. 62.2%, OR = 0.69, p = 0.05). A high consistency rate of 91.6% in explanations across all domains indicates reliable delivery of coherent and logical information. The simple prompt outperformed strategy-based prompts in accuracy (60% vs. 40%, p = 0.016), highlighting ChatGPT's limitations in its inability to accurately self-assess uncertainty and a tendency towards overconfidence, which may hinder medical decision-making. CONCLUSIONS ChatGPT-4's high accuracy and consistent explanations in urology board examination demonstrate its potential in medical information processing. However, its limitations in self-assessment and overconfidence necessitate caution in its application, especially for inexperienced users. These insights call for ongoing advancements of urology-specific AI tools.
Collapse
Affiliation(s)
- Chung-You Tsai
- Divisions of Urology, Department of Surgery, Far Eastern Memorial Hospital, No.21, Sec. 2, Nanya S. Rd., Banciao Dist., New Taipei City, 220, Taiwan
- Department of Electrical Engineering, Yuan Ze University, Taoyuan, Taiwan
| | - Shang-Ju Hsieh
- Divisions of Urology, Department of Surgery, Far Eastern Memorial Hospital, No.21, Sec. 2, Nanya S. Rd., Banciao Dist., New Taipei City, 220, Taiwan
| | - Hung-Hsiang Huang
- Divisions of Urology, Department of Surgery, Far Eastern Memorial Hospital, No.21, Sec. 2, Nanya S. Rd., Banciao Dist., New Taipei City, 220, Taiwan
| | - Juinn-Horng Deng
- Department of Electrical Engineering, Yuan Ze University, Taoyuan, Taiwan
| | - Yi-You Huang
- Department of Biomedical Engineering, College of Medicine and College of Engineering, National Taiwan University, Taipei, Taiwan
| | - Pai-Yu Cheng
- Divisions of Urology, Department of Surgery, Far Eastern Memorial Hospital, No.21, Sec. 2, Nanya S. Rd., Banciao Dist., New Taipei City, 220, Taiwan.
- Department of Biomedical Engineering, College of Medicine and College of Engineering, National Taiwan University, Taipei, Taiwan.
| |
Collapse
|
21
|
Ni Z, Peng R, Zheng X, Xie P. Embracing the future: Integrating ChatGPT into China's nursing education system. Int J Nurs Sci 2024; 11:295-299. [PMID: 38707690 PMCID: PMC11064564 DOI: 10.1016/j.ijnss.2024.03.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 02/13/2024] [Accepted: 03/06/2024] [Indexed: 05/07/2024] Open
Abstract
This article delves into the role of ChatGPT within the rapidly evolving field of artificial intelligence, especially highlighting its significant potential in nursing education. Initially, the paper presents the notable advancements ChatGPT has achieved in facilitating interactive learning and providing real-time feedback, along with the academic community's growing interest in this technology. Subsequently, summarizing the research outcomes of ChatGPT's applications in nursing education, including various clinical disciplines and scenarios, showcases the enormous potential for multidisciplinary education and addressing clinical issues. Comparing the performance of several Large Language Models (LLMs) on China's National Nursing Licensure Examination, we observed that ChatGPT demonstrated a higher accuracy rate than its counterparts, providing a solid theoretical foundation for its application in Chinese nursing education and clinical settings. Educational institutions should establish a targeted and effective regulatory framework to leverage ChatGPT in localized nursing education while assuming corresponding responsibilities. Through standardized training for users and adjustments to existing educational assessment methods aimed at preventing potential misuse and abuse, the full potential of ChatGPT as an innovative auxiliary tool in China's nursing education system can be realized, aligning with the developmental needs of modern teaching methodologies.
Collapse
Affiliation(s)
- Zhengxin Ni
- School of Nursing, Yangzhou University, Yangzhou, China
| | - Rui Peng
- Department of Bone and Joint Surgery and Sports Medicine Center, The First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Xiaofei Zheng
- Department of Bone and Joint Surgery and Sports Medicine Center, The First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Ping Xie
- Department of External Cooperation, Northern Jiangsu People’s Hospital, Nanjing, China
| |
Collapse
|
22
|
Pinto VBP, de Azevedo MF, Wroclawski ML, Gentile G, Jesus VLM, de Bessa Junior J, Nahas WC, Sacomani CAR, Sandhu JS, Gomes CM. Conformity of ChatGPT recommendations with the AUA/SUFU guideline on postprostatectomy urinary incontinence. Neurourol Urodyn 2024; 43:935-941. [PMID: 38451040 DOI: 10.1002/nau.25442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 02/24/2024] [Accepted: 02/27/2024] [Indexed: 03/08/2024]
Abstract
INTRODUCTION Artificial intelligence (AI) shows immense potential in medicine and Chat generative pretrained transformer (ChatGPT) has been used for different purposes in the field. However, it may not match the complexity and nuance of certain medical scenarios. This study evaluates the accuracy of ChatGPT 3.5 and 4 in providing recommendations regarding the management of postprostatectomy urinary incontinence (PPUI), considering The Incontinence After Prostate Treatment: AUA/SUFU Guideline as the best practice benchmark. MATERIALS AND METHODS A set of questions based on the AUA/SUFU Guideline was prepared. Queries included 10 conceptual questions and 10 case-based questions. All questions were open and entered into the ChatGPT with a recommendation to limit the answer to 200 words, for greater objectivity. Responses were graded as correct (1 point); partially correct (0.5 point), or incorrect (0 point). Performances of versions 3.5 and 4 of ChatGPT were analyzed overall and separately for the conceptual and the case-based questions. RESULTS ChatGPT 3.5 scored 11.5 out of 20 points (57.5% accuracy), while ChatGPT 4 scored 18 (90.0%; p = 0.031). In the conceptual questions, ChatGPT 3.5 provided accurate answers to six questions along with one partially correct response and three incorrect answers, with a final score of 6.5. In contrast, ChatGPT 4 provided correct answers to eight questions and partially correct answers to two questions, scoring 9.0. In the case-based questions, ChatGPT 3.5 scored 5.0, while ChatGPT 4 scored 9.0. The domains where ChatGPT performed worst were evaluation, treatment options, surgical complications, and special situations. CONCLUSION ChatGPT 4 demonstrated superior performance compared to ChatGPT 3.5 in providing recommendations for the management of PPUI, using the AUA/SUFU Guideline as a benchmark. Continuous monitoring is essential for evaluating the development and precision of AI-generated medical information.
Collapse
Affiliation(s)
- Vicktor B P Pinto
- Division of Urology, University of Sao Paulo School of Medicine, Sao Paulo, Brazil
| | - Matheus F de Azevedo
- Division of Urology, University of Sao Paulo School of Medicine, Sao Paulo, Brazil
| | - Marcelo L Wroclawski
- Division of Urology, ABC Medical School, Sao Paulo, Brazil
- Department of Urology, Albert Einstein Jewish Hospital, Sao Paulo, Brazil
- Department of Urologic Oncology, BP-a Beneficência Portuguesa de São Paulo, Sao Paulo, Brazil
| | - Guilherme Gentile
- Division of Urology, University of Sao Paulo School of Medicine, Sao Paulo, Brazil
| | - Vinicius L M Jesus
- Division of Urology, University of Sao Paulo School of Medicine, Sao Paulo, Brazil
| | | | - William C Nahas
- Division of Urology, University of Sao Paulo School of Medicine, Sao Paulo, Brazil
| | - Carlos A R Sacomani
- Innovation and Information Technology Sector, AC Camargo Cancer Hospital, Sao Paulo, Brazil
| | - Jaspreet S Sandhu
- Department of Surgery/Urology, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Cristiano M Gomes
- Division of Urology, University of Sao Paulo School of Medicine, Sao Paulo, Brazil
| |
Collapse
|
23
|
May M, Körner-Riffard K, Kollitsch L, Burger M, Brookman-May SD, Rauchenwald M, Marszalek M, Eredics K. Evaluating the Efficacy of AI Chatbots as Tutors in Urology: A Comparative Analysis of Responses to the 2022 In-Service Assessment of the European Board of Urology. Urol Int 2024; 108:359-366. [PMID: 38555637 PMCID: PMC11305516 DOI: 10.1159/000537854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 01/17/2024] [Indexed: 04/02/2024]
Abstract
INTRODUCTION This study assessed the potential of large language models (LLMs) as educational tools by evaluating their accuracy in answering questions across urological subtopics. METHODS Three LLMs (ChatGPT-3.5, ChatGPT-4, and Bing AI) were examined in two testing rounds, separated by 48 h, using 100 Multiple-Choice Questions (MCQs) from the 2022 European Board of Urology (EBU) In-Service Assessment (ISA), covering five different subtopics. The correct answer was defined as "formal accuracy" (FA) representing the designated single best answer (SBA) among four options. Alternative answers selected from LLMs, which may not necessarily be the SBA but are still deemed correct, were labeled as "extended accuracy" (EA). Their capacity to enhance the overall accuracy rate when combined with FA was examined. RESULTS In two rounds of testing, the FA scores were achieved as follows: ChatGPT-3.5: 58% and 62%, ChatGPT-4: 63% and 77%, and BING AI: 81% and 73%. The incorporation of EA did not yield a significant enhancement in overall performance. The achieved gains for ChatGPT-3.5, ChatGPT-4, and BING AI were as a result 7% and 5%, 5% and 2%, and 3% and 1%, respectively (p > 0.3). Within urological subtopics, LLMs showcased best performance in Pediatrics/Congenital and comparatively less effectiveness in Functional/BPS/Incontinence. CONCLUSION LLMs exhibit suboptimal urology knowledge and unsatisfactory proficiency for educational purposes. The overall accuracy did not significantly improve when combining EA to FA. The error rates remained high ranging from 16 to 35%. Proficiency levels vary substantially across subtopics. Further development of medicine-specific LLMs is required before integration into urological training programs.
Collapse
Affiliation(s)
- Matthias May
- Department of Urology, St. Elisabeth Hospital Straubing, Brothers of Mercy Hospital, Straubing, Germany
| | - Katharina Körner-Riffard
- Department of Urology, Caritas St. Josef Medical Centre, University of Regensburg, Regensburg, Germany
| | - Lisa Kollitsch
- Department of Urology and Andrology, Klinik Donaustadt, Vienna, Austria
| | - Maximilian Burger
- Department of Urology, Caritas St. Josef Medical Centre, University of Regensburg, Regensburg, Germany
| | - Sabine D Brookman-May
- Department of Urology, University of Munich, LMU, Munich, Germany
- Johnson and Johnson Innovative Medicine, Research and Development, Spring House, Pennsylvania, USA
| | - Michael Rauchenwald
- Department of Urology and Andrology, Klinik Donaustadt, Vienna, Austria
- European Board of Urology, Arnhem, The Netherlands
| | - Martin Marszalek
- Department of Urology and Andrology, Klinik Donaustadt, Vienna, Austria
| | - Klaus Eredics
- Department of Urology and Andrology, Klinik Donaustadt, Vienna, Austria,
- Department of Urology, Paracelsus Medical University, Salzburg, Austria,
| |
Collapse
|
24
|
Huang Y, Wu R, He J, Xiang Y. Evaluating ChatGPT-4.0's data analytic proficiency in epidemiological studies: A comparative analysis with SAS, SPSS, and R. J Glob Health 2024; 14:04070. [PMID: 38547497 PMCID: PMC10978058 DOI: 10.7189/jogh.14.04070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/02/2024] Open
Abstract
Background OpenAI's Chat Generative Pre-trained Transformer 4.0 (ChatGPT-4), an emerging artificial intelligence (AI)-based large language model (LLM), has been receiving increasing attention from the medical research community for its innovative 'Data Analyst' feature. We aimed to compare the capabilities of ChatGPT-4 against traditional biostatistical software (i.e. SAS, SPSS, R) in statistically analysing epidemiological research data. Methods We used a data set from the China Health and Nutrition Survey, comprising 9317 participants and 29 variables (e.g. gender, age, educational level, marital status, income, occupation, weekly working hours, survival status). Two researchers independently evaluated the data analysis capabilities of GPT-4's 'Data Analyst' feature against SAS, SPSS, and R across three commonly used epidemiological analysis methods: Descriptive statistics, intergroup analysis, and correlation analysis. We used an internally developed evaluation scale to assess and compare the consistency of results, analytical efficiency of coding or operations, user-friendliness, and overall performance between ChatGPT-4, SAS, SPSS, and R. Results In descriptive statistics, ChatGPT-4 showed high consistency of results, greater analytical efficiency of code or operations, and more intuitive user-friendliness compared to SAS, SPSS, and R. In intergroup comparisons and correlational analyses, despite minor discrepancies in statistical outcomes for certain analysis tasks with SAS, SPSS, and R, ChatGPT-4 maintained high analytical efficiency and exceptional user-friendliness. Thus, employing ChatGPT-4 can significantly lower the operational threshold for conducting epidemiological data analysis while maintaining consistency with traditional biostatistical software's outcome, requiring only specific, clear analysis instructions without any additional operations or code writing. Conclusions We found ChatGPT-4 to be a powerful auxiliary tool for statistical analysis in epidemiological research. However, it showed limitations in result consistency and in applying more advanced statistical methods. Therefore, we advocate for the use of ChatGPT-4 in supporting researchers with intermediate experience in data analysis. With AI technologies like LLMs advancing rapidly, their integration with data analysis platforms promises to lower operational barriers, thereby enabling researchers to dedicate greater focus to the nuanced interpretation of analysis results. This development is likely to significantly advance epidemiological and medical research.
Collapse
Affiliation(s)
- Yeen Huang
- School of Public Health and Emergency Management, Southern University of Science and Technology, Shenzhen, Guangdong, China
| | - Ruipeng Wu
- Key Laboratory for Molecular Genetic Mechanisms and Intervention Research, On High Altitude Disease of Tibet Autonomous Region, School of Medicine, Xizang Minzu University, Xianyang, Xizang, China
- Key Laboratory of High Altitude Hypoxia Environment and Life Health, School of Medicine, Xizang Minzu University, Xianyang, Xizang, China
- Key Laboratory of Environmental Medicine and Engineering of Ministry of Education, Department of Nutrition and Food Hygiene, School of Public Health, Southeast University, Nanjing, Jiangsu, China
| | - Juntao He
- Physical and Chemical Testing Institute, Shenzhen Prevention and Treatment Center for Occupational Diseases, Shenzhen, Guangdong, China
| | - Yingping Xiang
- Occupational Hazard Assessment Institute, Shenzhen Prevention and Treatment Center for Occupational Diseases, Shenzhen, Guangdong, China
| |
Collapse
|
25
|
Wu RC, Li DX, Feng DC. Re: Michael Eppler, Conner Ganjavi, Lorenzo Storino Ramacciotti, et al. Awareness and Use of ChatGPT and Large Language Models: A Prospective Cross-sectional Global Survey in Urology. Eur Urol. 2024;85:146-53. Eur Urol 2024; 85:e87-e88. [PMID: 38151444 DOI: 10.1016/j.eururo.2023.11.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Accepted: 11/23/2023] [Indexed: 12/29/2023]
Affiliation(s)
- Rui-Cheng Wu
- Department of Urology, Institute of Urology, West China Hospital, Sichuan University, Chengdu, China
| | - Deng-Xiong Li
- Department of Urology, Institute of Urology, West China Hospital, Sichuan University, Chengdu, China
| | - De-Chao Feng
- Department of Urology, Institute of Urology, West China Hospital, Sichuan University, Chengdu, China.
| |
Collapse
|
26
|
Zhao Z, Li Z, Yu N. Re: Michael Eppler, Conner Ganjavi, Lorenzo Storino Ramacciotti, et al. Awareness and Use of ChatGPT and Large Language Models: A Prospective Cross-sectional Global Survey in Urology. Eur Urol. 2024;85:146-53. Eur Urol 2024; 85:e83-e84. [PMID: 38143217 DOI: 10.1016/j.eururo.2023.12.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 12/01/2023] [Indexed: 12/26/2023]
Affiliation(s)
- Zhongwei Zhao
- Department of Urology, Qilu Hospital of Shandong University, Jinan, China
| | - Zhenye Li
- Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Nengwang Yu
- Department of Urology, Qilu Hospital of Shandong University, Jinan, China.
| |
Collapse
|
27
|
Eppler M, Ganjavi C, Abreu A, Gill I, Cacciamani GE. Reply to Rui-Cheng Wu, Deng-Xiong Li, and De-Chao Feng's Letter to the Editor re: Michael Eppler, Conner Ganjavi, Lorenzo Storino Ramacciotti, et al. Awareness and Use of ChatGPT and Large Language Models: A Prospective Cross-sectional Global Survey in Urology. Eur Urol. 2024;85:146-53. Eur Urol 2024; 85:e85-e86. [PMID: 38182492 DOI: 10.1016/j.eururo.2023.12.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 12/13/2023] [Indexed: 01/07/2024]
Affiliation(s)
- Michael Eppler
- USC Institute of Urology, Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; AI Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA.
| | - Conner Ganjavi
- USC Institute of Urology, Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; AI Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Andre Abreu
- USC Institute of Urology, Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; AI Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Inderbir Gill
- USC Institute of Urology, Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; AI Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Giovanni E Cacciamani
- USC Institute of Urology, Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; AI Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
28
|
Miao J, Thongprayoon C, Suppadungsuk S, Garcia Valencia OA, Qureshi F, Cheungpasitporn W. Ethical Dilemmas in Using AI for Academic Writing and an Example Framework for Peer Review in Nephrology Academia: A Narrative Review. Clin Pract 2023; 14:89-105. [PMID: 38248432 PMCID: PMC10801601 DOI: 10.3390/clinpract14010008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 12/23/2023] [Accepted: 12/28/2023] [Indexed: 01/23/2024] Open
Abstract
The emergence of artificial intelligence (AI) has greatly propelled progress across various sectors including the field of nephrology academia. However, this advancement has also given rise to ethical challenges, notably in scholarly writing. AI's capacity to automate labor-intensive tasks like literature reviews and data analysis has created opportunities for unethical practices, with scholars incorporating AI-generated text into their manuscripts, potentially undermining academic integrity. This situation gives rise to a range of ethical dilemmas that not only question the authenticity of contemporary academic endeavors but also challenge the credibility of the peer-review process and the integrity of editorial oversight. Instances of this misconduct are highlighted, spanning from lesser-known journals to reputable ones, and even infiltrating graduate theses and grant applications. This subtle AI intrusion hints at a systemic vulnerability within the academic publishing domain, exacerbated by the publish-or-perish mentality. The solutions aimed at mitigating the unethical employment of AI in academia include the adoption of sophisticated AI-driven plagiarism detection systems, a robust augmentation of the peer-review process with an "AI scrutiny" phase, comprehensive training for academics on ethical AI usage, and the promotion of a culture of transparency that acknowledges AI's role in research. This review underscores the pressing need for collaborative efforts among academic nephrology institutions to foster an environment of ethical AI application, thus preserving the esteemed academic integrity in the face of rapid technological advancements. It also makes a plea for rigorous research to assess the extent of AI's involvement in the academic literature, evaluate the effectiveness of AI-enhanced plagiarism detection tools, and understand the long-term consequences of AI utilization on academic integrity. An example framework has been proposed to outline a comprehensive approach to integrating AI into Nephrology academic writing and peer review. Using proactive initiatives and rigorous evaluations, a harmonious environment that harnesses AI's capabilities while upholding stringent academic standards can be envisioned.
Collapse
Affiliation(s)
- Jing Miao
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (S.S.); (O.A.G.V.); (F.Q.); (W.C.)
| | - Charat Thongprayoon
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (S.S.); (O.A.G.V.); (F.Q.); (W.C.)
| | - Supawadee Suppadungsuk
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (S.S.); (O.A.G.V.); (F.Q.); (W.C.)
- Chakri Naruebodindra Medical Institute, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bang Phli 10540, Samut Prakan, Thailand
| | - Oscar A. Garcia Valencia
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (S.S.); (O.A.G.V.); (F.Q.); (W.C.)
| | - Fawad Qureshi
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (S.S.); (O.A.G.V.); (F.Q.); (W.C.)
| | - Wisit Cheungpasitporn
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (S.S.); (O.A.G.V.); (F.Q.); (W.C.)
| |
Collapse
|