1
|
Song Y, Xu T. Letter to the editor for the article "Performance of ChatGPT-3.5 and ChatGPT-4 on the European Board of Urology (EBU) exams: a comparative analysis". World J Urol 2024; 42:555. [PMID: 39361038 DOI: 10.1007/s00345-024-05256-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2024] [Accepted: 09/01/2024] [Indexed: 10/05/2024] Open
Affiliation(s)
- Yuxuan Song
- Department of Urology, Peking University People's Hospital, Beijing, 100044, China
| | - Tao Xu
- Department of Urology, Peking University People's Hospital, Beijing, 100044, China.
| |
Collapse
|
2
|
Wu Z, Gan W, Xue Z, Ni Z, Zheng X, Zhang Y. Performance of ChatGPT on Nursing Licensure Examinations in the United States and China: Cross-Sectional Study. JMIR MEDICAL EDUCATION 2024; 10:e52746. [PMID: 39363539 PMCID: PMC11466054 DOI: 10.2196/52746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 06/12/2024] [Accepted: 06/15/2024] [Indexed: 10/05/2024]
Abstract
Background The creation of large language models (LLMs) such as ChatGPT is an important step in the development of artificial intelligence, which shows great potential in medical education due to its powerful language understanding and generative capabilities. The purpose of this study was to quantitatively evaluate and comprehensively analyze ChatGPT's performance in handling questions for the National Nursing Licensure Examination (NNLE) in China and the United States, including the National Council Licensure Examination for Registered Nurses (NCLEX-RN) and the NNLE. Objective This study aims to examine how well LLMs respond to the NCLEX-RN and the NNLE multiple-choice questions (MCQs) in various language inputs. To evaluate whether LLMs can be used as multilingual learning assistance for nursing, and to assess whether they possess a repository of professional knowledge applicable to clinical nursing practice. Methods First, we compiled 150 NCLEX-RN Practical MCQs, 240 NNLE Theoretical MCQs, and 240 NNLE Practical MCQs. Then, the translation function of ChatGPT 3.5 was used to translate NCLEX-RN questions from English to Chinese and NNLE questions from Chinese to English. Finally, the original version and the translated version of the MCQs were inputted into ChatGPT 4.0, ChatGPT 3.5, and Google Bard. Different LLMs were compared according to the accuracy rate, and the differences between different language inputs were compared. Results The accuracy rates of ChatGPT 4.0 for NCLEX-RN practical questions and Chinese-translated NCLEX-RN practical questions were 88.7% (133/150) and 79.3% (119/150), respectively. Despite the statistical significance of the difference (P=.03), the correct rate was generally satisfactory. Around 71.9% (169/235) of NNLE Theoretical MCQs and 69.1% (161/233) of NNLE Practical MCQs were correctly answered by ChatGPT 4.0. The accuracy of ChatGPT 4.0 in processing NNLE Theoretical MCQs and NNLE Practical MCQs translated into English was 71.5% (168/235; P=.92) and 67.8% (158/233; P=.77), respectively, and there was no statistically significant difference between the results of text input in different languages. ChatGPT 3.5 (NCLEX-RN P=.003, NNLE Theoretical P<.001, NNLE Practical P=.12) and Google Bard (NCLEX-RN P<.001, NNLE Theoretical P<.001, NNLE Practical P<.001) had lower accuracy rates for nursing-related MCQs than ChatGPT 4.0 in English input. English accuracy was higher when compared with ChatGPT 3.5's Chinese input, and the difference was statistically significant (NCLEX-RN P=.02, NNLE Practical P=.02). Whether submitted in Chinese or English, the MCQs from the NCLEX-RN and NNLE demonstrated that ChatGPT 4.0 had the highest number of unique correct responses and the lowest number of unique incorrect responses among the 3 LLMs. Conclusions This study, focusing on 618 nursing MCQs including NCLEX-RN and NNLE exams, found that ChatGPT 4.0 outperformed ChatGPT 3.5 and Google Bard in accuracy. It excelled in processing English and Chinese inputs, underscoring its potential as a valuable tool in nursing education and clinical decision-making.
Collapse
Affiliation(s)
- Zelin Wu
- Department of Bone and Joint Surgery and Sports Medicine Center, The First Affiliated Hospital, Guangzhou, China
| | - Wenyi Gan
- Department of Joint Surgery and Sports Medicine, Zhuhai People’s Hospital, Zhuhai City, China
| | - Zhaowen Xue
- Department of Bone and Joint Surgery and Sports Medicine Center, The First Affiliated Hospital, Guangzhou, China
| | - Zhengxin Ni
- School of Nursing, Yangzhou University, Yangzhou, China
| | - Xiaofei Zheng
- Department of Bone and Joint Surgery and Sports Medicine Center, The First Affiliated Hospital, Guangzhou, China
| | - Yiyi Zhang
- Department of Bone and Joint Surgery and Sports Medicine Center, The First Affiliated Hospital, Guangzhou, China
| |
Collapse
|
3
|
Demirci A. A Comparison of ChatGPT and Human Questionnaire Evaluations of the Urological Cancer Videos Most Watched on YouTube. Clin Genitourin Cancer 2024; 22:102145. [PMID: 39033711 DOI: 10.1016/j.clgc.2024.102145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 06/21/2024] [Accepted: 06/22/2024] [Indexed: 07/23/2024]
Abstract
AIM To examine the reliability of ChatGPT in evaluating the quality of medical content of the most watched videos related to urological cancers on YouTube. MATERIAL AND METHODS In March 2024 a playlist was created of the first 20 videos watched on YouTube for each type of urological cancer. The video texts were evaluated by ChatGPT and by a urology specialist using the DISCERN-5 and Global Quality Scale (GQS) questionnaires. The results obtained were compared using the Kruskal-Wallis test. RESULTS For the prostate, bladder, renal, and testicular cancer videos, the median (IQR) DISCERN-5 scores given by the human evaluator and ChatGPT were (Human: 4 [1], 3 [0], 3 [2], 3 [1], P = .11; ChatGPT: 3 [1.75], 3 [1], 3 [2], 3 [0], P = .4, respectively) and the GQS scores were (Human: 4 [1.75], 3 [0.75], 3.5 [2], 3.5 [1], P = .12; ChatGPT: 4 [1], 3 [0.75], 3 [1], 3.5 [1], P = .1, respectively), with no significant difference determined between the scores. The repeatability of the ChatGPT responses was determined to be similar at 25 % for prostate cancer, 30 % for bladder cancer, 30 % for renal cancer, and 35 % for testicular cancer (P = .92). No statistically significant difference was determined between the median (IQR) DISCERN-5 and GQS scores given by humans and ChatGPT for the content of videos about prostate, bladder, renal, and testicular cancer (P > .05). CONCLUSION Although ChatGPT is successful in evaluating the medical quality of video texts, the results should be evaluated with caution as the repeatability of the results is low.
Collapse
Affiliation(s)
- Aykut Demirci
- Department of Urology, Dr. Abdurrahman Yurtaslan Ankara Oncology Training and Research Hospital, University of Health Sciences, Ankara, Turkey.
| |
Collapse
|
4
|
Halawani A, Almehmadi SG, Alhubaishy BA, Alnefaie ZA, Hasan MN. Empowering patients: how accurate and readable are large language models in renal cancer education. Front Oncol 2024; 14:1457516. [PMID: 39391252 PMCID: PMC11464325 DOI: 10.3389/fonc.2024.1457516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Accepted: 09/09/2024] [Indexed: 10/12/2024] Open
Abstract
Background The incorporation of Artificial Intelligence (AI) into healthcare sector has fundamentally transformed patient care paradigms, particularly through the creation of patient education materials (PEMs) tailored to individual needs. This Study aims to assess the precision and readability AI-generated information on kidney cancer using ChatGPT 4.0, Gemini AI, and Perplexity AI., comparing these outputs to PEMs provided by the American Urological Association (AUA) and the European Association of Urology (EAU). The objective is to guide physicians in directing patients to accurate and understandable resources. Methods PEMs published by AUA and EAU were collected and categorized. kidney cancer-related queries, identified via Google Trends (GT), were input into CahtGPT-4.0, Gemini AI, and Perplexity AI. Four independent reviewers assessed the AI outputs for accuracy grounded on five distinct categories, employing a 5-point Likert scale. A readability evaluation was conducted utilizing established formulas, including Gunning Fog Index (GFI), Simple Measure of Gobbledygook (SMOG), and Flesch-Kincaid Grade Formula (FKGL). AI chatbots were then tasked with simplifying their outputs to achieve a sixth-grade reading level. Results The PEM published by the AUA was the most readable with a mean readability score of 9.84 ± 1.2, in contrast to EAU (11.88 ± 1.11), ChatGPT-4.0 (11.03 ± 1.76), Perplexity AI (12.66 ± 1.83), and Gemini AI (10.83 ± 2.31). The Chatbots demonstrated the capability to simplify text lower grade levels upon request, with ChatGPT-4.0 achieving a readability grade level ranging from 5.76 to 9.19, Perplexity AI from 7.33 to 8.45, Gemini AI from 6.43 to 8.43. While official PEMS were considered accurate, the LLMs generated outputs exhibited an overall high level of accuracy with minor detail omission and some information inaccuracies. Information related to kidney cancer treatment was found to be the least accurate among the evaluated categories. Conclusion Although the PEM published by AUA being the most readable, both authoritative PEMs and Large Language Models (LLMs) generated outputs exceeded the recommended readability threshold for general population. AI Chatbots can simplify their outputs when explicitly instructed. However, notwithstanding their accuracy, LLMs-generated outputs are susceptible to detail omission and inaccuracies. The variability in AI performance necessitates cautious use as an adjunctive tool in patient education.
Collapse
Affiliation(s)
| | | | | | - Ziyad A. Alnefaie
- Department of Urology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Mudhar N. Hasan
- Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, United Arab Emirates
- Department of Urology, Mediclinic City Hospital, Dubai, United Arab Emirates
| |
Collapse
|
5
|
Hashemi S, Karbalaei M, Keikha M. Comments on "Performance of ChatGPT in Answering Clinical Questions on the Practical Guideline of Blepharoptosis". Aesthetic Plast Surg 2024:10.1007/s00266-024-04320-7. [PMID: 39120728 DOI: 10.1007/s00266-024-04320-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Accepted: 08/01/2024] [Indexed: 08/10/2024]
Affiliation(s)
- Saleh Hashemi
- Department of Medical Genetics and molecular genomics, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Mohsen Karbalaei
- Department of Microbiology and Virology, School of Medicine, Jiroft University of Medical Sciences, Jiroft, Iran
- Bio Environmental Health Hazards Research Center, Jiroft University of Medical Sciences, Jiroft, Iran
| | - Masoud Keikha
- Department of Microbiology and Virology, School of Medicine, Iranshahr University of Medical Sciences, Iranshahr, Iran.
| |
Collapse
|
6
|
Şahin B, Emre Genç Y, Doğan K, Emre Şener T, Şekerci ÇA, Tanıdır Y, Yücel S, Tarcan T, Çam HK. Evaluating the Performance of ChatGPT in Urology: A Comparative Study of Knowledge Interpretation and Patient Guidance. J Endourol 2024; 38:799-808. [PMID: 38815140 DOI: 10.1089/end.2023.0413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024] Open
Abstract
Background/Aim: To evaluate the performance of Chat Generative Pre-trained Transformer (ChatGPT), a large language model trained by Open artificial intelligence. Materials and Methods: This study has three main steps to evaluate the effectiveness of ChatGPT in the urologic field. The first step involved 35 questions from our institution's experts, who have at least 10 years of experience in their fields. The responses of ChatGPT versions were qualitatively compared with the responses of urology residents to the same questions. The second step assesses the reliability of ChatGPT versions in answering current debate topics. The third step was to assess the reliability of ChatGPT versions in providing medical recommendations and directives to patients' commonly asked questions during the outpatient and inpatient clinic. Results: In the first step, version 4 provided correct answers to 25 questions out of 35 while version 3.5 provided only 19 (71.4% vs 54%). It was observed that residents in their last year of education in our clinic also provided a mean of 25 correct answers, and 4th year residents provided a mean of 19.3 correct responses. The second step involved evaluating the response of both versions to debate situations in urology, and it was found that both versions provided variable and inappropriate results. In the last step, both versions had a similar success rate in providing recommendations and guidance to patients based on expert ratings. Conclusion: The difference between the two versions of the 35 questions in the first step of the study was thought to be due to the improvement of ChatGPT's literature and data synthesis abilities. It may be a logical approach to use ChatGPT versions to inform the nonhealth care providers' questions with quick and safe answers but should not be used to as a diagnostic tool or make a choice among different treatment modalities.
Collapse
Affiliation(s)
- Bahadır Şahin
- Department of Urology, Marmara University, School of Medicine, Marmara University, Istanbul, Turkey
| | - Yunus Emre Genç
- Department of Urology, Marmara University, School of Medicine, Marmara University, Istanbul, Turkey
| | - Kader Doğan
- Department of Urology, Marmara University, School of Medicine, Marmara University, Istanbul, Turkey
| | - Tarık Emre Şener
- Department of Urology, Marmara University, School of Medicine, Marmara University, Istanbul, Turkey
| | - Çağrı Akın Şekerci
- Department of Urology, Marmara University, School of Medicine, Marmara University, Istanbul, Turkey
| | - Yılören Tanıdır
- Department of Urology, Marmara University, School of Medicine, Marmara University, Istanbul, Turkey
| | - Selçuk Yücel
- Department of Urology, Marmara University, School of Medicine, Marmara University, Istanbul, Turkey
| | - Tufan Tarcan
- Department of Urology, Marmara University, School of Medicine, Marmara University, Istanbul, Turkey
| | - Haydar Kamil Çam
- Department of Urology, Marmara University, School of Medicine, Marmara University, Istanbul, Turkey
| |
Collapse
|
7
|
Pompili D, Richa Y, Collins P, Richards H, Hennessey DB. Using artificial intelligence to generate medical literature for urology patients: a comparison of three different large language models. World J Urol 2024; 42:455. [PMID: 39073590 PMCID: PMC11286728 DOI: 10.1007/s00345-024-05146-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Accepted: 06/23/2024] [Indexed: 07/30/2024] Open
Abstract
PURPOSE Large language models (LLMs) are a form of artificial intelligence (AI) that uses deep learning techniques to understand, summarize and generate content. The potential benefits of LLMs in healthcare is predicted to be immense. The objective of this study was to examine the quality of patient information leaflets (PILs) produced by 3 LLMs on urological topics. METHODS Prompts were created to generate PILs from 3 LLMs: ChatGPT-4, PaLM 2 (Google Bard) and Llama 2 (Meta) across four urology topics (circumcision, nephrectomy, overactive bladder syndrome, and transurethral resection of the prostate). PILs were evaluated using a quality assessment checklist. PIL readability was assessed by the Average Reading Level Consensus Calculator. RESULTS PILs generated by PaLM 2 had the highest overall average quality score (3.58), followed by Llama 2 (3.34) and ChatGPT-4 (3.08). PaLM 2 generated PILs were of the highest quality in all topics except TURP and was the only LLM to include images. Medical inaccuracies were present in all generated content including instances of significant error. Readability analysis identified PaLM 2 generated PILs as the simplest (age 14-15 average reading level). Llama 2 PILs were the most difficult (age 16-17 average). CONCLUSION While LLMs can generate PILs that may help reduce healthcare professional workload, generated content requires clinician input for accuracy and inclusion of health literacy aids, such as images. LLM-generated PILs were above the average reading level for adults, necessitating improvement in LLM algorithms and/or prompt design. How satisfied patients are to LLM-generated PILs remains to be evaluated.
Collapse
Affiliation(s)
- David Pompili
- School of Medicine, University College Cork, Cork, Ireland
| | - Yasmina Richa
- School of Medicine, University College Cork, Cork, Ireland
| | - Patrick Collins
- Department of Urology, Mercy University Hospital, Cork, Ireland
| | - Helen Richards
- School of Medicine, University College Cork, Cork, Ireland
- Department of Clinical Psychology, Mercy University Hospital, Cork, Ireland
| | - Derek B Hennessey
- School of Medicine, University College Cork, Cork, Ireland.
- Department of Urology, Mercy University Hospital, Cork, Ireland.
| |
Collapse
|
8
|
Schoch J, Schmelz HU, Strauch A, Borgmann H, Nestler T. Performance of ChatGPT-3.5 and ChatGPT-4 on the European Board of Urology (EBU) exams: a comparative analysis. World J Urol 2024; 42:445. [PMID: 39060792 DOI: 10.1007/s00345-024-05137-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Accepted: 06/17/2024] [Indexed: 07/28/2024] Open
Abstract
BACKGROUND AND OBJECTIVE In the transformative era of artificial intelligence, its integration into various spheres, especially healthcare, has been promising. The objective of this study was to analyze the performance of ChatGPT, as open-source Large Language Model (LLM), in its different versions on the recent European Board of Urology (EBU) in-service assessment questions. DESIGN AND SETTING We asked multiple choice questions of the official EBU test books to ChatGPT-3.5 and ChatGPT-4 for the following exams: exam 1 (2017-2018), exam 2 (2019-2020) and exam 3 (2021-2022). Exams were passed with ≥60% correct answers. RESULTS ChatGPT-4 provided significantly more correct answers in all exams compared to the prior version 3.5 (exam 1: ChatGPT-3.5 64.3% vs. ChatGPT-4 81.6%; exam 2: 64.5% vs. 80.5%; exam 3: 56% vs. 77%, p < 0.001, respectively). Test exam 3 was the only exam ChatGPT-3.5 did not pass. Within the different subtopics, there were no significant differences of provided correct answers by ChatGPT-3.5. Concerning ChatGPT-4, the percentage in test exam 3 was significantly decreased in the subtopics Incontinence (exam 1: 81.6% vs. exam 3: 53.6%; p = 0.026) and Transplantation (exam 1: 77.8% vs. exam 3: 0%; p = 0.020). CONCLUSION Our findings indicate that ChatGPT, especially ChatGPT-4, has the general ability to answer complex medical questions and might pass FEBU exams. Nevertheless, there is still the indispensable need for human validation of LLM answers, especially concerning health care issues.
Collapse
Affiliation(s)
- Justine Schoch
- Department of Urology, Federal Armed Services Hospital Koblenz, Ruebenacherstrasse 170, 56072, Koblenz, Germany
| | - H-U Schmelz
- Department of Urology, Federal Armed Services Hospital Koblenz, Ruebenacherstrasse 170, 56072, Koblenz, Germany
| | - Angelina Strauch
- Department of Urology, Federal Armed Services Hospital Koblenz, Ruebenacherstrasse 170, 56072, Koblenz, Germany
| | - Hendrik Borgmann
- Department of Urology, Faculty of Health Sciences Brandenburg, Brandenburg a.d. Havel, Germany
| | - Tim Nestler
- Department of Urology, Federal Armed Services Hospital Koblenz, Ruebenacherstrasse 170, 56072, Koblenz, Germany.
- Department of Urology, University Hospital of Cologne, Cologne, Germany.
| |
Collapse
|
9
|
Wu H, Sun Z, Guo Q, Li C. The rapid growth of ChatGPT-related publications: A call for international guidelines. Asian J Surg 2024:S1015-9584(24)01246-6. [PMID: 38942627 DOI: 10.1016/j.asjsur.2024.06.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Accepted: 06/13/2024] [Indexed: 06/30/2024] Open
Affiliation(s)
- Haiyang Wu
- Department of Orthopaedics, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China; Department of Clinical College of Neurology, Neurosurgery and Neurorehabilitation, Tianjin Medical University, Tianjin, China.
| | - Zaijie Sun
- Department of Orthopaedics, Xiangyang Central Hospital, Affiliated Hospital of Hubei University of Arts and Science, Xiangyang, China
| | - Qiang Guo
- Department of Spine and Joint Surgery, Tianjin Baodi Hospital, Baodi Clinical College of Tianjin Medical University, Tianjin, China
| | - Cheng Li
- Department of Spine Surgery, Wangjing Hospital, China Academy of Chinese Medical Sciences, Beijing, China; Center for Musculoskeletal Surgery (CMSC), Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt University of Berlin, And Berlin Institute of Health, Berlin, Germany.
| |
Collapse
|
10
|
Zhu L, Rong Y, McGee LA, Rwigema JCM, Patel SH. Testing and Validation of a Custom Retrained Large Language Model for the Supportive Care of HN Patients with External Knowledge Base. Cancers (Basel) 2024; 16:2311. [PMID: 39001375 PMCID: PMC11240646 DOI: 10.3390/cancers16132311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 06/17/2024] [Accepted: 06/18/2024] [Indexed: 07/16/2024] Open
Abstract
PURPOSE This study aimed to develop a retrained large language model (LLM) tailored to the needs of HN cancer patients treated with radiotherapy, with emphasis on symptom management and survivorship care. METHODS A comprehensive external database was curated for training ChatGPT-4, integrating expert-identified consensus guidelines on supportive care for HN patients and correspondences from physicians and nurses within our institution's electronic medical records for 90 HN patients. The performance of our model was evaluated using 20 patient post-treatment inquiries that were then assessed by three Board certified radiation oncologists (RadOncs). The rating of the model was assessed on a scale of 1 (strongly disagree) to 5 (strongly agree) based on accuracy, clarity of response, completeness s, and relevance. RESULTS The average scores for the 20 tested questions were 4.25 for accuracy, 4.35 for clarity, 4.22 for completeness, and 4.32 for relevance, on a 5-point scale. Overall, 91.67% (220 out of 240) of assessments received scores of 3 or higher, and 83.33% (200 out of 240) received scores of 4 or higher. CONCLUSION The custom-trained model demonstrates high accuracy in providing support to HN patients offering evidence-based information and guidance on their symptom management and survivorship care.
Collapse
Affiliation(s)
| | - Yi Rong
- Correspondence: (Y.R.); (S.H.P.)
| | | | | | - Samir H. Patel
- Department of Radiation Oncology, Mayo Clinic, Phoenix, AZ 85054, USA; (L.Z.); (L.A.M.); (J.-C.M.R.)
| |
Collapse
|
11
|
Cakir H, Caglar U, Sekkeli S, Zerdali E, Sarilar O, Yildiz O, Ozgor F. Evaluating ChatGPT ability to answer urinary tract Infection-Related questions. Infect Dis Now 2024; 54:104884. [PMID: 38460761 DOI: 10.1016/j.idnow.2024.104884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 02/20/2024] [Accepted: 03/05/2024] [Indexed: 03/11/2024]
Abstract
INTRODUCTION For the first time, the accuracy and proficiency of ChatGPT answers on urogenital tract infection (UTIs) were evaluated. METHODS The study aimed to create two lists of questions: frequently asked questions (FAQs, public-based inquiries) on relevant topics, and questions based on guideline information (guideline-based inquiries). ChatGPT responses to FAQs and scientific questions were scored by two urologists and an infectious disease specialist. Quality and reliability of all ChatGPT answers were checked using the Global Quality Score (GQS). The reproducibility of ChatGPT answers was analyzed by asking each question twice. RESULTS All in all, 96.2 % of FAQs (75/78 inquiries) related to UTIs were correctly and adequately answered by ChatGPT, and scored GQS 5. None of the ChatGPT answers were classified as GQS 2 and GQS 1. Moreover, FAQs about cystitis, urethritis, and epididymo-orchitis were answered by ChatGPT with 100 % accuracy (GQS 5). ChatGPT answers for EAU urological infections guidelines showed that 61 (89.7 %), 5 (7.4 %), and 2 (2.9 %) ChatGPT responses were scored GQS 5, GQS 4, and GQS 3, respectively. None of the ChatGPT responses for EAU urological infections guidelines were categorized as GQS 2 and GQS 1. Comparison of mean GQS values of ChatGPT answers for FAQs and EAU urological guideline questions showed that ChatGPT was similarly able to respond to both question groups (p = 0.168). The ChatGPT response reproducibility rate was highest for the FAQ subgroups of cystitis, urethritis, and epididymo-orchitis (100 % for each subgroup). CONCLUSION The present study showed that ChatGPT gave accurate and satisfactory answers for both public-based inquiries, and EAU urological infection guideline-based questions. Reproducibility of ChatGPT answers exceeded 90% for both FAQs and scientific questions.
Collapse
Affiliation(s)
- Hakan Cakir
- Department of Urology, Fulya Acibadem Hospital, Istanbul, Turkey
| | - Ufuk Caglar
- Department of Urology, Haseki Training and Research Hospital, Istanbul, Turkey
| | - Sami Sekkeli
- Department of Urology, Haseki Training and Research Hospital, Istanbul, Turkey.
| | - Esra Zerdali
- Department of Infectious Diseases and Clinical Microbiology, Haseki Training and Research Hospital, Istanbul, Turkey
| | - Omer Sarilar
- Department of Urology, Haseki Training and Research Hospital, Istanbul, Turkey
| | - Oguzhan Yildiz
- Department of Urology, Haseki Training and Research Hospital, Istanbul, Turkey
| | - Faruk Ozgor
- Department of Urology, Haseki Training and Research Hospital, Istanbul, Turkey
| |
Collapse
|
12
|
Perrot O, Schirmann A, Vidart A, Guillot-Tantay C, Izard V, Lebret T, Boillot B, Mesnard B, Lebacle C, Madec FX. Chatbots vs andrologists: Testing 25 clinical cases. THE FRENCH JOURNAL OF UROLOGY 2024; 34:102636. [PMID: 38599321 DOI: 10.1016/j.fjurol.2024.102636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 04/02/2024] [Indexed: 04/12/2024]
Abstract
OBJECTIVE AI-derived language models are booming, and their place in medicine is undefined. The aim of our study is to compare responses to andrology clinical cases, between chatbots and andrologists, to assess the reliability of these technologies. MATERIAL AND METHOD We analyzed the responses of 32 experts, 18 residents and three chatbots (ChatGPT v3.5, v4 and Bard) to 25 andrology clinical cases. Responses were assessed on a Likert scale ranging from 0 to 2 for each question (0-false response or no response; 1-partially correct response, 2- correct response), on the basis of the latest national or, in the absence of such, international recommendations. We compared the averages obtained for all cases by the different groups. RESULTS Experts obtained a higher mean score (m=11/12.4 σ=1.4) than ChatGPT v4 (m=10.7/12.4 σ=2.2, p=0.6475), ChatGPT v3.5 (m=9.5/12.4 σ=2.1, p=0.0062) and Bard (m=7.2/12.4 σ=3.3, p<0.0001). Residents obtained a mean score (m=9.4/12.4 σ=1.7) higher than Bard (m=7.2/12.4 σ=3.3, p=0.0053) but lower than ChatGPT v3.5 (m=9.5/12.4 σ=2.1, p=0.8393) and v4 (m=10.7/12.4 σ=2.2, p=0.0183) and experts (m=11.0/12.4 σ=1.4,p=0.0009). ChatGPT v4 performance (m=10.7 σ=2.2) was better than ChatGPT v3.5 (m=9.5, σ=2.1, p=0.0476) and Bard performance (m=7.2 σ=3.3, p<0.0001). CONCLUSION The use of chatbots in medicine could be relevant. More studies are needed to integrate them into clinical practice. LEVEL OF EVIDENCE: 4
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Cedric Lebacle
- Kremlin-Bicetre Hospital, urology department, Kremlin-Bicetre, France
| | | |
Collapse
|
13
|
Shah YB, Goldberg ZN, Harness ED, Nash DB. Charting a Path to the Quintuple Aim: Harnessing AI to Address Social Determinants of Health. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2024; 21:718. [PMID: 38928964 PMCID: PMC11203467 DOI: 10.3390/ijerph21060718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Revised: 05/29/2024] [Accepted: 05/31/2024] [Indexed: 06/28/2024]
Abstract
The Quintuple Aim seeks to improve healthcare by addressing social determinants of health (SDOHs), which are responsible for 70-80% of medical outcomes. SDOH-related concerns have traditionally been addressed through referrals to social workers and community-based organizations (CBOs), but these pathways have had limited success in connecting patients with resources. Given that health inequity is expected to cost the United States nearly USD 300 billion by 2050, new artificial intelligence (AI) technology may aid providers in addressing SDOH. In this commentary, we present our experience with using ChatGPT to obtain SDOH management recommendations for archetypal patients in Philadelphia, PA. ChatGPT identified relevant SDOH resources and provided contact information for local organizations. Future exploration could improve AI prompts and integrate AI into electronic medical records to provide healthcare providers with real-time SDOH recommendations during appointments.
Collapse
Affiliation(s)
- Yash B. Shah
- Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, PA 19107, USA (Z.N.G.)
| | - Zachary N. Goldberg
- Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, PA 19107, USA (Z.N.G.)
| | - Erika D. Harness
- Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, PA 19107, USA (Z.N.G.)
| | - David B. Nash
- Jefferson College of Population Health, Philadelphia, PA 19107, USA
| |
Collapse
|
14
|
Gwon YN, Kim JH, Chung HS, Jung EJ, Chun J, Lee S, Shim SR. The Use of Generative AI for Scientific Literature Searches for Systematic Reviews: ChatGPT and Microsoft Bing AI Performance Evaluation. JMIR Med Inform 2024; 12:e51187. [PMID: 38771247 PMCID: PMC11107769 DOI: 10.2196/51187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 03/31/2024] [Accepted: 04/04/2024] [Indexed: 05/22/2024] Open
Abstract
Background A large language model is a type of artificial intelligence (AI) model that opens up great possibilities for health care practice, research, and education, although scholars have emphasized the need to proactively address the issue of unvalidated and inaccurate information regarding its use. One of the best-known large language models is ChatGPT (OpenAI). It is believed to be of great help to medical research, as it facilitates more efficient data set analysis, code generation, and literature review, allowing researchers to focus on experimental design as well as drug discovery and development. Objective This study aims to explore the potential of ChatGPT as a real-time literature search tool for systematic reviews and clinical decision support systems, to enhance their efficiency and accuracy in health care settings. Methods The search results of a published systematic review by human experts on the treatment of Peyronie disease were selected as a benchmark, and the literature search formula of the study was applied to ChatGPT and Microsoft Bing AI as a comparison to human researchers. Peyronie disease typically presents with discomfort, curvature, or deformity of the penis in association with palpable plaques and erectile dysfunction. To evaluate the quality of individual studies derived from AI answers, we created a structured rating system based on bibliographic information related to the publications. We classified its answers into 4 grades if the title existed: A, B, C, and F. No grade was given for a fake title or no answer. Results From ChatGPT, 7 (0.5%) out of 1287 identified studies were directly relevant, whereas Bing AI resulted in 19 (40%) relevant studies out of 48, compared to the human benchmark of 24 studies. In the qualitative evaluation, ChatGPT had 7 grade A, 18 grade B, 167 grade C, and 211 grade F studies, and Bing AI had 19 grade A and 28 grade C studies. Conclusions This is the first study to compare AI and conventional human systematic review methods as a real-time literature collection tool for evidence-based medicine. The results suggest that the use of ChatGPT as a tool for real-time evidence generation is not yet accurate and feasible. Therefore, researchers should be cautious about using such AI. The limitations of this study using the generative pre-trained transformer model are that the search for research topics was not diverse and that it did not prevent the hallucination of generative AI. However, this study will serve as a standard for future studies by providing an index to verify the reliability and consistency of generative AI from a user's point of view. If the reliability and consistency of AI literature search services are verified, then the use of these technologies will help medical research greatly.
Collapse
Affiliation(s)
- Yong Nam Gwon
- Department of Urology, Soonchunhyang University College of Medicine, Soonchunhyang University Seoul Hospital, Seoul, Republic of Korea
| | - Jae Heon Kim
- Department of Urology, Soonchunhyang University College of Medicine, Soonchunhyang University Seoul Hospital, Seoul, Republic of Korea
| | - Hyun Soo Chung
- College of Medicine, Soonchunhyang University, Cheonan, Republic of Korea
| | - Eun Jee Jung
- College of Medicine, Soonchunhyang University, Cheonan, Republic of Korea
| | - Joey Chun
- Department of Urology, Soonchunhyang University College of Medicine, Soonchunhyang University Seoul Hospital, Seoul, Republic of Korea
- Cranbrook Kingswood Upper School, Bloomfield Hills, MI, United States
| | - Serin Lee
- Department of Urology, Soonchunhyang University College of Medicine, Soonchunhyang University Seoul Hospital, Seoul, Republic of Korea
- Department of Biochemistry, Case Western Reserve University, Cleveland, OH, United States
| | - Sung Ryul Shim
- Department of Biomedical Informatics, Konyang University College of Medicine, Daejeon, Republic of Korea
- Konyang Medical Data Research Group-KYMERA, Konyang University Hospital, Daejeon, Republic of Korea
| |
Collapse
|
15
|
Hershenhouse JS, Mokhtar D, Eppler MB, Rodler S, Storino Ramacciotti L, Ganjavi C, Hom B, Davis RJ, Tran J, Russo GI, Cocci A, Abreu A, Gill I, Desai M, Cacciamani GE. Accuracy, readability, and understandability of large language models for prostate cancer information to the public. Prostate Cancer Prostatic Dis 2024:10.1038/s41391-024-00826-y. [PMID: 38744934 DOI: 10.1038/s41391-024-00826-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 03/14/2024] [Accepted: 03/26/2024] [Indexed: 05/16/2024]
Abstract
BACKGROUND Generative Pretrained Model (GPT) chatbots have gained popularity since the public release of ChatGPT. Studies have evaluated the ability of different GPT models to provide information about medical conditions. To date, no study has assessed the quality of ChatGPT outputs to prostate cancer related questions from both the physician and public perspective while optimizing outputs for patient consumption. METHODS Nine prostate cancer-related questions, identified through Google Trends (Global), were categorized into diagnosis, treatment, and postoperative follow-up. These questions were processed using ChatGPT 3.5, and the responses were recorded. Subsequently, these responses were re-inputted into ChatGPT to create simplified summaries understandable at a sixth-grade level. Readability of both the original ChatGPT responses and the layperson summaries was evaluated using validated readability tools. A survey was conducted among urology providers (urologists and urologists in training) to rate the original ChatGPT responses for accuracy, completeness, and clarity using a 5-point Likert scale. Furthermore, two independent reviewers evaluated the layperson summaries on correctness trifecta: accuracy, completeness, and decision-making sufficiency. Public assessment of the simplified summaries' clarity and understandability was carried out through Amazon Mechanical Turk (MTurk). Participants rated the clarity and demonstrated their understanding through a multiple-choice question. RESULTS GPT-generated output was deemed correct by 71.7% to 94.3% of raters (36 urologists, 17 urology residents) across 9 scenarios. GPT-generated simplified layperson summaries of this output was rated as accurate in 8 of 9 (88.9%) scenarios and sufficient for a patient to make a decision in 8 of 9 (88.9%) scenarios. Mean readability of layperson summaries was higher than original GPT outputs ([original ChatGPT v. simplified ChatGPT, mean (SD), p-value] Flesch Reading Ease: 36.5(9.1) v. 70.2(11.2), <0.0001; Gunning Fog: 15.8(1.7) v. 9.5(2.0), p < 0.0001; Flesch Grade Level: 12.8(1.2) v. 7.4(1.7), p < 0.0001; Coleman Liau: 13.7(2.1) v. 8.6(2.4), 0.0002; Smog index: 11.8(1.2) v. 6.7(1.8), <0.0001; Automated Readability Index: 13.1(1.4) v. 7.5(2.1), p < 0.0001). MTurk workers (n = 514) rated the layperson summaries as correct (89.5-95.7%) and correctly understood the content (63.0-87.4%). CONCLUSION GPT shows promise for correct patient education for prostate cancer-related contents, but the technology is not designed for delivering patients information. Prompting the model to respond with accuracy, completeness, clarity and readability may enhance its utility when used for GPT-powered medical chatbots.
Collapse
Affiliation(s)
- Jacob S Hershenhouse
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Daniel Mokhtar
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Michael B Eppler
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Severin Rodler
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Lorenzo Storino Ramacciotti
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Conner Ganjavi
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Brian Hom
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Ryan J Davis
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - John Tran
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | | | - Andrea Cocci
- Urology Section, University of Florence, Florence, Italy
| | - Andre Abreu
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Inderbir Gill
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Mihir Desai
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Giovanni E Cacciamani
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
16
|
Ozgor BY, Simavi MA. Accuracy and reproducibility of ChatGPT's free version answers about endometriosis. Int J Gynaecol Obstet 2024; 165:691-695. [PMID: 38108232 DOI: 10.1002/ijgo.15309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 11/27/2023] [Accepted: 12/04/2023] [Indexed: 12/19/2023]
Abstract
OBJECTIVE To evaluate the accuracy and reproducibility of ChatGPT's free version answers about endometriosis for the first time. METHODS Detailed internet searches to identify frequently asked questions (FAQs) about endometriosis have been performed. Scientific questions were prepared in accordance with the European Society of Human Reproduction and Embryology (ESHRE) endometriosis guidelines. An experienced gynecologist gave a score of 1-4 for each ChatGPT answer. The repeatability of ChatGPT answers about endometriosis was analyzed by asking each question twice, and the reproducibility of ChatGPT was accepted as scoring the answer to the same question in the same score category. RESULTS A total of 91.4% (n = 71) of all FAQs were answered completely, accurately, and sufficiently. ChatGPT had the highest accuracy in the symptom and diagnosis category (94.1%, 16/17 questions) and the lowest accuracy in the treatment category (81.3%, 13/16 questions). Furthermore, of the 40 questions based on the ESHRE endometriosis guidelines, 27 (67.5%) were classified as grade 1, seven (17.5%) as grade 2, and six (15.0%) as grade 3. The reproducibility rate of FAQs in the prevention, symptoms, and diagnosis, and complications categories was the highest (100% for all categories). The reproducibility rate was the lowest for questions based on the ESHRE endometriosis guidelines (70.0%). CONCLUSION ChatGPT accurately and satisfactorily responded to more than 90% of the questions about endometriosis, but to only 67.5% of questions based on the ESHRE endometriosis guidelines.
Collapse
Affiliation(s)
- Bahar Yuksel Ozgor
- Department of Obstetrics and Gynecology, Biruni University, Istanbul, Turkey
- Endometriosis Research and Support Organization (Endo Türkiye), Istanbul, Turkey
| | - Melek Azade Simavi
- Endometriosis Research and Support Organization (Endo Türkiye), Istanbul, Turkey
| |
Collapse
|
17
|
Cocci A, Pezzoli M, Lo Re M, Russo GI, Asmundo MG, Fode M, Cacciamani G, Cimino S, Minervini A, Durukan E. Quality of information and appropriateness of ChatGPT outputs for urology patients. Prostate Cancer Prostatic Dis 2024; 27:103-108. [PMID: 37516804 DOI: 10.1038/s41391-023-00705-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 06/22/2023] [Accepted: 07/17/2023] [Indexed: 07/31/2023]
Abstract
BACKGROUND The proportion of health-related searches on the internet is continuously growing. ChatGPT, a natural language processing (NLP) tool created by OpenAI, has been gaining increasing user attention and can potentially be used as a source for obtaining information related to health concerns. This study aims to analyze the quality and appropriateness of ChatGPT's responses to Urology case studies compared to those of a urologist. METHODS Data from 100 patient case studies, comprising patient demographics, medical history, and urologic complaints, were sequentially inputted into ChatGPT, one by one. A question was posed to determine the most likely diagnosis, suggested examinations, and treatment options. The responses generated by ChatGPT were then compared to those provided by a board-certified urologist who was blinded to ChatGPT's responses and graded on a 5-point Likert scale based on accuracy, comprehensiveness, and clarity as criterias for appropriateness. The quality of information was graded based on the section 2 of the DISCERN tool and readability assessments were performed using the Flesch Reading Ease (FRE) and Flesch-Kincaid Reading Grade Level (FKGL) formulas. RESULTS 52% of all responses were deemed appropriate. ChatGPT provided more appropriate responses for non-oncology conditions (58.5%) compared to oncology (52.6%) and emergency urology cases (11.1%) (p = 0.03). The median score of the DISCERN tool was 15 (IQR = 5.3) corresponding to a quality score of poor. The ChatGPT responses demonstrated a college graduate reading level, as indicated by the median FRE score of 18 (IQR = 21) and the median FKGL score of 15.8 (IQR = 3). CONCLUSIONS ChatGPT serves as an interactive tool for providing medical information online, offering the possibility of enhancing health outcomes and patient satisfaction. Nevertheless, the insufficient appropriateness and poor quality of the responses on Urology cases emphasizes the importance of thorough evaluation and use of NLP-generated outputs when addressing health-related concerns.
Collapse
Affiliation(s)
- Andrea Cocci
- Urology Section, University of Florence, Florence, Italy.
| | - Marta Pezzoli
- Urology Section, University of Florence, Florence, Italy
| | - Mattia Lo Re
- Urology Section, University of Florence, Florence, Italy
| | | | | | - Mikkel Fode
- Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
- Department of Urology, Copenhagen University Hospital, Herlev and Gentofte Hospital, Copenhagen, Denmark
| | - Giovanni Cacciamani
- Institute of Urology, Keck School of Medicine, University of Southern California (USC), Los Angeles, CA, USA
| | | | | | - Emil Durukan
- Department of Urology, Copenhagen University Hospital, Herlev and Gentofte Hospital, Copenhagen, Denmark
| |
Collapse
|
18
|
Peng S, Wang D, Liang Y, Xiao W, Zhang Y, Liu L. AI-ChatGPT/GPT-4: An Booster for the Development of Physical Medicine and Rehabilitation in the New Era! Ann Biomed Eng 2024; 52:462-466. [PMID: 37500980 PMCID: PMC10859338 DOI: 10.1007/s10439-023-03314-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 07/06/2023] [Indexed: 07/29/2023]
Abstract
Artificial intelligence (AI) has been driving the continuous development of the Physical Medicine and Rehabilitation (PM&R) fields. The latest release of ChatGPT/GPT-4 has shown us that AI can potentially transform the healthcare industry. In this study, we propose various ways in which ChatGPT/GPT-4 can display its talents in the field of PM&R in future. ChatGPT/GPT-4 is an essential tool for Physiatrists in the new era.
Collapse
Affiliation(s)
- Shengxin Peng
- School of Rehabilitation Medicine of Binzhou Medical University, Yantai, China
| | - Deqiang Wang
- School of Rehabilitation Medicine of Binzhou Medical University, Yantai, China
| | | | | | | | - Lei Liu
- Department of Painology, The First Affiliated Hospital of Shandong First Medical University (Shandong Provincial Qianfoshan Hospital), Jinan, 250014, China.
| |
Collapse
|
19
|
Abi-Rafeh J, Xu HH, Kazan R, Tevlin R, Furnas H. Large Language Models and Artificial Intelligence: A Primer for Plastic Surgeons on the Demonstrated and Potential Applications, Promises, and Limitations of ChatGPT. Aesthet Surg J 2024; 44:329-343. [PMID: 37562022 DOI: 10.1093/asj/sjad260] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 08/02/2023] [Accepted: 08/04/2023] [Indexed: 08/12/2023] Open
Abstract
BACKGROUND The rapidly evolving field of artificial intelligence (AI) holds great potential for plastic surgeons. ChatGPT, a recently released AI large language model (LLM), promises applications across many disciplines, including healthcare. OBJECTIVES The aim of this article was to provide a primer for plastic surgeons on AI, LLM, and ChatGPT, including an analysis of current demonstrated and proposed clinical applications. METHODS A systematic review was performed identifying medical and surgical literature on ChatGPT's proposed clinical applications. Variables assessed included applications investigated, command tasks provided, user input information, AI-emulated human skills, output validation, and reported limitations. RESULTS The analysis included 175 articles reporting on 13 plastic surgery applications and 116 additional clinical applications, categorized by field and purpose. Thirty-four applications within plastic surgery are thus proposed, with relevance to different target audiences, including attending plastic surgeons (n = 17, 50%), trainees/educators (n = 8, 24.0%), researchers/scholars (n = 7, 21%), and patients (n = 2, 6%). The 15 identified limitations of ChatGPT were categorized by training data, algorithm, and ethical considerations. CONCLUSIONS Widespread use of ChatGPT in plastic surgery will depend on rigorous research of proposed applications to validate performance and address limitations. This systemic review aims to guide research, development, and regulation to safely adopt AI in plastic surgery.
Collapse
|
20
|
Weng SX, Zheng HH, Yu QX, Wang JC. A commentary on'Re: Is ChatGPT a qualified thoracic surgeon assistant?'. Int J Surg 2024; 110:1287-1288. [PMID: 38016293 PMCID: PMC10871630 DOI: 10.1097/js9.0000000000000880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Accepted: 10/20/2023] [Indexed: 11/30/2023]
Affiliation(s)
- Shou-xiang Weng
- Department of Pathology, Taizhou Hospital, Wenzhou Medical University, Linhai
| | - Hai-hong Zheng
- Department of Pathology, Taizhou Hospital, Wenzhou Medical University, Linhai
| | - Qing-xin Yu
- Department of Pathology, Ningbo Clinical Pathology Diagnosis Center, Ningbo City, Zhejiang Province, People’s Republic of China
| | - Jiao-chen Wang
- Department of Pathology, Taizhou Hospital, Wenzhou Medical University, Linhai
| |
Collapse
|
21
|
Khene ZE, Bigot P, Mathieu R, Rouprêt M, Bensalah K. Development of a Personalized Chat Model Based on the European Association of Urology Oncology Guidelines: Harnessing the Power of Generative Artificial Intelligence in Clinical Practice. Eur Urol Oncol 2024; 7:160-162. [PMID: 37474402 DOI: 10.1016/j.euo.2023.06.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 06/22/2023] [Accepted: 06/28/2023] [Indexed: 07/22/2023]
Affiliation(s)
| | - Pierre Bigot
- Department of Urology, University of Angers, Angers, France
| | - Romain Mathieu
- Department of Urology, Rennes University Hospital, Rennes, France
| | - Morgan Rouprêt
- Department of Urology, La Pitié-Salpétrière Hospital, Paris, France
| | - Karim Bensalah
- Department of Urology, Rennes University Hospital, Rennes, France.
| |
Collapse
|
22
|
May M, Körner-Riffard K, Marszalek M, Eredics K. Would Uro_Chat, a Newly Developed Generative Artificial Intelligence Large Language Model, Have Successfully Passed the In-Service Assessment Questions of the European Board of Urology in 2022? Eur Urol Oncol 2024; 7:155-156. [PMID: 37716835 DOI: 10.1016/j.euo.2023.08.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 08/29/2023] [Indexed: 09/18/2023]
Affiliation(s)
- Matthias May
- Department of Urology, St. Elisabeth Hospital Straubing, Brothers of Mercy Hospital, Straubing, Germany.
| | - Katharina Körner-Riffard
- Department of Urology, Caritas St. Josef Medical Centre, University of Regensburg, Regensburg, Germany
| | - Martin Marszalek
- Department of Urology and Andrology, Klinik Donaustadt, Vienna, Austria
| | - Klaus Eredics
- Department of Urology and Andrology, Klinik Donaustadt, Vienna, Austria
| |
Collapse
|
23
|
Choi J, Kim JW, Lee YS, Tae JH, Choi SY, Chang IH, Kim JH. Availability of ChatGPT to provide medical information for patients with kidney cancer. Sci Rep 2024; 14:1542. [PMID: 38233511 PMCID: PMC10794224 DOI: 10.1038/s41598-024-51531-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 01/06/2024] [Indexed: 01/19/2024] Open
Abstract
ChatGPT is an advanced natural language processing technology that closely resembles human language. We evaluated whether ChatGPT could help patients understand kidney cancer and replace consultations with urologists. Two urologists developed ten questions commonly asked by patients with kidney cancer. The answers to these questions were produced using ChatGPT. The five-dimension SERVQUAL model was used to assess the service quality of ChatGPT. The survey was distributed to 103 urologists via email, and twenty-four urological oncologists specializing in kidney cancer were included as experts with more than 20 kidney cancer cases in clinic per month. All respondents were physicians. We received 24 responses to the email survey (response rate: 23.3%). The appropriateness rate for all ten answers exceeded 60%. The answer to Q2 received the highest agreement (91.7%, etiology of kidney cancer), whereas the answer to Q8 had the lowest (62.5%, comparison with other cancers). The experts gave low assessment ratings (44.4% vs. 93.3%, p = 0.028) in the SERVQUAL assurance (certainty of total answers) dimension. Positive scores for the overall understandability of ChatGPT answers were assigned by 54.2% of responders, and 70.8% said that ChatGPT could not replace explanations provided by urologists. Our findings affirm that although ChatGPT answers to kidney cancer questions are generally accessible, they should not supplant the counseling of a urologist.
Collapse
Affiliation(s)
- Joongwon Choi
- Department of Urology, Chung-Ang University Gwangmyeong Hospital, Chung-Ang University College of Medicine, Seoul, South Korea
| | - Jin Wook Kim
- Department of Urology, Chung-Ang University Gwangmyeong Hospital, Chung-Ang University College of Medicine, Seoul, South Korea
| | - Yong Seong Lee
- Department of Urology, Chung-Ang University Gwangmyeong Hospital, Chung-Ang University College of Medicine, Seoul, South Korea
| | - Jong Hyun Tae
- Department of Urology, Chung-Ang University Hospital, Chung-Ang University College of Medicine, Seoul, South Korea
| | - Se Young Choi
- Department of Urology, Chung-Ang University Hospital, Chung-Ang University College of Medicine, Seoul, South Korea
| | - In Ho Chang
- Department of Urology, Chung-Ang University Hospital, Chung-Ang University College of Medicine, Seoul, South Korea
| | - Jung Hoon Kim
- Department of Urology, Chung-Ang University Gwangmyeong Hospital, Chung-Ang University College of Medicine, Seoul, South Korea.
- Chung-Ang University Gwangmyeong Hospital, 110 Deokan-Ro, Gwangmyeong-Si, Gyeonggi-Do, 14353, South Korea.
| |
Collapse
|
24
|
Ferreira RM. New evidence-based practice: Artificial intelligence as a barrier breaker. World J Methodol 2023; 13:384-389. [PMID: 38229944 PMCID: PMC10789101 DOI: 10.5662/wjm.v13.i5.384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 10/24/2023] [Accepted: 11/08/2023] [Indexed: 12/20/2023] Open
Abstract
The concept of evidence-based practice has persisted over several years and remains a cornerstone in clinical practice, representing the gold standard for optimal patient care. However, despite widespread recognition of its significance, practical application faces various challenges and barriers, including a lack of skills in interpreting studies, limited resources, time constraints, linguistic competencies, and more. Recently, we have witnessed the emergence of a groundbreaking technological revolution known as artificial intelligence. Although artificial intelligence has become increasingly integrated into our daily lives, some reluctance persists among certain segments of the public. This article explores the potential of artificial intelligence as a solution to some of the main barriers encountered in the application of evidence-based practice. It highlights how artificial intelligence can assist in staying updated with the latest evidence, enhancing clinical decision-making, addressing patient misinformation, and mitigating time constraints in clinical practice. The integration of artificial intelligence into evidence-based practice has the potential to revolutionize healthcare, leading to more precise diagnoses, personalized treatment plans, and improved doctor-patient interactions. This proposed synergy between evidence-based practice and artificial intelligence may necessitate adjustments to its core concept, heralding a new era in healthcare.
Collapse
Affiliation(s)
- Ricardo Maia Ferreira
- Department of Sports and Exercise, Polytechnic Institute of Maia (N2i), Maia 4475-690, Porto, Portugal
- Department of Physioterapy, Polytechnic Institute of Coimbra, Coimbra Health School, Coimbra 3046-854, Coimbra, Portugal
- Department of Physioterapy, Polytechnic Institute of Castelo Branco, Dr. Lopes Dias Health School, Castelo Branco 6000-767, Castelo Branco, Portugal
- Sport Physical Activity and Health Research & Innovation Center, Polytechnic Institute of Viana do Castelo, Melgaço, 4960-320, Viana do Castelo, Portugal
| |
Collapse
|
25
|
Huo B, Cacciamani GE, Collins GS, McKechnie T, Lee Y, Guyatt G. Reporting standards for the use of large language model-linked chatbots for health advice. Nat Med 2023; 29:2988. [PMID: 37957381 DOI: 10.1038/s41591-023-02656-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Affiliation(s)
- Bright Huo
- Division of General Surgery, Department of Surgery, McMaster University, Hamilton, Ontario, Canada.
| | - Giovanni E Cacciamani
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- AI Center at USC Urology, University of Southern California, Los Angeles, CA, USA
| | - Gary S Collins
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology & Musculoskeletal Sciences, University of Oxford, Oxford, UK
- UK EQUATOR Centre, University of Oxford, Oxford, UK
| | - Tyler McKechnie
- Division of General Surgery, Department of Surgery, McMaster University, Hamilton, Ontario, Canada
- Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
| | - Yung Lee
- Division of General Surgery, Department of Surgery, McMaster University, Hamilton, Ontario, Canada
- Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, USA
| | - Gordon Guyatt
- Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
- Department of Medicine, McMaster University, Hamilton, Ontario, Canada
| |
Collapse
|
26
|
Yu QX, Wu RC, Feng DC, Li DX. Re: ChatGPT encounters multiple opportunities and challenges in neurosurgery. Int J Surg 2023; 109:4393-4394. [PMID: 37720947 PMCID: PMC10720816 DOI: 10.1097/js9.0000000000000749] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 08/25/2023] [Indexed: 09/19/2023]
Affiliation(s)
- Qing-xin Yu
- Department of pathology,Ningbo Clinical Pathology Diagnosis Center, Ningbo City, Zhejiang Province
| | - Rui-cheng Wu
- Department of Urology, Institute of Urology, West China Hospital, Sichuan University, Chengdu, Sichuan Province, People’s Republic of China
| | - De-chao Feng
- Department of Urology, Institute of Urology, West China Hospital, Sichuan University, Chengdu, Sichuan Province, People’s Republic of China
| | - Deng-xiong Li
- Department of Urology, Institute of Urology, West China Hospital, Sichuan University, Chengdu, Sichuan Province, People’s Republic of China
| |
Collapse
|
27
|
Rodler S, Kopliku R, Ulrich D, Kaltenhauser A, Casuscelli J, Eismann L, Waidelich R, Buchner A, Butz A, Cacciamani GE, Stief CG, Westhofen T. Patients' Trust in Artificial Intelligence-based Decision-making for Localized Prostate Cancer: Results from a Prospective Trial. Eur Urol Focus 2023:S2405-4569(23)00237-7. [PMID: 37923632 DOI: 10.1016/j.euf.2023.10.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 10/04/2023] [Accepted: 10/21/2023] [Indexed: 11/07/2023]
Abstract
BACKGROUND Artificial intelligence (AI) has the potential to enhance diagnostic accuracy and improve treatment outcomes. However, AI integration into clinical workflows and patient perspectives remain unclear. OBJECTIVE To determine patients' trust in AI and their perception of urologists relying on AI, and future diagnostic and therapeutic AI applications for patients. DESIGN, SETTING, AND PARTICIPANTS A prospective trial was conducted involving patients who received diagnostic or therapeutic interventions for prostate cancer (PC). INTERVENTION Patients were asked to complete a survey before magnetic resonance imaging, prostate biopsy, or radical prostatectomy. OUTCOME MEASUREMENTS AND STATISTICAL ANALYSIS The primary outcome was patient trust in AI. Secondary outcomes were the choice of AI in treatment settings and traits attributed to AI and urologists. RESULTS AND LIMITATIONS Data for 466 patients were analyzed. The cumulative affinity for technology was positively correlated with trust in AI (correlation coefficient 0.094; p = 0.04), whereas patient age, level of education, and subjective perception of illness were not (p > 0.05). The mean score (± standard deviation) for trust in capability was higher for physicians than for AI for responding in an individualized way when communicating a diagnosis (4.51 ± 0.76 vs 3.38 ± 1.07; mean difference [MD] 1.130, 95% confidence interval [CI] 1.010-1.250; t924 = 18.52, p < 0.001; Cohen's d = 1.040) and for explaining information in an understandable way (4.57 ± vs 3.18 ± 1.09; MD 1.392, 95% CI 1.275-1.509; t921 = 27.27, p < 0.001; Cohen's d = 1.216). Patients stated that they had higher trust in a diagnosis made by AI controlled by a physician versus AI not controlled by a physician (4.31 ± 0.88 vs 1.75 ± 0.93; MD 2.561, 95% CI 2.444-2.678; t925 = 42.89, p < 0.001; Cohen's d = 2.818). AI-assisted physicians (66.74%) were preferred over physicians alone (29.61%), physicians controlled by AI (2.36%), and AI alone (0.64%) for treatment in the current clinical scenario. CONCLUSIONS Trust in future diagnostic and therapeutic AI-based treatment relies on optimal integration with urologists as the human-machine interface to leverage human and AI capabilities. PATIENT SUMMARY Artificial intelligence (AI) will play a role in diagnostic decisions in prostate cancer in the future. At present, patients prefer AI-assisted urologists over urologists alone, AI alone, and AI-controlled urologists. Specific traits of AI and urologists could be used to optimize diagnosis and treatment for patients with prostate cancer.
Collapse
Affiliation(s)
- Severin Rodler
- Department of Urology, LMU University Hospital, Munich, Germany; USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
| | - Rega Kopliku
- Department of Urology, LMU University Hospital, Munich, Germany
| | - Daniel Ulrich
- Department of Informatics, Ludwig-Maximilian-Universität München, Munich, Germany
| | - Annika Kaltenhauser
- Department of Informatics, Ludwig-Maximilian-Universität München, Munich, Germany
| | | | - Lennert Eismann
- Department of Urology, LMU University Hospital, Munich, Germany
| | | | | | - Andreas Butz
- Department of Informatics, Ludwig-Maximilian-Universität München, Munich, Germany
| | - Giovanni E Cacciamani
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | | | - Thilo Westhofen
- Department of Urology, LMU University Hospital, Munich, Germany
| |
Collapse
|
28
|
Liu J, Zheng J, Cai X, Wu D, Yin C. A descriptive study based on the comparison of ChatGPT and evidence-based neurosurgeons. iScience 2023; 26:107590. [PMID: 37705958 PMCID: PMC10495632 DOI: 10.1016/j.isci.2023.107590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 06/21/2023] [Accepted: 08/04/2023] [Indexed: 09/15/2023] Open
Abstract
ChatGPT is an artificial intelligence product developed by OpenAI. This study aims to investigate whether ChatGPT can respond in accordance with evidence-based medicine in neurosurgery. We generated 50 neurosurgical questions covering neurosurgical diseases. Each question was posed three times to GPT-3.5 and GPT-4.0. We also recruited three neurosurgeons with high, middle, and low seniority to respond to questions. The results were analyzed regarding ChatGPT's overall performance score, mean scores by the items' specialty classification, and question type. In conclusion, GPT-3.5's ability to respond in accordance with evidence-based medicine was comparable to that of neurosurgeons with low seniority, and GPT-4.0's ability was comparable to that of neurosurgeons with high seniority. Although ChatGPT is yet to be comparable to a neurosurgeon with high seniority, future upgrades could enhance its performance and abilities.
Collapse
Affiliation(s)
- Jiayu Liu
- Department of Neurosurgery, the First Medical Centre, Chinese PLA General Hospital, Beijing 100853, China
| | - Jiqi Zheng
- School of Health Humanities, Peking University, Beijing 100191, China
| | - Xintian Cai
- Department of Graduate School, Xinjiang Medical University, Urumqi 830001, China
| | - Dongdong Wu
- Department of Information, Daping Hospital, Army Medical University, Chongqing 400042, China
| | - Chengliang Yin
- Faculty of Medicine, Macau University of Science and Technology, Macau 999078, China
| |
Collapse
|
29
|
Zhang L, Tashiro S, Mukaino M, Yamada S. Use of artificial intelligence large language models as a clinical tool in rehabilitation medicine: a comparative test case. J Rehabil Med 2023; 55:jrm13373. [PMID: 37691497 PMCID: PMC10501385 DOI: 10.2340/jrm.v55.13373] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Accepted: 07/05/2023] [Indexed: 09/12/2023] Open
Abstract
OBJECTIVE To explore the potential use of artificial intelligence language models in formulating rehabilitation prescriptions and International Classification of Functioning, Disability and Health (ICF) codes. Design: Comparative study based on a single case report compared to standard answers from a textbook. SUBJECTS A stroke case from textbook. Methods: Chat Generative Pre-Trained Transformer-4 (ChatGPT-4)was used to generate comprehensive medical and rehabilitation prescription information and ICF codes pertaining to the stroke case. This information was compared with standard answers from textbook, and 2 licensed Physical Medicine and Rehabilitation (PMR) clinicians reviewed the artificial intelligence recommendations for further discussion. RESULTS ChatGPT-4 effectively formulated rehabilitation prescriptions and ICF codes for a typical stroke case, together with a rationale to support its recommendations. This information was generated in seconds. Compared with standard answers, the large language model generated broader and more general prescriptions in terms of medical problems and management plans, rehabilitation problems and management plans, as well as rehabilitation goals. It also demonstrated the ability to propose specified approaches for each rehabilitation therapy. The language model made an error regarding the ICF category for the stroke case, but no mistakes were identified in the ICF codes assigned. Conclusion: This test case suggests that artificial intelligence language models have potential use in facilitating clinical practice and education in the field of rehabilitation medicine.
Collapse
Affiliation(s)
- Liang Zhang
- Department of Rehabilitation Medicine, Kyorin University School of Medicine, Japan
| | - Syoichi Tashiro
- Department of Rehabilitation Medicine, Kyorin University School of Medicine, Japan; Department of Rehabilitation Medicine, Keio University School of Medicine, Japan
| | - Masahiko Mukaino
- Department of Rehabilitation Medicine, Hokkaido University Hospital, Japan
| | - Shin Yamada
- Department of Rehabilitation Medicine, Kyorin University School of Medicine, Japan.
| |
Collapse
|
30
|
Peng L, Liang R, Yi F, Zhang S, Wu S. Re: Zhonghan Zhou, Xuesheng Wang, Xunhua Li, Limin Liao. Is ChatGPT an Evidence-based Doctor? Eur Urol. 2023;84:355-6. Eur Urol 2023; 84:e76. [PMID: 37271634 DOI: 10.1016/j.eururo.2023.04.042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 04/27/2023] [Indexed: 06/06/2023]
Affiliation(s)
- Lei Peng
- Department of Urology, Lanzhou University Second Hospital, Lanzhou, China; Motor Robotics Institute (MRI), South China Hospital, Health Science Center, Shenzhen University, Shenzhen, China
| | - Rui Liang
- Motor Robotics Institute (MRI), South China Hospital, Health Science Center, Shenzhen University, Shenzhen, China; Department of Urology, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Fulin Yi
- North Sichuan Medical College (University), Nanchong, China
| | - Shaohua Zhang
- Motor Robotics Institute (MRI), South China Hospital, Health Science Center, Shenzhen University, Shenzhen, China.
| | - Song Wu
- Department of Urology, Lanzhou University Second Hospital, Lanzhou, China; Motor Robotics Institute (MRI), South China Hospital, Health Science Center, Shenzhen University, Shenzhen, China.
| |
Collapse
|
31
|
Abstract
INTRODUCTION This study evaluates the knowledge of ChatGPT about osteoporosis. METHODS Osteoporosis-related frequently asked questions (FAQs) created by examining the websites frequently visited by patients, the official websites of hospitals, and social media. Questions based on these scientific data have been prepared in accordance with National Osteoporosis Guideline Group guides. Rater scored all ChatGPT answers between 1 and 4 (1 stated that the information was completely correct, 2 stated that the information was correct but insufficient, 3 stated that although some of the information was correct, there was incorrect information in the answer, and 4 stated that the answer consisted of completely incorrect information). The reproducibility of ChatGPT responses on osteoporosis was assessed by asking each question twice. The repeatability of the ChatGPT answer was considered as getting the same score twice. RESULTS ChatGPT responded to 72 FAQs with an accuracy rate of 80.6%. The highest accuracy in ChatGPT's answers about osteoporosis was in the prevention category, 91.7%, and in the general knowledge category, 85.8%. Only 19 of the 31 (61.3%) questions prepared according to the National Osteoporosis Guideline Group guidelines were answered correctly by ChatGPT, and two answers (6.4%) were categorized as grade 4. The reproducibility rate of ChatGPT answers on 72 FAQs was 86.1% and the reproducibility rate of ChatGPT answers on National Osteoporosis Guideline Group guidelines was 83.9%. CONCLUSION Present study outcomes for the first time showed that ChatGPT provided adequate answers to more than 80% of FAQs about osteoporosis. However, the accuracy of ChatGPT's answers to inquiries based on National Osteoporosis Guideline Group guidelines was decreased to 61.3%.
Collapse
Affiliation(s)
- Cigdem Cinar
- Department of Interventional Physiatry, Biruni University, Istanbul, TUR
| |
Collapse
|
32
|
Kim SH, Tae JH, Chang IH, Kim TH, Myung SC, Nguyen TT, Choi J, Kim JH, Kim JW, Lee YS, Choi SY. Changes in patient perceptions regarding ChatGPT-written explanations on lifestyle modifications for preventing urolithiasis recurrence. Digit Health 2023; 9:20552076231203940. [PMID: 37780059 PMCID: PMC10540569 DOI: 10.1177/20552076231203940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 09/09/2023] [Indexed: 10/03/2023] Open
Abstract
Purpose Artificial Intelligence (AI) imitating human-like language, such as ChatGPT, has impacted lives throughout various multidisciplinary fields. However, despite these innovations, it is unclear how well its implementation will assist patients in clinical situations. We evaluated changes in patient perceptions regarding AI before and after reading a ChatGPT-written explanation. Materials and methods In total, 24 South Korean patients receiving urolithiasis treatment were surveyed through questionnaires. The ChatGPT explanatory note was provided between the first and second questionnaires, detailing lifestyle modifications for preventing urolithiasis recurrence. The study questionnaire was the Korean version of the General Attitudes toward Artificial Intelligence Scale, including positive and negative attitude items. Wilcoxon signed-rank tests were accomplished to compare questionnaire scores before and after receiving the explanatory note. A linear regression analysis with stepwise elimination was used to assess variable (demographic data) accuracy in predicting outcomes. Results There were significant differences between total negative questionnaire scores pre- and post-surveys of ChatGPT, but not in the positive scores. Among variables, only education level significantly influenced mean score differences in the negative questionnaires. Conclusions The negative perception change among urolithiasis patients after receiving the explanatory note provided by the AI chatbot program was observed, evidencing that patients with lower education levels expressed a more negative response. The explanatory note provided by the AI chatbot program could provoke an adverse change in AI perception. Negative human responses must be considered to improve and adapt new technology in health care. Only through changing patient perspectives will upgraded AI technology integrate into medical healthcare.
Collapse
Affiliation(s)
- Seong Hwan Kim
- Department of Orthopedic Surgery, Chung-Ang University Hospital, Chung-Ang University College of Medicine, Seoul, Republic of Korea
| | - Jong Hyun Tae
- Department of Urology, Chung-Ang University Hospital, Chung-Ang University College of Medicine, Seoul, Republic of Korea
| | - In Ho Chang
- Department of Urology, Chung-Ang University Hospital, Chung-Ang University College of Medicine, Seoul, Republic of Korea
| | - Tae-Hyoung Kim
- Department of Urology, Chung-Ang University Hospital, Chung-Ang University College of Medicine, Seoul, Republic of Korea
| | - Soon Chul Myung
- Department of Urology, Chung-Ang University Hospital, Chung-Ang University College of Medicine, Seoul, Republic of Korea
| | - Tuan Thanh Nguyen
- Department of Urology, Cho Ray Hospital, University of Medicine and Pharmacy, Ho Chi Minh City, Vietnam
| | - Joongwon Choi
- Department of Urology, Chung-Ang University Gwangmyeong Hospital, Chung-Ang University College of Medicine, Gyeonggi-do, Republic of Korea
| | - Jung Hoon Kim
- Department of Urology, Chung-Ang University Gwangmyeong Hospital, Chung-Ang University College of Medicine, Gyeonggi-do, Republic of Korea
| | - Jin Wook Kim
- Department of Urology, Chung-Ang University Gwangmyeong Hospital, Chung-Ang University College of Medicine, Gyeonggi-do, Republic of Korea
| | - Yong Seong Lee
- Department of Urology, Chung-Ang University Gwangmyeong Hospital, Chung-Ang University College of Medicine, Gyeonggi-do, Republic of Korea
| | - Se Young Choi
- Department of Urology, Chung-Ang University Hospital, Chung-Ang University College of Medicine, Seoul, Republic of Korea
| |
Collapse
|