1
|
Halawani A, Almehmadi SG, Alhubaishy BA, Alnefaie ZA, Hasan MN. Empowering patients: how accurate and readable are large language models in renal cancer education. Front Oncol 2024; 14:1457516. [PMID: 39391252 PMCID: PMC11464325 DOI: 10.3389/fonc.2024.1457516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Accepted: 09/09/2024] [Indexed: 10/12/2024] Open
Abstract
Background The incorporation of Artificial Intelligence (AI) into healthcare sector has fundamentally transformed patient care paradigms, particularly through the creation of patient education materials (PEMs) tailored to individual needs. This Study aims to assess the precision and readability AI-generated information on kidney cancer using ChatGPT 4.0, Gemini AI, and Perplexity AI., comparing these outputs to PEMs provided by the American Urological Association (AUA) and the European Association of Urology (EAU). The objective is to guide physicians in directing patients to accurate and understandable resources. Methods PEMs published by AUA and EAU were collected and categorized. kidney cancer-related queries, identified via Google Trends (GT), were input into CahtGPT-4.0, Gemini AI, and Perplexity AI. Four independent reviewers assessed the AI outputs for accuracy grounded on five distinct categories, employing a 5-point Likert scale. A readability evaluation was conducted utilizing established formulas, including Gunning Fog Index (GFI), Simple Measure of Gobbledygook (SMOG), and Flesch-Kincaid Grade Formula (FKGL). AI chatbots were then tasked with simplifying their outputs to achieve a sixth-grade reading level. Results The PEM published by the AUA was the most readable with a mean readability score of 9.84 ± 1.2, in contrast to EAU (11.88 ± 1.11), ChatGPT-4.0 (11.03 ± 1.76), Perplexity AI (12.66 ± 1.83), and Gemini AI (10.83 ± 2.31). The Chatbots demonstrated the capability to simplify text lower grade levels upon request, with ChatGPT-4.0 achieving a readability grade level ranging from 5.76 to 9.19, Perplexity AI from 7.33 to 8.45, Gemini AI from 6.43 to 8.43. While official PEMS were considered accurate, the LLMs generated outputs exhibited an overall high level of accuracy with minor detail omission and some information inaccuracies. Information related to kidney cancer treatment was found to be the least accurate among the evaluated categories. Conclusion Although the PEM published by AUA being the most readable, both authoritative PEMs and Large Language Models (LLMs) generated outputs exceeded the recommended readability threshold for general population. AI Chatbots can simplify their outputs when explicitly instructed. However, notwithstanding their accuracy, LLMs-generated outputs are susceptible to detail omission and inaccuracies. The variability in AI performance necessitates cautious use as an adjunctive tool in patient education.
Collapse
Affiliation(s)
| | | | | | - Ziyad A. Alnefaie
- Department of Urology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Mudhar N. Hasan
- Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, United Arab Emirates
- Department of Urology, Mediclinic City Hospital, Dubai, United Arab Emirates
| |
Collapse
|
2
|
Halawani A, Mitchell A, Saffarzadeh M, Wong V, Chew BH, Forbes CM. Accuracy and Readability of Kidney Stone Patient Information Materials Generated by a Large Language Model Compared to Official Urologic Organizations. Urology 2024; 186:107-113. [PMID: 38395071 DOI: 10.1016/j.urology.2023.11.042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 10/29/2023] [Accepted: 11/07/2023] [Indexed: 02/25/2024]
Abstract
OBJECTIVE To compare the readability and accuracy of large language model generated patient information materials (PIMs) to those supplied by the American Urological Association (AUA), Canadian Urological Association (CUA), and European Association of Urology (EAU) for kidney stones. METHODS PIMs from AUA, CUA, and EAU related to nephrolithiasis were obtained and categorized. The most frequent patient questions related to kidney stones were identified from an internet query and input into GPT-3.5 and GPT-4. PIMs and ChatGPT outputs were assessed for accuracy and readability using previously published indexes. We also assessed changes in ChatGPT outputs when a reading level was specified (grade 6). RESULTS Readability scores were better for PIMs from the CUA (grade level 10-12), AUA (8-10), or EAU (9-11) compared to the chatbot. GPT-3.5 had the worst readability scores at grade 13-14 and GPT-4 was likewise less readable than urologic organization PIMs with scores of 11-13. While organizational PIMs were deemed to be accurate, the chatbot had high accuracy with minor details omitted. GPT-4 was more accurate in general stone information, dietary and medical management of kidney stones topics in comparison to GPT-3.5, while both models had the same accuracy in the surgical management of nephrolithiasis topics. CONCLUSION Current PIMs from major urologic organizations for kidney stones remain more readable than publicly available GPT outputs, but they are still higher than the reading ability of the general population. Of the available PIMs for kidney stones, those from the AUA are the most readable. Although Chatbot outputs for common kidney stone patient queries have a high degree of accuracy with minor omitted details, it is important for clinicians to understand their strengths and limitations.
Collapse
Affiliation(s)
- Abdulghafour Halawani
- Department of Urology, King Abdulaziz University, Jeddah, Saudi Arabia; Department of Urological Sciences, University of British Columbia, Stone Centre at Vancouver General Hospital, Vancouver, British Columbia, Canada
| | - Alec Mitchell
- Department of Urological Sciences, University of British Columbia, Stone Centre at Vancouver General Hospital, Vancouver, British Columbia, Canada
| | - Mohammadali Saffarzadeh
- Department of Urological Sciences, University of British Columbia, Stone Centre at Vancouver General Hospital, Vancouver, British Columbia, Canada
| | - Victor Wong
- Department of Urological Sciences, University of British Columbia, Stone Centre at Vancouver General Hospital, Vancouver, British Columbia, Canada
| | - Ben H Chew
- Department of Urological Sciences, University of British Columbia, Stone Centre at Vancouver General Hospital, Vancouver, British Columbia, Canada
| | - Connor M Forbes
- Department of Urological Sciences, University of British Columbia, Stone Centre at Vancouver General Hospital, Vancouver, British Columbia, Canada; Vancouver Prostate Centre, Vancouver, British Columbia, Canada.
| |
Collapse
|
3
|
Nguyen DD, Li T, Ferreira R, Baker Berjaoui M, Nguyen ALV, Chughtai B, Zorn KC, Bhojani N, Elterman D. Ablative minimally invasive surgical therapies for benign prostatic hyperplasia: A review of Aquablation, Rezum, and transperineal laser prostate ablation. Prostate Cancer Prostatic Dis 2024; 27:22-28. [PMID: 37081044 DOI: 10.1038/s41391-023-00669-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 03/23/2023] [Accepted: 04/04/2023] [Indexed: 04/22/2023]
Abstract
INTRODUCTION Benign prostatic hyperplasia (BPH) is one of the most common diseases affecting men and can present with bothersome lower urinary tract symptoms (LUTS). Historically, transurethral resection of the prostate (TURP) has been considered the gold standard in the treatment of LUTS due to BPH. However, TURP and other traditional options for the surgical management of LUTS secondary to BPH are associated with high rates of sexual dysfunction. In the past decade, several novel technologies, including Aquablation therapy, convective water vapor therapy (Rezum), and transperineal prostate laser ablation (TPLA), have demonstrated promising evidence to be safe and effective while preserving sexual function. METHODS In this review, we discuss three ablative minimally invasive surgeries: Aquablation, Rezum, and TPLA. We review their techniques, safety, as well as perioperative and functional outcomes. We go into further detail regarding sexual function after these ablative minimally invasive surgical therapies. RESULTS Aquablation is a surgeon-guided, robot-executed, heat-free ablative waterjet procedure with sustained functional outcomes at 5 years while having no effect on sexual activity. Rezum is an innovative office-based, minimally invasive surgical option for BPH that delivers convective water vapor energy into prostate adenoma to ablate obstructing tissue. Rezum leads to significant improvements in Qmax, IPSS while preserving sexual function. TPLA is another office-based technology which uses a diode laser source to produce thermoablation. It leads to improvement in Qmax, IPSS, and QoL while preserving ejaculatory function. CONCLUSIONS Overall, ablative minimally invasive surgical therapies have demonstrated excellent safety and efficacy profiles while preserving sexual function. These modalities should be discussed with patients to ensure informed and shared decision-making. Ablative minimally invasive surgical therapies may be particularly interesting to patients who value the preservation of their sexual function.
Collapse
Affiliation(s)
- David-Dan Nguyen
- Division of Urology, Department of Surgery, University of Toronto, Toronto, ON, Canada
| | - Tiange Li
- Division of Urology, Department of Surgery, University of Toronto, Toronto, ON, Canada
| | - Roseanne Ferreira
- Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, ON, Canada
| | | | - Anna-Lisa V Nguyen
- Schulich School of Medicine and Dentistry, Western University, London, ON, Canada
| | - Bilal Chughtai
- Department of Urology, Weill Cornell Medical College, New York, NY, USA
| | - Kevin C Zorn
- Division of Urology, Centre Hospitalier de l'Université de Montréal (CHUM), Montréal, QC, Canada
| | - Naeem Bhojani
- Division of Urology, Centre Hospitalier de l'Université de Montréal (CHUM), Montréal, QC, Canada
| | - Dean Elterman
- Division of Urology, Department of Surgery, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
4
|
Hotz T, Zhou MT, Reissmann ME, Apoj M, Wason SEL, Wang DS. Assessing the readability and quality of online information about benign prostatic hyperplasia. World J Urol 2023; 41:257-262. [PMID: 36416925 DOI: 10.1007/s00345-022-04223-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 11/09/2022] [Indexed: 11/24/2022] Open
Abstract
PURPOSE Benign prostatic hyperplasia (BPH) affects nearly half of men in their fifties. Patients often search the Internet to better understand their diagnosis, but online health information is not well regulated and can be difficult for patients to comprehend. This study aims to evaluate not only readability, but also the quality of online information about BPH, as well as the effect of commercial bias on readability and quality. METHODS Three search engines (Google, Bing, and DuckDuckGo) were used with broad search terms including "BPH," "BPH treatment," and "BPH surgery," to mimic a patient diagnosed with BPH seeking further information. 204 total websites were identified, of which 62 were unique websites. Among those unique websites, 23 were advertisements. Three readability formulas (Flesch-Kincaid Grade Level, Flesch-Kincaid Reading Ease, SMOG) were used to generate readability scores. DISCERN standardized questionnaire was used to evaluate website quality. RESULTS Average reading level of online information about BPH was significantly higher than the recommended level by the American Medical Association (AMA) and United States Department of Health and Human Services (USDHHS). Advertisements had significantly easier readability than nonadvertisements. Average website quality was "excellent" for nonadvertisements, but only "fair" for advertisements. CONCLUSION Although advertisements may hold optimal search result positions and have better readability than nonadvertisements, they have biased and lower quality information. It is important to guide patients to high quality online information of appropriate reading level. Continued efforts should be made to create and share with patients high quality resources with improved readability to facilitate comprehension and minimize misinformation.
Collapse
Affiliation(s)
- Tremearne Hotz
- Boston University School of Medicine, Boston, MA, 02118, USA
| | - Maya T Zhou
- Boston University School of Medicine, Boston, MA, 02118, USA
| | - Molly E Reissmann
- Department of Urology, Boston University School of Medicine/Boston Medical Center, 725 Albany Street, Shapiro 3B, Boston, MA, 02118, USA
| | - Michel Apoj
- Department of Urology, Boston University School of Medicine/Boston Medical Center, 725 Albany Street, Shapiro 3B, Boston, MA, 02118, USA
| | - Shaun E L Wason
- Department of Urology, Boston University School of Medicine/Boston Medical Center, 725 Albany Street, Shapiro 3B, Boston, MA, 02118, USA
| | - David S Wang
- Department of Urology, Boston University School of Medicine/Boston Medical Center, 725 Albany Street, Shapiro 3B, Boston, MA, 02118, USA.
| |
Collapse
|