1
|
de Hond A, Leeuwenberg T, Bartels R, van Buchem M, Kant I, Moons KG, van Smeden M. From text to treatment: the crucial role of validation for generative large language models in health care. Lancet Digit Health 2024; 6:e441-e443. [PMID: 38906607 DOI: 10.1016/s2589-7500(24)00111-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 04/16/2024] [Accepted: 05/13/2024] [Indexed: 06/23/2024]
Affiliation(s)
- Anne de Hond
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, 3584 CG Utrecht, Netherlands.
| | - Tuur Leeuwenberg
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, 3584 CG Utrecht, Netherlands
| | - Richard Bartels
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, 3584 CG Utrecht, Netherlands; Department of Digital Health, University Medical Centre Utrecht, Utrecht, Netherlands
| | - Marieke van Buchem
- Clinical AI Implementation and Research Lab, Leiden University Medical Center, Leiden, Netherlands; Department of Information Technology and Digital Innovation, Leiden University Medical Center, Leiden, Netherlands
| | - Ilse Kant
- Department of Digital Health, University Medical Centre Utrecht, Utrecht, Netherlands
| | - Karel Gm Moons
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, 3584 CG Utrecht, Netherlands
| | - Maarten van Smeden
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, 3584 CG Utrecht, Netherlands
| |
Collapse
|
2
|
Pozzi E, Velasquez DA, Varnum AA, Kava BR, Ramasamy R. Artificial Intelligence Modeling and Priapism. Curr Urol Rep 2024:10.1007/s11934-024-01221-9. [PMID: 38886246 DOI: 10.1007/s11934-024-01221-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/12/2024] [Indexed: 06/20/2024]
Abstract
PURPOSE OF REVIEW This narrative review aims to outline the current available evidence, challenges, and future perspectives of Artificial Intelligence (AI) in the diagnosis and management of priapism, a condition marked by prolonged and often painful erections that presents unique diagnostic and therapeutic challenges. RECENT FINDINGS Recent advancements in AI offer promising solutions to face the challenges in diagnosing and treating priapism. AI models have demonstrated the potential to predict the need for surgical intervention and improve diagnostic accuracy. The integration of AI models into medical decision-making for priapism can also predict long-term consequences. AI is currently being implemented in urology to enhance diagnostics and treatment work-up for various conditions, including priapism. Traditional diagnostic approaches rely heavily on assessments based on history, leading to potential delays in treatment with possible long-term sequelae. To date, the role of AI in the management of priapism is understudied, yet to achieve dependable and effective models that can reliably assist physicians in making decisions regarding both diagnostic and treatment strategies.
Collapse
Affiliation(s)
- Edoardo Pozzi
- Desai Sethi Urology Institute, Miller School of Medicine, University of Miami, Miami, FL, USA.
- University Vita-Salute San Raffaele, Milan, Italy.
- Division of Experimental Oncology, Unit of Urology, URI, IRCCS Ospedale San Raffaele, Milan, Italy.
| | - David A Velasquez
- Desai Sethi Urology Institute, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Alexandra Aponte Varnum
- Desai Sethi Urology Institute, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Bruce R Kava
- Desai Sethi Urology Institute, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Ranjith Ramasamy
- Desai Sethi Urology Institute, Miller School of Medicine, University of Miami, Miami, FL, USA
| |
Collapse
|
3
|
Mao C, Zhang T. A commentary on can ChatGPT assist urologists manage overactive bladder? Int J Surg 2024; 110:3970-3971. [PMID: 38446864 PMCID: PMC11175744 DOI: 10.1097/js9.0000000000001261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Accepted: 02/21/2024] [Indexed: 03/08/2024]
Affiliation(s)
- Changkun Mao
- Department of Urology, Anhui Provincial Children’s Hospital, Hefei, Anhui, People’s Republic of China
| | | |
Collapse
|
4
|
Hershenhouse JS, Mokhtar D, Eppler MB, Rodler S, Storino Ramacciotti L, Ganjavi C, Hom B, Davis RJ, Tran J, Russo GI, Cocci A, Abreu A, Gill I, Desai M, Cacciamani GE. Accuracy, readability, and understandability of large language models for prostate cancer information to the public. Prostate Cancer Prostatic Dis 2024:10.1038/s41391-024-00826-y. [PMID: 38744934 DOI: 10.1038/s41391-024-00826-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 03/14/2024] [Accepted: 03/26/2024] [Indexed: 05/16/2024]
Abstract
BACKGROUND Generative Pretrained Model (GPT) chatbots have gained popularity since the public release of ChatGPT. Studies have evaluated the ability of different GPT models to provide information about medical conditions. To date, no study has assessed the quality of ChatGPT outputs to prostate cancer related questions from both the physician and public perspective while optimizing outputs for patient consumption. METHODS Nine prostate cancer-related questions, identified through Google Trends (Global), were categorized into diagnosis, treatment, and postoperative follow-up. These questions were processed using ChatGPT 3.5, and the responses were recorded. Subsequently, these responses were re-inputted into ChatGPT to create simplified summaries understandable at a sixth-grade level. Readability of both the original ChatGPT responses and the layperson summaries was evaluated using validated readability tools. A survey was conducted among urology providers (urologists and urologists in training) to rate the original ChatGPT responses for accuracy, completeness, and clarity using a 5-point Likert scale. Furthermore, two independent reviewers evaluated the layperson summaries on correctness trifecta: accuracy, completeness, and decision-making sufficiency. Public assessment of the simplified summaries' clarity and understandability was carried out through Amazon Mechanical Turk (MTurk). Participants rated the clarity and demonstrated their understanding through a multiple-choice question. RESULTS GPT-generated output was deemed correct by 71.7% to 94.3% of raters (36 urologists, 17 urology residents) across 9 scenarios. GPT-generated simplified layperson summaries of this output was rated as accurate in 8 of 9 (88.9%) scenarios and sufficient for a patient to make a decision in 8 of 9 (88.9%) scenarios. Mean readability of layperson summaries was higher than original GPT outputs ([original ChatGPT v. simplified ChatGPT, mean (SD), p-value] Flesch Reading Ease: 36.5(9.1) v. 70.2(11.2), <0.0001; Gunning Fog: 15.8(1.7) v. 9.5(2.0), p < 0.0001; Flesch Grade Level: 12.8(1.2) v. 7.4(1.7), p < 0.0001; Coleman Liau: 13.7(2.1) v. 8.6(2.4), 0.0002; Smog index: 11.8(1.2) v. 6.7(1.8), <0.0001; Automated Readability Index: 13.1(1.4) v. 7.5(2.1), p < 0.0001). MTurk workers (n = 514) rated the layperson summaries as correct (89.5-95.7%) and correctly understood the content (63.0-87.4%). CONCLUSION GPT shows promise for correct patient education for prostate cancer-related contents, but the technology is not designed for delivering patients information. Prompting the model to respond with accuracy, completeness, clarity and readability may enhance its utility when used for GPT-powered medical chatbots.
Collapse
Affiliation(s)
- Jacob S Hershenhouse
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Daniel Mokhtar
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Michael B Eppler
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Severin Rodler
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Lorenzo Storino Ramacciotti
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Conner Ganjavi
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Brian Hom
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Ryan J Davis
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - John Tran
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | | | - Andrea Cocci
- Urology Section, University of Florence, Florence, Italy
| | - Andre Abreu
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Inderbir Gill
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Mihir Desai
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Giovanni E Cacciamani
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
5
|
Deng QF, Bao YY, Yang YY, Mao CK. Re: David Musheyev, Alexander Pan, Stacy Loeb, Abdo E. Kabarriti. How Well Do Artificial Intelligence Chatbots Respond to the Top Search Queries About Urological Malignancies? Eur Urol 2023;85:13-6. Eur Urol 2024:S0302-2838(24)02312-1. [PMID: 38644145 DOI: 10.1016/j.eururo.2024.02.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 02/15/2024] [Indexed: 04/23/2024]
Affiliation(s)
- Qi-Fei Deng
- Department of Urology, Anhui Provincial Children's Hospital, Hefei, China
| | - Yuan-Yuan Bao
- Department of Electrocardiography, Anhui Maternal and Child Health Hospital, Hefei, China
| | - Yuan-Yuan Yang
- Department of Electrocardiography, Anhui Maternal and Child Health Hospital, Hefei, China.
| | - Chang-Kun Mao
- Department of Urology, Anhui Provincial Children's Hospital, Hefei, China.
| |
Collapse
|
6
|
Hershenhouse JS, Cacciamani GE. Comment on: Assessing ChatGPT's ability to answer questions pertaining to erectile dysfunction. Int J Impot Res 2024:10.1038/s41443-023-00821-2. [PMID: 38467775 DOI: 10.1038/s41443-023-00821-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Revised: 12/12/2023] [Accepted: 12/21/2023] [Indexed: 03/13/2024]
Affiliation(s)
- Jacob S Hershenhouse
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Giovanni E Cacciamani
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
7
|
Eppler M, Ganjavi C, Abreu A, Gill I, Cacciamani GE. Reply to Rui-Cheng Wu, Deng-Xiong Li, and De-Chao Feng's Letter to the Editor re: Michael Eppler, Conner Ganjavi, Lorenzo Storino Ramacciotti, et al. Awareness and Use of ChatGPT and Large Language Models: A Prospective Cross-sectional Global Survey in Urology. Eur Urol. 2024;85:146-53. Eur Urol 2024; 85:e85-e86. [PMID: 38182492 DOI: 10.1016/j.eururo.2023.12.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 12/13/2023] [Indexed: 01/07/2024]
Affiliation(s)
- Michael Eppler
- USC Institute of Urology, Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; AI Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA.
| | - Conner Ganjavi
- USC Institute of Urology, Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; AI Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Andre Abreu
- USC Institute of Urology, Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; AI Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Inderbir Gill
- USC Institute of Urology, Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; AI Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Giovanni E Cacciamani
- USC Institute of Urology, Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; AI Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
8
|
Ćirković A, Katz T. Exploring the Potential of ChatGPT-4 in Predicting Refractive Surgery Categorizations: Comparative Study. JMIR Form Res 2023; 7:e51798. [PMID: 38153777 PMCID: PMC10784977 DOI: 10.2196/51798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 11/01/2023] [Accepted: 12/04/2023] [Indexed: 12/29/2023] Open
Abstract
BACKGROUND Refractive surgery research aims to optimally precategorize patients by their suitability for various types of surgery. Recent advances have led to the development of artificial intelligence-powered algorithms, including machine learning approaches, to assess risks and enhance workflow. Large language models (LLMs) like ChatGPT-4 (OpenAI LP) have emerged as potential general artificial intelligence tools that can assist across various disciplines, possibly including refractive surgery decision-making. However, their actual capabilities in precategorizing refractive surgery patients based on real-world parameters remain unexplored. OBJECTIVE This exploratory study aimed to validate ChatGPT-4's capabilities in precategorizing refractive surgery patients based on commonly used clinical parameters. The goal was to assess whether ChatGPT-4's performance when categorizing batch inputs is comparable to those made by a refractive surgeon. A simple binary set of categories (patient suitable for laser refractive surgery or not) as well as a more detailed set were compared. METHODS Data from 100 consecutive patients from a refractive clinic were anonymized and analyzed. Parameters included age, sex, manifest refraction, visual acuity, and various corneal measurements and indices from Scheimpflug imaging. This study compared ChatGPT-4's performance with a clinician's categorizations using Cohen κ coefficient, a chi-square test, a confusion matrix, accuracy, precision, recall, F1-score, and receiver operating characteristic area under the curve. RESULTS A statistically significant noncoincidental accordance was found between ChatGPT-4 and the clinician's categorizations with a Cohen κ coefficient of 0.399 for 6 categories (95% CI 0.256-0.537) and 0.610 for binary categorization (95% CI 0.372-0.792). The model showed temporal instability and response variability, however. The chi-square test on 6 categories indicated an association between the 2 raters' distributions (χ²5=94.7, P<.001). Here, the accuracy was 0.68, precision 0.75, recall 0.68, and F1-score 0.70. For 2 categories, the accuracy was 0.88, precision 0.88, recall 0.88, F1-score 0.88, and area under the curve 0.79. CONCLUSIONS This study revealed that ChatGPT-4 exhibits potential as a precategorization tool in refractive surgery, showing promising agreement with clinician categorizations. However, its main limitations include, among others, dependency on solely one human rater, small sample size, the instability and variability of ChatGPT's (OpenAI LP) output between iterations and nontransparency of the underlying models. The results encourage further exploration into the application of LLMs like ChatGPT-4 in health care, particularly in decision-making processes that require understanding vast clinical data. Future research should focus on defining the model's accuracy with prompt and vignette standardization, detecting confounding factors, and comparing to other versions of ChatGPT-4 and other LLMs to pave the way for larger-scale validation and real-world implementation.
Collapse
Affiliation(s)
| | - Toam Katz
- Department of Ophthalmology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| |
Collapse
|