1
|
Dihan QA, Brown AD, Zaldivar AT, Chauhan MZ, Eleiwa TK, Hassan AK, Solyman O, Gise R, Phillips PH, Sallam AB, Elhusseiny AM. Advancing Patient Education in Idiopathic Intracranial Hypertension: The Promise of Large Language Models. Neurol Clin Pract 2025; 15:e200366. [PMID: 39399571 PMCID: PMC11464234 DOI: 10.1212/cpj.0000000000200366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 06/04/2024] [Indexed: 10/15/2024]
Abstract
Background and Objectives We evaluated the performance of 3 large language models (LLMs) in generating patient education materials (PEMs) and enhancing the readability of prewritten PEMs on idiopathic intracranial hypertension (IIH). Methods This cross-sectional comparative study compared 3 LLMs, ChatGPT-3.5, ChatGPT-4, and Google Bard, for their ability to generate PEMs on IIH using 3 prompts. Prompt A (control prompt): "Can you write a patient-targeted health information handout on idiopathic intracranial hypertension that is easily understandable by the average American?", Prompt B (modifier statement + control prompt): "Given patient education materials are recommended to be written at a 6th-grade reading level, using the SMOG readability formula, can you write a patient-targeted health information handout on idiopathic intracranial hypertension that is easily understandable by the average American?", and Prompt C: "Given patient education materials are recommended to be written at a 6th-grade reading level, using the SMOG readability formula, can you rewrite the following text to a 6th-grade reading level: [insert text]." We compared generated and rewritten PEMs, along with the first 20 googled eligible PEMs on IIH, on readability (Simple Measure of Gobbledygook [SMOG] and Flesch-Kincaid Grade Level [FKGL]), quality (DISCERN and Patient Education Materials Assessment tool [PEMAT]), and accuracy (Likert misinformation scale). Results Generated PEMs were of high quality, understandability, and accuracy (median DISCERN score ≥4, PEMAT understandability ≥70%, Likert misinformation scale = 1). Only ChatGPT-4 was able to generate PEMs at the specified 6th-grade reading level (SMOG: 5.5 ± 0.6, FKGL: 5.6 ± 0.7). Original published PEMs were rewritten to below a 6th-grade reading level with Prompt C, without a decrease in quality, understandability, or accuracy only by ChatGPT-4 (SMOG: 5.6 ± 0.6, FKGL: 5.7 ± 0.8, p < 0.001, DISCERN ≥4, Likert misinformation = 1). Discussion In conclusion, LLMs, particularly ChatGPT-4, can produce high-quality, readable PEMs on IIH. They can also serve as supplementary tools to improve the readability of prewritten PEMs while maintaining quality and accuracy.
Collapse
Affiliation(s)
- Qais A Dihan
- Chicago Medical School (QAD), Rosalind Franklin University of Medicine and Science, North Chicago, IL; Department of Ophthalmology (QAD, MZC, PHP, ABS, AME), Harvey and Bernice Jones Eye Institute; UAMS College of Medicine (ADB), University of Arkansas for Medical Sciences, Little Rock, AR; Herbert Wertheim College of Medicine (ATZ), Florida International University; Mary & Edward Norton Library of Ophthalmology (ATZ), Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, FL; Department of Ophthalmology (TKE), Benha Faculty of Medicine, Benha University; Department of Ophthalmology (AKH), Faculty of Medicine, South Valley University, Qena; Department of Ophthalmology (OS), Research Institute of Ophthalmology, Giza, Egypt; Department of Ophthalmology (OS), Qassim University Medical City, Al-Qassim, Saudi Arabia; Department of Ophthalmology (RG, AME), Boston Children's Hospital, Harvard Medical School, MA; and Department of Ophthalmology (ABS), Faculty of Medicine, Ain Shams University, Cairo, Egypt
| | - Andrew D Brown
- Chicago Medical School (QAD), Rosalind Franklin University of Medicine and Science, North Chicago, IL; Department of Ophthalmology (QAD, MZC, PHP, ABS, AME), Harvey and Bernice Jones Eye Institute; UAMS College of Medicine (ADB), University of Arkansas for Medical Sciences, Little Rock, AR; Herbert Wertheim College of Medicine (ATZ), Florida International University; Mary & Edward Norton Library of Ophthalmology (ATZ), Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, FL; Department of Ophthalmology (TKE), Benha Faculty of Medicine, Benha University; Department of Ophthalmology (AKH), Faculty of Medicine, South Valley University, Qena; Department of Ophthalmology (OS), Research Institute of Ophthalmology, Giza, Egypt; Department of Ophthalmology (OS), Qassim University Medical City, Al-Qassim, Saudi Arabia; Department of Ophthalmology (RG, AME), Boston Children's Hospital, Harvard Medical School, MA; and Department of Ophthalmology (ABS), Faculty of Medicine, Ain Shams University, Cairo, Egypt
| | - Ana T Zaldivar
- Chicago Medical School (QAD), Rosalind Franklin University of Medicine and Science, North Chicago, IL; Department of Ophthalmology (QAD, MZC, PHP, ABS, AME), Harvey and Bernice Jones Eye Institute; UAMS College of Medicine (ADB), University of Arkansas for Medical Sciences, Little Rock, AR; Herbert Wertheim College of Medicine (ATZ), Florida International University; Mary & Edward Norton Library of Ophthalmology (ATZ), Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, FL; Department of Ophthalmology (TKE), Benha Faculty of Medicine, Benha University; Department of Ophthalmology (AKH), Faculty of Medicine, South Valley University, Qena; Department of Ophthalmology (OS), Research Institute of Ophthalmology, Giza, Egypt; Department of Ophthalmology (OS), Qassim University Medical City, Al-Qassim, Saudi Arabia; Department of Ophthalmology (RG, AME), Boston Children's Hospital, Harvard Medical School, MA; and Department of Ophthalmology (ABS), Faculty of Medicine, Ain Shams University, Cairo, Egypt
| | - Muhammad Z Chauhan
- Chicago Medical School (QAD), Rosalind Franklin University of Medicine and Science, North Chicago, IL; Department of Ophthalmology (QAD, MZC, PHP, ABS, AME), Harvey and Bernice Jones Eye Institute; UAMS College of Medicine (ADB), University of Arkansas for Medical Sciences, Little Rock, AR; Herbert Wertheim College of Medicine (ATZ), Florida International University; Mary & Edward Norton Library of Ophthalmology (ATZ), Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, FL; Department of Ophthalmology (TKE), Benha Faculty of Medicine, Benha University; Department of Ophthalmology (AKH), Faculty of Medicine, South Valley University, Qena; Department of Ophthalmology (OS), Research Institute of Ophthalmology, Giza, Egypt; Department of Ophthalmology (OS), Qassim University Medical City, Al-Qassim, Saudi Arabia; Department of Ophthalmology (RG, AME), Boston Children's Hospital, Harvard Medical School, MA; and Department of Ophthalmology (ABS), Faculty of Medicine, Ain Shams University, Cairo, Egypt
| | - Taher K Eleiwa
- Chicago Medical School (QAD), Rosalind Franklin University of Medicine and Science, North Chicago, IL; Department of Ophthalmology (QAD, MZC, PHP, ABS, AME), Harvey and Bernice Jones Eye Institute; UAMS College of Medicine (ADB), University of Arkansas for Medical Sciences, Little Rock, AR; Herbert Wertheim College of Medicine (ATZ), Florida International University; Mary & Edward Norton Library of Ophthalmology (ATZ), Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, FL; Department of Ophthalmology (TKE), Benha Faculty of Medicine, Benha University; Department of Ophthalmology (AKH), Faculty of Medicine, South Valley University, Qena; Department of Ophthalmology (OS), Research Institute of Ophthalmology, Giza, Egypt; Department of Ophthalmology (OS), Qassim University Medical City, Al-Qassim, Saudi Arabia; Department of Ophthalmology (RG, AME), Boston Children's Hospital, Harvard Medical School, MA; and Department of Ophthalmology (ABS), Faculty of Medicine, Ain Shams University, Cairo, Egypt
| | - Amr K Hassan
- Chicago Medical School (QAD), Rosalind Franklin University of Medicine and Science, North Chicago, IL; Department of Ophthalmology (QAD, MZC, PHP, ABS, AME), Harvey and Bernice Jones Eye Institute; UAMS College of Medicine (ADB), University of Arkansas for Medical Sciences, Little Rock, AR; Herbert Wertheim College of Medicine (ATZ), Florida International University; Mary & Edward Norton Library of Ophthalmology (ATZ), Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, FL; Department of Ophthalmology (TKE), Benha Faculty of Medicine, Benha University; Department of Ophthalmology (AKH), Faculty of Medicine, South Valley University, Qena; Department of Ophthalmology (OS), Research Institute of Ophthalmology, Giza, Egypt; Department of Ophthalmology (OS), Qassim University Medical City, Al-Qassim, Saudi Arabia; Department of Ophthalmology (RG, AME), Boston Children's Hospital, Harvard Medical School, MA; and Department of Ophthalmology (ABS), Faculty of Medicine, Ain Shams University, Cairo, Egypt
| | - Omar Solyman
- Chicago Medical School (QAD), Rosalind Franklin University of Medicine and Science, North Chicago, IL; Department of Ophthalmology (QAD, MZC, PHP, ABS, AME), Harvey and Bernice Jones Eye Institute; UAMS College of Medicine (ADB), University of Arkansas for Medical Sciences, Little Rock, AR; Herbert Wertheim College of Medicine (ATZ), Florida International University; Mary & Edward Norton Library of Ophthalmology (ATZ), Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, FL; Department of Ophthalmology (TKE), Benha Faculty of Medicine, Benha University; Department of Ophthalmology (AKH), Faculty of Medicine, South Valley University, Qena; Department of Ophthalmology (OS), Research Institute of Ophthalmology, Giza, Egypt; Department of Ophthalmology (OS), Qassim University Medical City, Al-Qassim, Saudi Arabia; Department of Ophthalmology (RG, AME), Boston Children's Hospital, Harvard Medical School, MA; and Department of Ophthalmology (ABS), Faculty of Medicine, Ain Shams University, Cairo, Egypt
| | - Ryan Gise
- Chicago Medical School (QAD), Rosalind Franklin University of Medicine and Science, North Chicago, IL; Department of Ophthalmology (QAD, MZC, PHP, ABS, AME), Harvey and Bernice Jones Eye Institute; UAMS College of Medicine (ADB), University of Arkansas for Medical Sciences, Little Rock, AR; Herbert Wertheim College of Medicine (ATZ), Florida International University; Mary & Edward Norton Library of Ophthalmology (ATZ), Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, FL; Department of Ophthalmology (TKE), Benha Faculty of Medicine, Benha University; Department of Ophthalmology (AKH), Faculty of Medicine, South Valley University, Qena; Department of Ophthalmology (OS), Research Institute of Ophthalmology, Giza, Egypt; Department of Ophthalmology (OS), Qassim University Medical City, Al-Qassim, Saudi Arabia; Department of Ophthalmology (RG, AME), Boston Children's Hospital, Harvard Medical School, MA; and Department of Ophthalmology (ABS), Faculty of Medicine, Ain Shams University, Cairo, Egypt
| | - Paul H Phillips
- Chicago Medical School (QAD), Rosalind Franklin University of Medicine and Science, North Chicago, IL; Department of Ophthalmology (QAD, MZC, PHP, ABS, AME), Harvey and Bernice Jones Eye Institute; UAMS College of Medicine (ADB), University of Arkansas for Medical Sciences, Little Rock, AR; Herbert Wertheim College of Medicine (ATZ), Florida International University; Mary & Edward Norton Library of Ophthalmology (ATZ), Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, FL; Department of Ophthalmology (TKE), Benha Faculty of Medicine, Benha University; Department of Ophthalmology (AKH), Faculty of Medicine, South Valley University, Qena; Department of Ophthalmology (OS), Research Institute of Ophthalmology, Giza, Egypt; Department of Ophthalmology (OS), Qassim University Medical City, Al-Qassim, Saudi Arabia; Department of Ophthalmology (RG, AME), Boston Children's Hospital, Harvard Medical School, MA; and Department of Ophthalmology (ABS), Faculty of Medicine, Ain Shams University, Cairo, Egypt
| | - Ahmed B Sallam
- Chicago Medical School (QAD), Rosalind Franklin University of Medicine and Science, North Chicago, IL; Department of Ophthalmology (QAD, MZC, PHP, ABS, AME), Harvey and Bernice Jones Eye Institute; UAMS College of Medicine (ADB), University of Arkansas for Medical Sciences, Little Rock, AR; Herbert Wertheim College of Medicine (ATZ), Florida International University; Mary & Edward Norton Library of Ophthalmology (ATZ), Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, FL; Department of Ophthalmology (TKE), Benha Faculty of Medicine, Benha University; Department of Ophthalmology (AKH), Faculty of Medicine, South Valley University, Qena; Department of Ophthalmology (OS), Research Institute of Ophthalmology, Giza, Egypt; Department of Ophthalmology (OS), Qassim University Medical City, Al-Qassim, Saudi Arabia; Department of Ophthalmology (RG, AME), Boston Children's Hospital, Harvard Medical School, MA; and Department of Ophthalmology (ABS), Faculty of Medicine, Ain Shams University, Cairo, Egypt
| | - Abdelrahman M Elhusseiny
- Chicago Medical School (QAD), Rosalind Franklin University of Medicine and Science, North Chicago, IL; Department of Ophthalmology (QAD, MZC, PHP, ABS, AME), Harvey and Bernice Jones Eye Institute; UAMS College of Medicine (ADB), University of Arkansas for Medical Sciences, Little Rock, AR; Herbert Wertheim College of Medicine (ATZ), Florida International University; Mary & Edward Norton Library of Ophthalmology (ATZ), Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, FL; Department of Ophthalmology (TKE), Benha Faculty of Medicine, Benha University; Department of Ophthalmology (AKH), Faculty of Medicine, South Valley University, Qena; Department of Ophthalmology (OS), Research Institute of Ophthalmology, Giza, Egypt; Department of Ophthalmology (OS), Qassim University Medical City, Al-Qassim, Saudi Arabia; Department of Ophthalmology (RG, AME), Boston Children's Hospital, Harvard Medical School, MA; and Department of Ophthalmology (ABS), Faculty of Medicine, Ain Shams University, Cairo, Egypt
| |
Collapse
|
2
|
Gupta M, Gupta P, Ho C, Wood J, Guleria S, Virostko J. Can generative AI improve the readability of patient education materials at a radiology practice? Clin Radiol 2024; 79:e1366-e1371. [PMID: 39266371 DOI: 10.1016/j.crad.2024.08.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Revised: 08/08/2024] [Accepted: 08/12/2024] [Indexed: 09/14/2024]
Abstract
AIM This study evaluated the readability of existing patient education materials and explored the potential of generative AI tools, such as ChatGPT-4 and Google Gemini, to simplify these materials to a sixth-grade reading level, in accordance with guidelines. MATERIALS AND METHODS Seven patient education documents were selected from a major radiology group. ChatGPT-4 and Gemini were provided the documents and asked to reformulate to target a sixth-grade reading level. Average reading level (ARL) and proportional word count (PWC) change were calculated, and a 1-sample t-test was conducted (p=0.05). Three radiologists assessed the materials on a Likert scale for appropriateness, relevance, clarity, and information retention. RESULTS The original materials had an ARL of 11.72. ChatGPT ARL was 7.32 ± 0.76 (6/7 significant) and Gemini ARL was 6.55 ± 0.51 (7/7 significant). ChatGPT reduced word count by 15% ± 7%, with 95% retaining at least 75% of information. Gemini reduced word count by 33% ± 7%, with 68% retaining at least 75% of information. ChatGPT outputs were more appropriate (95% vs. 57%), clear (92% vs. 67%), and relevant (95% vs. 76%) than Gemini. Interrater agreement was significantly different for ChatGPT (0.91) than for Gemini (0.46). CONCLUSION Generative AI significantly enhances the readability of patient education materials, which did not achieve the recommended sixth-grade ARL. Radiologist evaluations confirmed the appropriateness and relevance of the AI-simplified texts. This study emphasizes the capabilities of generative AI tools and the necessity for ongoing expert review to maintain content accuracy and suitability.
Collapse
Affiliation(s)
- M Gupta
- The University of Texas at Austin, Dell Medical School, Department of Diagnostic Medicine, Austin, TX, USA.
| | - P Gupta
- The University of Texas at Austin, Austin, TX, USA
| | - C Ho
- The University of Texas at Austin, Dell Medical School, Department of Diagnostic Medicine, Austin, TX, USA
| | - J Wood
- The University of Texas at Austin, Dell Medical School, Department of Diagnostic Medicine, Austin, TX, USA
| | - S Guleria
- The University of Texas at Austin, Dell Medical School, Department of Diagnostic Medicine, Austin, TX, USA
| | - J Virostko
- The University of Texas at Austin, Dell Medical School, Department of Diagnostic Medicine, Austin, TX, USA; The University of Texas at Austin, Dell Medical School, Livestrong Cancer Institutes, USA; The University of Texas at Austin, Dell Medical School, Department of Oncology, USA; The University of Texas at Austin, Oden Institute for Computational Engineering and Sciences, USA
| |
Collapse
|
3
|
Kalaw FGP, Baxter SL. Ethical considerations for large language models in ophthalmology. Curr Opin Ophthalmol 2024; 35:438-446. [PMID: 39259616 PMCID: PMC11427135 DOI: 10.1097/icu.0000000000001083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/13/2024]
Abstract
PURPOSE OF REVIEW This review aims to summarize and discuss the ethical considerations regarding large language model (LLM) use in the field of ophthalmology. RECENT FINDINGS This review of 47 articles on LLM applications in ophthalmology highlights their diverse potential uses, including education, research, clinical decision support, and surgical assistance (as an aid in operative notes). We also review ethical considerations such as the inability of LLMs to interpret data accurately, the risk of promoting controversial or harmful recommendations, and breaches of data privacy. These concerns imply the need for cautious integration of artificial intelligence in healthcare, emphasizing human oversight, transparency, and accountability to mitigate risks and uphold ethical standards. SUMMARY The integration of LLMs in ophthalmology offers potential advantages such as aiding in clinical decision support and facilitating medical education through their ability to process queries and analyze ophthalmic imaging and clinical cases. However, their utilization also raises ethical concerns regarding data privacy, potential misinformation, and biases inherent in the datasets used. Awareness of these concerns should be addressed in order to optimize its utility in the healthcare setting. More importantly, promoting responsible and careful use by consumers should be practiced.
Collapse
Affiliation(s)
- Fritz Gerald P Kalaw
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute
- Department of Biomedical Informatics, University of California San Diego Health System, University of California San Diego, La Jolla, California, USA
| | - Sally L Baxter
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute
- Department of Biomedical Informatics, University of California San Diego Health System, University of California San Diego, La Jolla, California, USA
| |
Collapse
|
4
|
Busch F, Hoffmann L, Dos Santos DP, Makowski MR, Saba L, Prucker P, Hadamitzky M, Navab N, Kather JN, Truhn D, Cuocolo R, Adams LC, Bressem KK. Large language models for structured reporting in radiology: past, present, and future. Eur Radiol 2024:10.1007/s00330-024-11107-6. [PMID: 39438330 DOI: 10.1007/s00330-024-11107-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 06/29/2024] [Accepted: 09/01/2024] [Indexed: 10/25/2024]
Abstract
Structured reporting (SR) has long been a goal in radiology to standardize and improve the quality of radiology reports. Despite evidence that SR reduces errors, enhances comprehensiveness, and increases adherence to guidelines, its widespread adoption has been limited. Recently, large language models (LLMs) have emerged as a promising solution to automate and facilitate SR. Therefore, this narrative review aims to provide an overview of LLMs for SR in radiology and beyond. We found that the current literature on LLMs for SR is limited, comprising ten studies on the generative pre-trained transformer (GPT)-3.5 (n = 5) and/or GPT-4 (n = 8), while two studies additionally examined the performance of Perplexity and Bing Chat or IT5. All studies reported promising results and acknowledged the potential of LLMs for SR, with six out of ten studies demonstrating the feasibility of multilingual applications. Building upon these findings, we discuss limitations, regulatory challenges, and further applications of LLMs in radiology report processing, encompassing four main areas: documentation, translation and summarization, clinical evaluation, and data mining. In conclusion, this review underscores the transformative potential of LLMs to improve efficiency and accuracy in SR and radiology report processing. KEY POINTS: Question How can LLMs help make SR in radiology more ubiquitous? Findings Current literature leveraging LLMs for SR is sparse but shows promising results, including the feasibility of multilingual applications. Clinical relevance LLMs have the potential to transform radiology report processing and enable the widespread adoption of SR. However, their future role in clinical practice depends on overcoming current limitations and regulatory challenges, including opaque algorithms and training data.
Collapse
Affiliation(s)
- Felix Busch
- School of Medicine and Health, Department of Diagnostic and Interventional Radiology, Klinikum rechts der Isar, TUM University Hospital, Technical University of Munich, Munich, Germany.
| | - Lena Hoffmann
- School of Medicine and Health, Department of Diagnostic and Interventional Radiology, Klinikum rechts der Isar, TUM University Hospital, Technical University of Munich, Munich, Germany
| | - Daniel Pinto Dos Santos
- Institute for Diagnostic and Interventional Radiology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
- Institute of Diagnostic and Interventional Radiology, University Hospital of Frankfurt, Frankfurt, Germany
| | - Marcus R Makowski
- School of Medicine and Health, Department of Diagnostic and Interventional Radiology, Klinikum rechts der Isar, TUM University Hospital, Technical University of Munich, Munich, Germany
| | - Luca Saba
- Department of Radiology, Azienda Ospedaliero Universitaria (A.O.U.), Cagliari, Italy
| | - Philipp Prucker
- School of Medicine and Health, Department of Diagnostic and Interventional Radiology, Klinikum rechts der Isar, TUM University Hospital, Technical University of Munich, Munich, Germany
| | - Martin Hadamitzky
- School of Medicine and Health, Institute for Cardiovascular Radiology and Nuclear Medicine, German Heart Center Munich, TUM University Hospital, Technical University of Munich, Munich, Germany
| | - Nassir Navab
- Chair for Computer Aided Medical Procedures & Augmented Reality, TUM School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
| | - Jakob Nikolas Kather
- Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg University Hospital, Heidelberg, Germany
- Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, Technical University Dresden, Dresden, Germany
| | - Daniel Truhn
- Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany
| | - Renato Cuocolo
- Department of Medicine, Surgery and Dentistry, University of Salerno, Baronissi, Italy
| | - Lisa C Adams
- School of Medicine and Health, Department of Diagnostic and Interventional Radiology, Klinikum rechts der Isar, TUM University Hospital, Technical University of Munich, Munich, Germany
| | - Keno K Bressem
- School of Medicine and Health, Institute for Cardiovascular Radiology and Nuclear Medicine, German Heart Center Munich, TUM University Hospital, Technical University of Munich, Munich, Germany
| |
Collapse
|
5
|
Bellanda VCF, Santos MLD, Ferraz DA, Jorge R, Melo GB. Applications of ChatGPT in the diagnosis, management, education, and research of retinal diseases: a scoping review. Int J Retina Vitreous 2024; 10:79. [PMID: 39420407 PMCID: PMC11487877 DOI: 10.1186/s40942-024-00595-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Accepted: 10/04/2024] [Indexed: 10/19/2024] Open
Abstract
PURPOSE This scoping review aims to explore the current applications of ChatGPT in the retina field, highlighting its potential, challenges, and limitations. METHODS A comprehensive literature search was conducted across multiple databases, including PubMed, Scopus, MEDLINE, and Embase, to identify relevant articles published from 2022 onwards. The inclusion criteria focused on studies evaluating the use of ChatGPT in retinal healthcare. Data were extracted and synthesized to map the scope of ChatGPT's applications in retinal care, categorizing articles into various practical application areas such as academic research, charting, coding, diagnosis, disease management, and patient counseling. RESULTS A total of 68 articles were included in the review, distributed across several categories: 8 related to academics and research, 5 to charting, 1 to coding and billing, 44 to diagnosis, 49 to disease management, 2 to literature consulting, 23 to medical education, and 33 to patient counseling. Many articles were classified into multiple categories due to overlapping topics. The findings indicate that while ChatGPT shows significant promise in areas such as medical education and diagnostic support, concerns regarding accuracy, reliability, and the potential for misinformation remain prevalent. CONCLUSION ChatGPT offers substantial potential in advancing retinal healthcare by supporting clinical decision-making, enhancing patient education, and automating administrative tasks. However, its current limitations, particularly in clinical accuracy and the risk of generating misinformation, necessitate cautious integration into practice, with continuous oversight from healthcare professionals. Future developments should focus on improving accuracy, incorporating up-to-date medical guidelines, and minimizing the risks associated with AI-driven healthcare tools.
Collapse
Affiliation(s)
- Victor C F Bellanda
- Ribeirão Preto Medical School, University of São Paulo, 3900 Bandeirantes Ave, Ribeirão Preto, SP, 14049-900, Brazil.
| | | | | | - Rodrigo Jorge
- Ribeirão Preto Medical School, University of São Paulo, 3900 Bandeirantes Ave, Ribeirão Preto, SP, 14049-900, Brazil
| | - Gustavo Barreto Melo
- Sergipe Eye Hospital, Aracaju, SE, Brazil
- Paulista School of Medicine, Federal University of São Paulo, São Paulo, SP, Brazil
| |
Collapse
|
6
|
Dihan Q, Chauhan MZ, Eleiwa TK, Brown AD, Hassan AK, Khodeiry MM, Elsheikh RH, Oke I, Nihalani BR, VanderVeen DK, Sallam AB, Elhusseiny AM. Large language models: a new frontier in paediatric cataract patient education. Br J Ophthalmol 2024; 108:1470-1476. [PMID: 39174290 DOI: 10.1136/bjo-2024-325252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 07/24/2024] [Indexed: 08/24/2024]
Abstract
BACKGROUND/AIMS This was a cross-sectional comparative study. We evaluated the ability of three large language models (LLMs) (ChatGPT-3.5, ChatGPT-4, and Google Bard) to generate novel patient education materials (PEMs) and improve the readability of existing PEMs on paediatric cataract. METHODS We compared LLMs' responses to three prompts. Prompt A requested they write a handout on paediatric cataract that was 'easily understandable by an average American.' Prompt B modified prompt A and requested the handout be written at a 'sixth-grade reading level, using the Simple Measure of Gobbledygook (SMOG) readability formula.' Prompt C rewrote existing PEMs on paediatric cataract 'to a sixth-grade reading level using the SMOG readability formula'. Responses were compared on their quality (DISCERN; 1 (low quality) to 5 (high quality)), understandability and actionability (Patient Education Materials Assessment Tool (≥70%: understandable, ≥70%: actionable)), accuracy (Likert misinformation; 1 (no misinformation) to 5 (high misinformation) and readability (SMOG, Flesch-Kincaid Grade Level (FKGL); grade level <7: highly readable). RESULTS All LLM-generated responses were of high-quality (median DISCERN ≥4), understandability (≥70%), and accuracy (Likert=1). All LLM-generated responses were not actionable (<70%). ChatGPT-3.5 and ChatGPT-4 prompt B responses were more readable than prompt A responses (p<0.001). ChatGPT-4 generated more readable responses (lower SMOG and FKGL scores; 5.59±0.5 and 4.31±0.7, respectively) than the other two LLMs (p<0.001) and consistently rewrote them to or below the specified sixth-grade reading level (SMOG: 5.14±0.3). CONCLUSION LLMs, particularly ChatGPT-4, proved valuable in generating high-quality, readable, accurate PEMs and in improving the readability of existing materials on paediatric cataract.
Collapse
Affiliation(s)
- Qais Dihan
- Rosalind Franklin University of Medicine and Science Chicago Medical School, North Chicago, Illinois, USA
- Deparment of Ophthalmology, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Muhammad Z Chauhan
- Deparment of Ophthalmology, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Taher K Eleiwa
- Department of Ophthalmology, Benha University, Benha, Egypt
| | - Andrew D Brown
- University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
| | - Amr K Hassan
- Department of Ophthalmology, South Valley University, Qena, Egypt
| | - Mohamed M Khodeiry
- Department of Ophthalmology, University of Kentucky, Lexington, Kentucky, USA
| | - Reem H Elsheikh
- Deparment of Ophthalmology, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Isdin Oke
- Department of Ophthalmology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Bharti R Nihalani
- Department of Ophthalmology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Deborah K VanderVeen
- Department of Ophthalmology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Ahmed B Sallam
- Deparment of Ophthalmology, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Abdelrahman M Elhusseiny
- Deparment of Ophthalmology, University of Arkansas for Medical Sciences, Little Rock, AR, USA
- Department of Ophthalmology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
7
|
Tailor PD, D'Souza HS, Li H, Starr MR. Vision of the future: large language models in ophthalmology. Curr Opin Ophthalmol 2024; 35:391-402. [PMID: 38814572 DOI: 10.1097/icu.0000000000001062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2024]
Abstract
PURPOSE OF REVIEW Large language models (LLMs) are rapidly entering the landscape of medicine in areas from patient interaction to clinical decision-making. This review discusses the evolving role of LLMs in ophthalmology, focusing on their current applications and future potential in enhancing ophthalmic care. RECENT FINDINGS LLMs in ophthalmology have demonstrated potential in improving patient communication and aiding preliminary diagnostics because of their ability to process complex language and generate human-like domain-specific interactions. However, some studies have shown potential for harm and there have been no prospective real-world studies evaluating the safety and efficacy of LLMs in practice. SUMMARY While current applications are largely theoretical and require rigorous safety testing before implementation, LLMs exhibit promise in augmenting patient care quality and efficiency. Challenges such as data privacy and user acceptance must be overcome before LLMs can be fully integrated into clinical practice.
Collapse
Affiliation(s)
| | - Haley S D'Souza
- Department of Ophthalmology, Mayo Clinic, Rochester, Minnesota
| | - Hanzhou Li
- Department of Radiology, Emory University, Atlanta, Georgia, USA
| | - Matthew R Starr
- Department of Ophthalmology, Mayo Clinic, Rochester, Minnesota
| |
Collapse
|
8
|
Carlà MM, Gambini G, Baldascino A, Boselli F, Giannuzzi F, Margollicci F, Rizzo S. Large language models as assistance for glaucoma surgical cases: a ChatGPT vs. Google Gemini comparison. Graefes Arch Clin Exp Ophthalmol 2024; 262:2945-2959. [PMID: 38573349 PMCID: PMC11377518 DOI: 10.1007/s00417-024-06470-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 03/11/2024] [Accepted: 03/20/2024] [Indexed: 04/05/2024] Open
Abstract
PURPOSE The aim of this study was to define the capability of ChatGPT-4 and Google Gemini in analyzing detailed glaucoma case descriptions and suggesting an accurate surgical plan. METHODS Retrospective analysis of 60 medical records of surgical glaucoma was divided into "ordinary" (n = 40) and "challenging" (n = 20) scenarios. Case descriptions were entered into ChatGPT and Bard's interfaces with the question "What kind of surgery would you perform?" and repeated three times to analyze the answers' consistency. After collecting the answers, we assessed the level of agreement with the unified opinion of three glaucoma surgeons. Moreover, we graded the quality of the responses with scores from 1 (poor quality) to 5 (excellent quality), according to the Global Quality Score (GQS) and compared the results. RESULTS ChatGPT surgical choice was consistent with those of glaucoma specialists in 35/60 cases (58%), compared to 19/60 (32%) of Gemini (p = 0.0001). Gemini was not able to complete the task in 16 cases (27%). Trabeculectomy was the most frequent choice for both chatbots (53% and 50% for ChatGPT and Gemini, respectively). In "challenging" cases, ChatGPT agreed with specialists in 9/20 choices (45%), outperforming Google Gemini performances (4/20, 20%). Overall, GQS scores were 3.5 ± 1.2 and 2.1 ± 1.5 for ChatGPT and Gemini (p = 0.002). This difference was even more marked if focusing only on "challenging" cases (1.5 ± 1.4 vs. 3.0 ± 1.5, p = 0.001). CONCLUSION ChatGPT-4 showed a good analysis performance for glaucoma surgical cases, either ordinary or challenging. On the other side, Google Gemini showed strong limitations in this setting, presenting high rates of unprecise or missed answers.
Collapse
Affiliation(s)
- Matteo Mario Carlà
- Ophthalmology Department, Fondazione Policlinico Universitario A. Gemelli, IRCCS, 00168, Rome, Italy.
- Ophthalmology Department, Catholic University "Sacro Cuore,", Largo A. Gemelli, 8, Rome, Italy.
| | - Gloria Gambini
- Ophthalmology Department, Fondazione Policlinico Universitario A. Gemelli, IRCCS, 00168, Rome, Italy
- Ophthalmology Department, Catholic University "Sacro Cuore,", Largo A. Gemelli, 8, Rome, Italy
| | - Antonio Baldascino
- Ophthalmology Department, Fondazione Policlinico Universitario A. Gemelli, IRCCS, 00168, Rome, Italy
- Ophthalmology Department, Catholic University "Sacro Cuore,", Largo A. Gemelli, 8, Rome, Italy
| | - Francesco Boselli
- Ophthalmology Department, Fondazione Policlinico Universitario A. Gemelli, IRCCS, 00168, Rome, Italy
- Ophthalmology Department, Catholic University "Sacro Cuore,", Largo A. Gemelli, 8, Rome, Italy
| | - Federico Giannuzzi
- Ophthalmology Department, Fondazione Policlinico Universitario A. Gemelli, IRCCS, 00168, Rome, Italy
- Ophthalmology Department, Catholic University "Sacro Cuore,", Largo A. Gemelli, 8, Rome, Italy
| | - Fabio Margollicci
- Ophthalmology Department, Fondazione Policlinico Universitario A. Gemelli, IRCCS, 00168, Rome, Italy
- Ophthalmology Department, Catholic University "Sacro Cuore,", Largo A. Gemelli, 8, Rome, Italy
| | - Stanislao Rizzo
- Ophthalmology Department, Fondazione Policlinico Universitario A. Gemelli, IRCCS, 00168, Rome, Italy
- Ophthalmology Department, Catholic University "Sacro Cuore,", Largo A. Gemelli, 8, Rome, Italy
| |
Collapse
|
9
|
Dihan Q, Chauhan MZ, Eleiwa TK, Hassan AK, Sallam AB, Khouri AS, Chang TC, Elhusseiny AM. Using Large Language Models to Generate Educational Materials on Childhood Glaucoma. Am J Ophthalmol 2024; 265:28-38. [PMID: 38614196 DOI: 10.1016/j.ajo.2024.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 03/29/2024] [Accepted: 04/03/2024] [Indexed: 04/15/2024]
Abstract
PURPOSE To evaluate the quality, readability, and accuracy of large language model (LLM)-generated patient education materials (PEMs) on childhood glaucoma, and their ability to improve existing the readability of online information. DESIGN Cross-sectional comparative study. METHODS We evaluated responses of ChatGPT-3.5, ChatGPT-4, and Bard to 3 separate prompts requesting that they write PEMs on "childhood glaucoma." Prompt A required PEMs be "easily understandable by the average American." Prompt B required that PEMs be written "at a 6th-grade level using Simple Measure of Gobbledygook (SMOG) readability formula." We then compared responses' quality (DISCERN questionnaire, Patient Education Materials Assessment Tool [PEMAT]), readability (SMOG, Flesch-Kincaid Grade Level [FKGL]), and accuracy (Likert Misinformation scale). To assess the improvement of readability for existing online information, Prompt C requested that LLM rewrite 20 resources from a Google search of keyword "childhood glaucoma" to the American Medical Association-recommended "6th-grade level." Rewrites were compared on key metrics such as readability, complex words (≥3 syllables), and sentence count. RESULTS All 3 LLMs generated PEMs that were of high quality, understandability, and accuracy (DISCERN ≥4, ≥70% PEMAT understandability, Misinformation score = 1). Prompt B responses were more readable than Prompt A responses for all 3 LLM (P ≤ .001). ChatGPT-4 generated the most readable PEMs compared to ChatGPT-3.5 and Bard (P ≤ .001). Although Prompt C responses showed consistent reduction of mean SMOG and FKGL scores, only ChatGPT-4 achieved the specified 6th-grade reading level (4.8 ± 0.8 and 3.7 ± 1.9, respectively). CONCLUSIONS LLMs can serve as strong supplemental tools in generating high-quality, accurate, and novel PEMs, and improving the readability of existing PEMs on childhood glaucoma.
Collapse
Affiliation(s)
- Qais Dihan
- Chicago Medical School (Q.D.), Rosalind Franklin University of Medicine and Science, North Chicago, Illinois, USA; Department of Ophthalmology (Q.D., M.Z.C., A.B.S., A.M.E.), Harvey and Bernice Jones Eye Institute, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
| | - Muhammad Z Chauhan
- Department of Ophthalmology (Q.D., M.Z.C., A.B.S., A.M.E.), Harvey and Bernice Jones Eye Institute, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
| | - Taher K Eleiwa
- Department of Ophthalmology (T.K.E.), Benha Faculty of Medicine, Benha University, Benha, Egypt
| | - Amr K Hassan
- Department of Ophthalmology (A.K.H.), Faculty of Medicine, South Valley University, Qena, Egypt
| | - Ahmed B Sallam
- Department of Ophthalmology (Q.D., M.Z.C., A.B.S., A.M.E.), Harvey and Bernice Jones Eye Institute, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA; Department of Ophthalmology (A.B.S.), Faculty of Medicine, Ain Shams University, Cairo, Egypt
| | - Albert S Khouri
- Institute of Ophthalmology & Visual Science (A.S.K.), Rutgers New Jersey Medical School, Newark, New Jersey, USA
| | - Ta C Chang
- Department of Ophthalmology (T.C.C.), Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, Florida, USA
| | - Abdelrahman M Elhusseiny
- Department of Ophthalmology (Q.D., M.Z.C., A.B.S., A.M.E.), Harvey and Bernice Jones Eye Institute, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA; Department of Ophthalmology (A.M.E.), Boston Children's Hospital, Harvard Medical School, Boston, Massachusetts, USA.
| |
Collapse
|
10
|
Eleiwa TK, Dihan QA, Brown AD, Zaldivar AT, Abdelnaem SE, Sallam AB, Phillips PH, Elnahry AG, Elhusseiny AM. Quality, Reliability, Readability, and Accountability of Online Information on Leukocoria. J Pediatr Ophthalmol Strabismus 2024; 61:332-338. [PMID: 38815099 DOI: 10.3928/01913913-20240425-02] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
PURPOSE To evaluate the quality, reliability, and readability of online patient educational materials on leukocoria. METHODS In this cross-sectional study, the Google search engine was searched for the terms "leukocoria" and "white pupil." The first 50 search outcomes were evaluated for each search term based on predefined inclusion criteria, excluding duplicates, peer-reviewed papers, forum posts, paywalled content, and multimedia links. Sources were categorized as "institutional" or "private." Three independent raters assessed each web-site for quality and reliability using DISCERN, Health on the Net Code of Conduct (HONcode), and JAMA criteria. Readability was evaluated using seven formulas: Flesch Reading Ease (FRE), Flesch-Kincaid Grade Level (FKGL), Simple Measure of Gobbledygook (SMOG) Index, Automated Readability Index (ARI), Linsear Write (LW), Gunning Fog Index (GFI), and Coleman-Liau Index (CLI). RESULTS A total of 51 websites were included. Quality, assessed by the DISCERN tool, showed a median score of 4, denoting moderate to high quality, with no significant differences between institutional and private sites or search terms. HONcode scores indicated variable reliability and trustworthiness (median: 10, range: 3 to 16), with institutional sites excelling in financial disclosure and ad differentiation. Additionally, institutional and private sites performed well in reliability and accountability, as measured by the JAMA Benchmark criteria (median: 3; range: 1 to 4). Readability, averaging an 11.3 ± 3.7 grade level, did not differ significantly between site types or search terms, consistently falling short of the recommended sixth-grade level for patient educational materials. CONCLUSIONS The patient educational materials on leukocoria demonstrated moderate to high quality, commendable reliability, and accountability. However, the readability scores were above the recommended level for the layperson. [J Pediatr Ophthalmol Strabismus. 2024;61(5):332-338.].
Collapse
|
11
|
Geantă M, Bădescu D, Chirca N, Nechita OC, Radu CG, Rascu S, Rădăvoi D, Sima C, Toma C, Jinga V. The Potential Impact of Large Language Models on Doctor-Patient Communication: A Case Study in Prostate Cancer. Healthcare (Basel) 2024; 12:1548. [PMID: 39120251 PMCID: PMC11311818 DOI: 10.3390/healthcare12151548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 07/16/2024] [Accepted: 08/03/2024] [Indexed: 08/10/2024] Open
Abstract
BACKGROUND In recent years, the integration of large language models (LLMs) into healthcare has emerged as a revolutionary approach to enhancing doctor-patient communication, particularly in the management of diseases such as prostate cancer. METHODS Our paper evaluated the effectiveness of three prominent LLMs-ChatGPT (3.5), Gemini (Pro), and Co-Pilot (the free version)-against the official Romanian Patient's Guide on prostate cancer. Employing a randomized and blinded method, our study engaged eight medical professionals to assess the responses of these models based on accuracy, timeliness, comprehensiveness, and user-friendliness. RESULTS The primary objective was to explore whether LLMs, when operating in Romanian, offer comparable or superior performance to the Patient's Guide, considering their potential to personalize communication and enhance the informational accessibility for patients. Results indicated that LLMs, particularly ChatGPT, generally provided more accurate and user-friendly information compared to the Guide. CONCLUSIONS The findings suggest a significant potential for LLMs to enhance healthcare communication by providing accurate and accessible information. However, variability in performance across different models underscores the need for tailored implementation strategies. We highlight the importance of integrating LLMs with a nuanced understanding of their capabilities and limitations to optimize their use in clinical settings.
Collapse
Affiliation(s)
- Marius Geantă
- Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania
- Center for Innovation in Medicine, 42J Theodor Pallady Bvd., 032266 Bucharest, Romania
- United Nations University—Maastricht Economic and Social Research Institute on Innovation and Technology, Boschstraat 24, 6211 AX Maastricht, The Netherlands
| | - Daniel Bădescu
- Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania
- Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
| | - Narcis Chirca
- Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania
- Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
| | - Ovidiu Cătălin Nechita
- Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania
- Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
| | - Cosmin George Radu
- Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
| | - Stefan Rascu
- Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania
- Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
| | - Daniel Rădăvoi
- Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania
- Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
| | - Cristian Sima
- Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania
- Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
| | - Cristian Toma
- Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania
- Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
| | - Viorel Jinga
- Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania
- Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
- Academy of Romanian Scientists, 3 Ilfov, 050085 Bucharest, Romania
| |
Collapse
|
12
|
Cohen SA, Brant A, Fisher AC, Pershing S, Do D, Pan C. Dr. Google vs. Dr. ChatGPT: Exploring the Use of Artificial Intelligence in Ophthalmology by Comparing the Accuracy, Safety, and Readability of Responses to Frequently Asked Patient Questions Regarding Cataracts and Cataract Surgery. Semin Ophthalmol 2024; 39:472-479. [PMID: 38516983 DOI: 10.1080/08820538.2024.2326058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 02/25/2024] [Accepted: 02/27/2024] [Indexed: 03/23/2024]
Abstract
PURPOSE Patients are using online search modalities to learn about their eye health. While Google remains the most popular search engine, the use of large language models (LLMs) like ChatGPT has increased. Cataract surgery is the most common surgical procedure in the US, and there is limited data on the quality of online information that populates after searches related to cataract surgery on search engines such as Google and LLM platforms such as ChatGPT. We identified the most common patient frequently asked questions (FAQs) about cataracts and cataract surgery and evaluated the accuracy, safety, and readability of the answers to these questions provided by both Google and ChatGPT. We demonstrated the utility of ChatGPT in writing notes and creating patient education materials. METHODS The top 20 FAQs related to cataracts and cataract surgery were recorded from Google. Responses to the questions provided by Google and ChatGPT were evaluated by a panel of ophthalmologists for accuracy and safety. Evaluators were also asked to distinguish between Google and LLM chatbot answers. Five validated readability indices were used to assess the readability of responses. ChatGPT was instructed to generate operative notes, post-operative instructions, and customizable patient education materials according to specific readability criteria. RESULTS Responses to 20 patient FAQs generated by ChatGPT were significantly longer and written at a higher reading level than responses provided by Google (p < .001), with an average grade level of 14.8 (college level). Expert reviewers were correctly able to distinguish between a human-reviewed and chatbot generated response an average of 31% of the time. Google answers contained incorrect or inappropriate material 27% of the time, compared with 6% of LLM generated answers (p < .001). When expert reviewers were asked to compare the responses directly, chatbot responses were favored (66%). CONCLUSIONS When comparing the responses to patients' cataract FAQs provided by ChatGPT and Google, practicing ophthalmologists overwhelming preferred ChatGPT responses. LLM chatbot responses were less likely to contain inaccurate information. ChatGPT represents a viable information source for eye health for patients with higher health literacy. ChatGPT may also be used by ophthalmologists to create customizable patient education materials for patients with varying health literacy.
Collapse
Affiliation(s)
- Samuel A Cohen
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Arthur Brant
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Ann Caroline Fisher
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Suzann Pershing
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Diana Do
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Carolyn Pan
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| |
Collapse
|
13
|
Rojas-Carabali W, Cifuentes-González C, Gutierrez-Sinisterra L, Heng LY, Tsui E, Gangaputra S, Sadda S, Nguyen QD, Kempen JH, Pavesio CE, Gupta V, Raman R, Miao C, Lee B, de-la-Torre A, Agrawal R. Managing a patient with uveitis in the era of artificial intelligence: Current approaches, emerging trends, and future perspectives. Asia Pac J Ophthalmol (Phila) 2024; 13:100082. [PMID: 39019261 DOI: 10.1016/j.apjo.2024.100082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 06/30/2024] [Accepted: 07/04/2024] [Indexed: 07/19/2024] Open
Abstract
The integration of artificial intelligence (AI) with healthcare has opened new avenues for diagnosing, treating, and managing medical conditions with remarkable precision. Uveitis, a diverse group of rare eye conditions characterized by inflammation of the uveal tract, exemplifies the complexities in ophthalmology due to its varied causes, clinical presentations, and responses to treatments. Uveitis, if not managed promptly and effectively, can lead to significant visual impairment. However, its management requires specialized knowledge, which is often lacking, particularly in regions with limited access to health services. AI's capabilities in pattern recognition, data analysis, and predictive modelling offer significant potential to revolutionize uveitis management. AI can classify disease etiologies, analyze multimodal imaging data, predict outcomes, and identify new therapeutic targets. However, transforming these AI models into clinical applications and meeting patient expectations involves overcoming challenges like acquiring extensive, annotated datasets, ensuring algorithmic transparency, and validating these models in real-world settings. This review delves into the complexities of uveitis and the current AI landscape, discussing the development, opportunities, and challenges of AI from theoretical models to bedside application. It also examines the epidemiology of uveitis, the global shortage of uveitis specialists, and the disease's socioeconomic impacts, underlining the critical need for AI-driven approaches. Furthermore, it explores the integration of AI in diagnostic imaging and future directions in ophthalmology, aiming to highlight emerging trends that could transform management of a patient with uveitis and suggesting collaborative efforts to enhance AI applications in clinical practice.
Collapse
Affiliation(s)
- William Rojas-Carabali
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore; Department of Ophthalmology, Tan Tock Seng Hospital, National Healthcare Group Eye Institute, Singapore.
| | - Carlos Cifuentes-González
- Department of Ophthalmology, Tan Tock Seng Hospital, National Healthcare Group Eye Institute, Singapore.
| | - Laura Gutierrez-Sinisterra
- Department of Ophthalmology, Tan Tock Seng Hospital, National Healthcare Group Eye Institute, Singapore.
| | - Lim Yuan Heng
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore.
| | - Edmund Tsui
- Stein Eye Institute, David Geffen of Medicine at UCLA, Los Angeles, CA, USA.
| | - Sapna Gangaputra
- Vanderbilt Eye Institute, Vanderbilt University Medical Center, Nashville, TN, USA.
| | - Srinivas Sadda
- Doheny Eye Institute, David Geffen of Medicine at UCLA, Los Angeles, CA, USA.
| | | | - John H Kempen
- Department of Ophthalmology, Massachusetts Eye and Ear/Harvard Medical School; and Schepens Eye Research Institute; Boston, MA, USA; Department of Ophthalmology, Myungsung Medical College/MCM Comprehensive Specialized Hospital, Addis Abeba, Ethiopia; Sight for Souls, Bellevue, WA, USA.
| | | | - Vishali Gupta
- Advanced Eye Centre, Post, graduate Institute of Medical Education and Research (PGIMER), Chandigarh, India.
| | - Rajiv Raman
- Department of Ophthalmology, Sankara Nethralaya, Chennai, India.
| | - Chunyan Miao
- School of Computer Science and Engineering at Nanyang Technological University, Singapore.
| | - Bernett Lee
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore.
| | - Alejandra de-la-Torre
- Neuroscience Research Group (NEUROS), Neurovitae Center for Neuroscience, Institute of Translational Medicine (IMT), Escuela de Medicina y Ciencias de la Salud, Universidad del Rosario, Bogotá, Colombia.
| | - Rupesh Agrawal
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore; Department of Ophthalmology, Tan Tock Seng Hospital, National Healthcare Group Eye Institute, Singapore; Singapore Eye Research Institute, Singapore; Duke NUS Medical School, Singapore.
| |
Collapse
|
14
|
Kianian R, Sun D, Crowell EL, Tsui E. Reply. Ophthalmol Retina 2024; 8:e15-e16. [PMID: 38363242 DOI: 10.1016/j.oret.2024.01.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 01/03/2024] [Accepted: 01/08/2024] [Indexed: 02/17/2024]
Affiliation(s)
- Reza Kianian
- Stein Eye Institute, Department of Ophthalmology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California
| | - Deyu Sun
- Stein Eye Institute, Department of Ophthalmology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California
| | - Eric L Crowell
- Mitchel and Shannon Wong Eye Institute, Dell Medical School at the University of Texas at Austin, Austin, Texas
| | - Edmund Tsui
- Stein Eye Institute, Department of Ophthalmology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California.
| |
Collapse
|
15
|
Eleiwa TK, Elhusseiny AM. Re: Kianian et al.: Enhancing the assessment of large language models in medical information generation (Ophthalmol Retina. 2024;8:195-201). Ophthalmol Retina 2024; 8:e15. [PMID: 38363243 DOI: 10.1016/j.oret.2024.01.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 01/05/2024] [Accepted: 01/08/2024] [Indexed: 02/17/2024]
Affiliation(s)
- Taher K Eleiwa
- Department of Ophthalmology, Benha Faculty of Medicine, Benha University, Benha, Egypt
| | - Abdelrahman M Elhusseiny
- Department of Ophthalmology, Harvey and Bernice Jones Eye Institute, University of Arkansas for Medical Sciences, Little Rock, Arkansas; Department of Ophthalmology, Boston Children's Hospital, Harvard Medical School, Boston, Massachusetts.
| |
Collapse
|
16
|
Biswas S, Davies LN, Sheppard AL, Logan NS, Wolffsohn JS. Utility of artificial intelligence-based large language models in ophthalmic care. Ophthalmic Physiol Opt 2024; 44:641-671. [PMID: 38404172 DOI: 10.1111/opo.13284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 01/23/2024] [Accepted: 01/25/2024] [Indexed: 02/27/2024]
Abstract
PURPOSE With the introduction of ChatGPT, artificial intelligence (AI)-based large language models (LLMs) are rapidly becoming popular within the scientific community. They use natural language processing to generate human-like responses to queries. However, the application of LLMs and comparison of the abilities among different LLMs with their human counterparts in ophthalmic care remain under-reported. RECENT FINDINGS Hitherto, studies in eye care have demonstrated the utility of ChatGPT in generating patient information, clinical diagnosis and passing ophthalmology question-based examinations, among others. LLMs' performance (median accuracy, %) is influenced by factors such as the iteration, prompts utilised and the domain. Human expert (86%) demonstrated the highest proficiency in disease diagnosis, while ChatGPT-4 outperformed others in ophthalmology examinations (75.9%), symptom triaging (98%) and providing information and answering questions (84.6%). LLMs exhibited superior performance in general ophthalmology but reduced accuracy in ophthalmic subspecialties. Although AI-based LLMs like ChatGPT are deemed more efficient than their human counterparts, these AIs are constrained by their nonspecific and outdated training, no access to current knowledge, generation of plausible-sounding 'fake' responses or hallucinations, inability to process images, lack of critical literature analysis and ethical and copyright issues. A comprehensive evaluation of recently published studies is crucial to deepen understanding of LLMs and the potential of these AI-based LLMs. SUMMARY Ophthalmic care professionals should undertake a conservative approach when using AI, as human judgement remains essential for clinical decision-making and monitoring the accuracy of information. This review identified the ophthalmic applications and potential usages which need further exploration. With the advancement of LLMs, setting standards for benchmarking and promoting best practices is crucial. Potential clinical deployment requires the evaluation of these LLMs to move away from artificial settings, delve into clinical trials and determine their usefulness in the real world.
Collapse
Affiliation(s)
- Sayantan Biswas
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - Leon N Davies
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - Amy L Sheppard
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - Nicola S Logan
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - James S Wolffsohn
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| |
Collapse
|
17
|
Roster K, Kann RB, Farabi B, Gronbeck C, Brownstone N, Lipner SR. Readability and Health Literacy Scores for ChatGPT-Generated Dermatology Public Education Materials: Cross-Sectional Analysis of Sunscreen and Melanoma Questions. JMIR DERMATOLOGY 2024; 7:e50163. [PMID: 38446502 PMCID: PMC10955394 DOI: 10.2196/50163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 01/02/2024] [Accepted: 02/06/2024] [Indexed: 03/07/2024] Open
Affiliation(s)
- Katie Roster
- New York Medical College, New York, NY, United States
| | | | - Banu Farabi
- Dermatology Department, NYC Health + Hospital/Metropolitan, New York, NY, United States
| | - Christian Gronbeck
- Department of Dermatology, University of Connecticut HealthCenter, Framington, CT, United States
| | - Nicholas Brownstone
- Department of Dermatology, Temple University Hospital, Philadelphia, PA, United States
| | - Shari R Lipner
- Department of Dermatology, Weill Cornell Medicine, New York, NY, United States
| |
Collapse
|
18
|
Ferro Desideri L, Roth J, Zinkernagel M, Anguita R. "Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration". Int J Retina Vitreous 2023; 9:71. [PMID: 37980501 PMCID: PMC10657493 DOI: 10.1186/s40942-023-00511-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Accepted: 11/11/2023] [Indexed: 11/20/2023] Open
Abstract
INTRODUCTION Age-related macular degeneration (AMD) affects millions of people globally, leading to a surge in online research of putative diagnoses, causing potential misinformation and anxiety in patients and their parents. This study explores the efficacy of artificial intelligence-derived large language models (LLMs) like in addressing AMD patients' questions. METHODS ChatGPT 3.5 (2023), Bing AI (2023), and Google Bard (2023) were adopted as LLMs. Patients' questions were subdivided in two question categories, (a) general medical advice and (b) pre- and post-intravitreal injection advice and classified as (1) accurate and sufficient (2) partially accurate but sufficient and (3) inaccurate and not sufficient. Non-parametric test has been done to compare the means between the 3 LLMs scores and also an analysis of variance and reliability tests were performed among the 3 groups. RESULTS In category a) of questions, the average score was 1.20 (± 0.41) with ChatGPT 3.5, 1.60 (± 0.63) with Bing AI and 1.60 (± 0.73) with Google Bard, showing no significant differences among the 3 groups (p = 0.129). The average score in category b was 1.07 (± 0.27) with ChatGPT 3.5, 1.69 (± 0.63) with Bing AI and 1.38 (± 0.63) with Google Bard, showing a significant difference among the 3 groups (p = 0.0042). Reliability statistics showed Chronbach's α of 0.237 (range 0.448, 0.096-0.544). CONCLUSION ChatGPT 3.5 consistently offered the most accurate and satisfactory responses, particularly with technical queries. While LLMs displayed promise in providing precise information about AMD; however, further improvements are needed especially in more technical questions.
Collapse
Affiliation(s)
- Lorenzo Ferro Desideri
- Department of Ophthalmology, Inselspital, University Hospital of Bern, Bern, Switzerland.
- Bern Photographic Reading Center, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland.
| | - Janice Roth
- Department of Ophthalmology, Inselspital, University Hospital of Bern, Bern, Switzerland
| | - Martin Zinkernagel
- Department of Ophthalmology, Inselspital, University Hospital of Bern, Bern, Switzerland
- Bern Photographic Reading Center, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Rodrigo Anguita
- Department of Ophthalmology, Inselspital, University Hospital of Bern, Bern, Switzerland
- Moorfields Eye Hospital NHS Foundation Trust, City Road, London, EC1V 2PD, UK
| |
Collapse
|