1
|
Cohen SA, Yadlapalli N, Tijerina JD, Alabiad CR, Chang JR, Kinde B, Mahoney NR, Roelofs KA, Woodward JA, Kossler AL. Comparing the Ability of Google and ChatGPT to Accurately Respond to Oculoplastics-Related Patient Questions and Generate Customized Oculoplastics Patient Education Materials. Clin Ophthalmol 2024; 18:2647-2655. [PMID: 39323727 PMCID: PMC11423829 DOI: 10.2147/opth.s480222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Accepted: 09/16/2024] [Indexed: 09/27/2024] Open
Abstract
Purpose To compare the accuracy and readability of responses to oculoplastics patient questions provided by Google and ChatGPT. Additionally, to assess the ability of ChatGPT to create customized patient education materials. Methods We executed a Google search to identify the 3 most frequently asked patient questions (FAQs) related to 10 oculoplastics conditions. FAQs were entered into both the Google search engine and the ChatGPT tool and responses were recorded. Responses were graded for readability using five validated readability indices and for accuracy by six oculoplastics surgeons. ChatGPT was instructed to create patient education materials at various reading levels for 8 oculoplastics procedures. The accuracy and readability of ChatGPT-generated procedural explanations were assessed. Results ChatGPT responses to patient FAQs were written at a significantly higher average grade level than Google responses (grade 15.6 vs 10.0, p < 0.001). ChatGPT responses (93% accuracy) were significantly more accurate (p < 0.001) than Google responses (78% accuracy) and were preferred by expert panelists (79%). ChatGPT accurately explained oculoplastics procedures at an above average reading level. When instructed to rewrite patient education materials at a lower reading level, grade level was reduced by approximately 4 (15.7 vs 11.7, respectively, p < 0.001) without sacrificing accuracy. Conclusion ChatGPT has the potential to provide patients with accurate information regarding their oculoplastics conditions. ChatGPT may also be utilized by oculoplastic surgeons as an accurate tool to provide customizable patient education for patients with varying health literacy. A better understanding of oculoplastics conditions and procedures amongst patients can lead to informed eye care decisions.
Collapse
Affiliation(s)
- Samuel A Cohen
- Department of Ophthalmology, Stein Eye Institute at University of California Los Angeles David Geffen School of Medicine, Los Angeles, CA, USA
| | - Nikhita Yadlapalli
- Department of Ophthalmology, FIU Herbert Wertheim College of Medicine, Miami, FL, USA
| | - Jonathan D Tijerina
- Department of Ophthalmology, Bascom Palmer Eye Institute at University of Miami Miller School of Medicine, Miami, FL, USA
| | - Chrisfouad R Alabiad
- Department of Ophthalmology, Bascom Palmer Eye Institute at University of Miami Miller School of Medicine, Miami, FL, USA
| | - Jessica R Chang
- Department of Ophthalmology, USC Roski Eye Institute at University of Southern California Keck School of Medicine, Los Angeles, CA, USA
| | - Benyam Kinde
- Department of Ophthalmology, Byers Eye Institute at Stanford University School of Medicine, Palo Alto, CA, USA
| | - Nicholas R Mahoney
- Department of Ophthalmology, Wilmer Eye Institute at Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Kelsey A Roelofs
- Department of Ophthalmology, Stein Eye Institute at University of California Los Angeles David Geffen School of Medicine, Los Angeles, CA, USA
| | - Julie A Woodward
- Department of Ophthalmology, Duke Eye Center at Duke University School of Medicine, Durham, NC, USA
| | - Andrea L Kossler
- Department of Ophthalmology, Byers Eye Institute at Stanford University School of Medicine, Palo Alto, CA, USA
| |
Collapse
|
2
|
Cohen SA, Brant A, Fisher AC, Pershing S, Do D, Pan C. Dr. Google vs. Dr. ChatGPT: Exploring the Use of Artificial Intelligence in Ophthalmology by Comparing the Accuracy, Safety, and Readability of Responses to Frequently Asked Patient Questions Regarding Cataracts and Cataract Surgery. Semin Ophthalmol 2024; 39:472-479. [PMID: 38516983 DOI: 10.1080/08820538.2024.2326058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 02/25/2024] [Accepted: 02/27/2024] [Indexed: 03/23/2024]
Abstract
PURPOSE Patients are using online search modalities to learn about their eye health. While Google remains the most popular search engine, the use of large language models (LLMs) like ChatGPT has increased. Cataract surgery is the most common surgical procedure in the US, and there is limited data on the quality of online information that populates after searches related to cataract surgery on search engines such as Google and LLM platforms such as ChatGPT. We identified the most common patient frequently asked questions (FAQs) about cataracts and cataract surgery and evaluated the accuracy, safety, and readability of the answers to these questions provided by both Google and ChatGPT. We demonstrated the utility of ChatGPT in writing notes and creating patient education materials. METHODS The top 20 FAQs related to cataracts and cataract surgery were recorded from Google. Responses to the questions provided by Google and ChatGPT were evaluated by a panel of ophthalmologists for accuracy and safety. Evaluators were also asked to distinguish between Google and LLM chatbot answers. Five validated readability indices were used to assess the readability of responses. ChatGPT was instructed to generate operative notes, post-operative instructions, and customizable patient education materials according to specific readability criteria. RESULTS Responses to 20 patient FAQs generated by ChatGPT were significantly longer and written at a higher reading level than responses provided by Google (p < .001), with an average grade level of 14.8 (college level). Expert reviewers were correctly able to distinguish between a human-reviewed and chatbot generated response an average of 31% of the time. Google answers contained incorrect or inappropriate material 27% of the time, compared with 6% of LLM generated answers (p < .001). When expert reviewers were asked to compare the responses directly, chatbot responses were favored (66%). CONCLUSIONS When comparing the responses to patients' cataract FAQs provided by ChatGPT and Google, practicing ophthalmologists overwhelming preferred ChatGPT responses. LLM chatbot responses were less likely to contain inaccurate information. ChatGPT represents a viable information source for eye health for patients with higher health literacy. ChatGPT may also be used by ophthalmologists to create customizable patient education materials for patients with varying health literacy.
Collapse
Affiliation(s)
- Samuel A Cohen
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Arthur Brant
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Ann Caroline Fisher
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Suzann Pershing
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Diana Do
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Carolyn Pan
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| |
Collapse
|
3
|
Cohen SA, Brant A, Rayess N, Rahimy E, Pan C, Fisher AC, Pershing S, Do D. Google Trends-Assisted Analysis of the Readability, Accountability, and Accessibility of Online Patient Education Materials for the Treatment of AMD After US FDA Approval of Pegcetacoplan. JOURNAL OF VITREORETINAL DISEASES 2024; 8:421-427. [PMID: 39148568 PMCID: PMC11323505 DOI: 10.1177/24741264241250156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Purpose: To evaluate the readability, accountability, accessibility, and source of online patient education materials for treatment of age-related macular degeneration (AMD) and to quantify public interest in Syfovre and geographic atrophy after US Food and Drug Administration (FDA) approval. Methods: Websites were classified into 4 categories by information source. Readability was assessed using 5 validated readability indices. Accountability was assessed using 4 benchmarks of the Journal of the American Medical Association (JAMA). Accessibility was evaluated using 3 established criteria. The Google Trends tool was used to evaluate temporal trends in public interest in "Syfovre" and "geographic atrophy" in the months after FDA approval. Results: Of 100 websites analyzed, 22% were written below the recommended sixth-grade reading level. The mean (±SD) grade level of analyzed articles was 9.76 ± 3.35. Websites averaged 1.40 ± 1.39 (of 4) JAMA accountability metrics. The majority of articles (67%) were from private practice/independent organizations. A significant increase in the public interest in the terms "Syfovre" and "geographic atrophy" after FDA approval was found with the Google Trends tool (P < .001). Conclusions: Patient education materials related to AMD treatment are often written at inappropriate reading levels and lack established accountability and accessibility metrics. Articles from national organizations ranked highest on accessibility metrics but were less visible on a Google search, suggesting the need for visibility-enhancing measures. Patient education materials related to the term "Syfovre" had the highest average reading level and low accountability, suggesting the need to modify resources to best address the needs of an increasingly curious public.
Collapse
Affiliation(s)
- Samuel A. Cohen
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Arthur Brant
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Ehsan Rahimy
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Carolyn Pan
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Ann Caroline Fisher
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Suzann Pershing
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Diana Do
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| |
Collapse
|
4
|
Cohen SA, Fisher AC, Xu BY, Song BJ. Comparing the Accuracy and Readability of Glaucoma-related Question Responses and Educational Materials by Google and ChatGPT. J Curr Glaucoma Pract 2024; 18:110-116. [PMID: 39575130 PMCID: PMC11576343 DOI: 10.5005/jp-journals-10078-1448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2024] [Accepted: 09/17/2024] [Indexed: 11/24/2024] Open
Abstract
Aim and background Patients are increasingly turning to the internet to learn more about their ocular disease. In this study, we sought (1) to compare the accuracy and readability of Google and ChatGPT responses to patients' glaucoma-related frequently asked questions (FAQs) and (2) to evaluate ChatGPT's capacity to improve glaucoma patient education materials by accurately reducing the grade level at which they are written. Materials and methods We executed a Google search to identify the three most common FAQs related to 10 search terms associated with glaucoma diagnosis and treatment. Each of the 30 FAQs was inputted into both Google and ChatGPT and responses were recorded. The accuracy of responses was evaluated by three glaucoma specialists while readability was assessed using five validated readability indices. Subsequently, ChatGPT was instructed to generate patient education materials at specific reading levels to explain seven glaucoma procedures. The accuracy and readability of procedural explanations were measured. Results ChatGPT responses to glaucoma FAQs were significantly more accurate than Google responses (97 vs 77% accuracy, respectively, p < 0.001). ChatGPT responses were also written at a significantly higher reading level (grade 14.3 vs 9.4, respectively, p < 0.001). When instructed to revise glaucoma procedural explanations to improve understandability, ChatGPT reduced the average reading level of educational materials from grade 16.6 (college level) to grade 9.4 (high school level) (p < 0.001) without reducing the accuracy of procedural explanations. Conclusion ChatGPT is more accurate than Google search when responding to glaucoma patient FAQs. ChatGPT successfully reduced the reading level of glaucoma procedural explanations without sacrificing accuracy, with implications for the future of customized patient education for patients with varying health literacy. Clinical significance Our study demonstrates the utility of ChatGPT for patients seeking information about glaucoma and for physicians when creating unique patient education materials at reading levels that optimize understanding by patients. An enhanced patient understanding of glaucoma may lead to informed decision-making and improve treatment compliance. How to cite this article Cohen SA, Fisher AC, Xu BY, et al. Comparing the Accuracy and Readability of Glaucoma-related Question Responses and Educational Materials by Google and ChatGPT. J Curr Glaucoma Pract 2024;18(3):110-116.
Collapse
Affiliation(s)
- Samuel A Cohen
- Department of Ophthalmology, UCLA Stein Eye Institute, Los Angeles, California, United States
| | - Ann C Fisher
- Department of Ophthalmology Byers Eye Institute at Stanford, Stanford, California, United States
| | - Benjamin Y Xu
- Department of Ophthalmology, USC Roski Eye Institute, Los Angeles, California, United States
| | - Brian J Song
- Department of Ophthalmology, USC Roski Eye Institute, Los Angeles, California, United States
| |
Collapse
|
5
|
Lim B, Chai A, Shaalan M. A Cross-Sectional Analysis of the Readability of Online Information Regarding Hip Osteoarthritis. Cureus 2024; 16:e60536. [PMID: 38887325 PMCID: PMC11181007 DOI: 10.7759/cureus.60536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/17/2024] [Indexed: 06/20/2024] Open
Abstract
Introduction Osteoarthritis (OA) is an age-related degenerative joint disease. There is a 25% risk of symptomatic hip OA in patients who live up to 85 years of age. It can impair a person's daily activities and increase their reliance on healthcare services. It is primarily managed with education, weight loss and exercise, supplemented with pharmacological interventions. Poor health literacy is associated with negative treatment outcomes and patient dissatisfaction. A literature search found there are no previously published studies examining the readability of online information about hip OA. Objectives To assess the readability of healthcare websites regarding hip OA. Methods The terms "hip pain", "hip osteoarthritis", "hip arthritis", and "hip OA" were searched on Google and Bing. Of 240 websites initially considered, 74 unique websites underwent evaluation using the WebFX online readability software (WebFX®, Harrisburg, USA). Readability was determined using the Flesch Reading Ease Score (FRES), Flesch-Kincaid Reading Grade Level (FKGL), Gunning Fog Index (GFI), Simple Measure of Gobbledygook (SMOG), Coleman-Liau Index (CLI), and Automated Readability Index (ARI). In line with recommended guidelines and previous studies, FRES >65 or a grade level score of sixth grade and under was considered acceptable. Results The average FRES was 56.74±8.18 (range 29.5-79.4). Only nine (12.16%) websites had a FRES score >65. The average FKGL score was 7.62±1.69 (range 4.2-12.9). Only seven (9.46%) websites were written at or below a sixth-grade level according to the FKGL score. The average GFI score was 9.20±2.09 (range 5.6-16.5). Only one (1.35%) website was written at or below a sixth-grade level according to the GFI score. The average SMOG score was 7.29±1.41 (range 5.4-12.0). Only eight (10.81%) websites were written at or below a sixth-grade level according to the SMOG score. The average CLI score was 13.86±1.75 (range 9.6-19.7). All 36 websites were written above a sixth-grade level according to the CLI score. The average ARI score was 6.91±2.06 (range 3.1-14.0). Twenty-eight (37.84%) websites were written at or below a sixth-grade level according to the ARI score. One-sample t-tests showed that FRES (p<0.001, CI -10.2 to -6.37), FKGL (p<0.001, CI 1.23 to 2.01), GFI (p<0.001, CI 2.72 to 3.69), SMOG (p<0.001, CI 0.97 to 1.62), CLI (p<0.001, CI 7.46 to 8.27), and ARI (p<0.001, CI 0.43 to 1.39) scores were significantly different from the accepted standard. One-way analysis of variance (ANOVA) testing of FRES scores (p=0.009) and CLI scores (p=0.009) showed a significant difference between categories. Post hoc testing showed a significant difference between academic and non-profit categories for FRES scores (p=0.010, CI -15.17 to -1.47) and CLI scores (p=0.008, CI 0.35 to 3.29). Conclusions Most websites regarding hip OA are written above recommended reading levels, hence exceeding the comprehension levels of the average patient. Readability of these resources must be improved to improve patient access to online healthcare information which can lead to improved patient understanding of their own condition and treatment outcomes.
Collapse
Affiliation(s)
- Brandon Lim
- Department of Medicine, School of Medicine, Trinity College Dublin, Dublin, IRL
| | - Ariel Chai
- Department of Medicine, School of Medicine, Trinity College Dublin, Dublin, IRL
| | - Mohamed Shaalan
- Department of Orthopaedics and Traumatology, The Mater Misericordiae University Hospital, Dublin, IRL
- Department of Trauma and Orthopaedics, St James's Hospital, Dublin, IRL
| |
Collapse
|
6
|
Lin MX, Li G, Cui D, Mathews PM, Akpek EK. Usability of Patient Education-Oriented Cataract Surgery Websites. Ophthalmology 2024; 131:499-506. [PMID: 37852419 DOI: 10.1016/j.ophtha.2023.10.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Revised: 09/27/2023] [Accepted: 10/03/2023] [Indexed: 10/20/2023] Open
Abstract
PURPOSE To assess the web accessibility and readability of patient-oriented educational websites for cataract surgery. DESIGN Cross-sectional electronic survey. PARTICIPANTS Websites with information dedicated to educating patients about cataract surgery. METHODS An incognito search for "cataract surgery" was performed using a popular search engine. The top 100 patient-oriented cataract surgery websites that came up were included and categorized as institutional, private practice, or medical organization according to authorship. Each site was assessed for readability using 4 standardized reading grade-level formulas. Accessibility was assessed through multilingual availability, accessibility menu availability, complementary educational video availability, and conformance and adherence to the Web Content Accessibility Guidelines (WCAG) 2.0. A standard t test and chi-square analysis were performed to assess the significance of differences with regard to readability and accessibility among the 3 authorship categories. MAIN OUTCOME MEASURES The main outcome measures were the website's average reading grade level, number of accessibility violations, multilingual availability, accessibility menu availability, complementary educational video availability, accessibility conformance level, and violation of the perceivable, operable, understandable, and robust (POUR) principles according to the WCAG 2.0. RESULTS A total of 32, 55, and 13 sites were affiliated with institutions, private practice, and other medical organizations, respectively. The overall mean reading grade was 11.8 ± 1.6, with higher reading levels observed in private practice websites compared with institutions and medical organizations combined (12.1 vs. 11.4; P = 0.03). Fewer private practice websites had multiple language options compared with institutional and medical organization websites combined (5.5% vs. 20.0%; P = 0.03). More private practice websites had accessibility menus than institutions and medical organizations combined (27.3% vs. 8.9%; P = 0.038). The overall mean number of WCAG 2.0 POUR principle violations was 17.1 ± 23.1 with no significant difference among groups. Eighty-five percent of websites violated the perceivable principle. CONCLUSIONS Available patient-oriented online information for cataract surgery may not be comprehensible to the general public. Readability and accessibility aspects should be considered when designing these resources. FINANCIAL DISCLOSURE(S) The author(s) have no proprietary or commercial interest in any materials discussed in this article.
Collapse
Affiliation(s)
- Michael X Lin
- The Ocular Surface Disease Clinic, The Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Gavin Li
- The Ocular Surface Disease Clinic, The Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, Maryland; Icahn School of Medicine at Mount Sinai, New York, New York
| | - David Cui
- The Ocular Surface Disease Clinic, The Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, Maryland; Krieger Eye Institute, Sinai Hospital of Baltimore, Baltimore, Maryland
| | - Priya M Mathews
- The Ocular Surface Disease Clinic, The Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, Maryland; Center for Sight, Sarasota, Florida
| | - Esen K Akpek
- The Ocular Surface Disease Clinic, The Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, Maryland.
| |
Collapse
|