1
|
Cohen SA, Brant A, Fisher AC, Pershing S, Do D, Pan C. Dr. Google vs. Dr. ChatGPT: Exploring the Use of Artificial Intelligence in Ophthalmology by Comparing the Accuracy, Safety, and Readability of Responses to Frequently Asked Patient Questions Regarding Cataracts and Cataract Surgery. Semin Ophthalmol 2024; 39:472-479. [PMID: 38516983 DOI: 10.1080/08820538.2024.2326058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 02/25/2024] [Accepted: 02/27/2024] [Indexed: 03/23/2024]
Abstract
PURPOSE Patients are using online search modalities to learn about their eye health. While Google remains the most popular search engine, the use of large language models (LLMs) like ChatGPT has increased. Cataract surgery is the most common surgical procedure in the US, and there is limited data on the quality of online information that populates after searches related to cataract surgery on search engines such as Google and LLM platforms such as ChatGPT. We identified the most common patient frequently asked questions (FAQs) about cataracts and cataract surgery and evaluated the accuracy, safety, and readability of the answers to these questions provided by both Google and ChatGPT. We demonstrated the utility of ChatGPT in writing notes and creating patient education materials. METHODS The top 20 FAQs related to cataracts and cataract surgery were recorded from Google. Responses to the questions provided by Google and ChatGPT were evaluated by a panel of ophthalmologists for accuracy and safety. Evaluators were also asked to distinguish between Google and LLM chatbot answers. Five validated readability indices were used to assess the readability of responses. ChatGPT was instructed to generate operative notes, post-operative instructions, and customizable patient education materials according to specific readability criteria. RESULTS Responses to 20 patient FAQs generated by ChatGPT were significantly longer and written at a higher reading level than responses provided by Google (p < .001), with an average grade level of 14.8 (college level). Expert reviewers were correctly able to distinguish between a human-reviewed and chatbot generated response an average of 31% of the time. Google answers contained incorrect or inappropriate material 27% of the time, compared with 6% of LLM generated answers (p < .001). When expert reviewers were asked to compare the responses directly, chatbot responses were favored (66%). CONCLUSIONS When comparing the responses to patients' cataract FAQs provided by ChatGPT and Google, practicing ophthalmologists overwhelming preferred ChatGPT responses. LLM chatbot responses were less likely to contain inaccurate information. ChatGPT represents a viable information source for eye health for patients with higher health literacy. ChatGPT may also be used by ophthalmologists to create customizable patient education materials for patients with varying health literacy.
Collapse
Affiliation(s)
- Samuel A Cohen
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Arthur Brant
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Ann Caroline Fisher
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Suzann Pershing
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Diana Do
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Carolyn Pan
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| |
Collapse
|
2
|
Eid K, Eid A, Wang D, Raiker RS, Chen S, Nguyen J. Optimizing Ophthalmology Patient Education via ChatBot-Generated Materials: Readability Analysis of AI-Generated Patient Education Materials and The American Society of Ophthalmic Plastic and Reconstructive Surgery Patient Brochures. Ophthalmic Plast Reconstr Surg 2024; 40:212-216. [PMID: 37972974 DOI: 10.1097/iop.0000000000002549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2023]
Abstract
PURPOSE This study aims to compare the readability of patient education materials (PEM) of the American Society of Ophthalmic Plastic and Reconstructive Surgery to that of PEMs generated by the AI-chat bots ChatGPT and Google Bard. METHODS PEMs on 16 common American Society of Ophthalmic Plastic and Reconstructive Surgery topics were generated by 2 AI models, ChatGPT 4.0 and Google Bard, with and without a 6th-grade reading level prompt modifier. The PEMs were analyzed using 7 readability metrics: Flesch Reading Ease Score, Gunning Fog Index, Flesch-Kincaid Grade Level, Coleman-Liau Index, Simple Measure of Gobbledygook Index Score, Automated Readability Index, and Linsear Write Readability Score. Each AI-generated PEM was compared with the equivalent American Society of Ophthalmic Plastic and Reconstructive Surgery PEM. RESULTS Across all readability indices, PEM generated by ChatGPT 4.0 consistently had the highest readability scores, indicating that the material generated by this AI chatbot may be most difficult to read in its unprompted form (Flesch Reading Ease Score: 36.5; Simple Measure of Gobbledygook: 14.7). Google's Bard was able to generate content that was easier to read than both the American Society of Ophthalmic Plastic and Reconstructive Surgery and ChatGPT 4.0 (Flesch Reading Ease Score: 52.3; Simple Measure of Gobbledygook: 12.7). When prompted to produce PEM at a 6th-grade reading level, both ChatGPT 4.0 and Bard were able to significantly improve in their readability scores, with prompted ChatGPT 4.0 being able to consistently generate content that was easier to read (Flesch Reading Ease Score: 67.9, Simple Measure of Gobbledygook: 10.2). CONCLUSION This study suggests that AI tools, when guided by appropriate prompts, can generate accessible and comprehensible PEMs in the field of ophthalmic plastic and reconstructive surgeries, balancing readability with the complexity of the necessary information.
Collapse
Affiliation(s)
- Kevin Eid
- Department of Ophthalmology, Moran Eye Center, University of Utah, Salt Lake City, Utah, U.S.A
| | - Alen Eid
- Department of Ophthalmology and Visual Sciences, West Virginia University, Morgantown, West Virginia, U.S.A
| | - Diane Wang
- Department of Ophthalmology and Visual Sciences, West Virginia University, Morgantown, West Virginia, U.S.A
| | - Rahul S Raiker
- Department of Medical Education, West Virginia University, Morgantown, West Virginia, U.S.A
| | - Stephen Chen
- Department of Medical Education, West Virginia University, Morgantown, West Virginia, U.S.A
| | - John Nguyen
- Department of Ophthalmology and Visual Sciences, West Virginia University, Morgantown, West Virginia, U.S.A
- Department of Otolaryngology and Head and Neck Surgery, West Virginia University, Morgantown, West Virginia, U.S.A
| |
Collapse
|
3
|
Kianian R, Sun D, Crowell EL, Tsui E. The Use of Large Language Models to Generate Education Materials about Uveitis. Ophthalmol Retina 2024; 8:195-201. [PMID: 37716431 DOI: 10.1016/j.oret.2023.09.008] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 09/08/2023] [Accepted: 09/08/2023] [Indexed: 09/18/2023]
Abstract
OBJECTIVE To assess large language models in generating readable uveitis information and in improving the readability of online health information. DESIGN Evaluation of technology. SUBJECTS Not applicable. METHODS ChatGPT and Bard were asked the following prompts: (prompt A) "considering that the average American reads at a 6th grade level, using the Flesch-Kincaid Grade Level (FKGL) formula, can you write patient-targeted health information on uveitis of around 6th grade level?" and (prompt B) "can you write patient-targeted health information on uveitis that is easy to understand by an average American?" Additionally, ChatGPT and Bard were asked the following prompt from the first-page results of Google when the term "uveitis" was searched: "Considering that the average American reads at a 6th grade level, using the FKGL formula, can you rewrite the following text to 6th grade level: [insert text]." The readability of each response was analyzed and compared using several metrics described below. MAIN OUTCOME MEASURES The FKGL is a highly validated readability assessment tool that assigns a grade level to a given text, the total number of words, sentences, syllables, and complex words. Complex words were defined as those with > 2 syllables. RESULTS ChatGPT and Bard generated responses with lower FKGL scores (i.e., easier to understand) in response to prompt A compared with prompt B. This was only significant for ChatGPT (P < 0.0001). The mean FKGL of responses to ChatGPT (6.3 ± 1.2) was significantly lower (P < 0.0001) than Bard 10.5 ± 0.8. ChatGPT responses also contained less complex words than Bard (P < 0.0001). Online health information on uveitis had a mean grade level of 11.0 ± 1.4. ChatGPT lowered the FKGL to 8.0 ± 1.0 (P < 0.0001) when asked to rewrite the content. Bard was not able to do so (mean FKGL of 11.1 ± 1.6). CONCLUSIONS ChatGPT can aid clinicians in producing easier-to-understand health information on uveitis for patients compared with already-existing content. It can also help with reducing the difficulty of the language used for uveitis health information targeted for patients. FINANCIAL DISCLOSURE(S) Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
Collapse
Affiliation(s)
- Reza Kianian
- Stein Eye Institute, Department of Ophthalmology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California
| | - Deyu Sun
- Stein Eye Institute, Department of Ophthalmology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California
| | - Eric L Crowell
- Mitchel and Shannon Wong Eye Institute, Dell Medical School at the University of Texas at Austin, Austin, Texas
| | - Edmund Tsui
- Stein Eye Institute, Department of Ophthalmology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California.
| |
Collapse
|
4
|
Hung YC, Chaker SC, Sigel M, Saad M, Yu-Hsuan Chang M, Slater ED. Addressing Current Deficits in Patient Education Materials Through Crowdsourcing. Ann Plast Surg 2024; 92:148-155. [PMID: 38198625 DOI: 10.1097/sap.0000000000003777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2024]
Abstract
BACKGROUND Patient education materials are commonly reported to be difficult to understand. OBJECTIVES We aimed to use crowdsourcing to improve patient education materials at our institution. METHODS This was a department-wide quality improvement project to increase organizational health literacy. There are 6 phases of this pilot study: (1) evaluating preexisting patient education materials, (2) evaluating online patient education materials at the society (the American Society of Plastic Surgeon) and government level (Medline Plus), (3) redesigning our patient education material and reevaluating the education material, (4) crowdsourcing to evaluate understandability of the new patient education material, (5) data analysis, and (6) incorporating crowdsourcing suggestions to the patient education material. RESULTS Breast-related patient education materials are not easy to read at the institution level, the society level, and the government level. Our new implant-based breast reconstruction patient education material is easy to read as demonstrated by the crowdsourcing evaluation. More than 90% of the participants reported our material is "very easy to understand" or "easy to understand." The crowdsourcing process took 1.5 days, with 700 workers responding to the survey. The total cost was $9. After incorporating participants' feedback into the finalized material, the readability of the material is at the recommended reading level. The material also had the recommended length (between 400 and 800 words). DISCUSSION Our study demonstrated a pathway for clinicians to efficiently obtain a large amount of feedback to improve patient education materials. Crowdsourcing is an effective tool to improve organizational health literacy.
Collapse
Affiliation(s)
- Ya-Ching Hung
- From the Department of General Surgery, Sinai Hospital of Baltimore, Baltimore, MD
| | - Sara C Chaker
- Department of Plastic Surgery, Vanderbilt University Medical Center
| | | | - Mariam Saad
- From the Department of General Surgery, Sinai Hospital of Baltimore, Baltimore, MD
| | | | - Elizabeth D Slater
- From the Department of General Surgery, Sinai Hospital of Baltimore, Baltimore, MD
| |
Collapse
|
5
|
Ong J, Hariprasad SM, Chhablani J. ChatGPT and GPT-4 in Ophthalmology: Applications of Large Language Model Artificial Intelligence in Retina. Ophthalmic Surg Lasers Imaging Retina 2023; 54:557-562. [PMID: 37847163 DOI: 10.3928/23258160-20230926-01] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2023]
|
6
|
Cohen SA, Fisher AC, Pershing S. Analysis of the Readability and Accountability of Online Patient Education Materials Related to Glaucoma Diagnosis and Treatment. Clin Ophthalmol 2023; 17:779-788. [PMID: 36923248 PMCID: PMC10008728 DOI: 10.2147/opth.s401492] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 02/15/2023] [Indexed: 03/10/2023] Open
Abstract
Purpose To assess the readability and accountability of online patient education materials related to glaucoma diagnosis and treatment. Methods We conducted a Google search for 10 search terms related to glaucoma diagnosis and 10 search terms related to glaucoma treatment. For each search term, the first 10 patient education websites populated after Google search were assessed for readability and accountability. Readability was assessed using five validated measures: Flesch Reading Ease (FRE), Gunning Fog Index (GFI), Flesch-Kincaid Grade Level (FKGL), Simple Measure of Gobbledygook (SMOG), and New Dale-Chall (NDC). Accountability was assessed using the Journal of the American Medical Association (JAMA) benchmarks. The source of information for each article analyzed was recorded. Results Of the 200 total websites analyzed, only 11% were written at or below the recommended 6th grade reading level. The average FRE and grade level for 100 glaucoma diagnosis-related articles were 42.02 ± 1.08 and 10.53 ± 1.30, respectively. The average FRE and grade level for 100 glaucoma treatment-related articles were 43.86 ± 1.01 and 11.29 ± 1.54, respectively. Crowdsourced articles were written at the highest average grade level (12.32 ± 0.78), followed by articles written by private practice/independent users (11.22 ± 1.74), national organizations (10.92 ± 1.24), and educational institutions (10.33 ± 1.35). Websites averaged 1.12 ± 1.15 of 4 JAMA accountability metrics. Conclusion Despite wide variation in the readability and accountability of online patient education materials related to glaucoma diagnosis and treatment, patient education materials are consistently written at levels above the recommended reading level and often lack accountability. Articles from educational institutions and national organizations were often written at lower reading levels but are less frequently encountered after Google search. There is a need for accurate and understandable online information that glaucoma patients can use to inform decisions about their eye health.
Collapse
Affiliation(s)
- Samuel A Cohen
- Department of Ophthalmology, Stanford University School of Medicine, Stanford, CA, USA.,Byers Eye Institute at Stanford, Stanford, CA, USA
| | - Ann Caroline Fisher
- Department of Ophthalmology, Stanford University School of Medicine, Stanford, CA, USA.,Byers Eye Institute at Stanford, Stanford, CA, USA
| | - Suzann Pershing
- Department of Ophthalmology, Stanford University School of Medicine, Stanford, CA, USA.,Byers Eye Institute at Stanford, Stanford, CA, USA.,VA Palo Alto Health Care System, Palo Alto, CA, USA
| |
Collapse
|
7
|
Mahajan J, Zhu A, Aftab OM, Henry RK, Agi NYB, Bhagat N. Educational quality and content of YouTube videos on diabetic macular edema. Int Ophthalmol 2022; 43:1093-1102. [PMID: 36057009 DOI: 10.1007/s10792-022-02504-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 08/26/2022] [Indexed: 11/27/2022]
Abstract
PURPOSE Diabetic macular edema (DME) is a vision-threatening complication of diabetes mellitus due to increased vascular permeability. Patients are increasingly using YouTube videos to educate themselves about DME. This study analyzes the content and quality of YouTube videos about DME. METHODS Videos were searched in December 2021 for "diabetic macular edema." The first 100 videos sorted by both relevance and view count were reviewed (n = 200). Quantitative metrics and content were collected. Two reviewers assessed videos using the JAMA (0-4), modified DISCERN (1-5), and Global Quality Scale (GQS, 1-5). Videos were sorted into author groups: 1 (academic institutions/organizations), 2 (private practices/organizations), and 3 (independent users; ophthalmologist users noted). Statistical analyses were deemed significant at a = 0.05. RESULTS One hundred four videos were included after applying exclusion criteria. Overall mean + standard deviations were 2.25 ± 0.83 (JAMA), 3.47 ± 0.55 (DISCERN), and 3.95 ± 0.95 (GQS). 51.9% of videos stated a definition, 32.7% mentioned screening, and 50% mentioned any DME risk factor. Healthcare professional-targeted videos had higher JAMA and DISCERN scores than patient-targeted videos (p < 0.05). Videos using ophthalmologists had higher JAMA and DISCERN scores than those lacking their presence (p < 0.05). JAMA scores significantly varied between author groups; within group 3, ophthalmologist-authored videos had higher DISCERN scores (p < 0.05). CONCLUSION Videos without ophthalmologists or targeted toward patients had poor quality and content coverage. The rising prevalence of diabetes, coupled with increased internet use for acquiring medical information, creates a strong need for high-quality information about DME.
Collapse
Affiliation(s)
- Jasmine Mahajan
- Department of Ophthalmology and Visual Science, Rutgers New Jersey Medical School, Newark, NJ, USA
| | - Aretha Zhu
- Department of Ophthalmology and Visual Science, Rutgers New Jersey Medical School, Newark, NJ, USA
| | - Owais M Aftab
- Department of Ophthalmology and Visual Science, Rutgers New Jersey Medical School, Newark, NJ, USA
| | - Roger K Henry
- Department of Ophthalmology and Visual Science, Rutgers New Jersey Medical School, Newark, NJ, USA
| | - Nathan Y B Agi
- Department of Ophthalmology and Visual Science, Rutgers New Jersey Medical School, Newark, NJ, USA
| | - Neelakshi Bhagat
- Department of Ophthalmology and Visual Science, Rutgers New Jersey Medical School, Newark, NJ, USA. .,Institute of Ophthalmology and Visual Science, Doctor's Office Center Suite 6100, Newark, NJ, USA.
| |
Collapse
|