1
|
Swisher AR, Wu AW, Liu GC, Lee MK, Carle TR, Tang DM. Enhancing Health Literacy: Evaluating the Readability of Patient Handouts Revised by ChatGPT's Large Language Model. Otolaryngol Head Neck Surg 2024. [PMID: 39105460 DOI: 10.1002/ohn.927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Revised: 07/03/2024] [Accepted: 07/20/2024] [Indexed: 08/07/2024]
Abstract
OBJECTIVE To use an artificial intelligence (AI)-powered large language model (LLM) to improve readability of patient handouts. STUDY DESIGN Review of online material modified by AI. SETTING Academic center. METHODS Five handout materials obtained from the American Rhinologic Society (ARS) and the American Academy of Facial Plastic and Reconstructive Surgery websites were assessed using validated readability metrics. The handouts were inputted into OpenAI's ChatGPT-4 after prompting: "Rewrite the following at a 6th-grade reading level." The understandability and actionability of both native and LLM-revised versions were evaluated using the Patient Education Materials Assessment Tool (PEMAT). Results were compared using Wilcoxon rank-sum tests. RESULTS The mean readability scores of the standard (ARS, American Academy of Facial Plastic and Reconstructive Surgery) materials corresponded to "difficult," with reading categories ranging between high school and university grade levels. Conversely, the LLM-revised handouts had an average seventh-grade reading level. LLM-revised handouts had better readability in nearly all metrics tested: Flesch-Kincaid Reading Ease (70.8 vs 43.9; P < .05), Gunning Fog Score (10.2 vs 14.42; P < .05), Simple Measure of Gobbledygook (9.9 vs 13.1; P < .05), Coleman-Liau (8.8 vs 12.6; P < .05), and Automated Readability Index (8.2 vs 10.7; P = .06). PEMAT scores were significantly higher in the LLM-revised handouts for understandability (91 vs 74%; P < .05) with similar actionability (42 vs 34%; P = .15) when compared to the standard materials. CONCLUSION Patient-facing handouts can be augmented by ChatGPT with simple prompting to tailor information with improved readability. This study demonstrates the utility of LLMs to aid in rewriting patient handouts and may serve as a tool to help optimize education materials. LEVEL OF EVIDENCE Level VI.
Collapse
Affiliation(s)
- Austin R Swisher
- Department of Otolaryngology-Head and Neck Surgery, Mayo Clinic, Phoenix, Arizona, USA
| | - Arthur W Wu
- Division of Otolaryngology-Head and Neck Surgery, Cedars-Sinai, Los Angeles, California, USA
| | - Gene C Liu
- Division of Otolaryngology-Head and Neck Surgery, Cedars-Sinai, Los Angeles, California, USA
| | - Matthew K Lee
- Division of Otolaryngology-Head and Neck Surgery, Cedars-Sinai, Los Angeles, California, USA
| | - Taylor R Carle
- Division of Otolaryngology-Head and Neck Surgery, Cedars-Sinai, Los Angeles, California, USA
| | - Dennis M Tang
- Division of Otolaryngology-Head and Neck Surgery, Cedars-Sinai, Los Angeles, California, USA
| |
Collapse
|
2
|
Armache M, Assi S, Wu R, Jain A, Lu J, Gordon L, Jacobs LM, Fundakowski CE, Rising KL, Leader AE, Fakhry C, Mady LJ. Readability of Patient Education Materials in Head and Neck Cancer: A Systematic Review. JAMA Otolaryngol Head Neck Surg 2024; 150:713-724. [PMID: 38900443 DOI: 10.1001/jamaoto.2024.1569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Importance Patient education materials (PEMs) can promote patient engagement, satisfaction, and treatment adherence. The American Medical Association recommends that PEMs be developed for a sixth-grade or lower reading level. Health literacy (HL) refers to an individual's ability to seek, understand, and use health information to make appropriate decisions regarding their health. Patients with suboptimal HL may not be able to understand or act on health information and are at risk for adverse health outcomes. Objective To assess the readability of PEMs on head and neck cancer (HNC) and to evaluate HL among patients with HNC. Evidence Review A systematic review of the literature was performed by searching Cochrane, PubMed, and Scopus for peer-reviewed studies published from 1995 to 2024 using the keywords head and neck cancer, readability, health literacy, and related synonyms. Full-text studies in English that evaluated readability and/or HL measures were included. Readability assessments included the Flesch-Kincaid Grade Level (FKGL grade, 0-20, with higher grades indicating greater reading difficulty) and Flesch Reading Ease (FRE score, 1-100, with higher scores indicating easier readability), among others. Reviews, conference materials, opinion letters, and guidelines were excluded. Study quality was assessed using the Appraisal Tool for Cross-Sectional Studies. Findings Of the 3235 studies identified, 17 studies assessing the readability of 1124 HNC PEMs produced by professional societies, hospitals, and others were included. The mean FKGL grade ranged from 8.8 to 14.8; none of the studies reported a mean FKGL of grade 6 or lower. Eight studies assessed HL and found inadequate HL prevalence ranging from 11.9% to 47.0%. Conclusions and Relevance These findings indicate that more than one-third of patients with HNC demonstrate inadequate HL, yet none of the PEMs assessed were developed for a sixth grade or lower reading level, as recommended by the American Medical Association. This incongruence highlights the need to address the readability of HNC PEMs to improve patient understanding of the disease and to mitigate potential barriers to shared decision-making for patients with HNC. It is crucial to acknowledge the responsibility of health care professionals to produce and promote more effective PEMs to dismantle the potentially preventable literacy barriers.
Collapse
Affiliation(s)
- Maria Armache
- Department of Otolaryngology-Head & Neck Surgery, The Johns Hopkins School of Medicine, Baltimore, Maryland
| | - Sahar Assi
- Cochlear Center for Hearing and Public Health, Johns Hopkins University, Baltimore, Maryland
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
| | - Richard Wu
- Head and Neck Institute, Cleveland Clinic, Cleveland, Ohio
| | - Amiti Jain
- Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Joseph Lu
- Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Larissa Gordon
- Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Lisa M Jacobs
- Mixed Methods Research Lab, Perelman School of Medicine, University of Pennsylvania, Philadelphia
| | - Christopher E Fundakowski
- Department of Otolaryngology-Head and Neck Surgery, Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Kristin L Rising
- Jefferson Center for Connected Care, Thomas Jefferson University, Philadelphia, Pennsylvania
- Department of Emergency Medicine, Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Amy E Leader
- Department of Population Health, Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, Pennsylvania
- Department of Medical Oncology, Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, Pennsylvania
- Sidney Kimmel Cancer Center, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Carole Fakhry
- Department of Otolaryngology-Head & Neck Surgery, The Johns Hopkins School of Medicine, Baltimore, Maryland
| | - Leila J Mady
- Department of Otolaryngology-Head & Neck Surgery, The Johns Hopkins School of Medicine, Baltimore, Maryland
| |
Collapse
|
3
|
Shen SA, Perez-Heydrich CA, Xie DX, Nellis JC. ChatGPT vs. web search for patient questions: what does ChatGPT do better? Eur Arch Otorhinolaryngol 2024; 281:3219-3225. [PMID: 38416195 PMCID: PMC11410109 DOI: 10.1007/s00405-024-08524-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 01/31/2024] [Indexed: 02/29/2024]
Abstract
PURPOSE Chat generative pretrained transformer (ChatGPT) has the potential to significantly impact how patients acquire medical information online. Here, we characterize the readability and appropriateness of ChatGPT responses to a range of patient questions compared to results from traditional web searches. METHODS Patient questions related to the published Clinical Practice Guidelines by the American Academy of Otolaryngology-Head and Neck Surgery were sourced from existing online posts. Questions were categorized using a modified Rothwell classification system into (1) fact, (2) policy, and (3) diagnosis and recommendations. These were queried using ChatGPT and traditional web search. All results were evaluated on readability (Flesch Reading Ease and Flesch-Kinkaid Grade Level) and understandability (Patient Education Materials Assessment Tool). Accuracy was assessed by two blinded clinical evaluators using a three-point ordinal scale. RESULTS 54 questions were organized into fact (37.0%), policy (37.0%), and diagnosis (25.8%). The average readability for ChatGPT responses was lower than traditional web search (FRE: 42.3 ± 13.1 vs. 55.6 ± 10.5, p < 0.001), while the PEMAT understandability was equivalent (93.8% vs. 93.5%, p = 0.17). ChatGPT scored higher than web search for questions the 'Diagnosis' category (p < 0.01); there was no difference in questions categorized as 'Fact' (p = 0.15) or 'Policy' (p = 0.22). Additional prompting improved ChatGPT response readability (FRE 55.6 ± 13.6, p < 0.01). CONCLUSIONS ChatGPT outperforms web search in answering patient questions related to symptom-based diagnoses and is equivalent in providing medical facts and established policy. Appropriate prompting can further improve readability while maintaining accuracy. Further patient education is needed to relay the benefits and limitations of this technology as a source of medial information.
Collapse
Affiliation(s)
- Sarek A Shen
- Department of Otolaryngology-Head and Neck Surgery, Johns Hopkins School of Medicine, 601 North Caroline Street, Baltimore, MD, 21287, USA.
| | | | - Deborah X Xie
- Department of Otolaryngology-Head and Neck Surgery, Johns Hopkins School of Medicine, 601 North Caroline Street, Baltimore, MD, 21287, USA
| | - Jason C Nellis
- Department of Otolaryngology-Head and Neck Surgery, Johns Hopkins School of Medicine, 601 North Caroline Street, Baltimore, MD, 21287, USA
| |
Collapse
|
4
|
Bralić N, Mijatović A, Marušić A, Buljan I. Conclusiveness, readability and textual characteristics of plain language summaries from medical and non-medical organizations: a cross-sectional study. Sci Rep 2024; 14:6016. [PMID: 38472285 DOI: 10.1038/s41598-024-56727-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Accepted: 03/11/2024] [Indexed: 03/14/2024] Open
Abstract
This cross-sectional study compared plain language summaries (PLSs) from medical and non-medical organizations regarding conclusiveness, readability and textual characteristics. All Cochrane (medical PLSs, n = 8638) and Campbell Collaboration and International Initiative for Impact Evaluation (non-medical PLSs, n = 163) PLSs of latest versions of systematic reviews published until 10 November 2022 were analysed. PLSs were classified into three conclusiveness categories (conclusive, inconclusive and unclear) using a machine learning tool for medical PLSs and by two experts for non-medical PLSs. A higher proportion of non-medical PLSs were conclusive (17.79% vs 8.40%, P < 0.0001), they had higher readability (median number of years of education needed to read the text with ease 15.23 (interquartile range (IQR) 14.35 to 15.96) vs 15.51 (IQR 14.31 to 16.77), P = 0.010), used more words (median 603 (IQR 539.50 to 658.50) vs 345 (IQR 202 to 476), P < 0.001). Language analysis showed that medical PLSs scored higher for disgust and fear, and non-medical PLSs scored higher for positive emotions. The reason for the observed differences between medical and non-medical fields may be attributed to the differences in publication methodologies or disciplinary differences. This approach to analysing PLSs is crucial for enhancing the overall quality of PLSs and knowledge translation to the general public.
Collapse
Affiliation(s)
- Nensi Bralić
- Department of Research in Biomedicine and Health, University of Split School of Medicine, Šoltanska 2A, 21000, Split, Croatia.
| | - Antonija Mijatović
- Department of Research in Biomedicine and Health, University of Split School of Medicine, Šoltanska 2A, 21000, Split, Croatia
| | - Ana Marušić
- Department of Research in Biomedicine and Health, University of Split School of Medicine, Šoltanska 2A, 21000, Split, Croatia
| | - Ivan Buljan
- Department of Psychology, Faculty of Humanities and Social Sciences, University of Split, Split, Croatia
| |
Collapse
|
5
|
Odigie E, Andreadis K, Chandra I, Mocchetti V, Rives H, Cox S, Rameau A. Are Mobile Applications in Laryngology Designed for All Patients? Laryngoscope 2023; 133:1540-1549. [PMID: 36317789 PMCID: PMC10149562 DOI: 10.1002/lary.30465] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 09/19/2022] [Accepted: 10/10/2022] [Indexed: 02/24/2023]
Abstract
OBJECTIVES Mobile applications (apps) are multiplying in laryngology, with little standardization of content, functionality, or accessibility. The purpose of this study is to evaluate the quality, functionality, health literacy, readability, accessibility, and inclusivity of laryngology mobile applications. METHODS Of the 3230 apps identified from the Apple and Google Play stores, 28 patient-facing apps met inclusion criteria. Apps were evaluated using validated scales assessing quality and functionality: the Mobile App Rating Scale (MARS) and the Institute for Healthcare Informatics App Functionality Scale. The Clear Communication Index (CDC) Institute of Medicine Strategies for Creating Health Literate Mobile Applications, and Patient Education Materials Assessment Tool (PEMAT) were used to evaluate apps health literacy level. Readability was assessed using established readability formulas. Apps were evaluated for language, accessibility features, and representation of a diverse population. RESULTS Twenty-six apps (92%) had adequate quality (MARS score > 3). The mean PEMAT score was 89% for actionability and 86% for understandability. On average, apps utilized 25/33 health literate strategies. Twenty-two apps (79%) did not pass the CDC index threshold of 90% for health literacy. Twenty-four app descriptions (86%) were above an 8th grade reading level. Only 4 apps (14%) showed diverse representation, 3 (11%) had non-English language functions, and 2 (7%) offered subtitles. Inter-rater reliability for MARS was adequate (CA-ICC = 0.715). CONCLUSION While most apps scored well in quality and functionality, many laryngology apps did not meet standards for health literacy. Most apps were written at a reading level above the national average, lacked accessibility features, and did not represent diverse populations. Laryngoscope, 133:1540-1549, 2023.
Collapse
Affiliation(s)
- Eseosa Odigie
- Sean Parker Institute for the Voice, Department of Otolaryngology, Weill Cornell Medical College, New York, USA
| | - Katerina Andreadis
- Sean Parker Institute for the Voice, Department of Otolaryngology, Weill Cornell Medical College, New York, USA
| | - Iyra Chandra
- Sean Parker Institute for the Voice, Department of Otolaryngology, Weill Cornell Medical College, New York, USA
| | - Valentina Mocchetti
- Sean Parker Institute for the Voice, Department of Otolaryngology, Weill Cornell Medical College, New York, USA
| | - Hal Rives
- Sean Parker Institute for the Voice, Department of Otolaryngology, Weill Cornell Medical College, New York, USA
| | - Steven Cox
- Department of Communication Sciences and Disorders, Adelphi University, Garden City, USA
| | - Anaïs Rameau
- Sean Parker Institute for the Voice, Department of Otolaryngology, Weill Cornell Medical College, New York, USA
| |
Collapse
|
6
|
Magrath WJ, Shneyderman M, Bauer TK, Neira P, Best S, Akst LM. Readability Analysis and Accessibility of Online Materials About Transgender Voice Care. Otolaryngol Head Neck Surg 2022; 167:952-958. [PMID: 35671144 DOI: 10.1177/01945998221103466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
OBJECTIVE To determine readability, understandability, and actionability of online health information related to transgender voice care. STUDY DESIGN Review of online materials. SETTING Academic medical center. METHODS A Google search of "transgender voice care" was performed with the first 50 websites meeting inclusion criteria included. Readability was assessed using the Flesch Reading Ease Score (FRES), Flesch-Kincaid Grade Level (FKGL), and the Simple Measure of Gobbledygook (SMOG). Understandability and actionability were measured by 2 independent reviewers using the Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P). Unpaired t tests were used to compare clinician- and patient-oriented sites, surgical and nonsurgical sites, and sites that discuss nonbinary indications for voice care. Analysis of variance was used to compare sites that discuss voice feminization, masculinization, both, or neither. RESULTS Average scores across the cohort for FRES, FKGL, and SMOG were 43.77 ± 13.52, 12.14 ± 2.66, and 11.30 ± 1.93, respectively, indicating materials were above a 12th-grade reading level. PEMAT-P scores for understandability and actionability were 64.95% ± 15.78% and 40.55% ± 23.86%, respectively. Patient-oriented sites were significantly more understandable and actionable than clinician-oriented sites (P < .02). Websites that discussed only voice feminization were significantly more readable according to objective metrics (FKGL, SMOG) than websites that discussed both feminization and masculinization or those that did not differentiate care types (P < .05). CONCLUSION Online information written about transgender voice care is written at a level above what is recommended for patient education materials. Providers may improve accessibility of transgender voice care by enhancing readability of online materials.
Collapse
Affiliation(s)
- Walker J Magrath
- Department of Otolaryngology-Head and Neck Surgery, School of Medicine, Johns Hopkins University, Baltimore, Maryland, USA
| | - Matthew Shneyderman
- Department of Otolaryngology-Head and Neck Surgery, School of Medicine, Johns Hopkins University, Baltimore, Maryland, USA
| | - Tom K Bauer
- Johns Hopkins Health System, Baltimore, Maryland, USA
| | - Paula Neira
- Johns Hopkins Center for Transgender Health, Johns Hopkins Medical Institutions, Baltimore, Maryland, USA
| | - Simon Best
- Department of Otolaryngology-Head and Neck Surgery, School of Medicine, Johns Hopkins University, Baltimore, Maryland, USA.,Johns Hopkins Center for Transgender Health, Johns Hopkins Medical Institutions, Baltimore, Maryland, USA
| | - Lee M Akst
- Department of Otolaryngology-Head and Neck Surgery, School of Medicine, Johns Hopkins University, Baltimore, Maryland, USA.,Johns Hopkins Center for Transgender Health, Johns Hopkins Medical Institutions, Baltimore, Maryland, USA
| |
Collapse
|
7
|
Kaya E, Görmez S. Quality and readability of online information on plantar fasciitis and calcaneal spur. Rheumatol Int 2022; 42:1965-1972. [PMID: 35763090 DOI: 10.1007/s00296-022-05165-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 06/08/2022] [Indexed: 10/17/2022]
Abstract
Plantar fasciitis and calcaneal spur are common causes of heel pain in the community. People use the Internet to obtain medical information about diseases. We reviewed Internet information sources on plantar fasciitis and calcaneal spur for quality and readability. The first 50 websites for each search term ("calcaneal spur", "heel spur", and "plantar fasciitis") were scanned on www.google.com . Six different valid tools were used for information quality and readability assessment. We searched for HONCode (Health On the Net Foundation Code) stamps on included websites. The total mean points for DISCERN were 50.52 ± 14.62, and the total mean points for JAMA (Journal of the American Medical Association) were 2.42 ± 1.26. In total, 25.72% of 97 websites had HONCode stamps. The average scores for the readability indicators were calculated to be Flesch-Kincaid Grade Level (FKGL): 7.27 ± 1.71, Gunning Fog: 8.46 ± 2.17, Simple Measure of Gobbledygook (SMOG): 6.89 ± 1.24, and Coleman Liau Index: 15.56 ± 1.85. In our study, when the website resources were examined, there were profit websites the most and website quality and readability were moderate level. A significant proportion of the websites have a financial bias and provide low-quality information. A mechanism for monitoring the quality and readability of online information must be established and managed systematically.
Collapse
Affiliation(s)
- Erhan Kaya
- Department of Public Health, Faculty of Medicine, Kahramanmaras Sutcu Imam University, Kahramanmaras, Turkey.
| | - Sinan Görmez
- Department of Orthopedics and Traumatology, Bulancak State Hospital, Giresun, Turkey
| |
Collapse
|