1
|
Zowalla R, Pfeifer D, Wetter T. Readability and topics of the German Health Web: Exploratory study and text analysis. PLoS One 2023; 18:e0281582. [PMID: 36763573 PMCID: PMC9916670 DOI: 10.1371/journal.pone.0281582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 01/27/2023] [Indexed: 02/11/2023] Open
Abstract
BACKGROUND The internet has become an increasingly important resource for health information, especially for lay people. However, the information found does not necessarily comply with the user's health literacy level. Therefore, it is vital to (1) identify prominent information providers, (2) quantify the readability of written health information, and (3) to analyze how different types of information sources are suited for people with differing health literacy levels. OBJECTIVE In previous work, we showed the use of a focused crawler to "capture" and describe a large sample of the "German Health Web", which we call the "Sampled German Health Web" (sGHW). It includes health-related web content of the three mostly German speaking countries Germany, Austria, and Switzerland, i.e. country-code top-level domains (ccTLDs) ".de", ".at" and ".ch". Based on the crawled data, we now provide a fully automated readability and vocabulary analysis of a subsample of the sGHW, an analysis of the sGHW's graph structure covering its size, its content providers and a ratio of public to private stakeholders. In addition, we apply Latent Dirichlet Allocation (LDA) to identify topics and themes within the sGHW. METHODS Important web sites were identified by applying PageRank on the sGHW's graph representation. LDA was used to discover topics within the top-ranked web sites. Next, a computer-based readability and vocabulary analysis was performed on each health-related web page. Flesch Reading Ease (FRE) and the 4th Vienna formula (WSTF) were used to assess the readability. Vocabulary was assessed by a specifically trained Support Vector Machine classifier. RESULTS In total, n = 14,193,743 health-related web pages were collected during the study period of 370 days. The resulting host-aggregated web graph comprises 231,733 nodes connected via 429,530 edges (network diameter = 25; average path length = 6.804; average degree = 1.854; modularity = 0.723). Among 3000 top-ranked pages (1000 per ccTLD according to PageRank), 18.50%(555/3000) belong to web sites from governmental or public institutions, 18.03% (541/3000) from nonprofit organizations, 54.03% (1621/3000) from private organizations, 4.07% (122/3000) from news agencies, 3.87% (116/3000) from pharmaceutical companies, 0.90% (27/3000) from private bloggers, and 0.60% (18/3000) are from others. LDA identified 50 topics, which we grouped into 11 themes: "Research & Science", "Illness & Injury", "The State", "Healthcare structures", "Diet & Food", "Medical Specialities", "Economy", "Food production", "Health communication", "Family" and "Other". The most prevalent themes were "Research & Science" and "Illness & Injury" accounting for 21.04% and 17.92% of all topics across all ccTLDs and provider types, respectively. Our readability analysis reveals that the majority of the collected web sites is structurally difficult or very difficult to read: 84.63% (2539/3000) scored a WSTF ≥ 12, 89.70% (2691/3000) scored a FRE ≤ 49. Moreover, our vocabulary analysis shows that 44.00% (1320/3000) web sites use vocabulary that is well suited for a lay audience. CONCLUSIONS We were able to identify major information hubs as well as topics and themes within the sGHW. Results indicate that the readability within the sGHW is low. As a consequence, patients may face barriers, even though the vocabulary used seems appropriate from a medical perspective. In future work, the authors intend to extend their analyses to identify trustworthy health information web sites.
Collapse
Affiliation(s)
- Richard Zowalla
- Department of Medical Informatics, Heilbronn University, Heilbronn, Germany
- Center for Machine Learning, Heilbronn University, Heilbronn, Germany
- * E-mail:
| | - Daniel Pfeifer
- Department of Medical Informatics, Heilbronn University, Heilbronn, Germany
- Center for Machine Learning, Heilbronn University, Heilbronn, Germany
| | - Thomas Wetter
- Institute of Medical Informatics, Heidelberg University Hospital, Heidelberg, Germany
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, United States of America
| |
Collapse
|
2
|
de Carvalho VDH, Nepomuceno TCC, Poleto T, Costa APCS. The COVID-19 Infodemic on Twitter: A Space and Time Topic Analysis of the Brazilian Immunization Program and Public Trust. Trop Med Infect Dis 2022; 7:425. [PMID: 36548680 PMCID: PMC9783210 DOI: 10.3390/tropicalmed7120425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 11/24/2022] [Accepted: 12/03/2022] [Indexed: 12/14/2022] Open
Abstract
The context of the COVID-19 pandemic has brought to light the infodemic phenomenon and the problem of misinformation. Agencies involved in managing COVID-19 immunization programs are also looking for ways to combat this problem, demanding analytical tools specialized in identifying patterns of misinformation and understanding how they have evolved in time and space to demonstrate their effects on public trust. The aim of this article is to present the results of a study applying topic analysis in space and time with respect to public opinion on the Brazilian COVID-19 immunization program. The analytical process involves applying topic discovery to tweets with geoinformation extracted from the COVID-19 vaccination theme. After extracting the topics, they were submitted to manual annotation, whereby the polarity labels pro, anti, and neutral were applied based on the support and trust in the COVID-19 vaccination. A space and time analysis was carried out using the topic and polarity distributions, making it possible to understand moments during which the most significant quantities of posts occurred and the cities that generated the most tweets. The analytical process describes a framework capable of meeting the needs of agencies for tools, providing indications of how misinformation has evolved and where its dissemination focuses, in addition to defining the granularity of this information according to what managers define as adequate. The following research outcomes can be highlighted. (1) We identified a specific date containing a peak that stands out among the other dates, indicating an event that mobilized public opinion about COVID-19 vaccination. (2) We extracted 23 topics, enabling the manual polarity annotation of each topic and an understanding of which polarities were associated with tweets. (3) Based on the association between polarities, topics, and tweets, it was possible to identify the Brazilian cities that produced the majority of tweets for each polarity and the amount distribution of tweets relative to cities populations.
Collapse
Affiliation(s)
| | | | - Thiago Poleto
- Departamento de Administração, Instituto de Ciências Sociais Aplicadas, Universidade Federal do Pará, Belém 66075-110, Brazil
| | - Ana Paula Cabral Seixas Costa
- Departamento de Engenharia de Produção, Centro de Tecnologia e Geociências, Universidade Federal de Pernambuco, Recife 50740-550, Brazil
| |
Collapse
|
3
|
Liu Q, Liang Y, Wang S, Huang Z, Wang Q, Jia M, Li Z, Ming WK. Health Communication through Chinese Media on E-Cigarette: A Topic Modeling Approach. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph19137591. [PMID: 35805245 PMCID: PMC9265508 DOI: 10.3390/ijerph19137591] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 06/13/2022] [Accepted: 06/15/2022] [Indexed: 02/01/2023]
Abstract
Background: Electronic cigarettes (e-cigarettes) have been a newsworthy topic in China. E-cigarettes are receiving greater consumer attention due to the rise of the Chinese e-cigarettes industry. In the past decade, e-cigarettes have been widely debated across the media, particularly their identity and their health effects. Objective: this study aims to (1) find the key topics in e-cigarette news and (2) provide suggestions for future media strategies to improve health communication. Method: We collected Chinese e-cigarettes news from 1 November 2015 to 31 October 2020, in the Huike (WiseSearch) database, using “e-cigarettes” (Chinese: “电子烟”) as the keyword. We used the Jieba package in python to perform the data cleaning process and the Dirichlet allocation (LDA) topic modeling method to generate major themes of the health communication through news content. Main finding: through an analysis of 1584 news articles on e-cigarettes, this paper finds 26 topics covered with 4 themes as regulations and control (n = 475, 30%), minor protection (n = 436, 27.5%), industry activities (n = 404, 25.5%), and health effects (n = 269, 17%). The peak and decline of the number of news articles are affected by time and related regulations. Conclusion: the main themes of Chinese news content on e-cigarettes are regulations and control, and minor protection. Newspapers should shoulder the responsibilities and play an important role in health communication with balanced coverage.
Collapse
Affiliation(s)
- Qian Liu
- School of Journalism and Communication, National Media Experimental Teaching Demonstration Center (Jinan University), Jinan University, No. 601, West Huangpu Avenue, Guangzhou 510632, China; (Q.L.); (Y.L.); (Q.W.); (M.J.); (Z.L.)
| | - Yu Liang
- School of Journalism and Communication, National Media Experimental Teaching Demonstration Center (Jinan University), Jinan University, No. 601, West Huangpu Avenue, Guangzhou 510632, China; (Q.L.); (Y.L.); (Q.W.); (M.J.); (Z.L.)
| | - Siyi Wang
- Department of Public Health and Preventive Medicine, School of Medicine, Jinan University, No. 601, West Huangpu Avenue, Guangzhou 510632, China; (S.W.); (Z.H.)
| | - Zhongguo Huang
- Department of Public Health and Preventive Medicine, School of Medicine, Jinan University, No. 601, West Huangpu Avenue, Guangzhou 510632, China; (S.W.); (Z.H.)
| | - Qing Wang
- School of Journalism and Communication, National Media Experimental Teaching Demonstration Center (Jinan University), Jinan University, No. 601, West Huangpu Avenue, Guangzhou 510632, China; (Q.L.); (Y.L.); (Q.W.); (M.J.); (Z.L.)
| | - Miaoyutian Jia
- School of Journalism and Communication, National Media Experimental Teaching Demonstration Center (Jinan University), Jinan University, No. 601, West Huangpu Avenue, Guangzhou 510632, China; (Q.L.); (Y.L.); (Q.W.); (M.J.); (Z.L.)
| | - Zihang Li
- School of Journalism and Communication, National Media Experimental Teaching Demonstration Center (Jinan University), Jinan University, No. 601, West Huangpu Avenue, Guangzhou 510632, China; (Q.L.); (Y.L.); (Q.W.); (M.J.); (Z.L.)
| | - Wai-Kit Ming
- Department of Infectious Diseases and Public Health, Jockey Club College of Veterinary Medicine and Life Sciences, City University of Hong Kong, To Yuen Building, 31 To Yuen Street, Hong Kong, China
- Correspondence:
| |
Collapse
|
4
|
Fairie P, Zhang Z, D'Souza AG, Walsh T, Quan H, Santana MJ. Categorising patient concerns using natural language processing techniques. BMJ Health Care Inform 2021; 28:e100274. [PMID: 34193519 PMCID: PMC8246286 DOI: 10.1136/bmjhci-2020-100274] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 05/20/2021] [Indexed: 11/23/2022] Open
Abstract
OBJECTIVES Patient feedback is critical to identify and resolve patient safety and experience issues in healthcare systems. However, large volumes of unstructured text data can pose problems for manual (human) analysis. This study reports the results of using a semiautomated, computational topic-modelling approach to analyse a corpus of patient feedback. METHODS Patient concerns were received by Alberta Health Services between 2011 and 2018 (n=76 163), regarding 806 care facilities in 163 municipalities, including hospitals, clinics, community care centres and retirement homes, in a province of 4.4 million. Their existing framework requires manual labelling of pre-defined categories. We applied an automated latent Dirichlet allocation (LDA)-based topic modelling algorithm to identify the topics present in these concerns, and thereby produce a framework-free categorisation. RESULTS The LDA model produced 40 topics which, following manual interpretation by researchers, were reduced to 28 coherent topics. The most frequent topics identified were communication issues causing delays (frequency: 10.58%), community care for elderly patients (8.82%), interactions with nurses (8.80%) and emergency department care (7.52%). Many patient concerns were categorised into multiple topics. Some were more specific versions of categories from the existing framework (eg, communication issues causing delays), while others were novel (eg, smoking in inappropriate settings). DISCUSSION LDA-generated topics were more nuanced than the manually labelled categories. For example, LDA found that concerns with community care were related to concerns about nursing for seniors, providing opportunities for insight and action. CONCLUSION Our findings outline the range of concerns patients share in a large health system and demonstrate the usefulness of using LDA to identify categories of patient concerns.
Collapse
Affiliation(s)
- Paul Fairie
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
- Alberta Strategy for Patient-Oriented Research Patient Engagement Platform, Calgary, Alberta, Canada
| | - Zilong Zhang
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
- Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
| | - Adam G D'Souza
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
- Alberta Health Services, Calgary, Alberta, Canada
| | - Tara Walsh
- Alberta Health Services, Calgary, Alberta, Canada
| | - Hude Quan
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
| | - Maria J Santana
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
- Alberta Strategy for Patient-Oriented Research Patient Engagement Platform, Calgary, Alberta, Canada
- Department of Pediatrics, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
| |
Collapse
|
5
|
Feldhege J, Moessner M, Bauer S. Who says what? Content and participation characteristics in an online depression community. J Affect Disord 2020; 263:521-527. [PMID: 31780138 DOI: 10.1016/j.jad.2019.11.007] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 10/14/2019] [Accepted: 11/02/2019] [Indexed: 11/26/2022]
Abstract
BACKGROUND An increasingly important source of informal help for people with depression are online depression communities. This study investigates the prevailing topics in an online depression community and how they are related to participation styles. METHODS A topic model with 26 topics of N = 16,291 posts and N = 71,543 comments of N = 20,037 users in a depression forum on Reddit was created using Latent Dirichlet allocation (LDA). The topics' proportions in the corpus were correlated with five participation measures, i.e. sum of scores, number of comments, posts to comments ratio, posting frequency, and word count. RESULTS The most common topics were Feelings, Motivation, The Community on Reddit, and Time. There were many significant, small to moderate correlations between topic proportions and participation style measures. The topics Feelings, Offering Support, and Small Talk generated a bigger response in the form of scores and comments. Talking about the past and relationships was more common in longer posts, whereas small talk, offering emotional support, and employing cognitive strategies was more readily found in short comments. Lower posting frequency was related to talking about feelings and romantic relationships. LIMITATIONS No information on users' demographics or mental health status was available. Topic modeling cannot capture elements of style and tone of text. CONCLUSIONS A wide spectrum of topics was uncovered in the topic modeling. Patterns in the correlations point to users with different participation styles preferring different topics. Results of this study can aid the development of online interventions for depression.
Collapse
Affiliation(s)
- Johannes Feldhege
- Center for Psychotherapy Research, University Hospital Heidelberg, Bergheimer Str. 54, 69115 Heidelberg, Germany.
| | - Markus Moessner
- Center for Psychotherapy Research, University Hospital Heidelberg, Bergheimer Str. 54, 69115 Heidelberg, Germany
| | - Stephanie Bauer
- Center for Psychotherapy Research, University Hospital Heidelberg, Bergheimer Str. 54, 69115 Heidelberg, Germany
| |
Collapse
|
6
|
Tran BX, Latkin CA, Sharafeldin N, Nguyen K, Vu GT, Tam WWS, Cheung NM, Nguyen HLT, Ho CSH, Ho RCM. Characterizing Artificial Intelligence Applications in Cancer Research: A Latent Dirichlet Allocation Analysis. JMIR Med Inform 2019; 7:e14401. [PMID: 31573929 PMCID: PMC6774235 DOI: 10.2196/14401] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 06/27/2019] [Accepted: 07/19/2019] [Indexed: 01/08/2023] Open
Abstract
Background Artificial intelligence (AI)–based therapeutics, devices, and systems are vital innovations in cancer control; particularly, they allow for diagnosis, screening, precise estimation of survival, informing therapy selection, and scaling up treatment services in a timely manner. Objective The aim of this study was to analyze the global trends, patterns, and development of interdisciplinary landscapes in AI and cancer research. Methods An exploratory factor analysis was conducted to identify research domains emerging from abstract contents. The Jaccard similarity index was utilized to identify the most frequently co-occurring terms. Latent Dirichlet Allocation was used for classifying papers into corresponding topics. Results From 1991 to 2018, the number of studies examining the application of AI in cancer care has grown to 3555 papers covering therapeutics, capacities, and factors associated with outcomes. Topics with the highest volume of publications include (1) machine learning, (2) comparative effectiveness evaluation of AI-assisted medical therapies, and (3) AI-based prediction. Noticeably, this classification has revealed topics examining the incremental effectiveness of AI applications, the quality of life, and functioning of patients receiving these innovations. The growing research productivity and expansion of multidisciplinary approaches are largely driven by machine learning, artificial neural networks, and AI in various clinical practices. Conclusions The research landscapes show that the development of AI in cancer care is focused on not only improving prediction in cancer screening and AI-assisted therapeutics but also on improving other corresponding areas such as precision and personalized medicine and patient-reported outcomes.
Collapse
Affiliation(s)
- Bach Xuan Tran
- Institute for Preventive Medicine and Public Health, Hanoi Medical University, Hanoi, Vietnam.,Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, United States
| | - Carl A Latkin
- Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, United States
| | - Noha Sharafeldin
- Division of Hematology & Oncology, Department of Medicine, University of Alabama at Birmingham, Birmingham, AL, United States.,Institute for Cancer Outcomes and Survivorship, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Katherina Nguyen
- Department of Science, Technology, and Society, Stanford University, Palo Alto, CA, United States
| | - Giang Thu Vu
- Center of Excellence in Evidence-based Medicine, Nguyen Tat Thanh University, Ho Chi Minh, Vietnam
| | - Wilson W S Tam
- Alice Lee Centre for Nursing Studies, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Ngai-Man Cheung
- Center of Excellence in Artificial Intelligence in Medicine, Nguyen Tat Thanh University, Ho Chi Minh, Vietnam.,Information Systems Technology and Design, Singapore University of Technology and Design, Singapore, Singapore
| | | | - Cyrus S H Ho
- Department of Psychological Medicine, National University Hospital, Singapore, Singapore
| | - Roger C M Ho
- Center of Excellence in Behavior Medicine, Nguyen Tat Thanh University, Ho Chi Minh, Vietnam.,Institute for Health Innovation and Technology, National University of Singapore, Singapore, Singapore.,Department of Psychological Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| |
Collapse
|