Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Liu Q, Chen Q, Shen J, Wu H, Sun Y, Ming WK. Data Analysis and Visualization of Newspaper Articles on Thirdhand Smoke: A Topic Modeling Approach. JMIR Med Inform 2019;7:e12414. [PMID: 30694199 PMCID: PMC6371067 DOI: 10.2196/12414] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2018] [Revised: 12/14/2018] [Accepted: 01/05/2019] [Indexed: 11/13/2022] Open

For:	Liu Q, Chen Q, Shen J, Wu H, Sun Y, Ming WK. Data Analysis and Visualization of Newspaper Articles on Thirdhand Smoke: A Topic Modeling Approach. JMIR Med Inform 2019;7:e12414. [PMID: 30694199 PMCID: PMC6371067 DOI: 10.2196/12414] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2018] [Revised: 12/14/2018] [Accepted: 01/05/2019] [Indexed: 11/13/2022] Open

Number

Cited by Other Article(s)

Zowalla R, Pfeifer D, Wetter T. Readability and topics of the German Health Web: Exploratory study and text analysis. PLoS One 2023;18:e0281582. [PMID: 36763573 PMCID: PMC9916670 DOI: 10.1371/journal.pone.0281582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 01/27/2023] [Indexed: 02/11/2023] Open

Abstract

BACKGROUND

The internet has become an increasingly important resource for health information, especially for lay people. However, the information found does not necessarily comply with the user's health literacy level. Therefore, it is vital to (1) identify prominent information providers, (2) quantify the readability of written health information, and (3) to analyze how different types of information sources are suited for people with differing health literacy levels.

OBJECTIVE

In previous work, we showed the use of a focused crawler to "capture" and describe a large sample of the "German Health Web", which we call the "Sampled German Health Web" (sGHW). It includes health-related web content of the three mostly German speaking countries Germany, Austria, and Switzerland, i.e. country-code top-level domains (ccTLDs) ".de", ".at" and ".ch". Based on the crawled data, we now provide a fully automated readability and vocabulary analysis of a subsample of the sGHW, an analysis of the sGHW's graph structure covering its size, its content providers and a ratio of public to private stakeholders. In addition, we apply Latent Dirichlet Allocation (LDA) to identify topics and themes within the sGHW.

METHODS

Important web sites were identified by applying PageRank on the sGHW's graph representation. LDA was used to discover topics within the top-ranked web sites. Next, a computer-based readability and vocabulary analysis was performed on each health-related web page. Flesch Reading Ease (FRE) and the 4th Vienna formula (WSTF) were used to assess the readability. Vocabulary was assessed by a specifically trained Support Vector Machine classifier.

RESULTS

In total, n = 14,193,743 health-related web pages were collected during the study period of 370 days. The resulting host-aggregated web graph comprises 231,733 nodes connected via 429,530 edges (network diameter = 25; average path length = 6.804; average degree = 1.854; modularity = 0.723). Among 3000 top-ranked pages (1000 per ccTLD according to PageRank), 18.50%(555/3000) belong to web sites from governmental or public institutions, 18.03% (541/3000) from nonprofit organizations, 54.03% (1621/3000) from private organizations, 4.07% (122/3000) from news agencies, 3.87% (116/3000) from pharmaceutical companies, 0.90% (27/3000) from private bloggers, and 0.60% (18/3000) are from others. LDA identified 50 topics, which we grouped into 11 themes: "Research & Science", "Illness & Injury", "The State", "Healthcare structures", "Diet & Food", "Medical Specialities", "Economy", "Food production", "Health communication", "Family" and "Other". The most prevalent themes were "Research & Science" and "Illness & Injury" accounting for 21.04% and 17.92% of all topics across all ccTLDs and provider types, respectively. Our readability analysis reveals that the majority of the collected web sites is structurally difficult or very difficult to read: 84.63% (2539/3000) scored a WSTF ≥ 12, 89.70% (2691/3000) scored a FRE ≤ 49. Moreover, our vocabulary analysis shows that 44.00% (1320/3000) web sites use vocabulary that is well suited for a lay audience.

CONCLUSIONS

We were able to identify major information hubs as well as topics and themes within the sGHW. Results indicate that the readability within the sGHW is low. As a consequence, patients may face barriers, even though the vocabulary used seems appropriate from a medical perspective. In future work, the authors intend to extend their analyses to identify trustworthy health information web sites.

Collapse

de Carvalho VDH, Nepomuceno TCC, Poleto T, Costa APCS. The COVID-19 Infodemic on Twitter: A Space and Time Topic Analysis of the Brazilian Immunization Program and Public Trust. Trop Med Infect Dis 2022;7:425. [PMID: 36548680 PMCID: PMC9783210 DOI: 10.3390/tropicalmed7120425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 11/24/2022] [Accepted: 12/03/2022] [Indexed: 12/14/2022] Open

Abstract

The context of the COVID-19 pandemic has brought to light the infodemic phenomenon and the problem of misinformation. Agencies involved in managing COVID-19 immunization programs are also looking for ways to combat this problem, demanding analytical tools specialized in identifying patterns of misinformation and understanding how they have evolved in time and space to demonstrate their effects on public trust. The aim of this article is to present the results of a study applying topic analysis in space and time with respect to public opinion on the Brazilian COVID-19 immunization program. The analytical process involves applying topic discovery to tweets with geoinformation extracted from the COVID-19 vaccination theme. After extracting the topics, they were submitted to manual annotation, whereby the polarity labels pro, anti, and neutral were applied based on the support and trust in the COVID-19 vaccination. A space and time analysis was carried out using the topic and polarity distributions, making it possible to understand moments during which the most significant quantities of posts occurred and the cities that generated the most tweets. The analytical process describes a framework capable of meeting the needs of agencies for tools, providing indications of how misinformation has evolved and where its dissemination focuses, in addition to defining the granularity of this information according to what managers define as adequate. The following research outcomes can be highlighted. (1) We identified a specific date containing a peak that stands out among the other dates, indicating an event that mobilized public opinion about COVID-19 vaccination. (2) We extracted 23 topics, enabling the manual polarity annotation of each topic and an understanding of which polarities were associated with tweets. (3) Based on the association between polarities, topics, and tweets, it was possible to identify the Brazilian cities that produced the majority of tweets for each polarity and the amount distribution of tweets relative to cities populations.

Collapse

Liu Q, Liang Y, Wang S, Huang Z, Wang Q, Jia M, Li Z, Ming WK. Health Communication through Chinese Media on E-Cigarette: A Topic Modeling Approach. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022;19:ijerph19137591. [PMID: 35805245 PMCID: PMC9265508 DOI: 10.3390/ijerph19137591] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 06/13/2022] [Accepted: 06/15/2022] [Indexed: 02/01/2023]

Fairie P, Zhang Z, D'Souza AG, Walsh T, Quan H, Santana MJ. Categorising patient concerns using natural language processing techniques. BMJ Health Care Inform 2021;28:e100274. [PMID: 34193519 PMCID: PMC8246286 DOI: 10.1136/bmjhci-2020-100274] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 05/20/2021] [Indexed: 11/23/2022] Open

Abstract

OBJECTIVES

Patient feedback is critical to identify and resolve patient safety and experience issues in healthcare systems. However, large volumes of unstructured text data can pose problems for manual (human) analysis. This study reports the results of using a semiautomated, computational topic-modelling approach to analyse a corpus of patient feedback.

METHODS

Patient concerns were received by Alberta Health Services between 2011 and 2018 (n=76 163), regarding 806 care facilities in 163 municipalities, including hospitals, clinics, community care centres and retirement homes, in a province of 4.4 million. Their existing framework requires manual labelling of pre-defined categories. We applied an automated latent Dirichlet allocation (LDA)-based topic modelling algorithm to identify the topics present in these concerns, and thereby produce a framework-free categorisation.

RESULTS

The LDA model produced 40 topics which, following manual interpretation by researchers, were reduced to 28 coherent topics. The most frequent topics identified were communication issues causing delays (frequency: 10.58%), community care for elderly patients (8.82%), interactions with nurses (8.80%) and emergency department care (7.52%). Many patient concerns were categorised into multiple topics. Some were more specific versions of categories from the existing framework (eg, communication issues causing delays), while others were novel (eg, smoking in inappropriate settings).

DISCUSSION

LDA-generated topics were more nuanced than the manually labelled categories. For example, LDA found that concerns with community care were related to concerns about nursing for seniors, providing opportunities for insight and action.

CONCLUSION

Our findings outline the range of concerns patients share in a large health system and demonstrate the usefulness of using LDA to identify categories of patient concerns.

Collapse

Feldhege J, Moessner M, Bauer S. Who says what? Content and participation characteristics in an online depression community. J Affect Disord 2020;263:521-527. [PMID: 31780138 DOI: 10.1016/j.jad.2019.11.007] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 10/14/2019] [Accepted: 11/02/2019] [Indexed: 11/26/2022]

Tran BX, Latkin CA, Sharafeldin N, Nguyen K, Vu GT, Tam WWS, Cheung NM, Nguyen HLT, Ho CSH, Ho RCM. Characterizing Artificial Intelligence Applications in Cancer Research: A Latent Dirichlet Allocation Analysis. JMIR Med Inform 2019;7:e14401. [PMID: 31573929 PMCID: PMC6774235 DOI: 10.2196/14401] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 06/27/2019] [Accepted: 07/19/2019] [Indexed: 01/08/2023] Open