Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Mohammadhassanzadeh H, Sketris I, Traynor R, Alexander S, Winquist B, Stewart SA. Using Natural Language Processing to Examine the Uptake, Content, and Readability of Media Coverage of a Pan-Canadian Drug Safety Research Project: Cross-Sectional Observational Study. JMIR Form Res 2020;4:e13296. [PMID: 31934872 PMCID: PMC6996767 DOI: 10.2196/13296] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Revised: 07/11/2019] [Accepted: 09/26/2019] [Indexed: 11/18/2022] Open

For:	Mohammadhassanzadeh H, Sketris I, Traynor R, Alexander S, Winquist B, Stewart SA. Using Natural Language Processing to Examine the Uptake, Content, and Readability of Media Coverage of a Pan-Canadian Drug Safety Research Project: Cross-Sectional Observational Study. JMIR Form Res 2020;4:e13296. [PMID: 31934872 PMCID: PMC6996767 DOI: 10.2196/13296] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Revised: 07/11/2019] [Accepted: 09/26/2019] [Indexed: 11/18/2022] Open

Number

Cited by Other Article(s)

Shakeri Hossein Abad Z, Butler GP, Thompson W, Lee J. Physical activity, sedentary behaviour, and sleep on Twitter: A multicountry and fully labelled dataset for public health surveillance research (Preprint). JMIR Public Health Surveill 2021;8:e32355. [PMID: 35156938 PMCID: PMC8887637 DOI: 10.2196/32355] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 10/02/2021] [Accepted: 10/13/2021] [Indexed: 01/30/2023] Open

Abstract

Background

Advances in automated data processing and machine learning (ML) models, together with the unprecedented growth in the number of social media users who publicly share and discuss health-related information, have made public health surveillance (PHS) one of the long-lasting social media applications. However, the existing PHS systems feeding on social media data have not been widely deployed in national surveillance systems, which appears to stem from the lack of practitioners and the public’s trust in social media data. More robust and reliable data sets over which supervised ML models can be trained and tested reliably is a significant step toward overcoming this hurdle. The health implications of daily behaviors (physical activity, sedentary behavior, and sleep [PASS]), as an evergreen topic in PHS, are widely studied through traditional data sources such as surveillance surveys and administrative databases, which are often several months out-of-date by the time they are used, costly to collect, and thus limited in quantity and coverage.

Objective

The main objective of this study is to present a large-scale, multicountry, longitudinal, and fully labeled data set to enable and support digital PASS surveillance research in PHS. To support high-quality surveillance research using our data set, we have conducted further analysis on the data set to supplement it with additional PHS-related metadata.

Methods

We collected the data of this study from Twitter using the Twitter livestream application programming interface between November 28, 2018, and June 19, 2020. To obtain PASS-related tweets for manual annotation, we iteratively used regular expressions, unsupervised natural language processing, domain-specific ontologies, and linguistic analysis. We used Amazon Mechanical Turk to label the collected data to self-reported PASS categories and implemented a quality control pipeline to monitor and manage the validity of crowd-generated labels. Moreover, we used ML, latent semantic analysis, linguistic analysis, and label inference analysis to validate the different components of the data set.

Results

LPHEADA (Labelled Digital Public Health Dataset) contains 366,405 crowd-generated labels (3 labels per tweet) for 122,135 PASS-related tweets that originated in Australia, Canada, the United Kingdom, or the United States, labeled by 708 unique annotators on Amazon Mechanical Turk. In addition to crowd-generated labels, LPHEADA provides details about the three critical components of any PHS system: place, time, and demographics (ie, gender and age range) associated with each tweet.

Conclusions

Publicly available data sets for digital PASS surveillance are usually isolated and only provide labels for small subsets of the data. We believe that the novelty and comprehensiveness of the data set provided in this study will help develop, evaluate, and deploy digital PASS surveillance systems. LPHEADA will be an invaluable resource for both public health researchers and practitioners.

Collapse

Accuracy of Algorithms to Identify People with Atopic Dermatitis in Ontario Routinely Collected Health Databases. J Invest Dermatol 2021;141:1840-1843. [PMID: 33571528 DOI: 10.1016/j.jid.2021.01.009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Revised: 12/31/2020] [Accepted: 01/04/2021] [Indexed: 11/20/2022]

Baclic O, Tunis M, Young K, Doan C, Swerdfeger H, Schonfeld J. Challenges and opportunities for public health made possible by advances in natural language processing. CANADA COMMUNICABLE DISEASE REPORT = RELEVE DES MALADIES TRANSMISSIBLES AU CANADA 2020;46:161-168. [PMID: 32673380 PMCID: PMC7343054 DOI: 10.14745/ccdr.v46i06a02] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]