1
|
Carpenter KA, Nguyen AT, Smith DA, Samori IA, Humphreys K, Lembke A, Kiang MV, Eichstaedt JC, Altman RB. Which social media platforms facilitate monitoring the opioid crisis? MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.07.06.24310035. [PMID: 39006412 PMCID: PMC11245080 DOI: 10.1101/2024.07.06.24310035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Social media can provide real-time insight into trends in substance use, addiction, and recovery. Prior studies have used platforms such as Reddit and X (formerly Twitter), but evolving policies around data access have threatened these platforms' usability in research. We evaluate the potential of a broad set of platforms to detect emerging trends in the opioid epidemic. From these, we created a shortlist of 11 platforms, for which we documented official policies regulating drug-related discussion, data accessibility, geolocatability, and prior use in opioid-related studies. We quantified their volumes of opioid discussion, capturing informal language by including slang generated using a large language model. Beyond the most commonly used Reddit and X, the platforms with high potential for use in opioid-related surveillance are TikTok, YouTube, and Facebook. Leveraging many different social platforms, instead of a single platform, safeguards against sudden changes to data access and may better capture all populations that use opioids than any single platform.
Collapse
|
2
|
Lyons RA, Gabbe BJ, Vallmuur K. Potential for advances in data linkage and data science to support injury prevention research. Inj Prev 2024:ip-2024-045367. [PMID: 39362751 DOI: 10.1136/ip-2024-045367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Accepted: 09/14/2024] [Indexed: 10/05/2024]
Abstract
The recent COVID-19 pandemic stimulated unprecedented linkage of datasets worldwide, and while injury is endemic rather than pandemic, there is much to be learned by the injury prevention community from the data science approaches taken to respond to the pandemic to support research into the primary, secondary and tertiary prevention of injuries. The use of routinely collected data to produce real-world evidence, as an alternative to clinical trials, has been gaining in popularity as the availability and quality of digital health platforms grow and the linkage landscape, and the analytics required to make best use of linked and unstructured data, is rapidly evolving. Capitalising on existing data sources, innovative linkage and advanced analytic approaches provides the opportunity to undertake novel injury prevention research and generate new knowledge, while avoiding data waste and additional burden to participants. We provide a tangible, but not exhaustive, list of examples showing the breadth and value of data linkage, along with the emerging capabilities of natural language processing techniques to enhance injury research. To optimise data science approaches to injury prevention, injury researchers in this area need to share methods, code, models and tools to improve consistence and efficiencies in this field. Increased collaboration between injury prevention researchers and data scientists working on population data linkage systems has much to offer this field of research.
Collapse
Affiliation(s)
- Ronan A Lyons
- Population Data Science, Swansea University, Swansea, Swansea, UK
- School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria, Australia
- Administrative Data Research Wales, Swansea University Medical School, Swansea University, Swansea, UK
| | - Belinda J Gabbe
- Population Data Science, Swansea University, Swansea, Swansea, UK
- School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria, Australia
| | - Kirsten Vallmuur
- Australian Centre for Health Services Innovation (AusHSI), Queensland University of Technology (QUT), Brisbane, Queensland, Australia
- Jamieson Trauma Institute, Royal Brisbane & Women's Hospital (RBWH), Brisbane, Queensland, Australia
| |
Collapse
|
3
|
Almeida A, Patton T, Conway M, Gupta A, Strathdee SA, Bórquez A. The Use of Natural Language Processing Methods in Reddit to Investigate Opioid Use: Scoping Review. JMIR INFODEMIOLOGY 2024; 4:e51156. [PMID: 39269743 PMCID: PMC11437337 DOI: 10.2196/51156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Revised: 06/01/2024] [Accepted: 06/18/2024] [Indexed: 09/15/2024]
Abstract
BACKGROUND The growing availability of big data spontaneously generated by social media platforms allows us to leverage natural language processing (NLP) methods as valuable tools to understand the opioid crisis. OBJECTIVE We aimed to understand how NLP has been applied to Reddit (Reddit Inc) data to study opioid use. METHODS We systematically searched for peer-reviewed studies and conference abstracts in PubMed, Scopus, PsycINFO, ACL Anthology, IEEE Xplore, and Association for Computing Machinery data repositories up to July 19, 2022. Inclusion criteria were studies investigating opioid use, using NLP techniques to analyze the textual corpora, and using Reddit as the social media data source. We were specifically interested in mapping studies' overarching goals and findings, methodologies and software used, and main limitations. RESULTS In total, 30 studies were included, which were classified into 4 nonmutually exclusive overarching goal categories: methodological (n=6, 20% studies), infodemiology (n=22, 73% studies), infoveillance (n=7, 23% studies), and pharmacovigilance (n=3, 10% studies). NLP methods were used to identify content relevant to opioid use among vast quantities of textual data, to establish potential relationships between opioid use patterns or profiles and contextual factors or comorbidities, and to anticipate individuals' transitions between different opioid-related subreddits, likely revealing progression through opioid use stages. Most studies used an embedding technique (12/30, 40%), prediction or classification approach (12/30, 40%), topic modeling (9/30, 30%), and sentiment analysis (6/30, 20%). The most frequently used programming languages were Python (20/30, 67%) and R (2/30, 7%). Among the studies that reported limitations (20/30, 67%), the most cited was the uncertainty regarding whether redditors participating in these forums were representative of people who use opioids (8/20, 40%). The papers were very recent (28/30, 93%), from 2019 to 2022, with authors from a range of disciplines. CONCLUSIONS This scoping review identified a wide variety of NLP techniques and applications used to support surveillance and social media interventions addressing the opioid crisis. Despite the clear potential of these methods to enable the identification of opioid-relevant content in Reddit and its analysis, there are limits to the degree of interpretive meaning that they can provide. Moreover, we identified the need for standardized ethical guidelines to govern the use of Reddit data to safeguard the anonymity and privacy of people using these forums.
Collapse
Affiliation(s)
- Alexandra Almeida
- Scientific Computing Program, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil
- San Diego State University, School of Social Work, San Diego, CA, United States
- Department of Medicine, University of California San Diego, San Diego, CA, United States
| | - Thomas Patton
- Department of Medicine, University of California San Diego, San Diego, CA, United States
| | - Mike Conway
- School of Computing and Information Systems, The University of Melbourne, Melbourne, Australia
| | - Amarnath Gupta
- San Diego Supercomputer Center, University of California San Diego, San Diego, CA, United States
| | - Steffanie A Strathdee
- Department of Medicine, University of California San Diego, San Diego, CA, United States
| | - Annick Bórquez
- Department of Medicine, University of California San Diego, San Diego, CA, United States
| |
Collapse
|
4
|
Jung S, Murthy D, Bateineh BS, Loukas A, Wilkinson AV. The Normalization of Vaping on TikTok Using Computer Vision, Natural Language Processing, and Qualitative Thematic Analysis: Mixed Methods Study. J Med Internet Res 2024; 26:e55591. [PMID: 39259963 PMCID: PMC11425021 DOI: 10.2196/55591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 04/07/2024] [Accepted: 05/20/2024] [Indexed: 09/13/2024] Open
Abstract
BACKGROUND Social media posts that portray vaping in positive social contexts shape people's perceptions and serve to normalize vaping. Despite restrictions on depicting or promoting controlled substances, vape-related content is easily accessible on TikTok. There is a need to understand strategies used in promoting vaping on TikTok, especially among susceptible youth audiences. OBJECTIVE This study seeks to comprehensively describe direct (ie, explicit promotional efforts) and indirect (ie, subtler strategies) themes promoting vaping on TikTok using a mixture of computational and qualitative thematic analyses of social media posts. In addition, we aim to describe how these themes might play a role in normalizing vaping behavior on TikTok for youth audiences, thereby informing public health communication and regulatory policies regarding vaping endorsements on TikTok. METHODS We collected 14,002 unique TikTok posts using 50 vape-related hashtags (eg, #vapetok and #boxmod). Using the k-means unsupervised machine learning algorithm, we identified clusters and then categorized posts qualitatively based on themes. Next, we organized all videos from the posts thematically and extracted the visual features of each theme using 3 machine learning-based model architectures: residual network (ResNet) with 50 layers (ResNet50), Visual Geometry Group model with 16 layers, and vision transformer. We chose the best-performing model, ResNet50, to thoroughly analyze the image clustering output. To assess clustering accuracy, we examined 4.01% (441/10,990) of the samples from each video cluster. Finally, we randomly selected 50 videos (5% of the total videos) from each theme, which were qualitatively coded and compared with the machine-derived classification for validation. RESULTS We successfully identified 5 major themes from the TikTok posts. Vape product marketing (1160/10,990, 8.28%) reflected direct marketing, while the other 4 themes reflected indirect marketing: TikTok influencer (3775/14,002, 26.96%), general vape (2741/14,002, 19.58%), vape brands (2042/14,002, 14.58%), and vaping cessation (1272/14,002, 9.08%). The ResNet50 model successfully classified clusters based on image features, achieving an average F1-score of 0.97, the highest among the 3 models. Qualitative content analyses indicated that vaping was depicted as a normal, routine part of daily life, with TikTok influencers subtly incorporating vaping into popular culture (eg, gaming, skateboarding, and tattooing) and social practices (eg, shopping sprees, driving, and grocery shopping). CONCLUSIONS The results from both computational and qualitative analyses of text and visual data reveal that vaping is normalized on TikTok. Our identified themes underscore how everyday conversations, promotional content, and the influence of popular figures collectively contribute to depicting vaping as a normal and accepted aspect of daily life on TikTok. Our study provides valuable insights for regulatory policies and public health initiatives aimed at tackling the normalization of vaping on social media platforms.
Collapse
Affiliation(s)
- Sungwon Jung
- School of Journalism and Media, University of Texas at Austin, Austin, TX, United States
| | - Dhiraj Murthy
- School of Journalism and Media, University of Texas at Austin, Austin, TX, United States
| | - Bara S Bateineh
- University of Texas Health Science Center at Houston School of Public Health, Houston, TX, United States
| | - Alexandra Loukas
- Department of Kinesiology and Health Education, University of Texas at Austin, Austin, TX, United States
| | - Anna V Wilkinson
- University of Texas Health Science Center at Houston School of Public Health, Houston, TX, United States
| |
Collapse
|
5
|
Canfell OJ, Woods L, Meshkat Y, Krivit J, Gunashanhar B, Slade C, Burton-Jones A, Sullivan C. The Impact of Digital Hospitals on Patient and Clinician Experience: Systematic Review and Qualitative Evidence Synthesis. J Med Internet Res 2024; 26:e47715. [PMID: 38466978 PMCID: PMC10964148 DOI: 10.2196/47715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 11/08/2023] [Accepted: 01/31/2024] [Indexed: 03/13/2024] Open
Abstract
BACKGROUND The digital transformation of health care is advancing rapidly. A well-accepted framework for health care improvement is the Quadruple Aim: improved clinician experience, improved patient experience, improved population health, and reduced health care costs. Hospitals are attempting to improve care by using digital technologies, but the effectiveness of these technologies is often only measured against cost and quality indicators, and less is known about the clinician and patient experience. OBJECTIVE This study aims to conduct a systematic review and qualitative evidence synthesis to assess the clinician and patient experience of digital hospitals. METHODS The PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) and ENTREQ (Enhancing the Transparency in Reporting the Synthesis of Qualitative Research) guidelines were followed. The PubMed, Embase, Scopus, CINAHL, and PsycINFO databases were searched from January 2010 to June 2022. Studies that explored multidisciplinary clinician or adult inpatient experiences of digital hospitals (with a full electronic medical record) were included. Study quality was assessed using the Mixed Methods Appraisal Tool. Data synthesis was performed narratively for quantitative studies. Qualitative evidence synthesis was performed via (1) automated machine learning text analytics using Leximancer (Leximancer Pty Ltd) and (2) researcher-led inductive synthesis to generate themes. RESULTS A total of 61 studies (n=39, 64% quantitative; n=15, 25% qualitative; and n=7, 11% mixed methods) were included. Most studies (55/61, 90%) investigated clinician experiences, whereas few (10/61, 16%) investigated patient experiences. The study populations ranged from 8 to 3610 clinicians, 11 to 34,425 patients, and 5 to 2836 hospitals. Quantitative outcomes indicated that clinicians had a positive overall satisfaction (17/24, 71% of the studies) with digital hospitals, and most studies (11/19, 58%) reported a positive sentiment toward usability. Data accessibility was reported positively, whereas adaptation, clinician-patient interaction, and workload burnout were reported negatively. The effects of digital hospitals on patient safety and clinicians' ability to deliver patient care were mixed. The qualitative evidence synthesis of clinician experience studies (18/61, 30%) generated 7 themes: inefficient digital documentation, inconsistent data quality, disruptions to conventional health care relationships, acceptance, safety versus risk, reliance on hybrid (digital and paper) workflows, and patient data privacy. There was weak evidence of a positive association between digital hospitals and patient satisfaction scores. CONCLUSIONS Clinicians' experience of digital hospitals appears positive according to high-level indicators (eg, overall satisfaction and data accessibility), but the qualitative evidence synthesis revealed substantive tensions. There is insufficient evidence to draw a definitive conclusion on the patient experience within digital hospitals, but indications appear positive or agnostic. Future research must prioritize equitable investigation and definition of the digital clinician and patient experience to achieve the Quadruple Aim of health care.
Collapse
Affiliation(s)
- Oliver J Canfell
- Centre for Health Services Research, Faculty of Medicine, The University of Queensland, Brisbane, Australia
- Queensland Digital Health Centre, Faculty of Medicine, The University of Queensland, Brisbane, Australia
- Digital Health Cooperative Research Centre, Australian Government, Sydney, Australia
- UQ Business School, Faculty of Business, Economics and Law, The University of Queensland, Brisbane, Australia
| | - Leanna Woods
- Centre for Health Services Research, Faculty of Medicine, The University of Queensland, Brisbane, Australia
- Queensland Digital Health Centre, Faculty of Medicine, The University of Queensland, Brisbane, Australia
| | - Yasaman Meshkat
- School of Clinical Medicine, Faculty of Medicine, The University of Queensland, Brisbane, Australia
| | - Jenna Krivit
- School of Clinical Medicine, Faculty of Medicine, The University of Queensland, Brisbane, Australia
| | - Brinda Gunashanhar
- School of Clinical Medicine, Faculty of Medicine, The University of Queensland, Brisbane, Australia
| | - Christine Slade
- Institute for Teaching and Learning Innovation, The University of Queensland, Brisbane, Australia
| | - Andrew Burton-Jones
- UQ Business School, Faculty of Business, Economics and Law, The University of Queensland, Brisbane, Australia
| | - Clair Sullivan
- Centre for Health Services Research, Faculty of Medicine, The University of Queensland, Brisbane, Australia
- Queensland Digital Health Centre, Faculty of Medicine, The University of Queensland, Brisbane, Australia
- Metro North Hospital and Health Service, Department of Health, Queensland Government, Brisbane, Australia
| |
Collapse
|
6
|
Walker AL, LoParco C, Rossheim ME, Livingston MD. #Delta8: a retailer-driven increase in Delta-8 THC discussions on Twitter from 2020 to 2021. THE AMERICAN JOURNAL OF DRUG AND ALCOHOL ABUSE 2023; 49:491-499. [PMID: 37433117 PMCID: PMC11022156 DOI: 10.1080/00952990.2023.2222433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 06/02/2023] [Accepted: 06/04/2023] [Indexed: 07/13/2023]
Abstract
Background: Delta-8 tetrahydrocannabinol (THC) has experienced significant cultivation, use, and online marketing growth in recent years.Objectives: This study utilized natural language processing on Twitter data to examine trends in public discussions regarding this novel psychoactive substance.Methods: This study analyzed the frequency of #Delta8 tweets over time, most commonly used words, sentiment classification of words in tweets, and a qualitative analysis of a random sample of tweets containing the hashtag "Delta8" from January 1, 2020 to September 26, 2021.Results: A total of 41,828 tweets were collected, with 30,826 unique tweets (73.7%) and 11,002 quotes, retweets, or replies (26.3%). Tweet activity increased from 2020 to 2021, with daily original tweets rising from 8.55 to 149. This increase followed a high-engagement retailer promotion in June 2021. Commonly used terms included "cbd," "cannabis," "edibles," and "cbdoil." Sentiment classification revealed a predominance of "positive" (30.93%) and "trust" (14.26%) categorizations, with 8.42% classified as "negative." Qualitative analysis identified 20 codes, encompassing substance type, retailers, links, and other characteristics.Conclusion: Twitter discussions on Delta-8 THC exhibited a sustained increase in prevalence from 2020 to 2022, with online retailers playing a dominant role. The content also demonstrated significant overlap with cannabidiol and various cannabis products. Given the growing presence of retailer marketing and sales on social media, it is crucial for public health researchers to monitor and promote relevant Delta-8 health recommendations on these platforms to ensure a balanced conversation.
Collapse
Affiliation(s)
- Andrew L. Walker
- Department of Behavioral, Social, and Health Education Sciences, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Cassidy LoParco
- Department of Health Behavior and Health Systems, School of Public Health, University of North Texas Health Science Center, Fort Worth, TX, USA
| | - Matthew E. Rossheim
- Department of Health Behavior and Health Systems, School of Public Health, University of North Texas Health Science Center, Fort Worth, TX, USA
| | - Melvin D. Livingston
- Department of Behavioral, Social, and Health Education Sciences, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| |
Collapse
|
7
|
Sarker A, Lakamana S, Guo Y, Ge Y, Leslie A, Okunromade O, Gonzalez-Polledo E, Perrone J, McKenzie-Brown AM. #ChronicPain: Automated Building of a Chronic Pain Cohort from Twitter Using Machine Learning. HEALTH DATA SCIENCE 2023; 3:0078. [PMID: 38333075 PMCID: PMC10852024 DOI: 10.34133/hds.0078] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 06/12/2023] [Indexed: 02/10/2024]
Abstract
Background Due to the high burden of chronic pain, and the detrimental public health consequences of its treatment with opioids, there is a high-priority need to identify effective alternative therapies. Social media is a potentially valuable resource for knowledge about self-reported therapies by chronic pain sufferers. Methods We attempted to (a) verify the presence of large-scale chronic pain-related chatter on Twitter, (b) develop natural language processing and machine learning methods for automatically detecting self-disclosures, (c) collect longitudinal data posted by them, and (d) semiautomatically analyze the types of chronic pain-related information reported by them. We collected data using chronic pain-related hashtags and keywords and manually annotated 4,998 posts to indicate if they were self-reports of chronic pain experiences. We trained and evaluated several state-of-the-art supervised text classification models and deployed the best-performing classifier. We collected all publicly available posts from detected cohort members and conducted manual and natural language processing-driven descriptive analyses. Results Interannotator agreement for the binary annotation was 0.82 (Cohen's kappa). The RoBERTa model performed best (F1 score: 0.84; 95% confidence interval: 0.80 to 0.89), and we used this model to classify all collected unlabeled posts. We discovered 22,795 self-reported chronic pain sufferers and collected over 3 million of their past posts. Further analyses revealed information about, but not limited to, alternative treatments, patient sentiments about treatments, side effects, and self-management strategies. Conclusion Our social media based approach will result in an automatically growing large cohort over time, and the data can be leveraged to identify effective opioid-alternative therapies for diverse chronic pain types.
Collapse
Affiliation(s)
- Abeed Sarker
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA, USA
| | - Sahithi Lakamana
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA, USA
| | - Yuting Guo
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA, USA
| | - Yao Ge
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA, USA
| | - Abimbola Leslie
- Department of Radiology, Robert Larner College of Medicine, University of Vermont, Burlington, VT, USA
| | - Omolola Okunromade
- Department of Health Policy and Community Health, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, GA, USA
| | | | - Jeanmarie Perrone
- Department of Emergency Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | | |
Collapse
|
8
|
Surveillance of communicable diseases using social media: A systematic review. PLoS One 2023; 18:e0282101. [PMID: 36827297 PMCID: PMC9956027 DOI: 10.1371/journal.pone.0282101] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 02/07/2023] [Indexed: 02/25/2023] Open
Abstract
BACKGROUND Communicable diseases pose a severe threat to public health and economic growth. The traditional methods that are used for public health surveillance, however, involve many drawbacks, such as being labor intensive to operate and resulting in a lag between data collection and reporting. To effectively address the limitations of these traditional methods and to mitigate the adverse effects of these diseases, a proactive and real-time public health surveillance system is needed. Previous studies have indicated the usefulness of performing text mining on social media. OBJECTIVE To conduct a systematic review of the literature that used textual content published to social media for the purpose of the surveillance and prediction of communicable diseases. METHODOLOGY Broad search queries were formulated and performed in four databases. Both journal articles and conference materials were included. The quality of the studies, operationalized as reliability and validity, was assessed. This qualitative systematic review was guided by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. RESULTS Twenty-three publications were included in this systematic review. All studies reported positive results for using textual social media content to surveille communicable diseases. Most studies used Twitter as a source for these data. Influenza was studied most frequently, while other communicable diseases received far less attention. Journal articles had a higher quality (reliability and validity) than conference papers. However, studies often failed to provide important information about procedures and implementation. CONCLUSION Text mining of health-related content published on social media can serve as a novel and powerful tool for the automated, real-time, and remote monitoring of public health and for the surveillance and prediction of communicable diseases in particular. This tool can address limitations related to traditional surveillance methods, and it has the potential to supplement traditional methods for public health surveillance.
Collapse
|
9
|
Fang C, Markuzon N, Patel N, Rueda JD. Natural Language Processing for Automated Classification of Qualitative Data From Interviews of Patients With Cancer. VALUE IN HEALTH : THE JOURNAL OF THE INTERNATIONAL SOCIETY FOR PHARMACOECONOMICS AND OUTCOMES RESEARCH 2022; 25:1995-2002. [PMID: 35840523 DOI: 10.1016/j.jval.2022.06.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 05/19/2022] [Accepted: 06/12/2022] [Indexed: 06/15/2023]
Abstract
OBJECTIVES This study sought to explore the use of novel natural language processing (NLP) methods for classifying unstructured, qualitative textual data from interviews of patients with cancer to identify patient-reported symptoms and impacts on quality of life. METHODS We tested the ability of 4 NLP models to accurately classify text from interview transcripts as "symptom," "quality of life impact," and "other." Interview data sets from patients with hepatocellular carcinoma (HCC) (n = 25), biliary tract cancer (BTC) (n = 23), and gastric cancer (n = 24) were used. Models were cross-validated with transcript subsets designated for training, validation, and testing. Multiclass classification performance of the 4 models was evaluated at paragraph and sentence level using the HCC testing data set and analyzed by the one-versus-rest technique quantified by the receiver operating characteristic area under the curve (ROC AUC) score. RESULTS NLP models accurately classified multiclass text from patient interviews. The Bidirectional Encoder Representations from Transformers model generally outperformed all other models at paragraph and sentence level. The highest predictive performance of the Bidirectional Encoder Representations from Transformers model was observed using the HCC data set to train and BTC data set to test (mean ROC AUC, 0.940 [SD 0.028]), with similarly high predictive performance using balanced and imbalanced training data sets from BTC and gastric cancer populations. CONCLUSIONS NLP models were accurate in predicting multiclass classification of text from interviews of patients with cancer, with most surpassing 0.9 ROC AUC at paragraph level. NLP may be a useful tool for scaling up processing of patient interviews in clinical studies and, thus, could serve to facilitate patient input into drug development and improving patient care.
Collapse
Affiliation(s)
- Chao Fang
- Oncology Biometrics ML/AI, AstraZeneca, Waltham, MA, USA
| | | | - Nikunj Patel
- US Medical Affairs, AstraZeneca, Gaithersburg, MD, USA
| | - Juan-David Rueda
- Oncology Market Access and Pricing, AstraZeneca, Gaithersburg, MD, USA
| |
Collapse
|
10
|
Discussions About COVID-19 Vaccination on Twitter in Turkey: Sentiment Analysis. Disaster Med Public Health Prep 2022; 17:e266. [PMID: 36226686 DOI: 10.1017/dmp.2022.229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
OBJECTIVES The present study aims to examine coronavirus disease 2019 (COVID-19) vaccination discussions on Twitter in Turkey and conduct sentiment analysis. METHODS The current study performed sentiment analysis of Twitter data with the artificial intelligence (AI) Natural Language Processing (NLP) method. The tweets were retrieved retrospectively from March 10, 2020, when the first COVID-19 case was seen in Turkey, to April 18, 2022. A total of 10,308 tweets accessed. The data were filtered before analysis due to excessive noise. First, the text is tokenized. Many steps were applied in normalizing texts. Tweets about the COVID-19 vaccines were classified according to basic emotion categories using sentiment analysis. The resulting dataset was used for training and testing ML (ML) classifiers. RESULTS It was determined that 7.50% of the tweeters had positive, 0.59% negative, and 91.91% neutral opinions about the COVID-19 vaccination. When the accuracy values of the ML algorithms used in this study were examined, it was seen that the XGBoost (XGB) algorithm had higher scores. CONCLUSIONS Three of 4 tweets consist of negative and neutral emotions. The responsibility of professional chambers and the public is essential in transforming these neutral and negative feelings into positive ones.
Collapse
|
11
|
Kang YB, McCosker A, Kamstra P, Farmer J. Resilience in Web-Based Mental Health Communities: Building a Resilience Dictionary With Semiautomatic Text Analysis. JMIR Form Res 2022; 6:e39013. [PMID: 36136394 PMCID: PMC9539645 DOI: 10.2196/39013] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 06/06/2022] [Accepted: 08/18/2022] [Indexed: 11/13/2022] Open
Abstract
Background Resilience is an accepted strengths-based concept that responds to change, adversity, and crises. This concept underpins both personal and community-based preventive approaches to mental health issues and shapes digital interventions. Online mental health peer-support forums have played a prominent role in enhancing resilience by providing accessible places for sharing lived experiences of mental issues and finding support. However, little research has been conducted on whether and how resilience is realized, hindering service providers’ ability to optimize resilience outcomes. Objective This study aimed to create a resilience dictionary that reflects the characteristics and realization of resilience within online mental health peer-support forums. The findings can be used to guide further analysis and improve resilience outcomes in mental health forums through targeted moderation and management. Methods A semiautomatic approach to creating a resilience dictionary was proposed using topic modeling and qualitative content analysis. We present a systematic 4-phase analysis pipeline that preprocesses raw forum posts, discovers core themes, conceptualizes resilience indicators, and generates a resilience dictionary. Our approach was applied to a mental health forum run by SANE (Schizophrenia: A National Emergency) Australia, with 70,179 forum posts between 2018 and 2020 by 2357 users being analyzed. Results The resilience dictionary and taxonomy developed in this study, reveal how resilience indicators (ie, “social capital,” “belonging,” “learning,” “adaptive capacity,” and “self-efficacy”) are characterized by themes commonly discussed in the forums; each theme’s top 10 most relevant descriptive terms and their synonyms; and the relatedness of resilience, reflecting a taxonomy of indicators that are more comprehensive (or compound) and more likely to facilitate the realization of others. The study showed that the resilience indicators “learning,” “belonging,” and “social capital” were more commonly realized, and “belonging” and “learning” served as foundations for “social capital” and “adaptive capacity” across the 2-year study period. Conclusions This study presents a resilience dictionary that improves our understanding of how aspects of resilience are realized in web-based mental health forums. The dictionary provides novel guidance on how to improve training to support and enhance automated systems for moderating mental health forum discussions.
Collapse
Affiliation(s)
- Yong-Bin Kang
- Australian Research Council (ARC) Centre of Excellence for Automated Decision-Making and Society (ADM+S), Swinburne University of Technology, Victoria, Australia
| | - Anthony McCosker
- Australian Research Council (ARC) Centre of Excellence for Automated Decision-Making and Society (ADM+S), Swinburne University of Technology, Victoria, Australia
- Social Innovation Research Institute, Swinburne University of Technology, Victoria, Australia
| | - Peter Kamstra
- Social Innovation Research Institute, Swinburne University of Technology, Victoria, Australia
| | - Jane Farmer
- Social Innovation Research Institute, Swinburne University of Technology, Victoria, Australia
| |
Collapse
|
12
|
Guo Y, Ge Y, Yang YC, Al-Garadi MA, Sarker A. Comparison of Pretraining Models and Strategies for Health-Related Social Media Text Classification. Healthcare (Basel) 2022; 10:healthcare10081478. [PMID: 36011135 PMCID: PMC9408372 DOI: 10.3390/healthcare10081478] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 07/29/2022] [Accepted: 08/02/2022] [Indexed: 11/24/2022] Open
Abstract
Pretrained contextual language models proposed in the recent past have been reported to achieve state-of-the-art performances in many natural language processing (NLP) tasks, including those involving health-related social media data. We sought to evaluate the effectiveness of different pretrained transformer-based models for social media-based health-related text classification tasks. An additional objective was to explore and propose effective pretraining strategies to improve machine learning performance on such datasets and tasks. We benchmarked six transformer-based models that were pretrained with texts from different domains and sources—BERT, RoBERTa, BERTweet, TwitterBERT, BioClinical_BERT, and BioBERT—on 22 social media-based health-related text classification tasks. For the top-performing models, we explored the possibility of further boosting performance by comparing several pretraining strategies: domain-adaptive pretraining (DAPT), source-adaptive pretraining (SAPT), and a novel approach called topic specific pretraining (TSPT). We also attempted to interpret the impacts of distinct pretraining strategies by visualizing document-level embeddings at different stages of the training process. RoBERTa outperformed BERTweet on most tasks, and better than others. BERT, TwitterBERT, BioClinical_BERT and BioBERT consistently underperformed. For pretraining strategies, SAPT performed better or comparable to the off-the-shelf models, and significantly outperformed DAPT. SAPT + TSPT showed consistently high performance, with statistically significant improvement in three tasks. Our findings demonstrate that RoBERTa and BERTweet are excellent off-the-shelf models for health-related social media text classification, and extended pretraining using SAPT and TSPT can further improve performance.
Collapse
Affiliation(s)
- Yuting Guo
- Department of Biomedical Informatics, Emory University, Atlanta, GA 30322, USA
- Correspondence:
| | - Yao Ge
- Department of Biomedical Informatics, Emory University, Atlanta, GA 30322, USA
| | - Yuan-Chi Yang
- Department of Biomedical Informatics, Emory University, Atlanta, GA 30322, USA
| | - Mohammed Ali Al-Garadi
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37240, USA
| | - Abeed Sarker
- Department of Biomedical Informatics, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
13
|
Lushington GH, Zgurzynski MI. Can the Written Word Fuel Pharmaceutical Innovation? Part 1. An Emerging Vista from von Economo to COVID-19. Comb Chem High Throughput Screen 2022; 25:1237-1238. [PMID: 35466871 DOI: 10.2174/1386207325666220422135755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 03/12/2022] [Accepted: 03/12/2022] [Indexed: 11/22/2022]
Affiliation(s)
- Gerald H Lushington
- Qnapsyn Biosciences, Inc. 16 Dekalb Pike, Suite 248, Blue Bell, PA 19422, USA
| | - Mary I Zgurzynski
- Boston College, Communication Dept., 140 Commonwealth Ave. Chestnut Hill, MA 02467, USA
| |
Collapse
|
14
|
Valdez D, Jozkowski KN, Haus K, Ten Thij M, Crawford BL, Montenegro MS, Lo WJ, Turner RC, Bollen J. Assessing rigid modes of thinking in self-declared abortion ideology: natural language processing insights from an online pilot qualitative study on abortion attitudes. Pilot Feasibility Stud 2022; 8:127. [PMID: 35710466 PMCID: PMC9200936 DOI: 10.1186/s40814-022-01078-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Accepted: 05/26/2022] [Indexed: 11/21/2022] Open
Abstract
Introduction Although much work has been done on US abortion ideology, less is known relative to the psychological processes that distinguish personal abortion beliefs or how those beliefs are communicated to others. As part of a forthcoming probability-based sampling designed study on US abortion climate, we piloted a study with a controlled sample to determine whether psychological indicators guiding abortion beliefs can be meaningfully extracted from qualitative interviews using natural language processing (NLP) substring matching. Of particular interest to this study is the presence of cognitive distortions—markers of rigid thinking—spoken during interviews and how cognitive distortion frequency may be tied to rigid, or firm, abortion beliefs. Methods We ran qualitative interview transcripts against two lexicons. The first lexicon, the cognitive distortion schemata (CDS), was applied to identify cognitive distortion n-grams (a series of words) embedded within the qualitative interviews. The second lexicon, the Linguistic Inquiry Word Count (LIWC), was applied to extract other psychological indicators, including the degrees of (1) analytic thinking, (2) emotional reasoning, (3) authenticity, and (4) clout. Results People with polarized abortion views (i.e., strongly supportive of or opposed to abortion) had the highest observed usage of CDS n-grams, scored highest on authenticity, and lowest on analytic thinking. By contrast, people with moderate or uncertain abortion views (i.e., people holding more complex or nuanced views of abortion) spoke with the least CDS n-grams and scored slightly higher on analytic thinking. Discussion and conclusion Our findings suggest people communicate about abortion differently depending on their personal abortion ideology. Those with strong abortion views may be more likely to communicate with authoritative words and patterns of words indicative of cognitive distortions—or limited complexity in belief systems. Those with moderate views are more likely to speak in conflicting terms and patterns of words that are flexible and open to change—or high complexity in belief systems. These findings suggest it is possible to extract psychological indicators with NLP from qualitative interviews about abortion. Findings from this study will help refine our protocol ahead of full-study launch.
Collapse
Affiliation(s)
- Danny Valdez
- Indiana University School of Public Health, 1025 E 7th Street, Bloomington, IN, 47405, USA
| | - Kristen N Jozkowski
- Indiana University School of Public Health, 1025 E 7th Street, Bloomington, IN, 47405, USA.
| | - Katherine Haus
- Indiana University School of Public Health, 1025 E 7th Street, Bloomington, IN, 47405, USA
| | - Marijn Ten Thij
- Department of Data Science and Knowledge Engineering, Universiteit Maastricht, P.O. Box 616, 6200 MD, Maastricht, Netherlands
| | - Brandon L Crawford
- Indiana University School of Public Health, 1025 E 7th Street, Bloomington, IN, 47405, USA
| | - María S Montenegro
- Indiana University College of Arts and Sciences, 107 S Indiana Ave, Bloomington, IN, 47405, USA
| | - Wen-Juo Lo
- University of Arkansas, 1 University of Arkansas, Fayetteville, AR, 72701, USA
| | - Ronna C Turner
- University of Arkansas, 1 University of Arkansas, Fayetteville, AR, 72701, USA
| | - Johan Bollen
- Luddy School of Informatics, Computing and Engineering, 919 E. 10th St., Bloomington, IN, 47408, USA
| |
Collapse
|
15
|
Bogdanowicz A, Guan C. Dynamic topic modeling of twitter data during the COVID-19 pandemic. PLoS One 2022; 17:e0268669. [PMID: 35622866 PMCID: PMC9140268 DOI: 10.1371/journal.pone.0268669] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 05/04/2022] [Indexed: 11/24/2022] Open
Abstract
In an effort to gauge the global pandemic's impact on social thoughts and behavior, it is important to answer the following questions: (1) What kinds of topics are individuals and groups vocalizing in relation to the pandemic? (2) Are there any noticeable topic trends and if so how do these topics change over time and in response to major events? In this paper, through the advanced Sequential Latent Dirichlet Allocation model, we identified twelve of the most popular topics present in a Twitter dataset collected over the period spanning April 3rd to April 13th, 2020 in the United States and discussed their growth and changes over time. These topics were both robust, in that they covered specific domains, not simply events, and dynamic, in that they were able to change over time in response to rising trends in our dataset. They spanned politics, healthcare, community, and the economy, and experienced macro-level growth over time, while also exhibiting micro-level changes in topic composition. Our approach differentiated itself in both scale and scope to study the emerging topics concerning COVID-19 at a scale that few works have been able to achieve. We contributed to the cross-sectional field of urban studies and big data. Whereas we are optimistic towards the future, we also understand that this is an unprecedented time that will have lasting impacts on individuals and society at large, impacting not only the economy or geo-politics, but human behavior and psychology. Therefore, in more ways than one, this research is just beginning to scratch the surface of what will be a concerted research effort into studying the history and repercussions of COVID-19.
Collapse
Affiliation(s)
| | - ChengHe Guan
- New York University Shanghai, Shanghai, China
- Shanghai Key Laboratory of Urban Design and Urban Science, NYU Shanghai, Shanghai, China
| |
Collapse
|
16
|
Abstract
In our increasingly digital world, aspects of our lives are encoded in the routine interactions we have with technology. Over the past few years, psychologists and technologists have been exploring what possibilities these digital life data might hold for improving mental health and well-being. Here I examine some of the recent advances in this field, particularly in the use of language data; consider the ethical and pragmatic implications of this technology; and examine a few areas where I believe these advances could significantly alter the way in which mental health and well-being are approached. This technology holds special promise for providing information about a patient’s life in between clinical encounters, in the clinical whitespace.
Collapse
|
17
|
Lanyi K, Green R, Craig D, Marshall C. COVID-19 Vaccine Hesitancy: Analysing Twitter to Identify Barriers to Vaccination in a Low Uptake Region of the UK. Front Digit Health 2022; 3:804855. [PMID: 35141699 PMCID: PMC8818664 DOI: 10.3389/fdgth.2021.804855] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 12/30/2021] [Indexed: 11/13/2022] Open
Abstract
To facilitate effective targeted COVID-19 vaccination strategies, it is important to understand reasons for vaccine hesitancy where uptake is low. Artificial intelligence (AI) techniques offer an opportunity for real-time analysis of public attitudes, sentiments, and key discussion topics from sources of soft-intelligence, including social media data. In this work, we explore the value of soft-intelligence, leveraged using AI, as an evidence source to support public health research. As a case study, we deployed a natural language processing (NLP) platform to rapidly identify and analyse key barriers to vaccine uptake from a collection of geo-located tweets from London, UK. We developed a search strategy to capture COVID-19 vaccine related tweets, identifying 91,473 tweets between 30 November 2020 and 15 August 2021. The platform's algorithm clustered tweets according to their topic and sentiment, from which we extracted 913 tweets from the top 12 negative sentiment topic clusters. These tweets were extracted for further qualitative analysis. We identified safety concerns; mistrust of government and pharmaceutical companies; and accessibility issues as key barriers limiting vaccine uptake. Our analysis also revealed widespread sharing of vaccine misinformation amongst Twitter users. This study further demonstrates that there is promising utility for using off-the-shelf NLP tools to leverage insights from social media data to support public health research. Future work to examine where this type of work might be integrated as part of a mixed-methods research approach to support local and national decision making is suggested.
Collapse
Affiliation(s)
- Katherine Lanyi
- National Institute for Health Research Innovation Observatory (NIHR) Innovation Observatory, Population Health Sciences Institute, Newcastle University, Newcastle, United Kingdom
| | | | | | | |
Collapse
|
18
|
Boettcher N. Studies of Depression and Anxiety Using Reddit as a Data Source: Scoping Review. JMIR Ment Health 2021; 8:e29487. [PMID: 34842560 PMCID: PMC8663609 DOI: 10.2196/29487] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 07/20/2021] [Accepted: 08/15/2021] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND The study of depression and anxiety using publicly available social media data is a research activity that has grown considerably over the past decade. The discussion platform Reddit has become a popular social media data source in this nascent area of study, in part because of the unique ways in which the platform is facilitative of research. To date, no work has been done to synthesize existing studies on depression and anxiety using Reddit. OBJECTIVE The objective of this review is to understand the scope and nature of research using Reddit as a primary data source for studying depression and anxiety. METHODS A scoping review was conducted using the Arksey and O'Malley framework. MEDLINE, Embase, CINAHL, PsycINFO, PsycARTICLES, Scopus, ScienceDirect, IEEE Xplore, and ACM academic databases were searched. Inclusion criteria were developed using the participants, concept, and context framework outlined by the Joanna Briggs Institute Scoping Review Methodology Group. Eligible studies featured an analytic focus on depression or anxiety and used naturalistic written expressions from Reddit users as a primary data source. RESULTS A total of 54 studies were included in the review. Tables and corresponding analyses delineate the key methodological features, including a comparatively larger focus on depression versus anxiety, an even split of original and premade data sets, a widespread analytic focus on classifying the mental health states of Reddit users, and practical implications that often recommend new methods of professionally delivered monitoring and outreach for Reddit users. CONCLUSIONS Studies of depression and anxiety using Reddit data are currently driven by a prevailing methodology that favors a technical, solution-based orientation. Researchers interested in advancing this research area will benefit from further consideration of conceptual issues surrounding the interpretation of Reddit data with the medical model of mental health. Further efforts are also needed to locate accountability and autonomy within practice implications, suggesting new forms of engagement with Reddit users.
Collapse
Affiliation(s)
- Nick Boettcher
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| |
Collapse
|
19
|
Wang J, Wang X, Wang L, Peng Y. Health Information Needs of Young Chinese People Based on an Online Health Community: Topic and Statistical Analysis. JMIR Med Inform 2021; 9:e30356. [PMID: 34747707 PMCID: PMC8663605 DOI: 10.2196/30356] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Revised: 08/30/2021] [Accepted: 09/25/2021] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND The internet has been widely accessible and well accepted by young people; however, there is a limited understanding of the internet usage patterns and characteristics on issues related to health problems. The contents posted on online health communities (OHCs) are valuable resources to learn about youth's health information needs. OBJECTIVE In this study, we concurrently exploited statistical analysis and topic analysis of online health information needs to explore the distribution, impact factors, and topics of interest relevant to Chinese young people. METHODS We collected 60,478 health-related data sets posted by young people from a well-known Chinese OHC named xywy.com. Descriptive statistical analysis and correlation analysis were applied to find the distribution and influence factors of the information needs of Chinese young people. Furthermore, a general 4-step topic mining strategy was presented for sparse short texts, which included sentence vectorization, dimension reduction, clustering, and keyword generation. RESULTS In the Chinese OHC, Chinese young people had a high demand for information in the areas of gynecology and obstetrics, internal medicine, dermatology, plastic surgery, and surgery, and they focused on topics such as treatment, symptoms, causes, pathology, and diet. Females accounted for 69.67% (42,136/60,478) and young adults accounted for 87.44% (52,882/60,478) of all data. Gender, age, and disease type all had a significant effect on young people's information needs and topic preferences (P<.001). CONCLUSIONS We conducted comprehensive analyses to discover the online health information needs of Chinese young people. The research findings are of great practical value to carry out health education and health knowledge dissemination inside and outside of schools according to the interests of youth, enable the innovation of information services in OHCs, and improve the health literacy of young people.
Collapse
Affiliation(s)
- Jie Wang
- School of Management, Capital Normal University, Beijing, China.,State key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
| | - Xin Wang
- Department of Electrical and Computer Engineering, The State University of New York at Stony Brook, Stony Brook, NY, United States
| | - Lei Wang
- School of Management, Capital Normal University, Beijing, China
| | - Yan Peng
- School of Management, Capital Normal University, Beijing, China
| |
Collapse
|
20
|
Hu M, Benson R, Chen AT, Zhu SH, Conway M. Determining the prevalence of cannabis, tobacco, and vaping device mentions in online communities using natural language processing. Drug Alcohol Depend 2021; 228:109016. [PMID: 34560332 PMCID: PMC8801036 DOI: 10.1016/j.drugalcdep.2021.109016] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 07/17/2021] [Accepted: 07/23/2021] [Indexed: 01/10/2023]
Abstract
INTRODUCTION The relationship between cannabis, tobacco, and vaping devices is both rapidly changing and poorly understood, with consumers rapidly shifting between use of all three product types. Given this dynamic and evolving landscape, there is an urgent need to monitor and better understand co-use, dual-use, and transition patterns between these products. This study describes work that utilizes social media - in this case, Reddit - in conjunction with automated Natural Language Processing (NLP) methods to better understand cannabis, tobacco, and vaping device product usage patterns. METHODS We collected Reddit data from the period 2013-2018, sourced from eight popular, high-volume Reddit communities (subreddits) related to the three product categories. We then manually annotated (coded) a set of 2640 Reddit posts and trained a machine learning-based NLP algorithm to automatically identify and disambiguate between cannabis or tobacco mentions (both smoking and vaping) in Reddit posts. This classifier was then applied to all data derived from the eight subreddits, 767,788 posts in total. RESULTS The NLP algorithm achieved an overall moderate performance (overall F-score of 0.77). When applied to our large corpus of Reddit posts, we discovered that over 10% of posts in the smoking cessation subreddit r/stopsmoking were classified as referring to vaping nicotine, and that only 2% of posts from the subreddits r/electronic_cigarette and r/vaping were classified as referring to smoking (tobacco) cessation. CONCLUSIONS This study presents the results of applying an NLP algorithm designed to identify and distinguish between cannabis and tobacco mentions (both smoking and vaping) in Reddit posts, hence contributing to our currently limited understanding of co-use, dual-use, and transition patterns between these products.
Collapse
Affiliation(s)
- Mengke Hu
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, United States.
| | - Ryzen Benson
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, United States
| | - Annie T Chen
- Department of Biomedical Informatics & Medical Education, University of Washington, Seattle, WA, United States
| | - Shu-Hong Zhu
- Herbert Wertheim School of Public Health, University of California San Diego, La Jolla, CA, United States
| | - Mike Conway
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, United States
| |
Collapse
|
21
|
Ainley E, Witwicki C, Tallett A, Graham C. Using Twitter Comments to Understand People's Experiences of UK Health Care During the COVID-19 Pandemic: Thematic and Sentiment Analysis. J Med Internet Res 2021; 23:e31101. [PMID: 34469327 PMCID: PMC8547412 DOI: 10.2196/31101] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 08/12/2021] [Accepted: 08/30/2021] [Indexed: 12/26/2022] Open
Abstract
Background The COVID-19 pandemic has led to changes in health service utilization patterns and a rapid rise in care being delivered remotely. However, there has been little published research examining patients’ experiences of accessing remote consultations since COVID-19. Such research is important as remote methods for delivering some care may be maintained in the future. Objective The aim of this study was to use content from Twitter to understand discourse around health and care delivery in the United Kingdom as a result of COVID-19, focusing on Twitter users’ views on and attitudes toward care being delivered remotely. Methods Tweets posted from the United Kingdom between January 2018 and October 2020 were extracted using the Twitter application programming interface. A total of 1408 tweets across three search terms were extracted into Excel; 161 tweets were removed following deduplication and 610 were identified as irrelevant to the research question. The remaining relevant tweets (N=637) were coded into categories using NVivo software, and assigned a positive, neutral, or negative sentiment. To examine views of remote care over time, the coded data were imported back into Excel so that each tweet was associated with both a theme and sentiment. Results The volume of tweets on remote care delivery increased markedly following the COVID-19 outbreak. Five main themes were identified in the tweets: access to remote care (n=267), quality of remote care (n=130), anticipation of remote care (n=39), online booking and asynchronous communication (n=85), and publicizing changes to services or care delivery (n=160). Mixed public attitudes and experiences to the changes in service delivery were found. The proportion of positive tweets regarding access to, and quality of, remote care was higher in the immediate period following the COVID-19 outbreak (March-May 2020) when compared to the time before COVID-19 onset and the time when restrictions from the first lockdown eased (June-October 2020). Conclusions Using Twitter data to address our research questions proved beneficial for providing rapid access to Twitter users’ attitudes to remote care delivery at a time when it would have been difficult to conduct primary research due to COVID-19. This approach allowed us to examine the discourse on remote care over a relatively long period and to explore shifting attitudes of Twitter users at a time of rapid changes in care delivery. The mixed attitudes toward remote care highlight the importance for patients to have a choice over the type of consultation that best suits their needs, and to ensure that the increased use of technology for delivering care does not become a barrier for some. The finding that overall sentiment about remote care was more positive in the early stages of the pandemic but has since declined emphasizes the need for a continued examination of people’s preference, particularly if remote appointments are likely to remain central to health care delivery.
Collapse
Affiliation(s)
| | | | - Amy Tallett
- Picker Institute Europe, Oxford, United Kingdom
| | | |
Collapse
|
22
|
Khademi Habibabadi S, Delir Haghighi P, Burstein F, Buttery J. Vaccine adverse event mentions in social media: Mining the language of Twitter conversations (Preprint). JMIR Med Inform 2021; 10:e34305. [PMID: 35708760 PMCID: PMC9247809 DOI: 10.2196/34305] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2021] [Revised: 02/22/2022] [Accepted: 04/11/2022] [Indexed: 11/13/2022] Open
Abstract
Background Traditional monitoring for adverse events following immunization (AEFI) relies on various established reporting systems, where there is inevitable lag between an AEFI occurring and its potential reporting and subsequent processing of reports. AEFI safety signal detection strives to detect AEFI as early as possible, ideally close to real time. Monitoring social media data holds promise as a resource for this. Objective The primary aim of this study is to investigate the utility of monitoring social media for gaining early insights into vaccine safety issues, by extracting vaccine adverse event mentions (VAEMs) from Twitter, using natural language processing techniques. The secondary aims are to document the natural language processing techniques used and identify the most effective of them for identifying tweets that contain VAEM, with a view to define an approach that might be applicable to other similar social media surveillance tasks. Methods A VAEM-Mine method was developed that combines topic modeling with classification techniques to extract maximal VAEM posts from a vaccine-related Twitter stream, with high degree of confidence. The approach does not require a targeted search for specific vaccine reaction–indicative words, but instead, identifies VAEM posts according to their language structure. Results The VAEM-Mine method isolated 8992 VAEMs from 811,010 vaccine-related Twitter posts and achieved an F1 score of 0.91 in the classification phase. Conclusions Social media can assist with the detection of vaccine safety signals as a valuable complementary source for monitoring mentions of vaccine adverse events. A social media–based VAEM data stream can be assessed for changes to detect possible emerging vaccine safety signals, helping to address the well-recognized limitations of passive reporting systems, including lack of timeliness and underreporting.
Collapse
Affiliation(s)
- Sedigheh Khademi Habibabadi
- Centre for Health Analytics, Melbourne Children's Campus, Melbourne, Australia
- Department of General Practice, University of Melbourne, Melbourne, Australia
| | - Pari Delir Haghighi
- Department of Human-Centred Computing, Faculty of Information Technology, Monash University, Melbourne, Australia
| | - Frada Burstein
- Department of Human-Centred Computing, Faculty of Information Technology, Monash University, Melbourne, Australia
| | - Jim Buttery
- Centre for Health Analytics, Melbourne Children's Campus, Melbourne, Australia
- Department of Paediatrics, University of Melbourne, Melbourne, Australia
| |
Collapse
|
23
|
Tagde P, Tagde S, Bhattacharya T, Tagde P, Chopra H, Akter R, Kaushik D, Rahman MH. Blockchain and artificial intelligence technology in e-Health. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2021; 28:52810-52831. [PMID: 34476701 PMCID: PMC8412875 DOI: 10.1007/s11356-021-16223-0] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 08/24/2021] [Indexed: 05/21/2023]
Abstract
Blockchain and artificial intelligence technologies are novel innovations in healthcare sector. Data on healthcare indices are collected from data published on Web of Sciences and other Google survey from various governing bodies. In this review, we focused on various aspects of blockchain and artificial intelligence and also discussed about integrating both technologies for making a significant difference in healthcare by promoting the implementation of a generalizable analytical technology that can be integrated into a more comprehensive risk management approach. This article has shown the various possibilities of creating reliable artificial intelligence models in e-Health using blockchain, which is an open network for the sharing and authorization of information. Healthcare professionals will have access to the blockchain to display the medical records of the patient, and AI uses a variety of proposed algorithms and decision-making capability, as well as large quantities of data. Thus, by integrating the latest advances of these technologies, the medical system will have improved service efficiency, reduced costs, and democratized healthcare. Blockchain enables the storage of cryptographic records, which AI needs.
Collapse
Affiliation(s)
- Priti Tagde
- Bhabha Pharmacy Research Institute, Bhabha University Bhopal, Bhopal M.P, India.
- PRISAL Foundation (Pharmaceutical Royal International Society), New delhi, India.
| | - Sandeep Tagde
- PRISAL Foundation (Pharmaceutical Royal International Society), New delhi, India
| | - Tanima Bhattacharya
- School of Chemistry & Chemical Engineering, Hubei University, Wuhan, China
- Department of Science & Engineering, Novel Global Community Education Foundation, Hebersham, Australia
| | - Pooja Tagde
- Practice of Medicine Department, Govt. Homeopathy College, Bhopal, M.P, India
| | - Hitesh Chopra
- Chitkara College of Pharmacy, Rajpura, Punjab, 140401, India
| | - Rokeya Akter
- Department of Pharmacy, Jagannath University, Sadarghat, Dhaka, 1100, Bangladesh
| | - Deepak Kaushik
- Department of Pharmaceutical Sciences, Maharshi Dayanand University, Rohtak, Haryana, 124001, India
| | - Md Habibur Rahman
- Department of Pharmacy, Southeast University, Banani, Dhaka, 1213, Bangladesh.
| |
Collapse
|
24
|
Dreyfus B, Chaudhary A, Bhardwaj P, Shree VK. Application of natural language processing techniques to identify off-label drug usage from various online health communities. J Am Med Inform Assoc 2021; 28:2147-2154. [PMID: 34333625 PMCID: PMC8449611 DOI: 10.1093/jamia/ocab124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Revised: 05/17/2021] [Accepted: 06/04/2021] [Indexed: 11/29/2022] Open
Abstract
Objective Outcomes mentioned on online health communities (OHCs) by patients can serve as a source of evidence for off-label drug usage evaluation, but identifying these outcomes manually is tedious work. We have built a natural language processing model to identify off-label usage of drugs mentioned in these patient posts. Materials and Methods Single patient posts from 4 major OHCs were considered for this study. A text classification model was built to classify the posts as either relevant or not relevant based on patient experience. The relevant posts were passed through a spelling correction tool, CSpell, and then medications and indications from these posts were identified using cTAKES (clinical Text Analysis and Knowledge Extraction System), a named entity recognition tool. Drug and indication pairs were identified using a dependency parser. Finally, if the paired indication was not mentioned on the label of the drug approved by U.S. Food and Drug Administration, it was tagged as off-label use of that drug. Results Using this algorithm, we identified 289 off-label indications, achieving a recall of 76%. Conclusions The method designed in this study identifies and extracts the semantic relationship between drugs and indications from demotic posts in OHCs. The results demonstrate the feasibility of using natural language processing techniques in identifying off-label drug usage across online health forums for a variety of drugs. Understanding patients’ off-label use of drugs may be able to help manufacturers innovate to better address patients’ needs and assist doctors’ prescribing decisions.
Collapse
Affiliation(s)
- Brian Dreyfus
- Epidemiology, Bristol Myers Squibb, Princeton, New Jersey, USA
- Corresponding Author: Brian Dreyfus, MPH, Bristol Myers Squibb, Route 206 & Province Line Road, Princeton, NJ, USA;
| | | | | | | |
Collapse
|
25
|
Liu Y, Whitfield C, Zhang T, Hauser A, Reynolds T, Anwar M. Monitoring COVID-19 pandemic through the lens of social media using natural language processing and machine learning. Health Inf Sci Syst 2021; 9:25. [PMID: 34188896 PMCID: PMC8226148 DOI: 10.1007/s13755-021-00158-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Accepted: 06/15/2021] [Indexed: 12/04/2022] Open
Abstract
Purpose It has been over a year since the first known case of coronavirus disease (COVID-19) emerged, yet the pandemic is far from over. To date, the coronavirus pandemic has infected over eighty million people and has killed more than 1.78 million worldwide. This study aims to explore “how useful is Reddit social media platform to surveil COVID-19 pandemic?” and “how do people’s concerns/behaviors change over the course of COVID-19 pandemic in North Carolina?”. The purpose of this study was to compare people’s thoughts, behavior changes, discussion topics, and the number of confirmed cases and deaths by applying natural language processing (NLP) to COVID-19 related data. Methods In this study, we collected COVID-19 related data from 18 subreddits of North Carolina from March to August 2020. Next, we applied methods from natural language processing and machine learning to analyze collected Reddit posts using feature engineering, topic modeling, custom named-entity recognition (NER), and BERT-based (Bidirectional Encoder Representations from Transformers) sentence clustering. Using these methods, we were able to glean people’s responses and their concerns about COVID-19 pandemic in North Carolina. Results We observed a positive change in attitudes towards masks for residents in North Carolina. The high-frequency words in all subreddit corpora for each of the COVID-19 mitigation strategy categories are: Distancing (DIST)—“social distance/distancing”, “lockdown”, and “work from home”; Disinfection (DIT)—“(hand) sanitizer/soap”, “hygiene”, and "wipe"; Personal Protective Equipment (PPE)—“mask/facemask(s)/face shield”, “n95(s)/kn95”, and “cloth/gown”; Symptoms (SYM)—“death”, “flu/influenza”, and “cough/coughed”; Testing (TEST)—“cases”, “(antibody) test”, and “test results (positive/negative)”. Conclusion The findings in our study show that the use of Reddit data to monitor COVID-19 pandemic in North Carolina (NC) was effective. The study shows the utility of NLP methods (e.g. cosine similarity, Latent Dirichlet Allocation (LDA) topic modeling, custom NER and BERT-based sentence clustering) in discovering the change of the public's concerns/behaviors over the course of COVID-19 pandemic in NC using Reddit data. Moreover, the results show that social media data can be utilized to surveil the epidemic situation in a specific community.
Collapse
Affiliation(s)
- Yang Liu
- Human-Centered AI (HC-AI) Lab, North Carolina A&T State University, Greensboro, NC 27411 USA
| | - Christopher Whitfield
- Human-Centered AI (HC-AI) Lab, North Carolina A&T State University, Greensboro, NC 27411 USA
| | - Tianyang Zhang
- Human-Centered AI (HC-AI) Lab, North Carolina A&T State University, Greensboro, NC 27411 USA.,University of Massachusetts Amherst, Amherst, MA 01003 USA
| | - Amanda Hauser
- North Carolina State University, Raleigh, NC 27695 USA
| | | | - Mohd Anwar
- Human-Centered AI (HC-AI) Lab, North Carolina A&T State University, Greensboro, NC 27411 USA
| |
Collapse
|
26
|
Bour C, Ahne A, Schmitz S, Perchoux C, Dessenne C, Fagherazzi G. The Use of Social Media for Health Research Purposes: Scoping Review. J Med Internet Res 2021; 23:e25736. [PMID: 34042593 PMCID: PMC8193478 DOI: 10.2196/25736] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 01/15/2021] [Accepted: 03/18/2021] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND As social media are increasingly used worldwide, more and more scientists are relying on them for their health-related projects. However, social media features, methodologies, and ethical issues are unclear so far because, to our knowledge, there has been no overview of this relatively young field of research. OBJECTIVE This scoping review aimed to provide an evidence map of the different uses of social media for health research purposes, their fields of application, and their analysis methods. METHODS We followed the scoping review methodologies developed by Arksey and O'Malley and the Joanna Briggs Institute. After developing search strategies based on keywords (eg, social media, health research), comprehensive searches were conducted in the PubMed/MEDLINE and Web of Science databases. We limited the search strategies to documents written in English and published between January 1, 2005, and April 9, 2020. After removing duplicates, articles were screened at the title and abstract level and at the full text level by two independent reviewers. One reviewer extracted data, which were descriptively analyzed to map the available evidence. RESULTS After screening 1237 titles and abstracts and 407 full texts, 268 unique papers were included, dating from 2009 to 2020 with an average annual growth rate of 32.71% for the 2009-2019 period. Studies mainly came from the Americas (173/268, 64.6%, including 151 from the United States). Articles used machine learning or data mining techniques (60/268) to analyze the data, discussed opportunities and limitations of the use of social media for research (59/268), assessed the feasibility of recruitment strategies (45/268), or discussed ethical issues (16/268). Communicable (eg, influenza, 40/268) and then chronic (eg, cancer, 24/268) diseases were the two main areas of interest. CONCLUSIONS Since their early days, social media have been recognized as resources with high potential for health research purposes, yet the field is still suffering from strong heterogeneity in the methodologies used, which prevents the research from being compared and generalized. For the field to be fully recognized as a valid, complementary approach to more traditional health research study designs, there is now a need for more guidance by types of applications of social media for health research, both from a methodological and an ethical perspective. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) RR2-10.1136/bmjopen-2020-040671.
Collapse
Affiliation(s)
- Charline Bour
- Department of Population Health, Luxembourg Institute of Health, Strassen, Luxembourg
| | - Adrian Ahne
- Inserm U1018, Center for Research in Epidemiology and Population Health (CESP), Paris Saclay University, Villejuif, France.,Epiconcept, Paris, France
| | - Susanne Schmitz
- Competence Centre for Methodology and Statistics, Luxembourg Institute of Health, Strassen, Luxembourg
| | - Camille Perchoux
- Luxembourg Institute of Socio-Economic Research, Esch/Alzette, Luxembourg
| | - Coralie Dessenne
- Department of Population Health, Luxembourg Institute of Health, Strassen, Luxembourg
| | - Guy Fagherazzi
- Department of Population Health, Luxembourg Institute of Health, Strassen, Luxembourg
| |
Collapse
|
27
|
Scaccia JP, Scott VC. 5335 days of Implementation Science: using natural language processing to examine publication trends and topics. Implement Sci 2021; 16:47. [PMID: 33902657 PMCID: PMC8077727 DOI: 10.1186/s13012-021-01120-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Accepted: 04/14/2021] [Indexed: 02/07/2023] Open
Abstract
INTRODUCTION Moving evidence-based practices into the hands of practitioners requires the synthesis and translation of research literature. However, the growing pace of scientific publications across disciplines makes it increasingly difficult to stay abreast of research literature. Natural language processing (NLP) methods are emerging as a valuable strategy for conducting content analyses of academic literature. We sought to apply NLP to identify publication trends in the journal Implementation Science, including key topic clusters and the distribution of topics over time. A parallel study objective was to demonstrate how NLP can be used in research synthesis. METHODS We examined 1711 Implementation Science abstracts published from February 22, 2006, to October 1, 2020. We retrieved the study data using PubMed's Application Programming Interface (API) to assemble a database. Following standard preprocessing steps, we use topic modeling with Latent Dirichlet allocation (LDA) to cluster the abstracts following a minimization algorithm. RESULTS We examined 30 topics and computed topic model statistics of quality. Analyses revealed that published articles largely reflect (i) characteristics of research, or (ii) domains of practice. Emergent topic clusters encompassed key terms both salient and common to implementation science. HIV and stroke represent the most commonly published clinical areas. Systematic reviews have grown in topic prominence and coherence, whereas articles pertaining to knowledge translation (KT) have dropped in prominence since 2013. Articles on HIV and implementation effectiveness have increased in topic exclusivity over time. DISCUSSION We demonstrated how NLP can be used as a synthesis and translation method to identify trends and topics across a large number of (over 1700) articles. With applicability to a variety of research domains, NLP is a promising approach to accelerate the dissemination and uptake of research literature. For future research in implementation science, we encourage the inclusion of more equity-focused studies to expand the impact of implementation science on disadvantaged communities.
Collapse
Affiliation(s)
| | - Victoria C Scott
- Department of Psychological Science, University of North Carolina at Charlotte, 9201 University City Boulevard, Charlotte, NC, 28223, USA
| |
Collapse
|
28
|
Oyebode O, Ndulue C, Adib A, Mulchandani D, Suruliraj B, Orji FA, Chambers CT, Meier S, Orji R. Health, Psychosocial, and Social Issues Emanating From the COVID-19 Pandemic Based on Social Media Comments: Text Mining and Thematic Analysis Approach. JMIR Med Inform 2021; 9:e22734. [PMID: 33684052 PMCID: PMC8025920 DOI: 10.2196/22734] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 10/22/2020] [Accepted: 02/25/2021] [Indexed: 12/14/2022] Open
Abstract
Background The COVID-19 pandemic has caused a global health crisis that affects many aspects of human lives. In the absence of vaccines and antivirals, several behavioral change and policy initiatives such as physical distancing have been implemented to control the spread of COVID-19. Social media data can reveal public perceptions toward how governments and health agencies worldwide are handling the pandemic, and the impact of the disease on people regardless of their geographic locations in line with various factors that hinder or facilitate the efforts to control the spread of the pandemic globally. Objective This paper aims to investigate the impact of the COVID-19 pandemic on people worldwide using social media data. Methods We applied natural language processing (NLP) and thematic analysis to understand public opinions, experiences, and issues with respect to the COVID-19 pandemic using social media data. First, we collected over 47 million COVID-19–related comments from Twitter, Facebook, YouTube, and three online discussion forums. Second, we performed data preprocessing, which involved applying NLP techniques to clean and prepare the data for automated key phrase extraction. Third, we applied the NLP approach to extract meaningful key phrases from over 1 million randomly selected comments and computed sentiment score for each key phrase and assigned sentiment polarity (ie, positive, negative, or neutral) based on the score using a lexicon-based technique. Fourth, we grouped related negative and positive key phrases into categories or broad themes. Results A total of 34 negative themes emerged, out of which 15 were health-related issues, psychosocial issues, and social issues related to the COVID-19 pandemic from the public perspective. Some of the health-related issues were increased mortality, health concerns, struggling health systems, and fitness issues; while some of the psychosocial issues were frustrations due to life disruptions, panic shopping, and expression of fear. Social issues were harassment, domestic violence, and wrong societal attitude. In addition, 20 positive themes emerged from our results. Some of the positive themes were public awareness, encouragement, gratitude, cleaner environment, online learning, charity, spiritual support, and innovative research. Conclusions We uncovered various negative and positive themes representing public perceptions toward the COVID-19 pandemic and recommended interventions that can help address the health, psychosocial, and social issues based on the positive themes and other research evidence. These interventions will help governments, health professionals and agencies, institutions, and individuals in their efforts to curb the spread of COVID-19 and minimize its impact, and in reacting to any future pandemics.
Collapse
Affiliation(s)
- Oladapo Oyebode
- Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada
| | - Chinenye Ndulue
- Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada
| | - Ashfaq Adib
- Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada
| | | | | | - Fidelia Anulika Orji
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
| | - Christine T Chambers
- Department of Psychology and Neuroscience, Dalhousie University, Halifax, NS, Canada.,Department of Pediatrics, Faculty of Medicine, Dalhousie University, Halifax, NS, Canada
| | - Sandra Meier
- Department of Psychiatry, Faculty of Medicine, Dalhousie University, Halifax, NS, Canada
| | - Rita Orji
- Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada
| |
Collapse
|
29
|
Fowler JC, Madan A, Bruce CR, Frueh BC, Kash B, Jones SL, Sasangohar F. Improving Psychiatric Care Through Integrated Digital Technologies. J Psychiatr Pract 2021; 27:92-100. [PMID: 33656814 DOI: 10.1097/pra.0000000000000535] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
This manuscript provides an overview of our efforts to implement an integrated electronic monitoring and feedback platform to increase patient engagement, improve care delivery and outcome of treatment, and alert care teams to deterioration in functioning. Patients First utilizes CareSense, a digital care navigation and data collection system, to integrate traditional patient-reported outcomes monitoring with novel biological monitoring between visits to provide patients and caregivers with real-time feedback on changes in symptoms such as stress, anxiety, and depression. The next stage of project development incorporates digital therapeutics (computerized therapeutic interventions) for patients, and video resources for primary care physicians and nurse practitioners who serve as the de facto front line for psychiatric care. Integration of the patient-reported outcomes monitoring with continuous biological monitoring, and digital supports is a novel application of existing technologies. Video resources pushed to care providers whose patients trigger a symptom severity alert is, to our knowledge, an industry first.
Collapse
|
30
|
Ford E, Shepherd S, Jones K, Hassan L. Toward an Ethical Framework for the Text Mining of Social Media for Health Research: A Systematic Review. Front Digit Health 2021; 2:592237. [PMID: 34713062 PMCID: PMC8521805 DOI: 10.3389/fdgth.2020.592237] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Accepted: 12/18/2020] [Indexed: 11/13/2022] Open
Abstract
Background: Text-mining techniques are advancing all the time and vast corpora of social media text can be analyzed for users' views and experiences related to their health. There is great promise for new insights into health issues such as drug side effects and spread of disease, as well as patient experiences of health conditions and health care. However, this emerging field lacks ethical consensus and guidance. We aimed to bring together a comprehensive body of opinion, views, and recommendations in this area so that academic researchers new to the field can understand relevant ethical issues. Methods: After registration of a protocol in PROSPERO, three parallel systematic searches were conducted, to identify academic articles comprising commentaries, opinion, and recommendations on ethical practice in social media text mining for health research and gray literature guidelines and recommendations. These were integrated with social media users' views from qualitative studies. Papers and reports that met the inclusion criteria were analyzed thematically to identify key themes, and an overarching set of themes was deduced. Results: A total of 47 reports and articles were reviewed, and eight themes were identified. Commentators suggested that publicly posted social media data could be used without consent and formal research ethics approval, provided that the anonymity of users is ensured, although we note that privacy settings are difficult for users to navigate on some sites. Even without the need for formal approvals, we note ethical issues: to actively identify and minimize possible harms, to conduct research for public benefit rather than private gain, to ensure transparency and quality of data access and analysis methods, and to abide by the law and terms and conditions of social media sites. Conclusion: Although social media text mining can often legally and reasonably proceed without formal ethics approvals, we recommend improving ethical standards in health-related research by increasing transparency of the purpose of research, data access, and analysis methods; consultation with social media users and target groups to identify and mitigate against potential harms that could arise; and ensuring the anonymity of social media users.
Collapse
Affiliation(s)
- Elizabeth Ford
- Department of Primary Care and Public Health, Brighton and Sussex Medical School, Brighton, United Kingdom
| | - Scarlett Shepherd
- Department of Primary Care and Public Health, Brighton and Sussex Medical School, Brighton, United Kingdom
| | - Kerina Jones
- Population Data Science, Medical School, Swansea University, Swansea, United Kingdom
| | - Lamiece Hassan
- Division of Informatics, Imaging & Data Sciences, University of Manchester, Manchester, United Kingdom
| |
Collapse
|
31
|
Straw I, Callison-Burch C. Artificial Intelligence in mental health and the biases of language based models. PLoS One 2020; 15:e0240376. [PMID: 33332380 PMCID: PMC7745984 DOI: 10.1371/journal.pone.0240376] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2020] [Accepted: 09/07/2020] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND The rapid integration of Artificial Intelligence (AI) into the healthcare field has occurred with little communication between computer scientists and doctors. The impact of AI on health outcomes and inequalities calls for health professionals and data scientists to make a collaborative effort to ensure historic health disparities are not encoded into the future. We present a study that evaluates bias in existing Natural Language Processing (NLP) models used in psychiatry and discuss how these biases may widen health inequalities. Our approach systematically evaluates each stage of model development to explore how biases arise from a clinical, data science and linguistic perspective. DESIGN/METHODS A literature review of the uses of NLP in mental health was carried out across multiple disciplinary databases with defined Mesh terms and keywords. Our primary analysis evaluated biases within 'GloVe' and 'Word2Vec' word embeddings. Euclidean distances were measured to assess relationships between psychiatric terms and demographic labels, and vector similarity functions were used to solve analogy questions relating to mental health. RESULTS Our primary analysis of mental health terminology in GloVe and Word2Vec embeddings demonstrated significant biases with respect to religion, race, gender, nationality, sexuality and age. Our literature review returned 52 papers, of which none addressed all the areas of possible bias that we identify in model development. In addition, only one article existed on more than one research database, demonstrating the isolation of research within disciplinary silos and inhibiting cross-disciplinary collaboration or communication. CONCLUSION Our findings are relevant to professionals who wish to minimize the health inequalities that may arise as a result of AI and data-driven algorithms. We offer primary research identifying biases within these technologies and provide recommendations for avoiding these harms in the future.
Collapse
Affiliation(s)
- Isabel Straw
- Department of Public Health, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Chris Callison-Burch
- Computer and Information Science Department, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
32
|
Aggarwal N, Ahmed M, Basu S, Curtin JJ, Evans BJ, Matheny ME, Nundy S, Sendak MP, Shachar C, Shah RU, Thadaney-Israni S. Advancing Artificial Intelligence in Health Settings Outside the Hospital and Clinic. NAM Perspect 2020; 2020:202011f. [PMID: 35291747 PMCID: PMC8916812 DOI: 10.31478/202011f] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/29/2023]
Affiliation(s)
| | | | | | | | | | - Michael E Matheny
- Vanderbilt University Medical Center and Tennessee Valley Healthcare System VA
| | | | | | | | | | | |
Collapse
|
33
|
Nguyen H, Nguyen T, Nguyen DT. A graph-based approach for population health analysis using Geo-tagged tweets. MULTIMEDIA TOOLS AND APPLICATIONS 2020; 80:7187-7204. [PMID: 33132740 PMCID: PMC7585996 DOI: 10.1007/s11042-020-10034-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 08/13/2020] [Accepted: 10/06/2020] [Indexed: 06/11/2023]
Abstract
We propose in this work a graph-based approach for automatic public health analysis using social media. In our approach, graphs are created to model the interactions between features and between tweets in social media. We investigated different graph properties and methods in constructing graph-based representations for population health analysis. The proposed approach is applied in two case studies: (1) estimating health indices, and (2) classifying health situation of counties in the US. We evaluate our approach on a dataset including more than one billion tweets collected in three years 2014, 2015, and 2016, and the health surveys from the Behavioral Risk Factor Surveillance System. We conducted realistic and large-scale experiments on various textual features and graph-based representations. Experimental results verified the robustness of the proposed approach and its superiority over existing ones in both case studies, confirming the potential of graph-based approach for modeling interactions in social networks for population health analysis.
Collapse
Affiliation(s)
- Hung Nguyen
- Faculty of IT, Nha Trang University, Nha Trang, Vietnam
| | - Thin Nguyen
- Applied Artificial Intelligence Institute, Deakin University, Geelong, VIC 3220 Australia
| | - Duc Thanh Nguyen
- School of Information Technology, Deakin University, Geelong, VIC 3220 Australia
| |
Collapse
|
34
|
Abstract
OBJECTIVES We survey recent developments in medical Information Extraction (IE) as reported in the literature from the past three years. Our focus is on the fundamental methodological paradigm shift from standard Machine Learning (ML) techniques to Deep Neural Networks (DNNs). We describe applications of this new paradigm concentrating on two basic IE tasks, named entity recognition and relation extraction, for two selected semantic classes-diseases and drugs (or medications)-and relations between them. METHODS For the time period from 2017 to early 2020, we searched for relevant publications from three major scientific communities: medicine and medical informatics, natural language processing, as well as neural networks and artificial intelligence. RESULTS In the past decade, the field of Natural Language Processing (NLP) has undergone a profound methodological shift from symbolic to distributed representations based on the paradigm of Deep Learning (DL). Meanwhile, this trend is, although with some delay, also reflected in the medical NLP community. In the reporting period, overwhelming experimental evidence has been gathered, as illustrated in this survey for medical IE, that DL-based approaches outperform non-DL ones by often large margins. Still, small-sized and access-limited corpora create intrinsic problems for data-greedy DL as do special linguistic phenomena of medical sublanguages that have to be overcome by adaptive learning strategies. CONCLUSIONS The paradigm shift from (feature-engineered) ML to DNNs changes the fundamental methodological rules of the game for medical NLP. This change is by no means restricted to medical IE but should also deeply influence other areas of medical informatics, either NLP- or non-NLP-based.
Collapse
Affiliation(s)
- Udo Hahn
- Jena University Language & Information Engineering (JULIE) Lab, Friedrich-Schiller-Universität Jena, Jena, Germany
| | - Michel Oleynik
- Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Graz, Austria
| |
Collapse
|
35
|
Baclic O, Tunis M, Young K, Doan C, Swerdfeger H, Schonfeld J. Challenges and opportunities for public health made possible by advances in natural language processing. CANADA COMMUNICABLE DISEASE REPORT = RELEVE DES MALADIES TRANSMISSIBLES AU CANADA 2020; 46:161-168. [PMID: 32673380 PMCID: PMC7343054 DOI: 10.14745/ccdr.v46i06a02] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Natural language processing (NLP) is a subfield of artificial intelligence devoted to understanding and generation of language. The recent advances in NLP technologies are enabling rapid analysis of vast amounts of text, thereby creating opportunities for health research and evidence-informed decision making. The analysis and data extraction from scientific literature, technical reports, health records, social media, surveys, registries and other documents can support core public health functions including the enhancement of existing surveillance systems (e.g. through faster identification of diseases and risk factors/at-risk populations), disease prevention strategies (e.g. through more efficient evaluation of the safety and effectiveness of interventions) and health promotion efforts (e.g. by providing the ability to obtain expert-level answers to any health related question). NLP is emerging as an important tool that can assist public health authorities in decreasing the burden of health inequality/inequity in the population. The purpose of this paper is to provide some notable examples of both the potential applications and challenges of NLP use in public health.
Collapse
Affiliation(s)
- Oliver Baclic
- Centre for Immunization and Respiratory Infectious Disease, Public Health Agency of Canada, Ottawa, ON
| | - Matthew Tunis
- Centre for Immunization and Respiratory Infectious Disease, Public Health Agency of Canada, Ottawa, ON
| | - Kelsey Young
- Centre for Immunization and Respiratory Infectious Disease, Public Health Agency of Canada, Ottawa, ON
| | - Coraline Doan
- Data, Partnerships and Innovation Hub, Public Health Agency of Canada, Ottawa, ON
| | - Howard Swerdfeger
- Data, Partnerships and Innovation Hub, Public Health Agency of Canada, Ottawa, ON
| | - Justin Schonfeld
- National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, MB
| |
Collapse
|
36
|
A Recommendation Mechanism for Under-Emphasized Tourist Spots Using Topic Modeling and Sentiment Analysis. SUSTAINABILITY 2019. [DOI: 10.3390/su12010320] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
With rapid advancements in internet applications, the growth rate of recommendation systems for tourists has skyrocketed. This has generated an enormous amount of travel-based data in the form of reviews, blogs, and ratings. However, most recommendation systems only recommend the top-rated places. Along with the top-ranked places, we aim to discover places that are often ignored by tourists owing to lack of promotion or effective advertising, referred to as under-emphasized locations. In this study, we use all relevant data, such as travel blogs, ratings, and reviews, in order to obtain optimal recommendations. We also aim to discover the latent factors that need to be addressed, such as food, cleanliness, and opening hours, and recommend a tourist place based on user history data. In this study, we propose a cross mapping table approach based on the location’s popularity, ratings, latent topics, and sentiments. An objective function for recommendation optimization is formulated based on these mappings. The baseline algorithms are latent Dirichlet allocation (LDA) and support vector machine (SVM). Our results show that the combined features of LDA, SVM, ratings, and cross mappings are conducive to enhanced performance. The main motivation of this study was to help tourist industries to direct more attention towards designing effective promotional activities for under-emphasized locations.
Collapse
|