1
|
Haupt MR, Chiu M, Chang J, Li Z, Cuomo R, Mackey TK. Detecting nuance in conspiracy discourse: Advancing methods in infodemiology and communication science with machine learning and qualitative content coding. PLoS One 2023; 18:e0295414. [PMID: 38117843 PMCID: PMC10732406 DOI: 10.1371/journal.pone.0295414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 11/21/2023] [Indexed: 12/22/2023] Open
Abstract
The spread of misinformation and conspiracies has been an ongoing issue since the early stages of the internet era, resulting in the emergence of the field of infodemiology (i.e., information epidemiology), which investigates the transmission of health-related information. Due to the high volume of online misinformation in recent years, there is a need to continue advancing methodologies in order to effectively identify narratives and themes. While machine learning models can be used to detect misinformation and conspiracies, these models are limited in their generalizability to other datasets and misinformation phenomenon, and are often unable to detect implicit meanings in text that require contextual knowledge. To rapidly detect evolving conspiracist narratives within high volume online discourse while identifying nuanced themes requiring the comprehension of subtext, this study describes a hybrid methodology that combines natural language processing (i.e., topic modeling and sentiment analysis) with qualitative content coding approaches to characterize conspiracy discourse related to 5G wireless technology and COVID-19 on Twitter (currently known as 'X'). Discourse that focused on correcting 5G conspiracies was also analyzed for comparison. Sentiment analysis shows that conspiracy-related discourse was more likely to use language that was analytic, combative, past-oriented, referenced social status, and expressed negative emotions. Corrections discourse was more likely to use words reflecting cognitive processes, prosocial relations, health-related consequences, and future-oriented language. Inductive coding characterized conspiracist narratives related to global elites, anti-vax sentiment, medical authorities, religious figures, and false correlations between technology advancements and disease outbreaks. Further, the corrections discourse did not address many of the narratives prevalent in conspiracy conversations. This paper aims to further bridge the gap between computational and qualitative methodologies by demonstrating how both approaches can be used in tandem to emphasize the positive aspects of each methodology while minimizing their respective drawbacks.
Collapse
Affiliation(s)
- Michael Robert Haupt
- Department of Cognitive Science, University of California San Diego, La Jolla, California, United States of America
- Global Health Policy & Data Institute, San Diego, California, United States of America
| | - Michelle Chiu
- Department of Psychology, Temple University, Philadelphia, Pennsylvania, United States of America
| | - Joseline Chang
- Rady School of Management, University of California San Diego, La Jolla, California, United States of America
| | - Zoe Li
- Global Health Policy & Data Institute, San Diego, California, United States of America
- S-3 Research, San Diego, California, United States of America
| | - Raphael Cuomo
- Department of Anesthesiology, University of California, San Diego School of Medicine, San Diego, California, United States of America
| | - Tim K. Mackey
- S-3 Research, San Diego, California, United States of America
- Global Health Program, Department of Anthropology, University of California, San Diego, California, United States of America
| |
Collapse
|
2
|
Long KK, Kwok SWH, Kotz J, Wang G. A deep multi-view imbalanced learning approach for identifying informative COVID-19 tweets from social media. Comput Biol Med 2023; 164:107232. [PMID: 37531859 DOI: 10.1016/j.compbiomed.2023.107232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 06/02/2023] [Accepted: 07/01/2023] [Indexed: 08/04/2023]
Abstract
Social media platforms such as Twitter are home ground for rapid COVID-19-related information sharing over the Internet, thereby becoming the favorable data resource for many downstream applications. Due to the massive pile of COVID-19 tweets generated every day, it is significant that the machine-learning-supported downstream applications can effectively skip the uninformative tweets and only pick up the informative tweets for their further use. However, existing solutions do not specifically consider the negative effect caused by the imbalanced ratios between informative and uninformative tweets in training data. In particular, most of the existing solutions are dominated by single-view learning, neglecting the rich information from different views to facilitate learning. In this study, a novel deep imbalanced multi-view learning approach called D-SVM-2K is proposed to identify the informative COVID-19 tweets from social media. This approach is built upon the well-known multiview learning method SVM-2K to incorporate different views generated from different feature extraction techniques. To battle against the class imbalance problem and enhance its learning ability, D-SVM-2K stacks multiple SVM-2K base classifiers in a stacked deep structure where its base classifiers can learn from either the original training dataset or the shifted critical regions identified using the well-known k-nearest neighboring algorithm. D-SVM-2K also realises a global and local deep ensemble learning on the multiple views' data. Our empirical experiments on a real-world labeled tweet dataset demonstrate the effectiveness of D-SVM-2K in dealing with the real-world multi-view class imbalance issues.
Collapse
Affiliation(s)
- Kok Kiang Long
- School of Information Technology, Murdoch University, Perth, Australia.
| | | | - Jayne Kotz
- Ngangk Yira Institute for Change, Murdoch University, Perth, Australia.
| | - Guanjin Wang
- School of Information Technology, Murdoch University, Perth, Australia.
| |
Collapse
|
3
|
Lane JM, Habib D, Curtis B. Linguistic Methodologies to Surveil the Leading Causes of Mortality: Scoping Review of Twitter for Public Health Data. J Med Internet Res 2023; 25:e39484. [PMID: 37307062 PMCID: PMC10337472 DOI: 10.2196/39484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 01/26/2023] [Accepted: 02/07/2023] [Indexed: 02/10/2023] Open
Abstract
BACKGROUND Twitter has become a dominant source of public health data and a widely used method to investigate and understand public health-related issues internationally. By leveraging big data methodologies to mine Twitter for health-related data at the individual and community levels, scientists can use the data as a rapid and less expensive source for both epidemiological surveillance and studies on human behavior. However, limited reviews have focused on novel applications of language analyses that examine human health and behavior and the surveillance of several emerging diseases, chronic conditions, and risky behaviors. OBJECTIVE The primary focus of this scoping review was to provide a comprehensive overview of relevant studies that have used Twitter as a data source in public health research to analyze users' tweets to identify and understand physical and mental health conditions and remotely monitor the leading causes of mortality related to emerging disease epidemics, chronic diseases, and risk behaviors. METHODS A literature search strategy following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) extended guidelines for scoping reviews was used to search specific keywords on Twitter and public health on 5 databases: Web of Science, PubMed, CINAHL, PsycINFO, and Google Scholar. We reviewed the literature comprising peer-reviewed empirical research articles that included original research published in English-language journals between 2008 and 2021. Key information on Twitter data being leveraged for analyzing user language to study physical and mental health and public health surveillance was extracted. RESULTS A total of 38 articles that focused primarily on Twitter as a data source met the inclusion criteria for review. In total, two themes emerged from the literature: (1) language analysis to identify health threats and physical and mental health understandings about people and societies and (2) public health surveillance related to leading causes of mortality, primarily representing 3 categories (ie, respiratory infections, cardiovascular disease, and COVID-19). The findings suggest that Twitter language data can be mined to detect mental health conditions, disease surveillance, and death rates; identify heart-related content; show how health-related information is shared and discussed; and provide access to users' opinions and feelings. CONCLUSIONS Twitter analysis shows promise in the field of public health communication and surveillance. It may be essential to use Twitter to supplement more conventional public health surveillance approaches. Twitter can potentially fortify researchers' ability to collect data in a timely way and improve the early identification of potential health threats. Twitter can also help identify subtle signals in language for understanding physical and mental health conditions.
Collapse
Affiliation(s)
- Jamil M Lane
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Daniel Habib
- Technology and Translational Research Unit, National Institute on Drug Abuse, National Institutes of Health, Baltimore, MD, United States
| | - Brenda Curtis
- Technology and Translational Research Unit, National Institute on Drug Abuse, National Institutes of Health, Baltimore, MD, United States
| |
Collapse
|
4
|
Honcharov V, Li J, Sierra M, Rivadeneira NA, Olazo K, Nguyen TT, Mackey TK, Sarkar U. Public Figure Vaccination Rhetoric and Vaccine Hesitancy: Retrospective Twitter Analysis. JMIR INFODEMIOLOGY 2023; 3:e40575. [PMID: 37113377 PMCID: PMC10039410 DOI: 10.2196/40575] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 12/19/2022] [Accepted: 12/27/2022] [Indexed: 04/29/2023]
Abstract
Background Social media has emerged as a critical mass communication tool, with both health information and misinformation now spread widely on the web. Prior to the COVID-19 pandemic, some public figures promulgated anti-vaccine attitudes, which spread widely on social media platforms. Although anti-vaccine sentiment has pervaded social media throughout the COVID-19 pandemic, it is unclear to what extent interest in public figures is generating anti-vaccine discourse. Objective We examined Twitter messages that included anti-vaccination hashtags and mentions of public figures to assess the connection between interest in these individuals and the possible spread of anti-vaccine messages. Methods We used a data set of COVID-19-related Twitter posts collected from the public streaming application programming interface from March to October 2020 and filtered it for anti-vaccination hashtags "antivaxxing," "antivaxx," "antivaxxers," "antivax," "anti-vaxxer," "discredit," "undermine," "confidence," and "immune." Next, we applied the Biterm Topic model (BTM) to output topic clusters associated with the entire corpus. Topic clusters were manually screened by examining the top 10 posts most highly correlated in each of the 20 clusters, from which we identified 5 clusters most relevant to public figures and vaccination attitudes. We extracted all messages from these clusters and conducted inductive content analysis to characterize the discourse. Results Our keyword search yielded 118,971 Twitter posts after duplicates were removed, and subsequently, we applied BTM to parse these data into 20 clusters. After removing retweets, we manually screened the top 10 tweets associated with each cluster (200 messages) to identify clusters associated with public figures. Extraction of these clusters yielded 768 posts for inductive analysis. Most messages were either pro-vaccination (n=329, 43%) or neutral about vaccination (n=425, 55%), with only 2% (14/768) including anti-vaccination messages. Three main themes emerged: (1) anti-vaccination accusation, in which the message accused the public figure of holding anti-vaccination beliefs; (2) using "anti-vax" as an epithet; and (3) stating or implying the negative public health impact of anti-vaccination discourse. Conclusions Most discussions surrounding public figures in common hashtags labelled as "anti-vax" did not reflect anti-vaccination beliefs. We observed that public figures with known anti-vaccination beliefs face scorn and ridicule on Twitter. Accusing public figures of anti-vaccination attitudes is a means of insulting and discrediting the public figure rather than discrediting vaccines. The majority of posts in our sample condemned public figures expressing anti-vax beliefs by undermining their influence, insulting them, or expressing concerns over public health ramifications. This points to a complex information ecosystem, where anti-vax sentiment may not reside in common anti-vax-related keywords or hashtags, necessitating further assessment of the influence that public figures have on this discourse.
Collapse
Affiliation(s)
- Vlad Honcharov
- Division of General Internal Medicine at Zuckerberg San Francisco General Hospital and Trauma Center University of California San Francisco San Francisco, CA United States
- Center for Vulnerable Populations University of California San Francisco San Francisco, CA United States
| | - Jiawei Li
- S-3 Research LLC San Diego, CA United States
- Global Health Policy and Data Institute San Diego, CA United States
| | - Maribel Sierra
- Division of General Internal Medicine at Zuckerberg San Francisco General Hospital and Trauma Center University of California San Francisco San Francisco, CA United States
- Center for Vulnerable Populations University of California San Francisco San Francisco, CA United States
| | - Natalie A Rivadeneira
- Division of General Internal Medicine at Zuckerberg San Francisco General Hospital and Trauma Center University of California San Francisco San Francisco, CA United States
- Center for Vulnerable Populations University of California San Francisco San Francisco, CA United States
| | - Kristan Olazo
- Division of General Internal Medicine at Zuckerberg San Francisco General Hospital and Trauma Center University of California San Francisco San Francisco, CA United States
- Center for Vulnerable Populations University of California San Francisco San Francisco, CA United States
| | - Thu T Nguyen
- Department of Family and Community Medicine University of California San Francisco San Francisco, CA United States
- Department of Epidemiology & Biostatistics University of Maryland School of Public Health College Park, MD United States
| | - Tim K Mackey
- S-3 Research LLC San Diego, CA United States
- Global Health Policy and Data Institute San Diego, CA United States
- Global Health Program Department of Anthropology University of California San Diego La Jolla, CA United States
| | - Urmimala Sarkar
- Division of General Internal Medicine at Zuckerberg San Francisco General Hospital and Trauma Center University of California San Francisco San Francisco, CA United States
- Center for Vulnerable Populations University of California San Francisco San Francisco, CA United States
| |
Collapse
|
5
|
Kahanek A, Yu X, Hong L, Cleveland A, Philbrick J. Temporal Variations and Spatial Disparities in Public Sentiment Toward COVID-19 and Preventive Practices in the United States: Infodemiology Study of Tweets. JMIR INFODEMIOLOGY 2021; 1:e31671. [PMID: 35013722 PMCID: PMC8722524 DOI: 10.2196/31671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 09/12/2021] [Accepted: 11/18/2021] [Indexed: 11/27/2022]
Abstract
Background During the COVID-19 pandemic, US public health authorities and county, state, and federal governments recommended or ordered certain preventative practices, such as wearing masks, to reduce the spread of the disease. However, individuals had divergent reactions to these preventive practices. Objective The purpose of this study was to understand the variations in public sentiment toward COVID-19 and the recommended or ordered preventive practices from the temporal and spatial perspectives, as well as how the variations in public sentiment are related to geographical and socioeconomic factors. Methods The authors leveraged machine learning methods to investigate public sentiment polarity in COVID-19–related tweets from January 21, 2020 to June 12, 2020. The study measured the temporal variations and spatial disparities in public sentiment toward both general COVID-19 topics and preventive practices in the United States. Results In the temporal analysis, we found a 4-stage pattern from high negative sentiment in the initial stage to decreasing and low negative sentiment in the second and third stages, to the rebound and increase in negative sentiment in the last stage. We also identified that public sentiment to preventive practices was significantly different in urban and rural areas, while poverty rate and unemployment rate were positively associated with negative sentiment to COVID-19 issues. Conclusions The differences between public sentiment toward COVID-19 and the preventive practices imply that actions need to be taken to manage the initial and rebound stages in future pandemics. The urban and rural differences should be considered in terms of the communication strategies and decision making during a pandemic. This research also presents a framework to investigate time-sensitive public sentiment at the county and state levels, which could guide local and state governments and regional communities in making decisions and developing policies in crises.
Collapse
Affiliation(s)
- Alexander Kahanek
- College of Information University of North Texas Denton, TX United States
| | - Xinchen Yu
- College of Information University of North Texas Denton, TX United States
| | - Lingzi Hong
- College of Information University of North Texas Denton, TX United States
| | - Ana Cleveland
- College of Information University of North Texas Denton, TX United States
| | - Jodi Philbrick
- College of Information University of North Texas Denton, TX United States
| |
Collapse
|
6
|
Stier AJ, Schertz KE, Rim NW, Cardenas-Iniguez C, Lahey BB, Bettencourt LMA, Berman MG. Evidence and theory for lower rates of depression in larger US urban areas. Proc Natl Acad Sci U S A 2021; 118:e2022472118. [PMID: 34315817 PMCID: PMC8346882 DOI: 10.1073/pnas.2022472118] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
It is commonly assumed that cities are detrimental to mental health. However, the evidence remains inconsistent and at most, makes the case for differences between rural and urban environments as a whole. Here, we propose a model of depression driven by an individual's accumulated experience mediated by social networks. The connection between observed systematic variations in socioeconomic networks and built environments with city size provides a link between urbanization and mental health. Surprisingly, this model predicts lower depression rates in larger cities. We confirm this prediction for US cities using four independent datasets. These results are consistent with other behaviors associated with denser socioeconomic networks and suggest that larger cities provide a buffer against depression. This approach introduces a systematic framework for conceptualizing and modeling mental health in complex physical and social networks, producing testable predictions for environmental and social determinants of mental health also applicable to other psychopathologies.
Collapse
Affiliation(s)
- Andrew J Stier
- Department of Psychology, University of Chicago, Chicago, IL 60637;
| | | | - Nak Won Rim
- Division of Social Sciences, University of Chicago, Chicago, IL 60637
| | | | - Benjamin B Lahey
- Department of Public Health Sciences, University of Chicago, Chicago, IL 60637
| | - Luís M A Bettencourt
- Department of Ecology & Evolution, University of Chicago, Chicago, IL 60637
- Mansueto Institute for Urban Innovation, University of Chicago, Chicago, IL 60637
| | - Marc G Berman
- The University of Chicago Neuroscience Institute, University of Chicago, Chicago, IL 60637
| |
Collapse
|