1
|
Mangalik S, Eichstaedt JC, Giorgi S, Mun J, Ahmed F, Gill G, V Ganesan A, Subrahmanya S, Soni N, Clouston SAP, Schwartz HA. Robust language-based mental health assessments in time and space through social media. NPJ Digit Med 2024; 7:109. [PMID: 38698174 PMCID: PMC11065872 DOI: 10.1038/s41746-024-01100-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Accepted: 04/04/2024] [Indexed: 05/05/2024] Open
Abstract
In the most comprehensive population surveys, mental health is only broadly captured through questionnaires asking about "mentally unhealthy days" or feelings of "sadness." Further, population mental health estimates are predominantly consolidated to yearly estimates at the state level, which is considerably coarser than the best estimates of physical health. Through the large-scale analysis of social media, robust estimation of population mental health is feasible at finer resolutions. In this study, we created a pipeline that used ~1 billion Tweets from 2 million geo-located users to estimate mental health levels and changes for depression and anxiety, the two leading mental health conditions. Language-based mental health assessments (LBMHAs) had substantially higher levels of reliability across space and time than available survey measures. This work presents reliable assessments of depression and anxiety down to the county-weeks level. Where surveys were available, we found moderate to strong associations between the LBMHAs and survey scores for multiple levels of granularity, from the national level down to weekly county measurements (fixed effects β = 0.34 to 1.82; p < 0.001). LBMHAs demonstrated temporal validity, showing clear absolute increases after a list of major societal events (+23% absolute change for depression assessments). LBMHAs showed improved external validity, evidenced by stronger correlations with measures of health and socioeconomic status than population surveys. This study shows that the careful aggregation of social media data yields spatiotemporal estimates of population mental health that exceed the granularity achievable by existing population surveys, and does so with generally greater reliability and validity.
Collapse
Affiliation(s)
- Siddharth Mangalik
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA.
| | - Johannes C Eichstaedt
- Department of Psychology, Stanford University, Stanford, CA, USA.
- Institute for Human-Centered A.I., Stanford University, Stanford, CA, USA.
| | - Salvatore Giorgi
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, USA
| | - Jihu Mun
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - Farhan Ahmed
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - Gilvir Gill
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - Adithya V Ganesan
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | | | - Nikita Soni
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - Sean A P Clouston
- Department of Family, Population, and Preventive Medicine, Renaissance School of Medicine, Stony Brook University, Stony Brook, NY, USA
| | - H Andrew Schwartz
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA.
| |
Collapse
|
2
|
Stade EC, Stirman SW, Ungar LH, Boland CL, Schwartz HA, Yaden DB, Sedoc J, DeRubeis RJ, Willer R, Eichstaedt JC. Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation. Npj Ment Health Res 2024; 3:12. [PMID: 38609507 PMCID: PMC10987499 DOI: 10.1038/s44184-024-00056-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 01/30/2024] [Indexed: 04/14/2024]
Abstract
Large language models (LLMs) such as Open AI's GPT-4 (which power ChatGPT) and Google's Gemini, built on artificial intelligence, hold immense potential to support, augment, or even eventually automate psychotherapy. Enthusiasm about such applications is mounting in the field as well as industry. These developments promise to address insufficient mental healthcare system capacity and scale individual access to personalized treatments. However, clinical psychology is an uncommonly high stakes application domain for AI systems, as responsible and evidence-based therapy requires nuanced expertise. This paper provides a roadmap for the ambitious yet responsible application of clinical LLMs in psychotherapy. First, a technical overview of clinical LLMs is presented. Second, the stages of integration of LLMs into psychotherapy are discussed while highlighting parallels to the development of autonomous vehicle technology. Third, potential applications of LLMs in clinical care, training, and research are discussed, highlighting areas of risk given the complex nature of psychotherapy. Fourth, recommendations for the responsible development and evaluation of clinical LLMs are provided, which include centering clinical science, involving robust interdisciplinary collaboration, and attending to issues like assessment, risk detection, transparency, and bias. Lastly, a vision is outlined for how LLMs might enable a new generation of studies of evidence-based interventions at scale, and how these studies may challenge assumptions about psychotherapy.
Collapse
Affiliation(s)
- Elizabeth C Stade
- Dissemination and Training Division, National Center for PTSD, VA Palo Alto Health Care System, Palo Alto, CA, USA.
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA.
- Institute for Human-Centered Artificial Intelligence & Department of Psychology, Stanford University, Stanford, CA, USA.
| | - Shannon Wiltsey Stirman
- Dissemination and Training Division, National Center for PTSD, VA Palo Alto Health Care System, Palo Alto, CA, USA
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA
| | - Lyle H Ungar
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Cody L Boland
- Dissemination and Training Division, National Center for PTSD, VA Palo Alto Health Care System, Palo Alto, CA, USA
| | - H Andrew Schwartz
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - David B Yaden
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - João Sedoc
- Department of Technology, Operations, and Statistics, New York University, New York, NY, USA
| | - Robert J DeRubeis
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
| | - Robb Willer
- Department of Sociology, Stanford University, Stanford, CA, USA
| | - Johannes C Eichstaedt
- Institute for Human-Centered Artificial Intelligence & Department of Psychology, Stanford University, Stanford, CA, USA.
| |
Collapse
|
3
|
Nilsson AH, Eichstaedt JC, Lomas T, Schwartz A, Kjell O. The Cantril Ladder elicits thoughts about power and wealth. Sci Rep 2024; 14:2642. [PMID: 38302578 PMCID: PMC10834405 DOI: 10.1038/s41598-024-52939-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 01/25/2024] [Indexed: 02/03/2024] Open
Abstract
The Cantril Ladder is among the most widely administered subjective well-being measures; every year, it is collected in 140+ countries in the Gallup World Poll and reported in the World Happiness Report. The measure asks respondents to evaluate their lives on a ladder from worst (bottom) to best (top). Prior work found Cantril Ladder scores sensitive to social comparison and to reflect one's relative position in the income distribution. To understand this, we explored how respondents interpret the Cantril Ladder. We analyzed word responses from 1581 UK adults and tested the impact of the (a) ladder imagery, (b) scale anchors of worst to best possible life, and c) bottom to top. Using three language analysis techniques (dictionary, topic, and word embeddings), we found that the Cantril Ladder framing emphasizes power and wealth over broader well-being and relationship concepts in comparison to the other study conditions. Further, altering the framings increased preferred scale levels from 8.4 to 8.9 (Cohen's d = 0.36). Introducing harmony as an anchor yielded the strongest divergence from the Cantril Ladder, reducing mentions of power and wealth topics the most (Cohen's d = -0.76). Our findings refine the understanding of historical Cantril Ladder data and may help guide the future evolution of well-being metrics and guidelines.
Collapse
Affiliation(s)
- August Håkan Nilsson
- Department of Psychology, Lund University, Lund, Sweden.
- Oslo Business School, Oslo Metropolitan University, Oslo, Norway.
| | - Johannes C Eichstaedt
- Department of Psychology, Institute for Human-Centered A.I., Stanford University, Stanford, CA, USA
| | - Tim Lomas
- Department of Epidemiology, Harvard University, Cambridge, USA
| | - Andrew Schwartz
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - Oscar Kjell
- Department of Psychology, Lund University, Lund, Sweden
| |
Collapse
|
4
|
Yaden DB, Giorgi S, Jordan M, Buffone A, Eichstaedt JC, Schwartz HA, Ungar L, Bloom P. Characterizing empathy and compassion using computational linguistic analysis. Emotion 2024; 24:106-115. [PMID: 37199938 DOI: 10.1037/emo0001205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
Many scholars have proposed that feeling what we believe others are feeling-often known as "empathy"-is essential for other-regarding sentiments and plays an important role in our moral lives. Caring for and about others (without necessarily sharing their feelings)-often known as "compassion"-is also frequently discussed as a relevant force for prosocial motivation and action. Here, we explore the relationship between empathy and compassion using the methods of computational linguistics. Analyses of 2,356,916 Facebook posts suggest that individuals (N = 2,781) high in empathy use different language than those high in compassion, after accounting for shared variance between these constructs. Empathic people, controlling for compassion, often use self-focused language and write about negative feelings, social isolation, and feeling overwhelmed. Compassionate people, controlling for empathy, often use other-focused language and write about positive feelings and social connections. In addition, high empathy without compassion is related to negative health outcomes, while high compassion without empathy is related to positive health outcomes, positive lifestyle choices, and charitable giving. Such findings favor an approach to moral motivation that is grounded in compassion rather than empathy. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Collapse
Affiliation(s)
- David B Yaden
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine
| | - Salvatore Giorgi
- Department of Computer and Information Science, University of Pennsylvania
| | | | | | | | | | - Lyle Ungar
- Department of Computer and Information Science, University of Pennsylvania
| | - Paul Bloom
- Department of Psychology, Yale University
| |
Collapse
|
5
|
Sametoğlu S, Pelt DHM, Eichstaedt JC, Ungar LH, Bartels M. Comparison of wellbeing structures based on survey responses and social media language: A network analysis. Appl Psychol Health Well Being 2023; 15:1555-1582. [PMID: 37161901 DOI: 10.1111/aphw.12451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Accepted: 04/07/2023] [Indexed: 05/11/2023]
Abstract
Wellbeing is predominantly measured through surveys but is increasingly measured by analysing individuals' language on social media platforms using social media text mining (SMTM). To investigate whether the structure of wellbeing is similar across both data collection methods, we compared networks derived from survey items and social media language features collected from the same participants. The dataset was split into an independent exploration (n = 1169) and a final subset (n = 1000). After estimating exploration networks, redundant survey items and language topics were eliminated. Final networks were then estimated using exploratory graph analysis (EGA). The networks of survey items and those from language topics were similar, both consisting of five wellbeing dimensions. The dimensions in the survey- and SMTM-based assessment of wellbeing showed convergent structures congruent with theories of wellbeing. Specific dimensions found in each network reflected the unique aspects of each type of data (survey and social media language). Networks derived from both language features and survey items show similar structures. Survey and SMTM methods may provide complementary methods to understand differences in human wellbeing.
Collapse
Affiliation(s)
- Selim Sametoğlu
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, Amsterdam University Medical Centers, Amsterdam, The Netherlands
| | - Dirk H M Pelt
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, Amsterdam University Medical Centers, Amsterdam, The Netherlands
| | - Johannes C Eichstaedt
- Department of Psychology, Stanford University, Stanford, California, USA
- Institute for Human-Centered AI, Stanford University, Stanford, California, USA
| | - Lyle H Ungar
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Positive Psychology Center, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Meike Bartels
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, Amsterdam University Medical Centers, Amsterdam, The Netherlands
| |
Collapse
|
6
|
Stade EC, Ungar L, Eichstaedt JC, Sherman G, Ruscio AM. Depression and anxiety have distinct and overlapping language patterns: Results from a clinical interview. J Psychopathol Clin Sci 2023; 132:972-983. [PMID: 37471025 PMCID: PMC10799169 DOI: 10.1037/abn0000850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/21/2023]
Abstract
Depression has been associated with heightened first-person singular pronoun use (I-usage; e.g., "I," "my") and negative emotion words. However, past research has relied on nonclinical samples and nonspecific depression measures, raising the question of whether these features are unique to depression vis-à-vis frequently co-occurring conditions, especially anxiety. Using structured questions about recent life changes or difficulties, we interviewed a sample of individuals with varying levels of depression and anxiety (N = 486), including individuals in a major depressive episode (n = 228) and/or diagnosed with generalized anxiety disorder (n = 273). Interviews were transcribed to provide a natural language sample. Analyses isolated language features associated with gold standard, clinician-rated measures of depression and anxiety. Many language features associated with depression were in fact shared between depression and anxiety. Language markers with relative specificity to depression included I-usage, sadness, and decreased positive emotion, while negations (e.g., "not," "no"), negative emotion, and several emotional language markers (e.g., anxiety, stress, depression) were relatively specific to anxiety. Several of these results were replicated using a self-report measure designed to disentangle components of depression and anxiety. We next built machine learning models to detect severity of common and specific depression and anxiety using only interview language. Individuals' speech characteristics during this brief interview predicted their depression and anxiety severity, beyond other clinical and demographic variables. Depression and anxiety have partially distinct patterns of expression in spoken language. Monitoring of depression and anxiety severity via language can augment traditional assessment modalities and aid in early detection. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
Collapse
Affiliation(s)
| | - Lyle Ungar
- Department of Computer and Information Science, University of Pennsylvania
| | - Johannes C. Eichstaedt
- Department of Psychology and Institute for Human-Centered Artificial Intelligence, Stanford University
| | - Garrick Sherman
- National Institute on Drug Abuse, Intramural Research Program
| | | |
Collapse
|
7
|
Giorgi S, Eichstaedt JC, Preoţiuc-Pietro D, Gardner JR, Schwartz HA, Ungar LH. Filling in the white space: Spatial interpolation with Gaussian processes and social media data. Curr Res Ecol Soc Psychol 2023; 5:100159. [PMID: 38125747 PMCID: PMC10732585 DOI: 10.1016/j.cresp.2023.100159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Full national coverage below the state level is difficult to attain through survey-based data collection. Even the largest survey-based data collections, such as the CDC's Behavioral Risk Factor Surveillance System or the Gallup-Healthways Well-being Index (both with more than 300,000 responses p.a.) only allow for the estimation of annual averages for about 260 out of roughly U.S. 3,000 counties when a threshold of 300 responses per county is used. Using a relatively high threshold of 300 responses gives substantially higher convergent validity-higher correlations with health variables-than lower thresholds but covers a reduced and biased sample of the population. We present principled methods to interpolate spatial estimates and show that including large-scale geotagged social media data can increase interpolation accuracy. In this work, we focus on Gallup-reported life satisfaction, a widely-used measure of subjective well-being. We use Gaussian Processes (GP), a formal Bayesian model, to interpolate life satisfaction, which we optimally combine with estimates from low-count data. We interpolate over several spaces (geographic and socioeconomic) and extend these evaluations to the space created by variables encoding language frequencies of approximately 6 million geotagged Twitter users. We find that Twitter language use can serve as a rough aggregate measure of socioeconomic and cultural similarity, and improves upon estimates derived from a wide variety of socioeconomic, demographic, and geographic similarity measures. We show that applying Gaussian Processes to the limited Gallup data allows us to generate estimates for a much larger number of counties while maintaining the same level of convergent validity with external criteria (i.e., N = 1,133 vs. 2,954 counties). This work suggests that spatial coverage of psychological variables can be reliably extended through Bayesian techniques while maintaining out-of-sample prediction accuracy and that Twitter language adds important information about cultural similarity over and above traditional socio-demographic and geographic similarity measures. Finally, to facilitate the adoption of these methods, we have also open-sourced an online tool that researchers can freely use to interpolate their data across geographies.
Collapse
Affiliation(s)
- Salvatore Giorgi
- Department of Computer and Information Science, University of Pennsylvania, United States of America
| | - Johannes C. Eichstaedt
- Department of Psychology & Institute for Human-Centered AI, Stanford University, United States of America
| | | | - Jacob R. Gardner
- Department of Computer and Information Science, University of Pennsylvania, United States of America
| | - H. Andrew Schwartz
- Department of Computer Science, Stony Brook University, United States of America
| | - Lyle H. Ungar
- Department of Computer and Information Science, University of Pennsylvania, United States of America
| |
Collapse
|
8
|
Giorgi S, Yaden DB, Eichstaedt JC, Ungar LH, Schwartz HA, Kwarteng A, Curtis B. Predicting U.S. county opioid poisoning mortality from multi-modal social media and psychological self-report data. Sci Rep 2023; 13:9027. [PMID: 37270657 DOI: 10.1038/s41598-023-34468-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 04/30/2023] [Indexed: 06/05/2023] Open
Abstract
Opioid poisoning mortality is a substantial public health crisis in the United States, with opioids involved in approximately 75% of the nearly 1 million drug related deaths since 1999. Research suggests that the epidemic is driven by both over-prescribing and social and psychological determinants such as economic stability, hopelessness, and isolation. Hindering this research is a lack of measurements of these social and psychological constructs at fine-grained spatial and temporal resolutions. To address this issue, we use a multi-modal data set consisting of natural language from Twitter, psychometric self-reports of depression and well-being, and traditional area-based measures of socio-demographics and health-related risk factors. Unlike previous work using social media data, we do not rely on opioid or substance related keywords to track community poisonings. Instead, we leverage a large, open vocabulary of thousands of words in order to fully characterize communities suffering from opioid poisoning, using a sample of 1.5 billion tweets from 6 million U.S. county mapped Twitter users. Results show that Twitter language predicted opioid poisoning mortality better than factors relating to socio-demographics, access to healthcare, physical pain, and psychological well-being. Additionally, risk factors revealed by the Twitter language analysis included negative emotions, discussions of long work hours, and boredom, whereas protective factors included resilience, travel/leisure, and positive emotions, dovetailing with results from the psychometric self-report data. The results show that natural language from public social media can be used as a surveillance tool for both predicting community opioid poisonings and understanding the dynamic social and psychological nature of the epidemic.
Collapse
Affiliation(s)
- Salvatore Giorgi
- National Institute on Drug Abuse, Intramural Research Program, Baltimore, MD, USA
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA
| | - David B Yaden
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Johannes C Eichstaedt
- Department of Psychology, Stanford University, Stanford, CA, USA
- Institute for Human-Centered AI, Stanford University, Stanford, CA, USA
| | - Lyle H Ungar
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA
| | - H Andrew Schwartz
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - Amy Kwarteng
- National Institute on Drug Abuse, Intramural Research Program, Baltimore, MD, USA
| | - Brenda Curtis
- National Institute on Drug Abuse, Intramural Research Program, Baltimore, MD, USA.
| |
Collapse
|
9
|
Levanti D, Monastero RN, Zamani M, Eichstaedt JC, Giorgi S, Schwartz HA, Meliker JR. Depression and Anxiety on Twitter During the COVID-19 Stay-At-Home Period in 7 Major U.S. Cities. AJPM Focus 2023; 2:100062. [PMID: 36573174 PMCID: PMC9773738 DOI: 10.1016/j.focus.2022.100062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Introduction Although surveys are a well-established instrument to capture the population prevalence of mental health at a moment in time, public Twitter is a continuously available data source that can provide a broader window into population mental health. We characterized the relationship between COVID-19 case counts, stay-at-home orders because of COVID-19, and anxiety and depression in 7 major U.S. cities utilizing Twitter data. Methods We collected 18 million Tweets from January to September 2019 (baseline) and 2020 from 7 U.S. cities with large populations and varied COVID-19 response protocols: Atlanta, Chicago, Houston, Los Angeles, Miami, New York, and Phoenix. We applied machine learning‒based language prediction models for depression and anxiety validated in previous work with Twitter data. As an alternative public big data source, we explored Google Trends data using search query frequencies. A qualitative evaluation of trends is presented. Results Twitter depression and anxiety scores were consistently elevated above their 2019 baselines across all the 7 locations. Twitter depression scores increased during the early phase of the pandemic, with a peak in early summer and a subsequent decline in late summer. The pattern of depression trends was aligned with national COVID-19 case trends rather than with trends in individual states. Anxiety was consistently and steadily elevated throughout the pandemic. Google search trends data showed noisy and inconsistent results. Conclusions Our study shows the feasibility of using Twitter to capture trends of depression and anxiety during the COVID-19 public health crisis and suggests that social media data can supplement survey data to monitor long-term mental health trends.
Collapse
Affiliation(s)
| | | | - Mohammadzaman Zamani
- Department of Computer Science, College of Engineering and Applied Sciences, Stony Brook University, Stony Brook, New York
| | - Johannes C. Eichstaedt
- Department of Psychology, School of Humanities and Sciences, Stanford University, Palo Alto, California
| | - Salvatore Giorgi
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania
| | - H. Andrew Schwartz
- Department of Computer Science, College of Engineering and Applied Sciences, Stony Brook University, Stony Brook, New York
| | - Jaymie R. Meliker
- Program in Public Health, Department of Family, Population, & Preventive Medicine, Renaissance School of Medicine, Stony Brook University, Stony Brook, New York
| |
Collapse
|
10
|
Lou S, Giorgi S, Liu T, Eichstaedt JC, Curtis B. Measuring disadvantage: A systematic comparison of United States small-area disadvantage indices. Health Place 2023; 80:102997. [PMID: 36867991 PMCID: PMC10038931 DOI: 10.1016/j.healthplace.2023.102997] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 02/02/2023] [Accepted: 02/21/2023] [Indexed: 03/05/2023]
Abstract
Extensive evidence demonstrates the effects of area-based disadvantage on a variety of life outcomes, such as increased mortality and low economic mobility. Despite these well-established patterns, disadvantage, often measured using composite indices, is inconsistently operationalized across studies. To address this issue, we systematically compared 5 U.S. disadvantage indices at the county-level on their relationships to 24 diverse life outcomes related to mortality, physical health, mental health, subjective well-being, and social capital from heterogeneous data sources. We further examined which domains of disadvantage are most important when creating these indices. Of the five indices examined, the Area Deprivation Index (ADI) and Child Opportunity Index 2.0 (COI) were most related to a diverse set of life outcomes, particularly physical health. Within each index, variables from the domains of education and employment were most important in relationships with life outcomes. Disadvantage indices are being used in real-world policy and resource allocation decisions; an index's generalizability across diverse life outcomes, and the domains of disadvantage which constitute the index, should be considered when guiding such decisions.
Collapse
Affiliation(s)
- Sophia Lou
- Technology and Translational Research Unit, National Institute on Drug Abuse, 251 Bayview Blvd., Baltimore, MD, 21224, USA
| | - Salvatore Giorgi
- Technology and Translational Research Unit, National Institute on Drug Abuse, 251 Bayview Blvd., Baltimore, MD, 21224, USA; Department of Computer and Information Science, University of Pennsylvania, 3330 Walnut St, Philadelphia, PA, 19104, USA
| | - Tingting Liu
- Technology and Translational Research Unit, National Institute on Drug Abuse, 251 Bayview Blvd., Baltimore, MD, 21224, USA; Positive Psychology Center, Department of Psychology, University of Pennsylvania, 425 S. University Ave, Philadelphia, PA, 19104, USA
| | - Johannes C Eichstaedt
- Department of Psychology and Institute for Human-Centered AI, Stanford University, 210 Panama St., Stanford, CA, 94305, USA
| | - Brenda Curtis
- Technology and Translational Research Unit, National Institute on Drug Abuse, 251 Bayview Blvd., Baltimore, MD, 21224, USA.
| |
Collapse
|
11
|
Wilhelm E, Ballalai I, Belanger ME, Benjamin P, Bertrand-Ferrandis C, Bezbaruah S, Briand S, Brooks I, Bruns R, Bucci LM, Calleja N, Chiou H, Devaria A, Dini L, D'Souza H, Dunn AG, Eichstaedt JC, Evers SMAA, Gobat N, Gissler M, Gonzales IC, Gruzd A, Hess S, Ishizumi A, John O, Joshi A, Kaluza B, Khamis N, Kosinska M, Kulkarni S, Lingri D, Ludolph R, Mackey T, Mandić-Rajčević S, Menczer F, Mudaliar V, Murthy S, Nazakat S, Nguyen T, Nilsen J, Pallari E, Pasternak Taschner N, Petelos E, Prinstein MJ, Roozenbeek J, Schneider A, Srinivasan V, Stevanović A, Strahwald B, Syed Abdul S, Varaidzo Machiri S, van der Linden S, Voegeli C, Wardle C, Wegwarth O, White BK, Willie E, Yau B, Purnat TD. Measuring the Burden of Infodemics: Summary of the Methods and Results of the Fifth WHO Infodemic Management Conference. JMIR Infodemiology 2023; 3:e44207. [PMID: 37012998 PMCID: PMC9989916 DOI: 10.2196/44207] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 01/09/2023] [Accepted: 01/26/2023] [Indexed: 01/27/2023]
Abstract
Background An infodemic is excess information, including false or misleading information, that spreads in digital and physical environments during a public health emergency. The COVID-19 pandemic has been accompanied by an unprecedented global infodemic that has led to confusion about the benefits of medical and public health interventions, with substantial impact on risk-taking and health-seeking behaviors, eroding trust in health authorities and compromising the effectiveness of public health responses and policies. Standardized measures are needed to quantify the harmful impacts of the infodemic in a systematic and methodologically robust manner, as well as harmonizing highly divergent approaches currently explored for this purpose. This can serve as a foundation for a systematic, evidence-based approach to monitoring, identifying, and mitigating future infodemic harms in emergency preparedness and prevention. Objective In this paper, we summarize the Fifth World Health Organization (WHO) Infodemic Management Conference structure, proceedings, outcomes, and proposed actions seeking to identify the interdisciplinary approaches and frameworks needed to enable the measurement of the burden of infodemics. Methods An iterative human-centered design (HCD) approach and concept mapping were used to facilitate focused discussions and allow for the generation of actionable outcomes and recommendations. The discussions included 86 participants representing diverse scientific disciplines and health authorities from 28 countries across all WHO regions, along with observers from civil society and global public health-implementing partners. A thematic map capturing the concepts matching the key contributing factors to the public health burden of infodemics was used throughout the conference to frame and contextualize discussions. Five key areas for immediate action were identified. Results The 5 key areas for the development of metrics to assess the burden of infodemics and associated interventions included (1) developing standardized definitions and ensuring the adoption thereof; (2) improving the map of concepts influencing the burden of infodemics; (3) conducting a review of evidence, tools, and data sources; (4) setting up a technical working group; and (5) addressing immediate priorities for postpandemic recovery and resilience building. The summary report consolidated group input toward a common vocabulary with standardized terms, concepts, study designs, measures, and tools to estimate the burden of infodemics and the effectiveness of infodemic management interventions. Conclusions Standardizing measurement is the basis for documenting the burden of infodemics on health systems and population health during emergencies. Investment is needed into the development of practical, affordable, evidence-based, and systematic methods that are legally and ethically balanced for monitoring infodemics; generating diagnostics, infodemic insights, and recommendations; and developing interventions, action-oriented guidance, policies, support options, mechanisms, and tools for infodemic managers and emergency program managers.
Collapse
Affiliation(s)
- Elisabeth Wilhelm
- US Centers for Disease Control and Prevention Atlanta, GA United States
| | | | - Marie-Eve Belanger
- Department of Political Science and International Relations Université de Genève Geneva Switzerland
| | | | | | - Supriya Bezbaruah
- Department of Epidemic and Pandemic Preparedness and Prevention World Health Organization Geneva Switzerland
| | - Sylvie Briand
- Department of Epidemic and Pandemic Preparedness and Prevention World Health Organization Geneva Switzerland
| | - Ian Brooks
- Center for Health Informatics School of Information Sciences University of Illinois Champaign, IL United States
| | - Richard Bruns
- Johns Hopkins Center for Health Security Baltimore, MD United States
| | - Lucie M Bucci
- Immunize Canada Canadian Public Health Association Ottawa, ON Canada
| | - Neville Calleja
- Directorate for Health Information and Research Ministry for Health Valletta Malta
| | - Howard Chiou
- US Centers for Disease Control and Prevention Atlanta, GA United States
- US Public Health Service Commissioned Corps Rockville, MD United States
| | | | - Lorena Dini
- Working Group Health Policy and Systems Research and Innovation Institute for General Practice Charité Universitätsmedizin Berlin Berlin Germany
| | - Hyjel D'Souza
- The George Institute for Global Health New Delhi India
| | - Adam G Dunn
- Biomedical Informatics and Digital Health Faculty of Medicine and Health University of Sydney Sydney Australia
| | - Johannes C Eichstaedt
- Department of Psychology Stanford University Stanford, CA United States
- Institute for Human-Centered AI Stanford University Stanford, CA United States
| | - Silvia M A A Evers
- Department of Health Services Research Maastricht University Maastricht Netherlands
| | - Nina Gobat
- Department of Country Readiness Strengthening World Health Organization Geneva Switzerland
| | - Mika Gissler
- Department of Knowledge Brokers THL Finnish Institute for Health and Welfare Helsinki Finland
| | - Ian Christian Gonzales
- Field Epidemiology Training Program Epidemiology Bureau Department of Health Manila Philippines
| | - Anatoliy Gruzd
- Ted Rogers School of Management Toronto Metropolitan University Toronto, ON Canada
| | - Sarah Hess
- Department of Epidemic and Pandemic Preparedness and Prevention World Health Organization Geneva Switzerland
| | - Atsuyoshi Ishizumi
- Department of Epidemic and Pandemic Preparedness and Prevention World Health Organization Geneva Switzerland
| | - Oommen John
- The George Institute for Global Health New Delhi India
| | - Ashish Joshi
- Department of Epidemiology and Biostatistics Graduate School of Public Health and Health Policy City University of New York New York, NY United States
| | - Benjamin Kaluza
- Department Technological Analysis and Strategic Planning Fraunhofer Institute for Technological Trend Analysis INT Euskirchen Germany
| | - Nagwa Khamis
- Infection Prevention and Control Department Children's Cancer Hospital Egypt-57357 Ain Shams University Specialized Hospital Cairo Egypt
| | - Monika Kosinska
- Department of Social Determinants World Health Organization Geneva Switzerland
| | - Shibani Kulkarni
- US Centers for Disease Control and Prevention Atlanta, GA United States
| | - Dimitra Lingri
- European Healthcare Fraud and Corruption Network Aristotle Universtity of Thessaloniki Brussels Belgium
| | - Ramona Ludolph
- Department of Epidemic and Pandemic Preparedness and Prevention World Health Organization Geneva Switzerland
| | - Tim Mackey
- Global Health Program Department of Anthropology University of California San Diego, CA United States
| | | | - Filippo Menczer
- Observatory on Social Media Luddy School of Informatics, Computing, and Engineering Indiana University Bloomington, IN United States
| | | | - Shruti Murthy
- The George Institute for Global Health New Delhi India
| | - Syed Nazakat
- DataLEADS (Health Analytics Asia) New Delhi India
| | - Tim Nguyen
- Department of Epidemic and Pandemic Preparedness and Prevention World Health Organization Geneva Switzerland
| | - Jennifer Nilsen
- Technology and Social Change Project Harvard University Cambridge, MA United States
| | - Elena Pallari
- Health Innovation Network Guy's and St Thomas' Hospital London United Kingdom
| | - Natalia Pasternak Taschner
- Center of Science and Society Columbia University New York, NY United States
- Instituto Questão de Ciência São Paulo Brazil
| | - Elena Petelos
- Department of Health Services Research Care and Public Health Research Institute Maastricht University Maastricht Netherlands
- Clinic of Social and Family Medicine Faculty of Medicine University of Crete Heraklion Greece
| | - Mitchell J Prinstein
- American Psychological Association Washington DC, DC United States
- Department of Psychology and Neuroscience University of North Carolina at Chapel Hill Chapel Hill, NC United States
| | - Jon Roozenbeek
- Department of Psychology University of Cambridge Cambridge United Kingdom
| | - Anton Schneider
- Bureau for Global Health Office of Infectious Disease United States Agency for International Development Washington DC, DC United States
| | | | - Aleksandar Stevanović
- Institute of Social Medicine Faculty of Medicine University of Belgrade Belgrade Serbia
| | - Brigitte Strahwald
- Pettenkofer School of Public Health Ludwig-Maximilians-Universität München Munich Germany
| | - Shabbir Syed Abdul
- The George Institute for Global Health New Delhi India
- Graduate Institute of Biomedical Informatics Taipei Medical University Taipei Taiwan
| | | | | | - Christopher Voegeli
- Office of the Director National Center for Immunization and Respiratory Diseases US Centers for Disease Control and Prevention Atlanta, GA United States
| | - Claire Wardle
- Information Futures Lab School of Public Health Brown University Providence, RI United States
| | - Odette Wegwarth
- Heisenberg Chair for Medical Risk Literacy & Evidence-Based Decisions Charite - Universitätsmedizin Berlin Berlin Germany
| | - Becky K White
- Department of Epidemic and Pandemic Preparedness and Prevention World Health Organization Geneva Switzerland
| | - Estelle Willie
- Communications, Policy, Advocacy The Rockefeller Foundation New York, NY United States
| | - Brian Yau
- Department of Epidemic and Pandemic Preparedness and Prevention World Health Organization Geneva Switzerland
| | - Tina D Purnat
- Department of Epidemic and Pandemic Preparedness and Prevention World Health Organization Geneva Switzerland
| |
Collapse
|
12
|
Son Y, Clouston SAP, Kotov R, Eichstaedt JC, Bromet EJ, Luft BJ, Schwartz HA. World Trade Center responders in their own words: predicting PTSD symptom trajectories with AI-based language analyses of interviews. Psychol Med 2023; 53:918-926. [PMID: 34154682 PMCID: PMC8692489 DOI: 10.1017/s0033291721002294] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 05/15/2021] [Accepted: 05/21/2021] [Indexed: 12/24/2022]
Abstract
BACKGROUND Oral histories from 9/11 responders to the World Trade Center (WTC) attacks provide rich narratives about distress and resilience. Artificial Intelligence (AI) models promise to detect psychopathology in natural language, but they have been evaluated primarily in non-clinical settings using social media. This study sought to test the ability of AI-based language assessments to predict PTSD symptom trajectories among responders. METHODS Participants were 124 responders whose health was monitored at the Stony Brook WTC Health and Wellness Program who completed oral history interviews about their initial WTC experiences. PTSD symptom severity was measured longitudinally using the PTSD Checklist (PCL) for up to 7 years post-interview. AI-based indicators were computed for depression, anxiety, neuroticism, and extraversion along with dictionary-based measures of linguistic and interpersonal style. Linear regression and multilevel models estimated associations of AI indicators with concurrent and subsequent PTSD symptom severity (significance adjusted by false discovery rate). RESULTS Cross-sectionally, greater depressive language (β = 0.32; p = 0.049) and first-person singular usage (β = 0.31; p = 0.049) were associated with increased symptom severity. Longitudinally, anxious language predicted future worsening in PCL scores (β = 0.30; p = 0.049), whereas first-person plural usage (β = -0.36; p = 0.014) and longer words usage (β = -0.35; p = 0.014) predicted improvement. CONCLUSIONS This is the first study to demonstrate the value of AI in understanding PTSD in a vulnerable population. Future studies should extend this application to other trauma exposures and to other demographic groups, especially under-represented minorities.
Collapse
Affiliation(s)
- Youngseo Son
- Department of Computer Science, Stony Brook University, New York, USA
| | - Sean A. P. Clouston
- Program in Public Health, Stony Brook University, New York, USA
- Department of Family, Population and Preventive Medicine, Stony Brook University, New York, USA
| | - Roman Kotov
- Department of Psychiatry, Stony Brook University, New York, USA
| | - Johannes C. Eichstaedt
- Department of Psychology & Institute for Human-Centered A.I., Stanford University, Stanford, California, USA
| | | | | | | |
Collapse
|
13
|
Liu T, Ungar LH, Curtis B, Sherman G, Yadeta K, Tay L, Eichstaedt JC, Guntuku SC. Head versus heart: social media reveals differential language of loneliness from depression. Npj Ment Health Res 2022; 1:16. [PMID: 38609477 PMCID: PMC10955894 DOI: 10.1038/s44184-022-00014-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 09/12/2022] [Indexed: 04/14/2024]
Abstract
We study the language differentially associated with loneliness and depression using 3.4-million Facebook posts from 2986 individuals, and uncover the statistical associations of survey-based depression and loneliness with both dictionary-based (Linguistic Inquiry Word Count 2015) and open-vocabulary linguistic features (words, phrases, and topics). Loneliness and depression were found to have highly overlapping language profiles, including sickness, pain, and negative emotions as (cross-sectional) risk factors, and social relationships and activities as protective factors. Compared to depression, the language associated with loneliness reflects a stronger cognitive focus, including more references to cognitive processes (i.e., differentiation and tentative language, thoughts, and the observation of irregularities), and cognitive activities like reading and writing. As might be expected, less lonely users were more likely to reference social relationships (e.g., friends and family, romantic relationships), and use first-person plural pronouns. Our findings suggest that the mechanisms of loneliness include self-oriented cognitive activities (i.e., reading) and an overattention to the interpretation of information in the environment. These data-driven ecological findings suggest interventions for loneliness that target maladaptive social cognitions (e.g., through reframing the perception of social environments), strengthen social relationships, and treat other affective distress (i.e., depression).
Collapse
Affiliation(s)
- Tingting Liu
- National Institute on Drug Abuse (NIDA IRP), National Institutes of Health (NIH), Baltimore, MD, USA.
- Positive Psychology Center, University of Pennsylvania, Philadelphia, PA, USA.
| | - Lyle H Ungar
- Positive Psychology Center, University of Pennsylvania, Philadelphia, PA, USA
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Brenda Curtis
- National Institute on Drug Abuse (NIDA IRP), National Institutes of Health (NIH), Baltimore, MD, USA
| | - Garrick Sherman
- Positive Psychology Center, University of Pennsylvania, Philadelphia, PA, USA
| | - Kenna Yadeta
- National Institute on Drug Abuse (NIDA IRP), National Institutes of Health (NIH), Baltimore, MD, USA
| | - Louis Tay
- Department of Psychological Sciences, Purdue University, West Lafayette, IN, USA
| | - Johannes C Eichstaedt
- Department of Psychology, Institute for Human-Centered A.I., Stanford University, Stanford, CA, USA
| | - Sharath Chandra Guntuku
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
14
|
Frimer JA, Aujla H, Feinberg M, Skitka LJ, Aquino K, Eichstaedt JC, Willer R. Incivility Is Rising Among American Politicians on Twitter. Social Psychological and Personality Science 2022. [DOI: 10.1177/19485506221083811] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
We provide the first systematic investigation of trends in the incivility of American politicians on Twitter, a dominant platform for political communication in the United States. Applying a validated artificial intelligence classifier to all 1.3 million tweets made by members of Congress since 2009, we observe a 23% increase in incivility over a decade on Twitter. Further analyses suggest that the rise was partly driven by reinforcement learning in which politicians engaged in greater incivility following positive feedback. Uncivil tweets tended to receive more approval and attention, publicly indexed by large quantities of “likes” and “retweets” on the platform. Mediational and longitudinal analyses show that the greater this feedback for uncivil tweets, the more uncivil tweets were thereafter. We conclude by discussing how the structure of social media platforms might facilitate this incivility-reinforcing dynamic between politicians and their followers.
Collapse
Affiliation(s)
| | | | | | | | - Karl Aquino
- University of British Columbia, Vancouver, Canada
| | | | | |
Collapse
|
15
|
Liu T, Meyerhoff J, Eichstaedt JC, Karr CJ, Kaiser SM, Kording KP, Mohr DC, Ungar LH. The relationship between text message sentiment and self-reported depression. J Affect Disord 2022; 302:7-14. [PMID: 34963643 PMCID: PMC8912980 DOI: 10.1016/j.jad.2021.12.048] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 11/15/2021] [Accepted: 12/18/2021] [Indexed: 10/19/2022]
Abstract
BACKGROUND Personal sensing has shown promise for detecting behavioral correlates of depression, but there is little work examining personal sensing of cognitive and affective states. Digital language, particularly through personal text messages, is one source that can measure these markers. METHODS We correlated privacy-preserving sentiment analysis of text messages with self-reported depression symptom severity. We enrolled 219 U.S. adults in a 16 week longitudinal observational study. Participants installed a personal sensing app on their phones, which administered self-report PHQ-8 assessments of their depression severity, collected phone sensor data, and computed anonymized language sentiment scores from their text messages. We also trained machine learning models for predicting end-of-study self-reported depression status using on blocks of phone sensor and text features. RESULTS In correlation analyses, we find that degrees of depression, emotional, and personal pronoun language categories correlate most strongly with self-reported depression, validating prior literature. Our classification models which predict binary depression status achieve a leave-one-out AUC of 0.72 when only considering text features and 0.76 when combining text with other networked smartphone sensors. LIMITATIONS Participants were recruited from a panel that over-represented women, caucasians, and individuals with self-reported depression at baseline. As language use differs across demographic factors, generalizability beyond this population may be limited. The study period also coincided with the initial COVID-19 outbreak in the United States, which may have affected smartphone sensor data quality. CONCLUSIONS Effective depression prediction through text message sentiment, especially when combined with other personal sensors, could enable comprehensive mental health monitoring and intervention.
Collapse
Affiliation(s)
- Tony Liu
- Department of Computer and Information Science, University of Pennsylvania, USA.
| | - Jonah Meyerhoff
- Center for Behavioral Intervention Technologies (CBITs), Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, USA
| | | | | | - Susan M Kaiser
- Center for Behavioral Intervention Technologies (CBITs), Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, USA
| | - Konrad P Kording
- Department of Bioengineering, Department of Neuroscience, University of Pennsylvania, USA
| | - David C Mohr
- Center for Behavioral Intervention Technologies (CBITs), Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, USA
| | - Lyle H Ungar
- Department of Computer and Information Science, University of Pennsylvania, USA
| |
Collapse
|
16
|
Eichstaedt JC, Kern ML, Yaden DB, Schwartz HA, Giorgi S, Park G, Hagan CA, Tobolsky VA, Smith LK, Buffone A, Iwry J, Seligman MEP, Ungar LH. Closed- and open-vocabulary approaches to text analysis: A review, quantitative comparison, and recommendations. Psychol Methods 2021; 26:398-427. [PMID: 34726465 DOI: 10.1037/met0000349] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Technology now makes it possible to understand efficiently and at large scale how people use language to reveal their everyday thoughts, behaviors, and emotions. Written text has been analyzed through both theory-based, closed-vocabulary methods from the social sciences as well as data-driven, open-vocabulary methods from computer science, but these approaches have not been comprehensively compared. To provide guidance on best practices for automatically analyzing written text, this narrative review and quantitative synthesis compares five predominant closed- and open-vocabulary methods: Linguistic Inquiry and Word Count (LIWC), the General Inquirer, DICTION, Latent Dirichlet Allocation, and Differential Language Analysis. We compare the linguistic features associated with gender, age, and personality across the five methods using an existing dataset of Facebook status updates and self-reported survey data from 65,896 users. Results are fairly consistent across methods. The closed-vocabulary approaches efficiently summarize concepts and are helpful for understanding how people think, with LIWC2015 yielding the strongest, most parsimonious results. Open-vocabulary approaches reveal more specific and concrete patterns across a broad range of content domains, better address ambiguous word senses, and are less prone to misinterpretation, suggesting that they are well-suited for capturing the nuances of everyday psychological processes. We detail several errors that can occur in closed-vocabulary analyses, the impact of sample size, number of words per user and number of topics included in open-vocabulary analyses, and implications of different analytical decisions. We conclude with recommendations for researchers, advocating for a complementary approach that combines closed- and open-vocabulary methods. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Collapse
Affiliation(s)
| | - Margaret L Kern
- Melbourne Graduate School of Education, The University of Melbourne
| | - David B Yaden
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins Medicine
| | - H A Schwartz
- Department of Computer Science, Stony Brook University
| | | | - Gregory Park
- Department of Psychology, University of Pennsylvania
| | | | | | - Laura K Smith
- Department of Psychology, University of Pennsylvania
| | | | - Jonathan Iwry
- Department of Psychology, University of Pennsylvania
| | | | - Lyle H Ungar
- Department of Psychology, University of Pennsylvania
| |
Collapse
|
17
|
Ressler RW, Paxton P, Velasco K, Pivnick L, Weiss I, Eichstaedt JC. Nonprofits: A Public Policy Tool for the Promotion of Community Subjective Well-being. J Public Adm Res Theory 2021; 31:822-838. [PMID: 34608375 PMCID: PMC8482971 DOI: 10.1093/jopart/muab010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Looking to supplement common economic indicators, politicians and policymakers are increasingly interested in how to measure and improve the subjective well-being of communities. Theories about nonprofit organizations suggest that they represent a potential policy-amenable lever to increase community subjective well-being. Using longitudinal cross-lagged panel models with IRS and Twitter data, this study explores whether communities with higher numbers of nonprofits per capita exhibit greater subjective well-being in the form of more expressions of positive emotion, engagement, and relationships. We find associations, robust to sample bias concerns, between most types of nonprofit organizations and decreases in negative emotions, negative sentiments about relationships, and disengagement. We also find an association between nonprofit presence and the proportion of words tweeted in a county that indicate engagement. These findings contribute to our theoretical understanding of why nonprofit organizations matter for community-level outcomes and how they should be considered an important public policy lever.
Collapse
|
18
|
Giorgi S, Nguyen KL, Eichstaedt JC, Kern ML, Yaden DB, Kosinski M, Seligman MEP, Ungar LH, Schwartz HA, Park G. Regional personality assessment through social media language. J Pers 2021; 90:405-425. [PMID: 34536229 DOI: 10.1111/jopy.12674] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 08/26/2021] [Accepted: 09/12/2021] [Indexed: 11/30/2022]
Abstract
OBJECTIVE We explore the personality of counties as assessed through linguistic patterns on social media. Such studies were previously limited by the cost and feasibility of large-scale surveys; however, language-based computational models applied to large social media datasets now allow for large-scale personality assessment. METHOD We applied a language-based assessment of the five factor model of personality to 6,064,267 U.S. Twitter users. We aggregated the Twitter-based personality scores to 2,041 counties and compared to political, economic, social, and health outcomes measured through surveys and by government agencies. RESULTS There was significant personality variation across counties. Openness to experience was higher on the coasts, conscientiousness was uniformly spread, extraversion was higher in southern states, agreeableness was higher in western states, and emotional stability was highest in the south. Across 13 outcomes, language-based personality estimates replicated patterns that have been observed in individual-level and geographic studies. This includes higher Republican vote share in less agreeable counties and increased life satisfaction in more conscientious counties. CONCLUSIONS Results suggest that regions vary in their personality and that these differences can be studied through computational linguistic analysis of social media. Furthermore, these methods may be used to explore other psychological constructs across geographies.
Collapse
Affiliation(s)
- Salvatore Giorgi
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Khoa Le Nguyen
- Department Psychology and Neuroscience, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Johannes C Eichstaedt
- Department of Psychology, Institute for Human-Centered A.I., Stanford University, Stanford, California, USA
| | - Margaret L Kern
- Melbourne Graduate School of Education, University of Melbourne, Melbourne, Victoria, Australia
| | - David B Yaden
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Michal Kosinski
- Graduate School of Business, Stanford University, Stanford, California, USA
| | - Martin E P Seligman
- Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Lyle H Ungar
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - H Andrew Schwartz
- Department of Computer Science, Stony Brook University, Stony Brook, New York, USA
| | - Gregory Park
- Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
19
|
Giorgi S, Guntuku SC, Eichstaedt JC, Pajot C, Schwartz HA, Ungar LH. Well-Being Depends on Social Comparison: Hierarchical Models of Twitter Language Suggest That Richer Neighbors Make You Less Happy. Proc Int AAAI Conf Weblogs Soc Media 2021; 15:1069-1074. [PMID: 37064998 PMCID: PMC10099468 DOI: 10.1609/icwsm.v15i1.18132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Psychological research has shown that subjective well-being is sensitive to social comparison effects; individuals report decreased happiness when their neighbors earn more than they do. In this work, we use Twitter language to estimate the well-being of users, and model both individual and neighborhood income using hierarchical modeling across counties in the United States (US). We show that language-based estimates from a sample of 5.8 million Twitter users replicate results obtained from large-scale well-being surveys - relatively richer neighbors leads to lower well-being, even when controlling for absolute income. Furthermore, predicting individual-level happiness using hierarchical models (i.e., individuals within their communities) out-predicts standard baselines. We also explore language associated with relative income differences and find that individuals with lower income than their community tend to swear (f*ck, sh*t, b*tch), express anger (pissed, bullsh*t, wtf), hesitation (don't, anymore, idk, confused) and acts of social deviance (weed, blunt, drunk). These results suggest that social comparison robustly affects reported well-being, and that Twitter language analyses can be used to both measure these effects and shed light on their underlying psychological dynamics.
Collapse
|
20
|
Eichstaedt JC, Yaden DB, Ribeiro F, Adler A, Kern ML. Supplementary analysis for lifestyle and wellbeing: Exploring behavioral and demographic covariates in a large US sample. Intnl J Wellbeing 2020. [DOI: 10.5502/ijw.v10i4.831s] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
|
21
|
Eichstaedt JC, Yaden DB, Ribeiro F, Adler A, Kern ML. Lifestyle and wellbeing: Exploring behavioral and demographic covariates in a large US sample. Intnl J Wellbeing 2020. [DOI: 10.5502/ijw.v10i4.831] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
|
22
|
Abstract
Personality psychologists are increasingly documenting dynamic, within–person processes. Big data methodologies can augment this endeavour by allowing for the collection of naturalistic and personality–relevant digital traces from online environments. Whereas big data methods have primarily been used to catalogue static personality dimensions, here we present a case study in how they can be used to track dynamic fluctuations in psychological states. We apply a text–based, machine learning prediction model to Facebook status updates to compute weekly trajectories of emotional valence and arousal. We train this model on 2895 human–annotated Facebook statuses and apply the resulting model to 303 575 Facebook statuses posted by 640 US Facebook users who had previously self–reported their Big Five traits, yielding an average of 28 weekly estimates per user. We examine the correlations between model–predicted emotion and self–reported personality, providing a test of the robustness of these links when using weekly aggregated data, rather than momentary data as in prior work. We further present dynamic visualizations of weekly valence and arousal for every user, while making the final data set of 17 937 weeks openly available. We discuss the strengths and drawbacks of this method in the context of personality psychology's evolution into a dynamic science. © 2020 European Association of Personality Psychology
Collapse
|
23
|
Abstract
A rapidly growing literature has attempted to explain Donald Trump's success in the 2016 U.S. presidential election as a result of a wide variety of differences in individual characteristics, attitudes, and social processes. We propose that the economic and psychological processes previously established have in common that they generated or electorally capitalized on unhappiness in the electorate, which emerges as a powerful high-level predictor of the 2016 electoral outcome. Drawing on a large dataset covering over 2 million individual surveys, which we aggregated to the county level, we find that low levels of evaluative, experienced, and eudaemonic subjective well-being (SWB) are strongly predictive of Trump's victory, accounting for an extensive list of demographic, ideological, and socioeconomic covariates and robustness checks. County-level future life evaluation alone correlates with the Trump vote share over Republican baselines at r = -.78 in the raw data, a magnitude rarely seen in the social sciences. We show similar findings when examining the association between individual-level life satisfaction and Trump voting. Low levels of SWB also predict anti-incumbent voting at the 2012 election, both at the county and individual level. The findings suggest that SWB is a powerful high-level marker of (dis)content and that SWB should be routinely considered alongside economic explanations of electoral choice. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Collapse
Affiliation(s)
- George Ward
- d, Sloan School of Management, Massachusetts Institute of Technology
| | | | - Lyle H Ungar
- Department of Computer and Information Science, University of Pennsylvania
| | | |
Collapse
|
24
|
Jaidka K, Giorgi S, Schwartz HA, Kern ML, Ungar LH, Eichstaedt JC. Estimating geographic subjective well-being from Twitter: A comparison of dictionary and data-driven language methods. Proc Natl Acad Sci U S A 2020; 117:10165-10171. [PMID: 32341156 PMCID: PMC7229753 DOI: 10.1073/pnas.1906364117] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Researchers and policy makers worldwide are interested in measuring the subjective well-being of populations. When users post on social media, they leave behind digital traces that reflect their thoughts and feelings. Aggregation of such digital traces may make it possible to monitor well-being at large scale. However, social media-based methods need to be robust to regional effects if they are to produce reliable estimates. Using a sample of 1.53 billion geotagged English tweets, we provide a systematic evaluation of word-level and data-driven methods for text analysis for generating well-being estimates for 1,208 US counties. We compared Twitter-based county-level estimates with well-being measurements provided by the Gallup-Sharecare Well-Being Index survey through 1.73 million phone surveys. We find that word-level methods (e.g., Linguistic Inquiry and Word Count [LIWC] 2015 and Language Assessment by Mechanical Turk [LabMT]) yielded inconsistent county-level well-being measurements due to regional, cultural, and socioeconomic differences in language use. However, removing as few as three of the most frequent words led to notable improvements in well-being prediction. Data-driven methods provided robust estimates, approximating the Gallup data at up to r = 0.64. We show that the findings generalized to county socioeconomic and health outcomes and were robust when poststratifying the samples to be more representative of the general US population. Regional well-being estimation from social media data seems to be robust when supervised data-driven methods are used.
Collapse
Affiliation(s)
- Kokil Jaidka
- Department of Communications and New Media, National University of Singapore, Singapore 117416;
- Centre for Trusted Internet and Community, National University of Singapore, Singapore 117416
| | - Salvatore Giorgi
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104
| | - H Andrew Schwartz
- Department of Computer Science, Stony Brook University, Stony Brook, NY 11794
| | - Margaret L Kern
- Melbourne Graduate School of Education, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Lyle H Ungar
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104
| | - Johannes C Eichstaedt
- Department of Psychology, Stanford University, Stanford, CA 94305;
- Institute for Human-Centered Artificial Intelligence, Stanford University, Stanford, CA 94305
| |
Collapse
|
25
|
Giorgi S, Yaden DB, Eichstaedt JC, Ashford RD, Buffone AE, Schwartz HA, Ungar LH, Curtis B. Cultural Differences in Tweeting about Drinking Across the US. Int J Environ Res Public Health 2020; 17:ijerph17041125. [PMID: 32053866 PMCID: PMC7068559 DOI: 10.3390/ijerph17041125] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Revised: 02/06/2020] [Accepted: 02/08/2020] [Indexed: 11/16/2022]
Abstract
Excessive alcohol use in the US contributes to over 88,000 deaths per year and costs over $250 billion annually. While previous studies have shown that excessive alcohol use can be detected from general patterns of social media engagement, we characterized how drinking-specific language varies across regions and cultures in the US. From a database of 38 billion public tweets, we selected those mentioning “drunk”, found the words and phrases distinctive of drinking posts, and then clustered these into topics and sets of semantically related words. We identified geolocated “drunk” tweets and correlated their language with the prevalence of self-reported excessive alcohol consumption (Behavioral Risk Factor Surveillance System; BRFSS). We then identified linguistic markers associated with excessive drinking in different regions and cultural communities as identified by the American Community Project. “Drunk” tweet frequency (of the 3.3 million geolocated “drunk” tweets) correlated with excessive alcohol consumption at both the county and state levels (r = 0.26 and 0.45, respectively, p < 0.01). Topic analyses revealed that excessive alcohol consumption was most correlated with references to drinking with friends (r = 0.20), family (r = 0.15), and driving under the influence (r = 0.14). Using the American Community Project classification, we found a number of cultural markers of drinking: religious communities had a high frequency of anti-drunk driving tweets, Hispanic centers discussed family members drinking, and college towns discussed sexual behavior. This study shows that Twitter can be used to explore the specific sociocultural contexts in which excessive alcohol use occurs within particular regions and communities. These findings can inform more targeted public health messaging and help to better understand cultural determinants of substance abuse.
Collapse
Affiliation(s)
- Salvatore Giorgi
- Computer and Information Science Department, University of Pennsylvania, Philadelphia, PA 19104, USA; (S.G.); (L.H.U.)
- National Institutes of Health, National Institute on Drug Abuse, Bethesda, MD 20892, USA
| | - David B. Yaden
- Department of Psychology, University of Pennsylvania, Philadelphia, PA 19104, USA; (D.B.Y.)
| | - Johannes C. Eichstaedt
- Department of Psychology & Institute for Human-Centered Artificial Intelligence, Stanford University, Stanford, CA 94305, USA;
| | - Robert D. Ashford
- Substance Use Disorders Institute, University of the Sciences, Philadelphia, PA 19104, USA;
| | - Anneke E.K. Buffone
- Department of Psychology, University of Pennsylvania, Philadelphia, PA 19104, USA; (D.B.Y.)
| | - H. Andrew Schwartz
- Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA;
| | - Lyle H. Ungar
- Computer and Information Science Department, University of Pennsylvania, Philadelphia, PA 19104, USA; (S.G.); (L.H.U.)
| | - Brenda Curtis
- National Institutes of Health, National Institute on Drug Abuse, Bethesda, MD 20892, USA
- Correspondence:
| |
Collapse
|
26
|
Merchant RM, Asch DA, Crutchley P, Ungar LH, Guntuku SC, Eichstaedt JC, Hill S, Padrez K, Smith RJ, Schwartz HA. Evaluating the predictability of medical conditions from social media posts. PLoS One 2019; 14:e0215476. [PMID: 31206534 PMCID: PMC6576767 DOI: 10.1371/journal.pone.0215476] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Accepted: 04/02/2019] [Indexed: 12/11/2022] Open
Abstract
We studied whether medical conditions across 21 broad categories were predictable from social media content across approximately 20 million words written by 999 consenting patients. Facebook language significantly improved upon the prediction accuracy of demographic variables for 18 of the 21 disease categories; it was particularly effective at predicting diabetes and mental health conditions including anxiety, depression and psychoses. Social media data are a quantifiable link into the otherwise elusive daily lives of patients, providing an avenue for study and assessment of behavioral and environmental disease risk factors. Analogous to the genome, social media data linked to medical diagnoses can be banked with patients’ consent, and an encoding of social media language can be used as markers of disease risk, serve as a screening tool, and elucidate disease epidemiology. In what we believe to be the first report linking electronic medical record data with social media data from consenting patients, we identified that patients’ Facebook status updates can predict many health conditions, suggesting opportunities to use social media data to determine disease onset or exacerbation and to conduct social media-based health interventions.
Collapse
Affiliation(s)
- Raina M Merchant
- Penn Medicine Center for Digital Health, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.,Department of Emergency Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.,Penn Medicine Center for Health Care Innovation, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - David A Asch
- Penn Medicine Center for Digital Health, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.,Penn Medicine Center for Health Care Innovation, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.,The Center for Health Equity Research and Promotion-Philadelphia Veterans Affairs Medical Center, Philadelphia, Pennsylvania, United States of America.,The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Patrick Crutchley
- Penn Medicine Center for Digital Health, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.,Positive Psychology Center, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Lyle H Ungar
- Penn Medicine Center for Digital Health, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.,The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.,Positive Psychology Center, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.,Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Sharath C Guntuku
- Penn Medicine Center for Digital Health, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.,Department of Emergency Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.,Penn Medicine Center for Health Care Innovation, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.,Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Johannes C Eichstaedt
- Positive Psychology Center, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Shawndra Hill
- Penn Medicine Center for Digital Health, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.,Microsoft Research, New York, New York, United States of America
| | - Kevin Padrez
- Penn Medicine Center for Digital Health, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Robert J Smith
- Penn Medicine Center for Digital Health, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - H Andrew Schwartz
- Penn Medicine Center for Digital Health, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.,Positive Psychology Center, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.,Department of Computer Science, Stony Brook University, Stony Brook, New York, United States of America
| |
Collapse
|
27
|
Pang D, Eichstaedt JC, Buffone A, Slaff B, Ruch W, Ungar LH. The language of character strengths: Predicting morally valued traits on social media. J Pers 2019; 88:287-306. [PMID: 31107975 PMCID: PMC7065131 DOI: 10.1111/jopy.12491] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2018] [Revised: 05/07/2019] [Accepted: 05/14/2019] [Indexed: 11/27/2022]
Abstract
OBJECTIVE Social media is increasingly being used to study psychological constructs. This study is the first to use Twitter language to investigate the 24 Values in Action Inventory of Character Strengths, which have been shown to predict important life domains such as well-being. METHOD We use both a top-down closed-vocabulary (Linguistic Inquiry and Word Count) and a data-driven open-vocabulary (Differential Language Analysis) approach to analyze 3,937,768 tweets from 4,423 participants (64.3% female), who answered a 240-item survey on character strengths. RESULTS We present the language profiles of (a) a global positivity factor accounting for 36% of the variances in the strengths, and (b) each of the 24 individual strengths, for which we find largely face-valid language associations. Machine learning models trained on language data to predict character strengths reach out-of-sample prediction accuracies comparable to previous work on personality (rmedian = 0.28, ranging from 0.13 to 0.51). CONCLUSIONS The findings suggest that Twitter can be used to characterize and predict character strengths. This technique could be used to measure the character strengths of large populations unobtrusively and cost-effectively.
Collapse
Affiliation(s)
- Dandan Pang
- Department of Work and Organizational Psychology, University of Bern, Bern, Switzerland.,Personality and Assessment, Department of Psychology, University of Zurich, Zurich, Switzerland
| | | | - Anneke Buffone
- Positive Psychology Center, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Barry Slaff
- Positive Psychology Center, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Willibald Ruch
- Personality and Assessment, Department of Psychology, University of Zurich, Zurich, Switzerland
| | - Lyle H Ungar
- Positive Psychology Center, University of Pennsylvania, Philadelphia, Pennsylvania.,Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania
| |
Collapse
|
28
|
Abstract
Fluctuations in mood states are driven by unpredictable outcomes in daily life but also appear to drive consequential behaviors such as risk-taking. However, our understanding of the relationships between unexpected outcomes, mood, and risk-taking behavior has relied primarily upon constrained and artificial laboratory settings. Here we examine, using naturalistic datasets, how real-world unexpected outcomes predict mood state changes observable at the level of a city, in turn predicting changes in gambling behavior. By analyzing day-to-day mood language extracted from 5.2 million location-specific and public Twitter posts or 'tweets', we examine how real-world 'prediction errors'-local outcomes that deviate positively from expectations-predict day-to-day mood states observable at the level of a city. These mood states in turn predicted increased per-person lottery gambling rates, revealing how interplay between prediction errors, moods, and risky decision-making unfolds in the real world. Our results underscore how social media and naturalistic datasets can uniquely allow us to understand consequential psychological phenomena.
Collapse
Affiliation(s)
- A. Ross Otto
- Department of Psychology, McGill University, Montréal, Québec, Canada
| | - Johannes C. Eichstaedt
- Positive Psychology Center, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
29
|
Eichstaedt JC, Smith RJ, Merchant RM, Ungar LH, Crutchley P, Preoţiuc-Pietro D, Asch DA, Schwartz HA. Facebook language predicts depression in medical records. Proc Natl Acad Sci U S A 2018; 115:11203-11208. [PMID: 30322910 PMCID: PMC6217418 DOI: 10.1073/pnas.1802331115] [Citation(s) in RCA: 190] [Impact Index Per Article: 31.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Depression, the most prevalent mental illness, is underdiagnosed and undertreated, highlighting the need to extend the scope of current screening methods. Here, we use language from Facebook posts of consenting individuals to predict depression recorded in electronic medical records. We accessed the history of Facebook statuses posted by 683 patients visiting a large urban academic emergency department, 114 of whom had a diagnosis of depression in their medical records. Using only the language preceding their first documentation of a diagnosis of depression, we could identify depressed patients with fair accuracy [area under the curve (AUC) = 0.69], approximately matching the accuracy of screening surveys benchmarked against medical records. Restricting Facebook data to only the 6 months immediately preceding the first documented diagnosis of depression yielded a higher prediction accuracy (AUC = 0.72) for those users who had sufficient Facebook data. Significant prediction of future depression status was possible as far as 3 months before its first documentation. We found that language predictors of depression include emotional (sadness), interpersonal (loneliness, hostility), and cognitive (preoccupation with the self, rumination) processes. Unobtrusive depression assessment through social media of consenting individuals may become feasible as a scalable complement to existing screening and monitoring procedures.
Collapse
Affiliation(s)
| | - Robert J Smith
- Penn Medicine Center for Digital Health, University of Pennsylvania, Philadelphia, PA 19104
| | - Raina M Merchant
- Penn Medicine Center for Digital Health, University of Pennsylvania, Philadelphia, PA 19104
- Department of Emergency Medicine, Perelman School of Medicine, University of Pennsylvania, PA 19104
| | - Lyle H Ungar
- Positive Psychology Center, University of Pennsylvania, Philadelphia, PA 19104
- Penn Medicine Center for Digital Health, University of Pennsylvania, Philadelphia, PA 19104
| | - Patrick Crutchley
- Positive Psychology Center, University of Pennsylvania, Philadelphia, PA 19104
- Penn Medicine Center for Digital Health, University of Pennsylvania, Philadelphia, PA 19104
| | | | - David A Asch
- Penn Medicine Center for Digital Health, University of Pennsylvania, Philadelphia, PA 19104
- The Center for Health Equity Research and Promotion, Philadelphia Veterans Affairs Medical Center, Philadelphia, PA 19104
| | - H Andrew Schwartz
- Computer Science Department, Stony Brook University, Stony Brook, NY 11794
| |
Collapse
|
30
|
Yaden DB, Eichstaedt JC, Medaglia JD. The Future of Technology in Positive Psychology: Methodological Advances in the Science of Well-Being. Front Psychol 2018; 9:962. [PMID: 29967586 PMCID: PMC6016018 DOI: 10.3389/fpsyg.2018.00962] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Accepted: 05/24/2018] [Indexed: 01/07/2023] Open
Abstract
Advances in biotechnology and information technology are poised to transform well-being research. This article reviews the technologies that we predict will have the most impact on both measurement and intervention in the field of positive psychology over the next decade. These technologies include: psychopharmacology, non-invasive brain stimulation, virtual reality environments, and big-data methods for large-scale multivariate analysis. Some particularly relevant potential costs and benefits to individual and collective well-being are considered for each technology as well as ethical considerations. As these technologies may substantially enhance the capacity of psychologists to intervene on and measure well-being, now is the time to discuss the potential promise and pitfalls of these technologies.
Collapse
Affiliation(s)
- David B. Yaden
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, United States
| | | | - John D. Medaglia
- Department of Neurology, University of Pennsylvania, Philadelphia, PA, United States
- Department of Psychology, Drexel University, Philadelphia, PA, United States
| |
Collapse
|
31
|
Guntuku SC, Yaden DB, Kern ML, Ungar LH, Eichstaedt JC. Detecting depression and mental illness on social media: an integrative review. Curr Opin Behav Sci 2017. [DOI: 10.1016/j.cobeha.2017.07.005] [Citation(s) in RCA: 239] [Impact Index Per Article: 34.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
32
|
Yaden DB, Eichstaedt JC, Kern ML, Smith LK, Buffone A, Stillwell DJ, Kosinski M, Ungar LH, Seligman MEP, Schwartz HA. The Language of Religious Affiliation. Social Psychological and Personality Science 2017. [DOI: 10.1177/1948550617711228] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Religious affiliation is an important identifying characteristic for many individuals and relates to numerous life outcomes including health, well-being, policy positions, and cognitive style. Using methods from computational linguistics, we examined language from 12,815 Facebook users in the United States and United Kingdom who indicated their religious affiliation. Religious individuals used more positive emotion words ( β = .278, p < .0001) and social themes such as family ( β = .242, p < .0001), while nonreligious people expressed more negative emotions like anger ( β = −.427, p < .0001) and categories related to cognitive processes, like tentativeness ( β = −.153, p < .0001). Nonreligious individuals also used more themes related to the body ( β = −.265, p < .0001) and death ( β = −.247, p < .0001). The findings offer directions for future research on religious affiliation, specifically in terms of social, emotional, and cognitive differences.
Collapse
Affiliation(s)
- David B. Yaden
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
| | | | - Margaret L. Kern
- Melbourne Graduate School of Education, University of Melbourne, Melbourne, Victoria, Australia
| | - Laura K. Smith
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
| | - Anneke Buffone
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
| | - David J. Stillwell
- Psychometrics Centre, University of Cambridge, Cambridge, United Kingdom
| | - Michal Kosinski
- Stanford Graduate School of Business, Stanford University, Stanford, CA, USA
| | - Lyle H. Ungar
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
| | | | - H. Andrew Schwartz
- Computer Science, Stony Brook University, The State University of New York, NY, USA
| |
Collapse
|
33
|
Yaden DB, Le Nguyen KD, Kern ML, Wintering NA, Eichstaedt JC, Schwartz HA, Buffone AEK, Smith LK, Waldman MR, Hood RW, Newberg AB. The noetic quality: A multimethod exploratory study. Psychology of Consciousness: Theory, Research, and Practice 2017. [DOI: 10.1037/cns0000098] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
34
|
Yaden DB, Le Nguyen KD, Kern ML, Belser AB, Eichstaedt JC, Iwry J, Smith ME, Wintering NA, Hood RW, Newberg AB. Of Roots and Fruits: A Comparison of Psychedelic and Nonpsychedelic Mystical Experiences. Journal of Humanistic Psychology 2016. [DOI: 10.1177/0022167816674625] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Experiences of profound existential or spiritual significance can be triggered reliably through psychopharmacological means using psychedelic substances. However, little is known about the benefits of religious, spiritual, or mystical experiences (RSMEs) prompted by psychedelic substances, as compared with those that occur through other means. In this study, 739 self-selected participants reported the psychological impact of their RSMEs and indicated whether they were induced by a psychedelic substance. Experiences induced by psychedelic substances were rated as more intensely mystical ( d = .75, p < .001), resulted in a reduced fear of death ( d = .21, p < .01), increased sense of purpose ( d = .18, p < .05), and increased spirituality ( d = .28, p < .001) as compared with nonpsychedelically triggered RSMEs. These results remained significant in an expanded model controlling for gender, education, socioeconomic status, and religious affiliation. These findings lend support to the growing consensus that RSMEs induced with psychedelic substances are genuinely mystical and generally positive in outcome.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Ralph W. Hood
- University of Tennessee at Chattanooga, Chattanooga, TN, USA
| | | |
Collapse
|
35
|
Kern ML, Park G, Eichstaedt JC, Schwartz HA, Sap M, Smith LK, Ungar LH. Gaining insights from social media language: Methodologies and challenges. Psychol Methods 2016; 21:507-525. [PMID: 27505683 DOI: 10.1037/met0000091] [Citation(s) in RCA: 69] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Language data available through social media provide opportunities to study people at an unprecedented scale. However, little guidance is available to psychologists who want to enter this area of research. Drawing on tools and techniques developed in natural language processing, we first introduce psychologists to social media language research, identifying descriptive and predictive analyses that language data allow. Second, we describe how raw language data can be accessed and quantified for inclusion in subsequent analyses, exploring personality as expressed on Facebook to illustrate. Third, we highlight challenges and issues to be considered, including accessing and processing the data, interpreting effects, and ethical issues. Social media has become a valuable part of social life, and there is much we can learn by bringing together the tools of computer science with the theories and insights of psychology. (PsycINFO Database Record
Collapse
Affiliation(s)
| | - Gregory Park
- Department of Psychology, University of Pennsylvania
| | | | - H Andrew Schwartz
- Department of Computer & Information Science, University of Pennsylvania
| | - Maarten Sap
- Department of Psychology, University of Pennsylvania
| | - Laura K Smith
- Department of Psychology, University of Pennsylvania
| | - Lyle H Ungar
- Department of Computer & Information Science, University of Pennsylvania
| |
Collapse
|
36
|
Park G, Yaden DB, Schwartz HA, Kern ML, Eichstaedt JC, Kosinski M, Stillwell D, Ungar LH, Seligman MEP. Women are Warmer but No Less Assertive than Men: Gender and Language on Facebook. PLoS One 2016; 11:e0155885. [PMID: 27223607 PMCID: PMC4881750 DOI: 10.1371/journal.pone.0155885] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2015] [Accepted: 05/05/2016] [Indexed: 11/30/2022] Open
Abstract
Using a large social media dataset and open-vocabulary methods from computational linguistics, we explored differences in language use across gender, affiliation, and assertiveness. In Study 1, we analyzed topics (groups of semantically similar words) across 10 million messages from over 52,000 Facebook users. Most language differed little across gender. However, topics most associated with self-identified female participants included friends, family, and social life, whereas topics most associated with self-identified male participants included swearing, anger, discussion of objects instead of people, and the use of argumentative language. In Study 2, we plotted male- and female-linked language topics along two interpersonal dimensions prevalent in gender research: affiliation and assertiveness. In a sample of over 15,000 Facebook users, we found substantial gender differences in the use of affiliative language and slight differences in assertive language. Language used more by self-identified females was interpersonally warmer, more compassionate, polite, and—contrary to previous findings—slightly more assertive in their language use, whereas language used more by self-identified males was colder, more hostile, and impersonal. Computational linguistic analysis combined with methods to automatically label topics offer means for testing psychological theories unobtrusively at large scale.
Collapse
Affiliation(s)
- Gregory Park
- Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - David Bryce Yaden
- Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- * E-mail:
| | - H. Andrew Schwartz
- Computer Science Department, Stony Brook University, Stony Brook, New York, United States of America
| | - Margaret L. Kern
- Graduate School of Education, University of Melbourne, Victoria, Australia
| | - Johannes C. Eichstaedt
- Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Michael Kosinski
- Psychometrics Centre, University of Cambridge, Cambridge, United Kingdom
| | - David Stillwell
- Psychometrics Centre, University of Cambridge, Cambridge, United Kingdom
| | - Lyle H. Ungar
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Martin E. P. Seligman
- Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
37
|
Park G, Schwartz HA, Sap M, Kern ML, Weingarten E, Eichstaedt JC, Berger J, Stillwell DJ, Kosinski M, Ungar LH, Seligman MEP. Living in the Past, Present, and Future: Measuring Temporal Orientation With Language. J Pers 2016; 85:270-280. [DOI: 10.1111/jopy.12239] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
38
|
Schwartz HA, Sap M, Kern ML, Eichstaedt JC, Kapelner A, Agrawal M, Blanco E, Dziurzynski L, Park G, Stillwell D, Kosinski M, Seligman MEP, Ungar LH. PREDICTING INDIVIDUAL WELL-BEING THROUGH THE LANGUAGE OF SOCIAL MEDIA. Pac Symp Biocomput 2016; 21:516-527. [PMID: 26776214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
We present the task of predicting individual well-being, as measured by a life satisfaction scale, through the language people use on social media. Well-being, which encompasses much more than emotion and mood, is linked with good mental and physical health. The ability to quickly and accurately assess it can supplement multi-million dollar national surveys as well as promote whole body health. Through crowd-sourced ratings of tweets and Facebook status updates, we create message-level predictive models for multiple components of well-being. However, well-being is ultimately attributed to people, so we perform an additional evaluation at the user-level, finding that a multi-level cascaded model, using both message-level predictions and userlevel features, performs best and outperforms popular lexicon-based happiness models. Finally, we suggest that analyses of language go beyond prediction by identifying the language that characterizes well-being.
Collapse
Affiliation(s)
- H Andrew Schwartz
- Stony Brook University, Computer Science, Stony Brook, NY 11794, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Abstract
Countless studies have addressed why some individuals achieve more than others. Nevertheless, the psychology of achievement lacks a unifying conceptual framework for synthesizing these empirical insights. We propose organizing achievement-related traits by two possible mechanisms of action: Traits that determine the rate at which an individual learns a skill are talent variables and can be distinguished conceptually from traits that determine the effort an individual puts forth. This approach takes inspiration from Newtonian mechanics: achievement is akin to distance traveled, effort to time, skill to speed, and talent to acceleration. A novel prediction from this model is that individual differences in effort (but not talent) influence achievement (but not skill) more substantially over longer (rather than shorter) time intervals. Conceptualizing skill as the multiplicative product of talent and effort, and achievement as the multiplicative product of skill and effort, advances similar, but less formal, propositions by several important earlier thinkers.
Collapse
|
40
|
Eichstaedt JC, Schwartz HA, Kern ML, Park G, Labarthe DR, Merchant RM, Jha S, Agrawal M, Dziurzynski LA, Sap M, Weeg C, Larson EE, Ungar LH, Seligman MEP. Psychological language on Twitter predicts county-level heart disease mortality. Psychol Sci 2015; 26:159-69. [PMID: 25605707 DOI: 10.1177/0956797614557867] [Citation(s) in RCA: 194] [Impact Index Per Article: 21.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Hostility and chronic stress are known risk factors for heart disease, but they are costly to assess on a large scale. We used language expressed on Twitter to characterize community-level psychological correlates of age-adjusted mortality from atherosclerotic heart disease (AHD). Language patterns reflecting negative social relationships, disengagement, and negative emotions-especially anger-emerged as risk factors; positive emotions and psychological engagement emerged as protective factors. Most correlations remained significant after controlling for income and education. A cross-sectional regression model based only on Twitter language predicted AHD mortality significantly better than did a model that combined 10 common demographic, socioeconomic, and health risk factors, including smoking, diabetes, hypertension, and obesity. Capturing community psychological characteristics through social media is feasible, and these characteristics are strong markers of cardiovascular mortality at the community level.
Collapse
Affiliation(s)
| | - Hansen Andrew Schwartz
- Department of Psychology, University of Pennsylvania Department of Computer and Information Science, University of Pennsylvania
| | - Margaret L Kern
- Department of Psychology, University of Pennsylvania Graduate School of Education, University of Melbourne
| | - Gregory Park
- Department of Psychology, University of Pennsylvania
| | | | | | - Sneha Jha
- Department of Computer and Information Science, University of Pennsylvania
| | - Megha Agrawal
- Department of Computer and Information Science, University of Pennsylvania
| | | | - Maarten Sap
- Department of Psychology, University of Pennsylvania
| | | | | | - Lyle H Ungar
- Department of Psychology, University of Pennsylvania Department of Computer and Information Science, University of Pennsylvania
| | | |
Collapse
|
41
|
Park G, Schwartz HA, Eichstaedt JC, Kern ML, Kosinski M, Stillwell DJ, Ungar LH, Seligman MEP. Automatic personality assessment through social media language. J Pers Soc Psychol 2014; 108:934-52. [PMID: 25365036 DOI: 10.1037/pspp0000020] [Citation(s) in RCA: 194] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Language use is a psychologically rich, stable individual difference with well-established correlations to personality. We describe a method for assessing personality using an open-vocabulary analysis of language from social media. We compiled the written language from 66,732 Facebook users and their questionnaire-based self-reported Big Five personality traits, and then we built a predictive model of personality based on their language. We used this model to predict the 5 personality factors in a separate sample of 4,824 Facebook users, examining (a) convergence with self-reports of personality at the domain- and facet-level; (b) discriminant validity between predictions of distinct traits; (c) agreement with informant reports of personality; (d) patterns of correlations with external criteria (e.g., number of friends, political attitudes, impulsiveness); and (e) test-retest reliability over 6-month intervals. Results indicated that language-based assessments can constitute valid personality measures: they agreed with self-reports and informant reports of personality, added incremental validity over informant reports, adequately discriminated between traits, exhibited patterns of correlations with external criteria similar to those found with self-reported personality, and were stable over 6-month intervals. Analysis of predictive language can provide rich portraits of the mental life associated with traits. This approach can complement and extend traditional methods, providing researchers with an additional measure that can quickly and cheaply assess large groups of participants with minimal burden.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Lyle H Ungar
- Computer & Information Science, University of Pennsylvania
| | | |
Collapse
|
42
|
Kern ML, Eichstaedt JC, Schwartz HA, Dziurzynski L, Ungar LH, Stillwell DJ, Kosinski M, Ramones SM, Seligman MEP. The online social self: an open vocabulary approach to personality. Assessment 2013; 21:158-69. [PMID: 24322010 DOI: 10.1177/1073191113514104] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
OBJECTIVE We present a new open language analysis approach that identifies and visually summarizes the dominant naturally occurring words and phrases that most distinguished each Big Five personality trait. METHOD Using millions of posts from 69,792 Facebook users, we examined the correlation of personality traits with online word usage. Our analysis method consists of feature extraction, correlational analysis, and visualization. RESULTS The distinguishing words and phrases were face valid and provide insight into processes that underlie the Big Five traits. CONCLUSION Open-ended data driven exploration of large datasets combined with established psychological theory and measures offers new tools to further understand the human psyche.
Collapse
Affiliation(s)
| | | | | | | | - Lyle H Ungar
- University of Pennsylvania, Philadelphia, PA, USA
| | | | | | | | | |
Collapse
|
43
|
Kern ML, Eichstaedt JC, Schwartz HA, Park G, Ungar LH, Stillwell DJ, Kosinski M, Dziurzynski L, Seligman MEP. From "Sooo excited!!!" to "So proud": using language to study development. Dev Psychol 2013; 50:178-88. [PMID: 24274726 DOI: 10.1037/a0035048] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
We introduce a new method, differential language analysis (DLA), for studying human development in which computational linguistics are used to analyze the big data available through online social media in light of psychological theory. Our open vocabulary DLA approach finds words, phrases, and topics that distinguish groups of people based on 1 or more characteristics. Using a data set of over 70,000 Facebook users, we identify how word and topic use vary as a function of age and compile cohort specific words and phrases into visual summaries that are face valid and intuitively meaningful. We demonstrate how this methodology can be used to test developmental hypotheses, using the aging positivity effect (Carstensen & Mikels, 2005) as an example. While in this study we focused primarily on common trends across age-related cohorts, the same methodology can be used to explore heterogeneity within developmental stages or to explore other characteristics that differentiate groups of people. Our comprehensive list of words and topics is available on our web site for deeper exploration by the research community.
Collapse
Affiliation(s)
| | | | - H Andrew Schwartz
- Department of Computer and Information Science, University of Pennsylvania
| | - Gregory Park
- Department of Psychology, University of Pennsylvania
| | - Lyle H Ungar
- Department of Computer and Information Science, University of Pennsylvania
| | | | | | | | | |
Collapse
|
44
|
Schwartz HA, Eichstaedt JC, Kern ML, Dziurzynski L, Ramones SM, Agrawal M, Shah A, Kosinski M, Stillwell D, Seligman MEP, Ungar LH. Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS One 2013; 8:e73791. [PMID: 24086296 PMCID: PMC3783449 DOI: 10.1371/journal.pone.0073791] [Citation(s) in RCA: 376] [Impact Index Per Article: 34.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2013] [Accepted: 07/29/2013] [Indexed: 11/19/2022] Open
Abstract
We analyzed 700 million words, phrases, and topic instances collected from the Facebook messages of 75,000 volunteers, who also took standard personality tests, and found striking variations in language with personality, gender, and age. In our open-vocabulary technique, the data itself drives a comprehensive exploration of language that distinguishes people, finding connections that are not captured with traditional closed-vocabulary word-category analyses. Our analyses shed new light on psychosocial processes yielding results that are face valid (e.g., subjects living in high elevations talk about the mountains), tie in with other research (e.g., neurotic people disproportionately use the phrase 'sick of' and the word 'depressed'), suggest new hypotheses (e.g., an active life implies emotional stability), and give detailed insights (males use the possessive 'my' when mentioning their 'wife' or 'girlfriend' more often than females use 'my' with 'husband' or 'boyfriend'). To date, this represents the largest study, by an order of magnitude, of language and personality.
Collapse
Affiliation(s)
- H. Andrew Schwartz
- Positive Psychology Center, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Computer & Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Johannes C. Eichstaedt
- Positive Psychology Center, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Margaret L. Kern
- Positive Psychology Center, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Lukasz Dziurzynski
- Positive Psychology Center, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Stephanie M. Ramones
- Positive Psychology Center, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Megha Agrawal
- Positive Psychology Center, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Computer & Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Achal Shah
- Computer & Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Michal Kosinski
- The Psychometrics Centre, University of Cambridge, Cambridge, United Kingdom
| | - David Stillwell
- The Psychometrics Centre, University of Cambridge, Cambridge, United Kingdom
| | - Martin E. P. Seligman
- Positive Psychology Center, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Lyle H. Ungar
- Computer & Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|