1
|
Quillivic R, Gayraud F, Auxéméry Y, Vanni L, Peschanski D, Eustache F, Dayan J, Mesmoudi S. Interdisciplinary approach to identify language markers for post-traumatic stress disorder using machine learning and deep learning. Sci Rep 2024; 14:12468. [PMID: 38816468 PMCID: PMC11139884 DOI: 10.1038/s41598-024-61557-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 05/07/2024] [Indexed: 06/01/2024] Open
Abstract
Post-traumatic stress disorder (PTSD) lacks clear biomarkers in clinical practice. Language as a potential diagnostic biomarker for PTSD is investigated in this study. We analyze an original cohort of 148 individuals exposed to the November 13, 2015, terrorist attacks in Paris. The interviews, conducted 5-11 months after the event, include individuals from similar socioeconomic backgrounds exposed to the same incident, responding to identical questions and using uniform PTSD measures. Using this dataset to collect nuanced insights that might be clinically relevant, we propose a three-step interdisciplinary methodology that integrates expertise from psychiatry, linguistics, and the Natural Language Processing (NLP) community to examine the relationship between language and PTSD. The first step assesses a clinical psychiatrist's ability to diagnose PTSD using interview transcription alone. The second step uses statistical analysis and machine learning models to create language features based on psycholinguistic hypotheses and evaluate their predictive strength. The third step is the application of a hypothesis-free deep learning approach to the classification of PTSD in our cohort. Results show that the clinical psychiatrist achieved a diagnosis of PTSD with an AUC of 0.72. This is comparable to a gold standard questionnaire (Area Under Curve (AUC) ≈ 0.80). The machine learning model achieved a diagnostic AUC of 0.69. The deep learning approach achieved an AUC of 0.64. An examination of model error informs our discussion. Importantly, the study controls for confounding factors, establishes associations between language and DSM-5 subsymptoms, and integrates automated methods with qualitative analysis. This study provides a direct and methodologically robust description of the relationship between PTSD and language. Our work lays the groundwork for advancing early and accurate diagnosis and using linguistic markers to assess the effectiveness of pharmacological treatments and psychotherapies.
Collapse
Affiliation(s)
- Robin Quillivic
- PSL-EPHE, Paris, France.
- ISCPIF, Institut des Systèmes Complexes, Paris île-de-France, France.
| | - Frédérique Gayraud
- Laboratoire dynamique du langage, UMR 5596, CNRS, université ́ Lyon-II, Lyon, France
| | - Yann Auxéméry
- Centre Hospitalier de Jury-les-Metz, centre de réhabilitation pour adultes, Metz, France
- UMR 1319 Inspiire, INSERM, Université de Lorraine, 9 avenue de la forêt de Haye, Nancy, France
| | - Laurent Vanni
- CNRS, UMR 7320 : Bases, Corpus, Langage, Nice, France
| | - Denis Peschanski
- Université PARIS 1 Panthéon-Sorbonne, Paris, France
- CNRS, CESSP, UMR 8209, Paris, France
| | - Francis Eustache
- PSL-EPHE, Paris, France
- INSERM, NIMH U1077, Caen, France
- UNICAEN, Caen, France
| | - Jacques Dayan
- PSL-EPHE, Paris, France
- INSERM, NIMH U1077, Caen, France
- UNICAEN, Caen, France
- CHU de Rennes, Rennes, France
| | - Salma Mesmoudi
- PSL-EPHE, Paris, France
- ISCPIF, Institut des Systèmes Complexes, Paris île-de-France, France
- Université PARIS 1 Panthéon-Sorbonne, Paris, France
- CNRS, CESSP, UMR 8209, Paris, France
| |
Collapse
|
2
|
Mangalik S, Eichstaedt JC, Giorgi S, Mun J, Ahmed F, Gill G, V Ganesan A, Subrahmanya S, Soni N, Clouston SAP, Schwartz HA. Robust language-based mental health assessments in time and space through social media. NPJ Digit Med 2024; 7:109. [PMID: 38698174 PMCID: PMC11065872 DOI: 10.1038/s41746-024-01100-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Accepted: 04/04/2024] [Indexed: 05/05/2024] Open
Abstract
In the most comprehensive population surveys, mental health is only broadly captured through questionnaires asking about "mentally unhealthy days" or feelings of "sadness." Further, population mental health estimates are predominantly consolidated to yearly estimates at the state level, which is considerably coarser than the best estimates of physical health. Through the large-scale analysis of social media, robust estimation of population mental health is feasible at finer resolutions. In this study, we created a pipeline that used ~1 billion Tweets from 2 million geo-located users to estimate mental health levels and changes for depression and anxiety, the two leading mental health conditions. Language-based mental health assessments (LBMHAs) had substantially higher levels of reliability across space and time than available survey measures. This work presents reliable assessments of depression and anxiety down to the county-weeks level. Where surveys were available, we found moderate to strong associations between the LBMHAs and survey scores for multiple levels of granularity, from the national level down to weekly county measurements (fixed effects β = 0.34 to 1.82; p < 0.001). LBMHAs demonstrated temporal validity, showing clear absolute increases after a list of major societal events (+23% absolute change for depression assessments). LBMHAs showed improved external validity, evidenced by stronger correlations with measures of health and socioeconomic status than population surveys. This study shows that the careful aggregation of social media data yields spatiotemporal estimates of population mental health that exceed the granularity achievable by existing population surveys, and does so with generally greater reliability and validity.
Collapse
Affiliation(s)
- Siddharth Mangalik
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA.
| | - Johannes C Eichstaedt
- Department of Psychology, Stanford University, Stanford, CA, USA.
- Institute for Human-Centered A.I., Stanford University, Stanford, CA, USA.
| | - Salvatore Giorgi
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, USA
| | - Jihu Mun
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - Farhan Ahmed
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - Gilvir Gill
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - Adithya V Ganesan
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | | | - Nikita Soni
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - Sean A P Clouston
- Department of Family, Population, and Preventive Medicine, Renaissance School of Medicine, Stony Brook University, Stony Brook, NY, USA
| | - H Andrew Schwartz
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA.
| |
Collapse
|
3
|
Weisenburger RL, Mullarkey MC, Labrada J, Labrousse D, Yang MY, MacPherson AH, Hsu KJ, Ugail H, Shumake J, Beevers CG. Conversational assessment using artificial intelligence is as clinically useful as depression scales and preferred by users. J Affect Disord 2024; 351:489-498. [PMID: 38290584 DOI: 10.1016/j.jad.2024.01.212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 01/15/2024] [Accepted: 01/22/2024] [Indexed: 02/01/2024]
Abstract
BACKGROUND Depression is prevalent, chronic, and burdensome. Due to limited screening access, depression often remains undiagnosed. Artificial intelligence (AI) models based on spoken responses to interview questions may offer an effective, efficient alternative to other screening methods. OBJECTIVE The primary aim was to use a demographically diverse sample to validate an AI model, previously trained on human-administered interviews, on novel bot-administered interviews, and to check for algorithmic biases related to age, sex, race, and ethnicity. METHODS Using the Aiberry app, adults recruited via social media (N = 393) completed a brief bot-administered interview and a depression self-report form. An AI model was used to predict form scores based on interview responses alone. For all meaningful discrepancies between model inference and form score, clinicians performed a masked review to determine which one they preferred. RESULTS There was strong concurrent validity between the model predictions and raw self-report scores (r = 0.73, MAE = 3.3). 90 % of AI predictions either agreed with self-report or with clinical expert opinion when AI contradicted self-report. There was no differential model performance across age, sex, race, or ethnicity. LIMITATIONS Limitations include access restrictions (English-speaking ability and access to smartphone or computer with broadband internet) and potential self-selection of participants more favorably predisposed toward AI technology. CONCLUSION The Aiberry model made accurate predictions of depression severity based on remotely collected spoken responses to a bot-administered interview. This study shows promising results for the use of AI as a mental health screening tool on par with self-report measures.
Collapse
Affiliation(s)
- Rachel L Weisenburger
- Department of Psychology and Institute for Mental Health Research, The University of Texas at Austin, United States of America.
| | | | | | - Daniel Labrousse
- Department of Psychiatry, Georgetown University Medical Center, United States of America
| | - Michelle Y Yang
- Department of Psychiatry, Georgetown University Medical Center, United States of America
| | - Allison Huff MacPherson
- Department of Family and Community Medicine, College of Medicine, University of Arizona, United States of America
| | - Kean J Hsu
- Department of Psychiatry, Georgetown University Medical Center, United States of America; Department of Psychology, National University of Singapore, Singapore
| | - Hassan Ugail
- Centre for Visual Computing, University of Bradford, United Kingdom of Great Britain and Northern Ireland
| | | | - Christopher G Beevers
- Department of Psychology and Institute for Mental Health Research, The University of Texas at Austin, United States of America
| |
Collapse
|
4
|
Kjell ONE, Kjell K, Schwartz HA. Beyond rating scales: With targeted evaluation, large language models are poised for psychological assessment. Psychiatry Res 2024; 333:115667. [PMID: 38290286 DOI: 10.1016/j.psychres.2023.115667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Revised: 12/03/2023] [Accepted: 12/05/2023] [Indexed: 02/01/2024]
Abstract
In this narrative review, we survey recent empirical evaluations of AI-based language assessments and present a case for the technology of large language models to be poised for changing standardized psychological assessment. Artificial intelligence has been undergoing a purported "paradigm shift" initiated by new machine learning models, large language models (e.g., BERT, LAMMA, and that behind ChatGPT). These models have led to unprecedented accuracy over most computerized language processing tasks, from web searches to automatic machine translation and question answering, while their dialogue-based forms, like ChatGPT have captured the interest of over a million users. The success of the large language model is mostly attributed to its capability to numerically represent words in their context, long a weakness of previous attempts to automate psychological assessment from language. While potential applications for automated therapy are beginning to be studied on the heels of chatGPT's success, here we present evidence that suggests, with thorough validation of targeted deployment scenarios, that AI's newest technology can move mental health assessment away from rating scales and to instead use how people naturally communicate, in language.
Collapse
Affiliation(s)
- Oscar N E Kjell
- Psychology Department, Lund University, Sweden; Computer Science Department, Stony Brook University, United States.
| | | | - H Andrew Schwartz
- Psychology Department, Lund University, Sweden; Computer Science Department, Stony Brook University, United States
| |
Collapse
|
5
|
Werntz A, O'Shea BA, Sjobeck G, Howell J, Lindgren KP, Teachman BA. Implicit and explicit COVID-19 associations and mental health in the United States: a large-scale examination and replication. ANXIETY, STRESS, AND COPING 2023; 36:690-709. [PMID: 36757678 PMCID: PMC10409876 DOI: 10.1080/10615806.2023.2176486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 12/19/2022] [Accepted: 01/30/2023] [Indexed: 02/10/2023]
Abstract
BACKGROUND Given the sensitive nature of COVID-19 beliefs, evaluating them explicitly and implicitly may provide a fuller picture of how these beliefs vary based on identities and how they relate to mental health. OBJECTIVE Three novel brief implicit association tests (BIATs) were created and evaluated: two that measured COVID-19-as-dangerous (vs. safe) and one that measured COVID-19 precautions-as-necessary (vs. unnecessary). Implicit and explicit COVID-19 associations were examined based on individuals' demographic characteristics. Implicit associations were hypothesized to uniquely contribute to individuals' self-reports of mental health. METHODS Participants (N = 13,413 US residents; April-November 2020) were volunteers for a COVID-19 study. Participants completed one BIAT and self-report measures. This was a preregistered study with a planned internal replication. RESULTS Results revealed older age was weakly associated with stronger implicit and explicit associations of COVID-as-dangerous and precautions-as-necessary. Black and Asian individuals reported greater necessity of taking precautions than White individuals (with small-to-medium effects); greater education was associated with greater explicit reports of COVID-19-as-dangerous and precautions-as-necessary with small effects. Replicated relationships between COVID-as-dangerous explicit associations and mental health had very small effects. CONCLUSIONS Implicit associations did not predict mental health but there was evidence that stronger COVID-19-as-dangerous explicit associations are weakly associated with worse mental health.
Collapse
Affiliation(s)
- Alexandra Werntz
- Department of Psychology, University of Virginia, Charlottesville, VA, USA
- Center for Evidence-Based Mentoring, University of Massachusetts Boston, Boston, MA, USA
| | - Brian A O'Shea
- School of Psychology, University of Nottingham, Nottingham, United Kingdom
- Department of Psychology, Harvard University, Cambridge, MA, USA
| | - Gustav Sjobeck
- Department of Psychology, University of Virginia, Charlottesville, VA, USA
| | - Jennifer Howell
- Psychological Sciences, Health Sciences Research Institute, University of California Merced, CA, USA
| | - Kristen P Lindgren
- Trauma Recovery & Resilience Innovations, Department of Psychiatry & Behavioral Sciences, University of Washington, Seattle, WA, USA
| | - Bethany A Teachman
- Department of Psychology, University of Virginia, Charlottesville, VA, USA
| |
Collapse
|
6
|
Malgaroli M, Hull TD, Zech JM, Althoff T. Natural language processing for mental health interventions: a systematic review and research framework. Transl Psychiatry 2023; 13:309. [PMID: 37798296 PMCID: PMC10556019 DOI: 10.1038/s41398-023-02592-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 08/31/2023] [Accepted: 09/04/2023] [Indexed: 10/07/2023] Open
Abstract
Neuropsychiatric disorders pose a high societal cost, but their treatment is hindered by lack of objective outcomes and fidelity metrics. AI technologies and specifically Natural Language Processing (NLP) have emerged as tools to study mental health interventions (MHI) at the level of their constituent conversations. However, NLP's potential to address clinical and research challenges remains unclear. We therefore conducted a pre-registered systematic review of NLP-MHI studies using PRISMA guidelines (osf.io/s52jh) to evaluate their models, clinical applications, and to identify biases and gaps. Candidate studies (n = 19,756), including peer-reviewed AI conference manuscripts, were collected up to January 2023 through PubMed, PsycINFO, Scopus, Google Scholar, and ArXiv. A total of 102 articles were included to investigate their computational characteristics (NLP algorithms, audio features, machine learning pipelines, outcome metrics), clinical characteristics (clinical ground truths, study samples, clinical focus), and limitations. Results indicate a rapid growth of NLP MHI studies since 2019, characterized by increased sample sizes and use of large language models. Digital health platforms were the largest providers of MHI data. Ground truth for supervised learning models was based on clinician ratings (n = 31), patient self-report (n = 29) and annotations by raters (n = 26). Text-based features contributed more to model accuracy than audio markers. Patients' clinical presentation (n = 34), response to intervention (n = 11), intervention monitoring (n = 20), providers' characteristics (n = 12), relational dynamics (n = 14), and data preparation (n = 4) were commonly investigated clinical categories. Limitations of reviewed studies included lack of linguistic diversity, limited reproducibility, and population bias. A research framework is developed and validated (NLPxMHI) to assist computational and clinical researchers in addressing the remaining gaps in applying NLP to MHI, with the goal of improving clinical utility, data access, and fairness.
Collapse
Affiliation(s)
- Matteo Malgaroli
- Department of Psychiatry, New York University, Grossman School of Medicine, New York, NY, 10016, USA.
| | | | - James M Zech
- Talkspace, New York, NY, 10025, USA
- Department of Psychology, Florida State University, Tallahassee, FL, 32306, USA
| | - Tim Althoff
- Department of Computer Science, University of Washington, Seattle, WA, 98195, USA
| |
Collapse
|
7
|
Printz Pereira D, Dominguez Perez S, Milan S. U.S. mothers' appraisals of the race-related events of 2020: Implications for the course of maternal posttraumatic stress disorder symptoms. J Trauma Stress 2023; 36:272-284. [PMID: 36593587 DOI: 10.1002/jts.22904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 11/11/2022] [Accepted: 11/14/2022] [Indexed: 01/04/2023]
Abstract
For parents of color, publicized racial violence can heighten concerns about their children's safety. The goal of this study was to test whether this form of race-related stress exacerbates maternal posttraumatic stress disorder (PTSD) symptoms over a 4-month period in families of color. Participants included 262 U.S. mothers with a lifetime mental health diagnosis (67.6% non-Hispanic White, 15.6% African-American/Black, 16.8% other family of color). Mothers completed online surveys and open-ended questions, including appraisals of the meaning of the 2020 race-related events (i.e., George Floyd's death, subsequent protests) in relation to their children's future. Open-ended responses were quantified using LIWC15 text analysis for emotion word frequency and thematic coding for perceived implications. In ANCOVA, there were significant racial group differences in appraisals, ds = 0.09-0.57. The responses from mothers of Black children included fewer positive and more negative emotion words than mothers of White children; they also included more perceived negative implications than all other mothers but did not vary on perceived positive implications. In regression analyses, there were significant moderating effects of race/ethnicity in the association between appraisals and PTSD symptom course such that negative appraisals predicted a subsequent increase in PTSD symptoms only for mothers of Black children, βs = .26-.37. Variations in event appraisals were unrelated to PTSD symptom course for other mothers. These findings provide longitudinal support for the link between vicarious racism exposure and PTSD symptoms and highlight one potential form of racism-related stress for parents of Black children.
Collapse
Affiliation(s)
- Destiny Printz Pereira
- Department of Psychological Sciences, University of Connecticut, Storrs, Connecticut, USA
| | - Sophia Dominguez Perez
- Department of Psychological Sciences, University of Connecticut, Storrs, Connecticut, USA
| | - Stephanie Milan
- Department of Psychological Sciences, University of Connecticut, Storrs, Connecticut, USA
| |
Collapse
|
8
|
The where and when of COVID-19: Using ecological and Twitter-based assessments to examine impacts in a temporal and community context. PLoS One 2022; 17:e0264280. [PMID: 35196353 PMCID: PMC8865674 DOI: 10.1371/journal.pone.0264280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 02/07/2022] [Indexed: 12/23/2022] Open
Abstract
In March 2020, residents of the Bronx, New York experienced one of the first significant community COVID-19 outbreaks in the United States. Focusing on intensive longitudinal data from 78 Bronx-based older adults, we used a multi-method approach to (1) examine 2019 to early pandemic (February-June 2020) changes in momentary psychological well-being of Einstein Aging Study (EAS) participants and (2) to contextualize these changes with community distress scores collected from public Twitter posts posted in Bronx County. We found increases in mean loneliness from 2019 to 2020; and participants that were higher in neuroticism had greater increases in thought unpleasantness and feeling depressed. Twitter-based Bronx community scores of anxiety, depressivity, and negatively-valenced affect showed elevated levels in 2020 weeks relative to 2019. Integration of EAS participant data and community data showed week-to-week fluctuations across 2019 and 2020. Results highlight how community-level data can characterize a rapidly changing environment to supplement individual-level data at no additional burden to individual participants.
Collapse
|
9
|
Sawalha J, Yousefnezhad M, Shah Z, Brown MRG, Greenshaw AJ, Greiner R. Detecting Presence of PTSD Using Sentiment Analysis From Text Data. Front Psychiatry 2022; 12:811392. [PMID: 35178000 PMCID: PMC8844448 DOI: 10.3389/fpsyt.2021.811392] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 12/27/2021] [Indexed: 12/15/2022] Open
Abstract
Rates of Post-traumatic stress disorder (PTSD) have risen significantly due to the COVID-19 pandemic. Telehealth has emerged as a means to monitor symptoms for such disorders. This is partly due to isolation or inaccessibility of therapeutic intervention caused from the pandemic. Additional screening tools may be needed to augment identification and diagnosis of PTSD through a virtual medium. Sentiment analysis refers to the use of natural language processing (NLP) to extract emotional content from text information. In our study, we train a machine learning (ML) model on text data, which is part of the Audio/Visual Emotion Challenge and Workshop (AVEC-19) corpus, to identify individuals with PTSD using sentiment analysis from semi-structured interviews. Our sample size included 188 individuals without PTSD, and 87 with PTSD. The interview was conducted by an artificial character (Ellie) over a video-conference call. Our model was able to achieve a balanced accuracy of 80.4% on a held out dataset used from the AVEC-19 challenge. Additionally, we implemented various partitioning techniques to determine if our model was generalizable enough. This shows that learned models can use sentiment analysis of speech to identify the presence of PTSD, even through a virtual medium. This can serve as an important, accessible and inexpensive tool to detect mental health abnormalities during the COVID-19 pandemic.
Collapse
Affiliation(s)
- Jeff Sawalha
- Department of Psychiatry, University of Alberta, Edmonton, AB, Canada
- Department of Computer Science, University of Alberta, Edmonton, AB, Canada
- Department of Computing Science, Alberta Machine Intelligence Institute, University of Alberta, Edmonton, AB, Canada
| | - Muhammad Yousefnezhad
- Department of Psychiatry, University of Alberta, Edmonton, AB, Canada
- Department of Computer Science, University of Alberta, Edmonton, AB, Canada
- Department of Computing Science, Alberta Machine Intelligence Institute, University of Alberta, Edmonton, AB, Canada
| | - Zehra Shah
- Department of Computer Science, University of Alberta, Edmonton, AB, Canada
- Department of Computing Science, Alberta Machine Intelligence Institute, University of Alberta, Edmonton, AB, Canada
| | - Matthew R. G. Brown
- Department of Psychiatry, University of Alberta, Edmonton, AB, Canada
- Department of Computer Science, University of Alberta, Edmonton, AB, Canada
| | | | - Russell Greiner
- Department of Computer Science, University of Alberta, Edmonton, AB, Canada
- Department of Computing Science, Alberta Machine Intelligence Institute, University of Alberta, Edmonton, AB, Canada
| |
Collapse
|