1
|
Seyedi S, Griner E, Corbin L, Jiang Z, Roberts K, Iacobelli L, Milloy A, Boazak M, Bahrami Rad A, Abbasi A, Cotes RO, Clifford GD. Using HIPAA (Health Insurance Portability and Accountability Act)-Compliant Transcription Services for Virtual Psychiatric Interviews: Pilot Comparison Study. JMIR Ment Health 2023; 10:e48517. [PMID: 37906217 PMCID: PMC10646674 DOI: 10.2196/48517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 08/25/2023] [Accepted: 09/12/2023] [Indexed: 11/02/2023] Open
Abstract
BACKGROUND Automatic speech recognition (ASR) technology is increasingly being used for transcription in clinical contexts. Although there are numerous transcription services using ASR, few studies have compared the word error rate (WER) between different transcription services among different diagnostic groups in a mental health setting. There has also been little research into the types of words ASR transcriptions mistakenly generate or omit. OBJECTIVE This study compared the WER of 3 ASR transcription services (Amazon Transcribe [Amazon.com, Inc], Zoom-Otter AI [Zoom Video Communications, Inc], and Whisper [OpenAI Inc]) in interviews across 2 different clinical categories (controls and participants experiencing a variety of mental health conditions). These ASR transcription services were also compared with a commercial human transcription service, Rev (Rev.Com, Inc). Words that were either included or excluded by the error in the transcripts were systematically analyzed by their Linguistic Inquiry and Word Count categories. METHODS Participants completed a 1-time research psychiatric interview, which was recorded on a secure server. Transcriptions created by the research team were used as the gold standard from which WER was calculated. The interviewees were categorized into either the control group (n=18) or the mental health condition group (n=47) using the Mini-International Neuropsychiatric Interview. The total sample included 65 participants. Brunner-Munzel tests were used for comparing independent sets, such as the diagnostic groupings, and Wilcoxon signed rank tests were used for correlated samples when comparing the total sample between different transcription services. RESULTS There were significant differences between each ASR transcription service's WER (P<.001). Amazon Transcribe's output exhibited significantly lower WERs compared with the Zoom-Otter AI's and Whisper's ASR. ASR performances did not significantly differ across the 2 different clinical categories within each service (P>.05). A comparison between the human transcription service output from Rev and the best-performing ASR (Amazon Transcribe) demonstrated a significant difference (P<.001), with Rev having a slightly lower median WER (7.6%, IQR 5.4%-11.35 vs 8.9%, IQR 6.9%-11.6%). Heat maps and spider plots were used to visualize the most common errors in Linguistic Inquiry and Word Count categories, which were found to be within 3 overarching categories: Conversation, Cognition, and Function. CONCLUSIONS Overall, consistent with previous literature, our results suggest that the WER between manual and automated transcription services may be narrowing as ASR services advance. These advances, coupled with decreased cost and time in receiving transcriptions, may make ASR transcriptions a more viable option within health care settings. However, more research is required to determine if errors in specific types of words impact the analysis and usability of these transcriptions, particularly for specific applications and in a variety of populations in terms of clinical diagnosis, literacy level, accent, and cultural origin.
Collapse
Affiliation(s)
- Salman Seyedi
- Department of Biomedical Informatics, Emory University, Atlanta, GA, United States
| | - Emily Griner
- Department of Psychiatry and Behavioral Sciences, Emory University, Atlanta, GA, United States
| | - Lisette Corbin
- Department of Psychiatry, Duke University Health, Durham, NC, United States
| | - Zifan Jiang
- Department of Biomedical Informatics, Emory University, Atlanta, GA, United States
- Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA, United States
| | - Kailey Roberts
- Department of Epidemiology, Emory University Rollins School of Public Health, Atlanta, GA, United States
| | - Luca Iacobelli
- Department of Psychiatry and Behavioral Sciences, Emory University, Atlanta, GA, United States
| | - Aaron Milloy
- Infection Prevention Department, Emory Healthcare, Atlanta, GA, United States
| | - Mina Boazak
- Animo Sano Psychiatry, Durham, NC, United States
| | - Ali Bahrami Rad
- Department of Biomedical Informatics, Emory University, Atlanta, GA, United States
| | - Ahmed Abbasi
- Department of Information Technology, Analytics, and Operations, University of Notre Dame, Notre Dame, IN, United States
| | - Robert O Cotes
- Department of Psychiatry and Behavioral Sciences, Emory University, Atlanta, GA, United States
| | - Gari D Clifford
- Department of Biomedical Informatics, Emory University, Atlanta, GA, United States
- Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA, United States
| |
Collapse
|
2
|
Berardi M, Brosch K, Pfarr JK, Schneider K, Sültmann A, Thomas-Odenthal F, Wroblewski A, Usemann P, Philipsen A, Dannlowski U, Nenadić I, Kircher T, Krug A, Stein F, Dietrich M. Relative importance of speech and voice features in the classification of schizophrenia and depression. Transl Psychiatry 2023; 13:298. [PMID: 37726285 PMCID: PMC10509176 DOI: 10.1038/s41398-023-02594-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 08/10/2023] [Accepted: 09/08/2023] [Indexed: 09/21/2023] Open
Abstract
Speech is a promising biomarker for schizophrenia spectrum disorder (SSD) and major depressive disorder (MDD). This proof of principle study investigates previously studied speech acoustics in combination with a novel application of voice pathology features as objective and reproducible classifiers for depression, schizophrenia, and healthy controls (HC). Speech and voice features for classification were calculated from recordings of picture descriptions from 240 speech samples (20 participants with SSD, 20 with MDD, and 20 HC each with 4 samples). Binary classification support vector machine (SVM) models classified the disorder groups and HC. For each feature, the permutation feature importance was calculated, and the top 25% most important features were used to compare differences between the disorder groups and HC including correlations between the important features and symptom severity scores. Multiple kernels for SVM were tested and the pairwise models with the best performing kernel (3-degree polynomial) were highly accurate for each classification: 0.947 for HC vs. SSD, 0.920 for HC vs. MDD, and 0.932 for SSD vs. MDD. The relatively most important features were measures of articulation coordination, number of pauses per minute, and speech variability. There were moderate correlations between important features and positive symptoms for SSD. The important features suggest that speech characteristics relating to psychomotor slowing, alogia, and flat affect differ between HC, SSD, and MDD.
Collapse
Affiliation(s)
- Mark Berardi
- Department of Psychiatry and Psychotherapy, University Hospital Bonn, Bonn, Germany.
| | - Katharina Brosch
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Julia-Katharina Pfarr
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Katharina Schneider
- Institute for Linguistics: General Linguistics, University of Mainz, Mainz, Germany
| | - Angela Sültmann
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Florian Thomas-Odenthal
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Adrian Wroblewski
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Paula Usemann
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Alexandra Philipsen
- Department of Psychiatry and Psychotherapy, University Hospital Bonn, Bonn, Germany
| | - Udo Dannlowski
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
| | - Igor Nenadić
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Tilo Kircher
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Axel Krug
- Department of Psychiatry and Psychotherapy, University Hospital Bonn, Bonn, Germany
| | - Frederike Stein
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Maria Dietrich
- Department of Psychiatry and Psychotherapy, University Hospital Bonn, Bonn, Germany
| |
Collapse
|
3
|
Chuang CY, Lin YT, Liu CC, Lee LE, Chang HY, Liu AS, Hung SH, Fu LC. Multimodal Assessment of Schizophrenia Symptom Severity From Linguistic, Acoustic and Visual Cues. IEEE Trans Neural Syst Rehabil Eng 2023; 31:3469-3479. [PMID: 37607137 DOI: 10.1109/tnsre.2023.3307597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
Assessing the condition of every schizophrenia patient correctly normally requires lengthy and frequent interviews with professionally trained doctors. To alleviate the time and manual burden on those mental health professionals, this paper proposes a multimodal assessment model that predicts the severity level of each symptom defined in Scale for the Assessment of Thought, Language, and Communication (TLC) and Positive and Negative Syndrome Scale (PANSS) based on the patient's linguistic, acoustic, and visual behavior. The proposed deep-learning model consists of a multimodal fusion framework and four unimodal transformer-based backbone networks. The second-stage pre-training is introduced to make each off-the-shelf pre-trained model learn the pattern of schizophrenia data more effectively. It learns to extract the desired features from the view of its modality. Next, the pre-trained parameters are frozen, and the light-weight trainable unimodal modules are inserted and fine-tuned to keep the number of parameters low while maintaining the superb performance simultaneously. Finally, the four adapted unimodal modules are fused into a final multimodal assessment model through the proposed multimodal fusion framework. For the purpose of validation, we train and evaluate the proposed model on schizophrenia patients recruited from National Taiwan University Hospital, whose performance achieves 0.534/0.685 in MAE/MSE, outperforming the related works in the literature. Through the experimental results and ablation studies, as well as the comparison with other related multimodal assessment works, our approach not only demonstrates the superiority of our performance but also the effectiveness of our approach to extract and integrate information from multiple modalities.
Collapse
|
4
|
Schneider K, Leinweber K, Jamalabadi H, Teutenberg L, Brosch K, Pfarr JK, Thomas-Odenthal F, Usemann P, Wroblewski A, Straube B, Alexander N, Nenadić I, Jansen A, Krug A, Dannlowski U, Kircher T, Nagels A, Stein F. Syntactic complexity and diversity of spontaneous speech production in schizophrenia spectrum and major depressive disorders. SCHIZOPHRENIA (HEIDELBERG, GERMANY) 2023; 9:35. [PMID: 37248240 DOI: 10.1038/s41537-023-00359-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 04/25/2023] [Indexed: 05/31/2023]
Abstract
Syntax, the grammatical structure of sentences, is a fundamental aspect of language. It remains debated whether reduced syntactic complexity is unique to schizophrenia spectrum disorder (SSD) or whether it is also present in major depressive disorder (MDD). Furthermore, the association of syntax (including syntactic complexity and diversity) with language-related neuropsychology and psychopathological symptoms across disorders remains unclear. Thirty-four SSD patients and thirty-eight MDD patients diagnosed according to DSM-IV-TR as well as forty healthy controls (HC) were included and tasked with describing four pictures from the Thematic Apperception Test. We analyzed the produced speech regarding its syntax delineating measures for syntactic complexity (the total number of main clauses embedding subordinate clauses) and diversity (number of different types of complex sentences). We performed cluster analysis to identify clusters based on syntax and investigated associations of syntactic, to language-related neuropsychological (verbal fluency and verbal episodic memory), and psychopathological measures (positive and negative formal thought disorder) using network analyses. Syntax in SSD was significantly reduced in comparison to MDD and HC, whereas the comparison of HC and MDD revealed no significant differences. No associations were present between speech measures and current medication, duration and severity of illness, age or sex; the single association accounted for was education. A cluster analysis resulted in four clusters with different degrees of syntax across diagnoses. Subjects with less syntax exhibited pronounced positive and negative symptoms and displayed poorer performance in executive functioning, global functioning, and verbal episodic memory. All cluster-based networks indicated varying degrees of domain-specific and cross-domain connections. Measures of syntactic complexity were closely related while syntactic diversity appeared to be a separate node outside of the syntactic network. Cross-domain associations were more salient in more complex syntactic production.
Collapse
Affiliation(s)
- Katharina Schneider
- Department of English and Linguistics, General Linguistics, University of Mainz, Mainz, Germany.
| | - Katrin Leinweber
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
| | - Hamidreza Jamalabadi
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
| | - Lea Teutenberg
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
| | - Katharina Brosch
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Julia-Katharina Pfarr
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Florian Thomas-Odenthal
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Paula Usemann
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Adrian Wroblewski
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Benjamin Straube
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Nina Alexander
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Igor Nenadić
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Andreas Jansen
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Axel Krug
- Department of Psychiatry and Psychotherapy, University of Bonn, Bonn, Germany
| | - Udo Dannlowski
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
| | - Tilo Kircher
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Arne Nagels
- Department of English and Linguistics, General Linguistics, University of Mainz, Mainz, Germany
| | - Frederike Stein
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| |
Collapse
|
5
|
Teixeira FL, Costa MRE, Abreu JP, Cabral M, Soares SP, Teixeira JP. A Narrative Review of Speech and EEG Features for Schizophrenia Detection: Progress and Challenges. Bioengineering (Basel) 2023; 10:bioengineering10040493. [PMID: 37106680 PMCID: PMC10135748 DOI: 10.3390/bioengineering10040493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 04/06/2023] [Accepted: 04/14/2023] [Indexed: 04/29/2023] Open
Abstract
Schizophrenia is a mental illness that affects an estimated 21 million people worldwide. The literature establishes that electroencephalography (EEG) is a well-implemented means of studying and diagnosing mental disorders. However, it is known that speech and language provide unique and essential information about human thought. Semantic and emotional content, semantic coherence, syntactic structure, and complexity can thus be combined in a machine learning process to detect schizophrenia. Several studies show that early identification is crucial to prevent the onset of illness or mitigate possible complications. Therefore, it is necessary to identify disease-specific biomarkers for an early diagnosis support system. This work contributes to improving our knowledge about schizophrenia and the features that can identify this mental illness via speech and EEG. The emotional state is a specific characteristic of schizophrenia that can be identified with speech emotion analysis. The most used features of speech found in the literature review are fundamental frequency (F0), intensity/loudness (I), frequency formants (F1, F2, and F3), Mel-frequency cepstral coefficients (MFCC's), the duration of pauses and sentences (SD), and the duration of silence between words. Combining at least two feature categories achieved high accuracy in the schizophrenia classification. Prosodic and spectral or temporal features achieved the highest accuracy. The work with higher accuracy used the prosodic and spectral features QEVA, SDVV, and SSDL, which were derived from the F0 and spectrogram. The emotional state can be identified with most of the features previously mentioned (F0, I, F1, F2, F3, MFCCs, and SD), linear prediction cepstral coefficients (LPCC), linear spectral features (LSF), and the pause rate. Using the event-related potentials (ERP), the most promissory features found in the literature are mismatch negativity (MMN), P2, P3, P50, N1, and N2. The EEG features with higher accuracy in schizophrenia classification subjects are the nonlinear features, such as Cx, HFD, and Lya.
Collapse
Affiliation(s)
- Felipe Lage Teixeira
- Research Centre in Digitalization and Intelligent Robotics (CEDRI), Instituto Politécnico de Bragança, Campus de Santa Apolónia, 5300-253 Bragança, Portugal
- Engineering Department, School of Sciences and Technology, University of Trás-os-Montes and Alto Douro (UTAD), Quinta de Prados, 5000-801 Vila Real, Portugal
| | - Miguel Rocha E Costa
- Research Centre in Digitalization and Intelligent Robotics (CEDRI), Instituto Politécnico de Bragança, Campus de Santa Apolónia, 5300-253 Bragança, Portugal
| | - José Pio Abreu
- Faculty of Medicine of the University of Coimbra, 3000-548 Coimbra, Portugal
- Hospital da Universidade de Coimbra, 3004-561 Coimbra, Portugal
| | - Manuel Cabral
- Engineering Department, School of Sciences and Technology, University of Trás-os-Montes and Alto Douro (UTAD), Quinta de Prados, 5000-801 Vila Real, Portugal
- Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, 3810-193 Aveiro, Portugal
| | - Salviano Pinto Soares
- Engineering Department, School of Sciences and Technology, University of Trás-os-Montes and Alto Douro (UTAD), Quinta de Prados, 5000-801 Vila Real, Portugal
- Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, 3810-193 Aveiro, Portugal
- Intelligent Systems Associate Laboratory (LASI), University of Aveiro, 3810-193 Aveiro, Portugal
| | - João Paulo Teixeira
- Research Centre in Digitalization and Intelligent Robotics (CEDRI), Instituto Politécnico de Bragança, Campus de Santa Apolónia, 5300-253 Bragança, Portugal
- Laboratório para a Sustentabilidade e Tecnologia em Regiões de Montanha (SusTEC), Instituto Politécnico de Bragança, Campus de Santa Apolónia, 5300-253 Bragança, Portugal
| |
Collapse
|
6
|
Ettore E, Müller P, Hinze J, Benoit M, Giordana B, Postin D, Lecomte A, Lindsay H, Robert P, König A. Digital Phenotyping for Differential Diagnosis of Major Depressive Episode: Narrative Review. JMIR Ment Health 2023; 10:e37225. [PMID: 36689265 PMCID: PMC9903183 DOI: 10.2196/37225] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 09/02/2022] [Accepted: 09/30/2022] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Major depressive episode (MDE) is a common clinical syndrome. It can be found in different pathologies such as major depressive disorder (MDD), bipolar disorder (BD), posttraumatic stress disorder (PTSD), or even occur in the context of psychological trauma. However, only 1 syndrome is described in international classifications (Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition [DSM-5]/International Classification of Diseases 11th Revision [ICD-11]), which do not take into account the underlying pathology at the origin of the MDE. Clinical interviews are currently the best source of information to obtain the etiological diagnosis of MDE. Nevertheless, it does not allow an early diagnosis and there are no objective measures of extracted clinical information. To remedy this, the use of digital tools and their correlation with clinical symptomatology could be useful. OBJECTIVE We aimed to review the current application of digital tools for MDE diagnosis while highlighting shortcomings for further research. In addition, our work was focused on digital devices easy to use during clinical interview and mental health issues where depression is common. METHODS We conducted a narrative review of the use of digital tools during clinical interviews for MDE by searching papers published in PubMed/MEDLINE, Web of Science, and Google Scholar databases since February 2010. The search was conducted from June to September 2021. Potentially relevant papers were then compared against a checklist for relevance and reviewed independently for inclusion, with focus on 4 allocated topics of (1) automated voice analysis, behavior analysis by (2) video and physiological measures, (3) heart rate variability (HRV), and (4) electrodermal activity (EDA). For this purpose, we were interested in 4 frequently found clinical conditions in which MDE can occur: (1) MDD, (2) BD, (3) PTSD, and (4) psychological trauma. RESULTS A total of 74 relevant papers on the subject were qualitatively analyzed and the information was synthesized. Thus, a digital phenotype of MDE seems to emerge consisting of modifications in speech features (namely, temporal, prosodic, spectral, source, and formants) and in speech content, modifications in nonverbal behavior (head, hand, body and eyes movement, facial expressivity, and gaze), and a decrease in physiological measurements (HRV and EDA). We not only found similarities but also differences when MDE occurs in MDD, BD, PTSD, or psychological trauma. However, comparative studies were rare in BD or PTSD conditions, which does not allow us to identify clear and distinct digital phenotypes. CONCLUSIONS Our search identified markers from several modalities that hold promise for helping with a more objective diagnosis of MDE. To validate their potential, further longitudinal and prospective studies are needed.
Collapse
Affiliation(s)
- Eric Ettore
- Department of Psychiatry and Memory Clinic, University Hospital of Nice, Nice, France
| | - Philipp Müller
- Research Department Cognitive Assistants, Deutsches Forschungszentrum für Künstliche Intelligenz GmbH, Saarbrücken, Germany
| | - Jonas Hinze
- Department of Psychiatry and Psychotherapy, Saarland University Medical Center, Hombourg, Germany
| | - Michel Benoit
- Department of Psychiatry, Hopital Pasteur, University Hospital of Nice, Nice, France
| | - Bruno Giordana
- Department of Psychiatry, Hopital Pasteur, University Hospital of Nice, Nice, France
| | - Danilo Postin
- Department of Psychiatry, School of Medicine and Health Sciences, Carl von Ossietzky University of Oldenburg, Bad Zwischenahn, Germany
| | - Amandine Lecomte
- Research Department Sémagramme Team, Institut national de recherche en informatique et en automatique, Nancy, France
| | - Hali Lindsay
- Research Department Cognitive Assistants, Deutsches Forschungszentrum für Künstliche Intelligenz GmbH, Saarbrücken, Germany
| | - Philippe Robert
- Research Department, Cognition-Behaviour-Technology Lab, University Côte d'Azur, Nice, France
| | - Alexandra König
- Research Department Stars Team, Institut national de recherche en informatique et en automatique, Sophia Antipolis - Valbonne, France
| |
Collapse
|
7
|
Koops S, Brederoo SG, de Boer JN, Nadema FG, Voppel AE, Sommer IE. Speech as a Biomarker for Depression. CNS & NEUROLOGICAL DISORDERS DRUG TARGETS 2023; 22:152-160. [PMID: 34961469 DOI: 10.2174/1871527320666211213125847] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 10/10/2021] [Accepted: 10/10/2021] [Indexed: 01/01/2023]
Abstract
BACKGROUND Depression is a debilitating disorder that at present lacks a reliable biomarker to aid in diagnosis and early detection. Recent advances in computational analytic approaches have opened up new avenues in developing such a biomarker by taking advantage of the wealth of information that can be extracted from a person's speech. OBJECTIVE The current review provides an overview of the latest findings in the rapidly evolving field of computational language analysis for the detection of depression. We cover a wide range of both acoustic and content-related linguistic features, data types (i.e., spoken and written language), and data sources (i.e., lab settings, social media, and smartphone-based). We put special focus on the current methodological advances with regard to feature extraction and computational modeling techniques. Furthermore, we pay attention to potential hurdles in the implementation of automatic speech analysis. CONCLUSION Depressive speech is characterized by several anomalies, such as lower speech rate, less pitch variability and more self-referential speech. With current computational modeling techniques, such features can be used to detect depression with an accuracy of up to 91%. The performance of the models is optimized when machine learning techniques are implemented that suit the type and amount of data. Recent studies now work towards further optimization and generalizability of the computational language models to detect depression. Finally, privacy and ethical issues are of paramount importance to be addressed when automatic speech analysis techniques are further implemented in, for example, smartphones. Altogether, computational speech analysis is well underway towards becoming an effective diagnostic aid for depression.
Collapse
Affiliation(s)
- Sanne Koops
- Department of Biomedical Sciences of Cells & Systems, Cognitive Neurosciences, University of Groningen, University Medical Center Groningen (UMCG), Groningen, The Netherlands
| | - Sanne G Brederoo
- Department of Biomedical Sciences of Cells & Systems, Cognitive Neurosciences, University of Groningen, University Medical Center Groningen (UMCG), Groningen, The Netherlands
- University Center for Psychiatry, University Medical Center Groningen, Groningen, The Netherlands
| | - Janna N de Boer
- Department of Psychiatry, University Medical Center Utrecht, Utrecht University & Brain Center Rudolf Magnus, Utrecht, The Netherlands
| | - Femke G Nadema
- Department of Biomedical Sciences of Cells & Systems, Cognitive Neurosciences, University of Groningen, University Medical Center Groningen (UMCG), Groningen, The Netherlands
| | - Alban E Voppel
- Department of Biomedical Sciences of Cells & Systems, Cognitive Neurosciences, University of Groningen, University Medical Center Groningen (UMCG), Groningen, The Netherlands
| | - Iris E Sommer
- Department of Biomedical Sciences of Cells & Systems, Cognitive Neurosciences, University of Groningen, University Medical Center Groningen (UMCG), Groningen, The Netherlands
| |
Collapse
|
8
|
Xu S, Yang Z, Chakraborty D, Chua YHV, Tolomeo S, Winkler S, Birnbaum M, Tan BL, Lee J, Dauwels J. Identifying psychiatric manifestations in schizophrenia and depression from audio-visual behavioural indicators through a machine-learning approach. SCHIZOPHRENIA (HEIDELBERG, GERMANY) 2022; 8:92. [PMID: 36344515 PMCID: PMC9640655 DOI: 10.1038/s41537-022-00287-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/21/2022] [Accepted: 09/08/2022] [Indexed: 11/09/2022]
Abstract
Schizophrenia (SCZ) and depression (MDD) are two chronic mental disorders that seriously affect the quality of life of millions of people worldwide. We aim to develop machine-learning methods with objective linguistic, speech, facial, and motor behavioral cues to reliably predict the severity of psychopathology or cognitive function, and distinguish diagnosis groups. We collected and analyzed the speech, facial expressions, and body movement recordings of 228 participants (103 SCZ, 50 MDD, and 75 healthy controls) from two separate studies. We created an ensemble machine-learning pipeline and achieved a balanced accuracy of 75.3% for classifying the total score of negative symptoms, 75.6% for the composite score of cognitive deficits, and 73.6% for the total score of general psychiatric symptoms in the mixed sample containing all three diagnostic groups. The proposed system is also able to differentiate between MDD and SCZ with a balanced accuracy of 84.7% and differentiate patients with SCZ or MDD from healthy controls with a balanced accuracy of 82.3%. These results suggest that machine-learning models leveraging audio-visual characteristics can help diagnose, assess, and monitor patients with schizophrenia and depression.
Collapse
Affiliation(s)
- Shihao Xu
- School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
| | - Zixu Yang
- Institute of Mental Health, Singapore, Singapore
| | - Debsubhra Chakraborty
- School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
| | - Yi Han Victoria Chua
- School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
- School of Social Science, Nanyang Technological University, Singapore, Singapore
| | - Serenella Tolomeo
- Department of Psychology, National University of Singapore, Singapore, Singapore
| | - Stefan Winkler
- School of Computing, National University of Singapore, Singapore, Singapore
| | | | | | - Jimmy Lee
- Institute of Mental Health, Singapore, Singapore
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - Justin Dauwels
- Faculty of Electrical Engineering, Mathematics, and Computer Science, Delft University of Technology, Delft, Netherlands.
| |
Collapse
|
9
|
Huang YJ, Lin YT, Liu CC, Lee LE, Hung SH, Lo JK, Fu LC. Assessing Schizophrenia Patients through Linguistic and Acoustic Features using Deep Learning Techniques. IEEE Trans Neural Syst Rehabil Eng 2022; 30:947-956. [PMID: 35358049 DOI: 10.1109/tnsre.2022.3163777] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Thought, language, and communication disorders are among the salient characteristics of schizophrenia. Such impairments are often exhibited in patients' conversations. Researches have shown that assessments of thought disorder are crucial for tracking the clinical patients' conditions and early detection of clinical high-risks. Detecting such symptoms require a trained clinician's expertise, which is prohibitive due to cost and the high patient-to-clinician ratio. In this paper, we propose a machine learning method using Transformer-based model to help automate the assessment of the severity of the thought disorder of schizophrenia. The proposed model uses both textual and acoustic speech between occupational therapists or psychiatric nurses and schizophrenia patients to predict the level of their thought disorder. Experimental results show that the proposed model has the ability to closely predict the results of assessments for Schizophrenia patients base on the extracted semantic, syntactic and acoustic features. Thus, we believe our model can be a helpful tool to doctors when they are assessing schizophrenia patients.
Collapse
|
10
|
Birnbaum ML, Abrami A, Heisig S, Ali A, Arenare E, Agurto C, Lu N, Kane JM, Cecchi G. Acoustic and Facial Features From Clinical Interviews for Machine Learning-Based Psychiatric Diagnosis: Algorithm Development. JMIR Ment Health 2022; 9:e24699. [PMID: 35072648 PMCID: PMC8822433 DOI: 10.2196/24699] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Revised: 04/29/2021] [Accepted: 12/01/2021] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND In contrast to all other areas of medicine, psychiatry is still nearly entirely reliant on subjective assessments such as patient self-report and clinical observation. The lack of objective information on which to base clinical decisions can contribute to reduced quality of care. Behavioral health clinicians need objective and reliable patient data to support effective targeted interventions. OBJECTIVE We aimed to investigate whether reliable inferences-psychiatric signs, symptoms, and diagnoses-can be extracted from audiovisual patterns in recorded evaluation interviews of participants with schizophrenia spectrum disorders and bipolar disorder. METHODS We obtained audiovisual data from 89 participants (mean age 25.3 years; male: 48/89, 53.9%; female: 41/89, 46.1%): individuals with schizophrenia spectrum disorders (n=41), individuals with bipolar disorder (n=21), and healthy volunteers (n=27). We developed machine learning models based on acoustic and facial movement features extracted from participant interviews to predict diagnoses and detect clinician-coded neuropsychiatric symptoms, and we assessed model performance using area under the receiver operating characteristic curve (AUROC) in 5-fold cross-validation. RESULTS The model successfully differentiated between schizophrenia spectrum disorders and bipolar disorder (AUROC 0.73) when aggregating face and voice features. Facial action units including cheek-raising muscle (AUROC 0.64) and chin-raising muscle (AUROC 0.74) provided the strongest signal for men. Vocal features, such as energy in the frequency band 1 to 4 kHz (AUROC 0.80) and spectral harmonicity (AUROC 0.78), provided the strongest signal for women. Lip corner-pulling muscle signal discriminated between diagnoses for both men (AUROC 0.61) and women (AUROC 0.62). Several psychiatric signs and symptoms were successfully inferred: blunted affect (AUROC 0.81), avolition (AUROC 0.72), lack of vocal inflection (AUROC 0.71), asociality (AUROC 0.63), and worthlessness (AUROC 0.61). CONCLUSIONS This study represents advancement in efforts to capitalize on digital data to improve diagnostic assessment and supports the development of a new generation of innovative clinical tools by employing acoustic and facial data analysis.
Collapse
Affiliation(s)
- Michael L Birnbaum
- Department of Psychiatry, The Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, United States.,The Feinstein Institute for Medical Research, Northwell Health, Manhasset, NY, United States.,The Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY, United States
| | - Avner Abrami
- Computational Biology Center, IBM Research, Yorktown Heights, NY, United States
| | - Stephen Heisig
- Icahn School of Medicine at Mount Sinai, New York City, NY, United States
| | - Asra Ali
- Department of Psychiatry, The Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, United States.,The Feinstein Institute for Medical Research, Northwell Health, Manhasset, NY, United States
| | - Elizabeth Arenare
- Department of Psychiatry, The Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, United States.,The Feinstein Institute for Medical Research, Northwell Health, Manhasset, NY, United States
| | - Carla Agurto
- Computational Biology Center, IBM Research, Yorktown Heights, NY, United States
| | - Nathaniel Lu
- Department of Psychiatry, The Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, United States.,The Feinstein Institute for Medical Research, Northwell Health, Manhasset, NY, United States
| | - John M Kane
- Department of Psychiatry, The Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, United States.,The Feinstein Institute for Medical Research, Northwell Health, Manhasset, NY, United States.,The Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY, United States
| | - Guillermo Cecchi
- Computational Biology Center, IBM Research, Yorktown Heights, NY, United States
| |
Collapse
|
11
|
Tang SX, Kriz R, Cho S, Park SJ, Harowitz J, Gur RE, Bhati MT, Wolf DH, Sedoc J, Liberman MY. Natural language processing methods are sensitive to sub-clinical linguistic differences in schizophrenia spectrum disorders. NPJ SCHIZOPHRENIA 2021; 7:25. [PMID: 33990615 PMCID: PMC8121795 DOI: 10.1038/s41537-021-00154-3] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Accepted: 03/26/2021] [Indexed: 01/11/2023]
Abstract
Computerized natural language processing (NLP) allows for objective and sensitive detection of speech disturbance, a hallmark of schizophrenia spectrum disorders (SSD). We explored several methods for characterizing speech changes in SSD (n = 20) compared to healthy control (HC) participants (n = 11) and approached linguistic phenotyping on three levels: individual words, parts-of-speech (POS), and sentence-level coherence. NLP features were compared with a clinical gold standard, the Scale for the Assessment of Thought, Language and Communication (TLC). We utilized Bidirectional Encoder Representations from Transformers (BERT), a state-of-the-art embedding algorithm incorporating bidirectional context. Through the POS approach, we found that SSD used more pronouns but fewer adverbs, adjectives, and determiners (e.g., "the," "a,"). Analysis of individual word usage was notable for more frequent use of first-person singular pronouns among individuals with SSD and first-person plural pronouns among HC. There was a striking increase in incomplete words among SSD. Sentence-level analysis using BERT reflected increased tangentiality among SSD with greater sentence embedding distances. The SSD sample had low speech disturbance on average and there was no difference in group means for TLC scores. However, NLP measures of language disturbance appear to be sensitive to these subclinical differences and showed greater ability to discriminate between HC and SSD than a model based on clinical ratings alone. These intriguing exploratory results from a small sample prompt further inquiry into NLP methods for characterizing language disturbance in SSD and suggest that NLP measures may yield clinically relevant and informative biomarkers.
Collapse
Affiliation(s)
- Sunny X Tang
- Zucker Hillside Hospital, Department of Psychiatry, 75-59 263rd St., Glen Oaks, NY, USA.
- University of Pennsylvania, Department of Psychiatry, 3400 Spruce St, Gates Building, Philadelphia, PA, USA.
- Linguistics Data Consortium, 3600 Market St, Suite 810, Philadelphia, PA, USA.
| | - Reno Kriz
- University of Pennsylvania, Department of Computer Science, 3330 Walnut St, Levine Hall, Philadelphia, PA, USA
| | - Sunghye Cho
- Linguistics Data Consortium, 3600 Market St, Suite 810, Philadelphia, PA, USA
| | - Suh Jung Park
- University of Pennsylvania, Department of Psychiatry, 3400 Spruce St, Gates Building, Philadelphia, PA, USA
| | - Jenna Harowitz
- University of Pennsylvania, Department of Psychiatry, 3400 Spruce St, Gates Building, Philadelphia, PA, USA
| | - Raquel E Gur
- University of Pennsylvania, Department of Psychiatry, 3400 Spruce St, Gates Building, Philadelphia, PA, USA
| | - Mahendra T Bhati
- University of Pennsylvania, Department of Psychiatry, 3400 Spruce St, Gates Building, Philadelphia, PA, USA
- Stanford University, Department of Psychiatry and Neurosurgery, 401 Quarry Road, Stanford, CA, USA
| | - Daniel H Wolf
- University of Pennsylvania, Department of Psychiatry, 3400 Spruce St, Gates Building, Philadelphia, PA, USA
| | - João Sedoc
- New York University, Department of Technology, Operations, and Statistics, 44 West Fourth Street, Kaufman Management Center, New York, NY, USA
| | - Mark Y Liberman
- Linguistics Data Consortium, 3600 Market St, Suite 810, Philadelphia, PA, USA
- University of Pennsylvania, Department of Linguistics, 3401-C Walnut St, Suite 300, C Wing, Philadelphia, PA, USA
| |
Collapse
|