1
|
Huang X, Wang F, Gao Y, Liao Y, Zhang W, Zhang L, Xu Z. Depression recognition using voice-based pre-training model. Sci Rep 2024; 14:12734. [PMID: 38830969 DOI: 10.1038/s41598-024-63556-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 05/30/2024] [Indexed: 06/05/2024] Open
Abstract
The early screening of depression is highly beneficial for patients to obtain better diagnosis and treatment. While the effectiveness of utilizing voice data for depression detection has been demonstrated, the issue of insufficient dataset size remains unresolved. Therefore, we propose an artificial intelligence method to effectively identify depression. The wav2vec 2.0 voice-based pre-training model was used as a feature extractor to automatically extract high-quality voice features from raw audio. Additionally, a small fine-tuning network was used as a classification model to output depression classification results. Subsequently, the proposed model was fine-tuned on the DAIC-WOZ dataset and achieved excellent classification results. Notably, the model demonstrated outstanding performance in binary classification, attaining an accuracy of 0.9649 and an RMSE of 0.1875 on the test set. Similarly, impressive results were obtained in multi-classification, with an accuracy of 0.9481 and an RMSE of 0.3810. The wav2vec 2.0 model was first used for depression recognition and showed strong generalization ability. The method is simple, practical, and applicable, which can assist doctors in the early screening of depression.
Collapse
Affiliation(s)
- Xiangsheng Huang
- School of Biomedical Engineering, South-Central Minzu University, No.182, Minzu Avenue, Hongshan District, Wuhan City, 430074, Hubei Province, China
| | - Fang Wang
- School of Biomedical Engineering, South-Central Minzu University, No.182, Minzu Avenue, Hongshan District, Wuhan City, 430074, Hubei Province, China
| | - Yuan Gao
- School of Biomedical Engineering, South-Central Minzu University, No.182, Minzu Avenue, Hongshan District, Wuhan City, 430074, Hubei Province, China
| | - Yilong Liao
- School of Biomedical Engineering, South-Central Minzu University, No.182, Minzu Avenue, Hongshan District, Wuhan City, 430074, Hubei Province, China
| | - Wenjing Zhang
- School of Biomedical Engineering, South-Central Minzu University, No.182, Minzu Avenue, Hongshan District, Wuhan City, 430074, Hubei Province, China
| | - Li Zhang
- School of Biomedical Engineering, South-Central Minzu University, No.182, Minzu Avenue, Hongshan District, Wuhan City, 430074, Hubei Province, China.
| | - Zhenrong Xu
- School of Biomedical Engineering, South-Central Minzu University, No.182, Minzu Avenue, Hongshan District, Wuhan City, 430074, Hubei Province, China
| |
Collapse
|
2
|
A New Regression Model for Depression Severity Prediction Based on Correlation among Audio Features Using a Graph Convolutional Neural Network. Diagnostics (Basel) 2023; 13:diagnostics13040727. [PMID: 36832211 PMCID: PMC9955540 DOI: 10.3390/diagnostics13040727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 02/10/2023] [Accepted: 02/13/2023] [Indexed: 02/17/2023] Open
Abstract
Recent studies have revealed mutually correlated audio features in the voices of depressed patients. Thus, the voices of these patients can be characterized based on the combinatorial relationships among the audio features. To date, many deep learning-based methods have been proposed to predict the depression severity using audio data. However, existing methods have assumed that the individual audio features are independent. Hence, in this paper, we propose a new deep learning-based regression model that allows for the prediction of depression severity on the basis of the correlation among audio features. The proposed model was developed using a graph convolutional neural network. This model trains the voice characteristics using graph-structured data generated to express the correlation among audio features. We conducted prediction experiments on depression severity using the DAIC-WOZ dataset employed in several previous studies. The experimental results showed that the proposed model achieved a root mean square error (RMSE) of 2.15, a mean absolute error (MAE) of 1.25, and a symmetric mean absolute percentage error of 50.96%. Notably, RMSE and MAE significantly outperformed the existing state-of-the-art prediction methods. From these results, we conclude that the proposed model can be a promising tool for depression diagnosis.
Collapse
|
3
|
Ishimaru M, Okada Y, Uchiyama R, Horiguchi R, Toyoshima I. Classification of Depression and Its Severity Based on Multiple Audio Features Using a Graphical Convolutional Neural Network. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:1588. [PMID: 36674342 PMCID: PMC9864471 DOI: 10.3390/ijerph20021588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 01/11/2023] [Accepted: 01/14/2023] [Indexed: 06/17/2023]
Abstract
Audio features are physical features that reflect single or complex coordinated movements in the vocal organs. Hence, in speech-based automatic depression classification, it is critical to consider the relationship among audio features. Here, we propose a deep learning-based classification model for discriminating depression and its severity using correlation among audio features. This model represents the correlation between audio features as graph structures and learns speech characteristics using a graph convolutional neural network. We conducted classification experiments in which the same subjects were allowed to be included in both the training and test data (Setting 1) and the subjects in the training and test data were completely separated (Setting 2). The results showed that the classification accuracy in Setting 1 significantly outperformed existing state-of-the-art methods, whereas that in Setting 2, which has not been presented in existing studies, was much lower than in Setting 1. We conclude that the proposed model is an effective tool for discriminating recurring patients and their severities, but it is difficult to detect new depressed patients. For practical application of the model, depression-specific speech regions appearing locally rather than the entire speech of depressed patients should be detected and assigned the appropriate class labels.
Collapse
Affiliation(s)
- Momoko Ishimaru
- Division of Information and Electronic Engineering, Muroran Institute of Technology, 27-1, Mizumoto-cho, Muroran 050-8585, Hokkaido, Japan
| | - Yoshifumi Okada
- College of Information and Systems, Muroran Institute of Technology, 27-1, Mizumoto-cho, Muroran 050-8585, Hokkaido, Japan
| | - Ryunosuke Uchiyama
- Division of Information and Electronic Engineering, Muroran Institute of Technology, 27-1, Mizumoto-cho, Muroran 050-8585, Hokkaido, Japan
| | - Ryo Horiguchi
- Division of Information and Electronic Engineering, Muroran Institute of Technology, 27-1, Mizumoto-cho, Muroran 050-8585, Hokkaido, Japan
| | - Itsuki Toyoshima
- Division of Information and Electronic Engineering, Muroran Institute of Technology, 27-1, Mizumoto-cho, Muroran 050-8585, Hokkaido, Japan
| |
Collapse
|
4
|
Kumar MR, Vekkot S, Lalitha S, Gupta D, Govindraj VJ, Shaukat K, Alotaibi YA, Zakariah M. Dementia Detection from Speech Using Machine Learning and Deep Learning Architectures. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22239311. [PMID: 36502013 PMCID: PMC9740675 DOI: 10.3390/s22239311] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 11/14/2022] [Accepted: 11/15/2022] [Indexed: 06/01/2023]
Abstract
Dementia affects the patient's memory and leads to language impairment. Research has demonstrated that speech and language deterioration is often a clear indication of dementia and plays a crucial role in the recognition process. Even though earlier studies have used speech features to recognize subjects suffering from dementia, they are often used along with other linguistic features obtained from transcriptions. This study explores significant standalone speech features to recognize dementia. The primary contribution of this work is to identify a compact set of speech features that aid in the dementia recognition process. The secondary contribution is to leverage machine learning (ML) and deep learning (DL) models for the recognition task. Speech samples from the Pitt corpus in Dementia Bank are utilized for the present study. The critical speech feature set of prosodic, voice quality and cepstral features has been proposed for the task. The experimental results demonstrate the superiority of machine learning (87.6 percent) over deep learning (85 percent) models for recognizing Dementia using the compact speech feature combination, along with lower time and memory consumption. The results obtained using the proposed approach are promising compared with the existing works on dementia recognition using speech.
Collapse
Affiliation(s)
- M. Rupesh Kumar
- Department of Electronics & Communication Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru 560035, India
| | - Susmitha Vekkot
- Department of Electronics & Communication Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru 560035, India
| | - S. Lalitha
- Department of Electronics & Communication Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru 560035, India
| | - Deepa Gupta
- Department of Computer Science & Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Bengaluru 560035, India
| | - Varasiddhi Jayasuryaa Govindraj
- Department of Electronics & Communication Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru 560035, India
| | - Kamran Shaukat
- School of Information and Physical Sciences, The University of Newcastle, Newcastle 2300, Australia
| | - Yousef Ajami Alotaibi
- Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh 11451, Saudi Arabia
| | - Mohammed Zakariah
- Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh 11451, Saudi Arabia
| |
Collapse
|
5
|
Pandey SK, Shekhawat HS, Prasanna SRM, Bhasin S, Jasuja R. A deep tensor-based approach for automatic depression recognition from speech utterances. PLoS One 2022; 17:e0272659. [PMID: 35951508 PMCID: PMC9371305 DOI: 10.1371/journal.pone.0272659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 07/24/2022] [Indexed: 11/26/2022] Open
Abstract
Depression is one of the significant mental health issues affecting all age groups globally. While it has been widely recognized to be one of the major disease burdens in populations, complexities in definitive diagnosis present a major challenge. Usually, trained psychologists utilize conventional methods including individualized interview assessment and manually administered PHQ-8 scoring. However, heterogeneity in symptomatic presentations, which span somatic to affective complaints, impart substantial subjectivity in its diagnosis. Diagnostic accuracy is further compounded by the cross-sectional nature of sporadic assessment methods during physician-office visits, especially since depressive symptoms/severity may evolve over time. With widespread acceptance of smart wearable devices and smartphones, passive monitoring of depression traits using behavioral signals such as speech presents a unique opportunity as companion diagnostics to assist the trained clinicians in objective assessment over time. Therefore, we propose a framework for automated depression classification leveraging alterations in speech patterns in the well documented and extensively studied DAIC-WOZ depression dataset. This novel tensor-based approach requires a substantially simpler implementation architecture and extracts discriminative features for depression recognition with high f1 score and accuracy. We posit that such algorithms, which use significantly less compute load would allow effective onboard deployment in wearables for improve diagnostics accuracy and real-time monitoring of depressive disorders.
Collapse
Affiliation(s)
- Sandeep Kumar Pandey
- Electronics and Electrical Engineering Dept, Indian Institute of Technology Guwahati, Assam, India
| | - Hanumant Singh Shekhawat
- Electronics and Electrical Engineering Dept, Indian Institute of Technology Guwahati, Assam, India
| | - S. R. M. Prasanna
- Electrical Engineering Dept, Indian Institute of Technology Dharwad, Dharwad, Karnataka, India
| | - Shalendar Bhasin
- Brigham and Womens Hospital, Harvard Medical School, Boston, MA, United States of America
| | - Ravi Jasuja
- Brigham and Womens Hospital, Harvard Medical School, Boston, MA, United States of America
- Function promoting Therapies, Waltham, MA, United States of America
| |
Collapse
|
6
|
Cognitive Computing in Mental Healthcare: a Review of Methods and Technologies for Detection of Mental Disorders. Cognit Comput 2022. [DOI: 10.1007/s12559-022-10042-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
7
|
A textual-based featuring approach for depression detection using machine learning classifiers and social media texts. Comput Biol Med 2021; 135:104499. [PMID: 34174760 DOI: 10.1016/j.compbiomed.2021.104499] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Revised: 05/12/2021] [Accepted: 05/14/2021] [Indexed: 10/21/2022]
Abstract
Depression is one of the leading causes of suicide worldwide. However, a large percentage of cases of depression go undiagnosed and, thus, untreated. Previous studies have found that messages posted by individuals with major depressive disorder on social media platforms can be analysed to predict if they are suffering, or likely to suffer, from depression. This study aims to determine whether machine learning could be effectively used to detect signs of depression in social media users by analysing their social media posts-especially when those messages do not explicitly contain specific keywords such as 'depression' or 'diagnosis'. To this end, we investigate several text preprocessing and textual-based featuring methods along with machine learning classifiers, including single and ensemble models, to propose a generalised approach for depression detection using social media texts. We first use two public, labelled Twitter datasets to train and test the machine learning models, and then another three non-Twitter depression-class-only datasets (sourced from Facebook, Reddit, and an electronic diary) to test the performance of our trained models against other social media sources. Experimental results indicate that the proposed approach is able to effectively detect depression via social media texts even when the training datasets do not contain specific keywords (such as 'depression' and 'diagnose'), as well as when unrelated datasets are used for testing.
Collapse
|