1
|
Liu L, Liu L, Wafa HA, Tydeman F, Xie W, Wang Y. Diagnostic accuracy of deep learning using speech samples in depression: a systematic review and meta-analysis. J Am Med Inform Assoc 2024:ocae189. [PMID: 39013193 DOI: 10.1093/jamia/ocae189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Revised: 06/12/2024] [Accepted: 07/05/2024] [Indexed: 07/18/2024] Open
Abstract
OBJECTIVE This study aims to conduct a systematic review and meta-analysis of the diagnostic accuracy of deep learning (DL) using speech samples in depression. MATERIALS AND METHODS This review included studies reporting diagnostic results of DL algorithms in depression using speech data, published from inception to January 31, 2024, on PubMed, Medline, Embase, PsycINFO, Scopus, IEEE, and Web of Science databases. Pooled accuracy, sensitivity, and specificity were obtained by random-effect models. The diagnostic Precision Study Quality Assessment Tool (QUADAS-2) was used to assess the risk of bias. RESULTS A total of 25 studies met the inclusion criteria and 8 of them were used in the meta-analysis. The pooled estimates of accuracy, specificity, and sensitivity for depression detection models were 0.87 (95% CI, 0.81-0.93), 0.85 (95% CI, 0.78-0.91), and 0.82 (95% CI, 0.71-0.94), respectively. When stratified by model structure, the highest pooled diagnostic accuracy was 0.89 (95% CI, 0.81-0.97) in the handcrafted group. DISCUSSION To our knowledge, our study is the first meta-analysis on the diagnostic performance of DL for depression detection from speech samples. All studies included in the meta-analysis used convolutional neural network (CNN) models, posing problems in deciphering the performance of other DL algorithms. The handcrafted model performed better than the end-to-end model in speech depression detection. CONCLUSIONS The application of DL in speech provided a useful tool for depression detection. CNN models with handcrafted acoustic features could help to improve the diagnostic performance. PROTOCOL REGISTRATION The study protocol was registered on PROSPERO (CRD42023423603).
Collapse
Affiliation(s)
- Lidan Liu
- Department of Population Health Sciences, School of Life Course and Population Sciences, Faculty of Life Sciences & Medicine, King's College London, London, SE1 1UL, United Kingdom
| | - Lu Liu
- Department of Population Health Sciences, School of Life Course and Population Sciences, Faculty of Life Sciences & Medicine, King's College London, London, SE1 1UL, United Kingdom
| | - Hatem A Wafa
- Department of Population Health Sciences, School of Life Course and Population Sciences, Faculty of Life Sciences & Medicine, King's College London, London, SE1 1UL, United Kingdom
| | - Florence Tydeman
- Department of Population Health Sciences, School of Life Course and Population Sciences, Faculty of Life Sciences & Medicine, King's College London, London, SE1 1UL, United Kingdom
| | - Wanqing Xie
- Department of Intelligent Medical Engineering, School of Biomedical Engineering, Anhui Medical University, Hefei, 230032, China
- Department of Psychology, School of Mental Health and Psychological Sciences, Anhui Medical University, Hefei, 230032, China
- Beth Israel Deaconess Medical Center, Harvard Medical School, Harvard University, Boston, MA, 02115, United States
| | - Yanzhong Wang
- Department of Population Health Sciences, School of Life Course and Population Sciences, Faculty of Life Sciences & Medicine, King's College London, London, SE1 1UL, United Kingdom
| |
Collapse
|
2
|
Huang X, Wang F, Gao Y, Liao Y, Zhang W, Zhang L, Xu Z. Depression recognition using voice-based pre-training model. Sci Rep 2024; 14:12734. [PMID: 38830969 DOI: 10.1038/s41598-024-63556-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 05/30/2024] [Indexed: 06/05/2024] Open
Abstract
The early screening of depression is highly beneficial for patients to obtain better diagnosis and treatment. While the effectiveness of utilizing voice data for depression detection has been demonstrated, the issue of insufficient dataset size remains unresolved. Therefore, we propose an artificial intelligence method to effectively identify depression. The wav2vec 2.0 voice-based pre-training model was used as a feature extractor to automatically extract high-quality voice features from raw audio. Additionally, a small fine-tuning network was used as a classification model to output depression classification results. Subsequently, the proposed model was fine-tuned on the DAIC-WOZ dataset and achieved excellent classification results. Notably, the model demonstrated outstanding performance in binary classification, attaining an accuracy of 0.9649 and an RMSE of 0.1875 on the test set. Similarly, impressive results were obtained in multi-classification, with an accuracy of 0.9481 and an RMSE of 0.3810. The wav2vec 2.0 model was first used for depression recognition and showed strong generalization ability. The method is simple, practical, and applicable, which can assist doctors in the early screening of depression.
Collapse
Affiliation(s)
- Xiangsheng Huang
- School of Biomedical Engineering, South-Central Minzu University, No.182, Minzu Avenue, Hongshan District, Wuhan City, 430074, Hubei Province, China
| | - Fang Wang
- School of Biomedical Engineering, South-Central Minzu University, No.182, Minzu Avenue, Hongshan District, Wuhan City, 430074, Hubei Province, China
| | - Yuan Gao
- School of Biomedical Engineering, South-Central Minzu University, No.182, Minzu Avenue, Hongshan District, Wuhan City, 430074, Hubei Province, China
| | - Yilong Liao
- School of Biomedical Engineering, South-Central Minzu University, No.182, Minzu Avenue, Hongshan District, Wuhan City, 430074, Hubei Province, China
| | - Wenjing Zhang
- School of Biomedical Engineering, South-Central Minzu University, No.182, Minzu Avenue, Hongshan District, Wuhan City, 430074, Hubei Province, China
| | - Li Zhang
- School of Biomedical Engineering, South-Central Minzu University, No.182, Minzu Avenue, Hongshan District, Wuhan City, 430074, Hubei Province, China.
| | - Zhenrong Xu
- School of Biomedical Engineering, South-Central Minzu University, No.182, Minzu Avenue, Hongshan District, Wuhan City, 430074, Hubei Province, China
| |
Collapse
|
3
|
Yang S, Cui L, Wang L, Wang T, You J. Enhancing multimodal depression diagnosis through representation learning and knowledge transfer. Heliyon 2024; 10:e25959. [PMID: 38380046 PMCID: PMC10877283 DOI: 10.1016/j.heliyon.2024.e25959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 01/30/2024] [Accepted: 02/05/2024] [Indexed: 02/22/2024] Open
Abstract
Depression is a complex mental health disorder that presents significant challenges in diagnosis and treatment. This study proposes an innovative approach, leveraging artificial intelligence advancements, to enhance multimodal depression diagnosis. The diagnosis of depression often relies on subjective assessments and clinical interviews, leading to potential biases and inaccuracies. Additionally, integrating diverse data modalities, such as textual, imaging, and audio information, poses technical challenges due to data heterogeneity and high dimensionality. To address these challenges, this paper proposes the RLKT-MDD (Representation Learning and Knowledge Transfer for Multimodal Depression Diagnosis) model framework. Representation learning enables the model to autonomously discover meaningful patterns and features from diverse data sources, surpassing traditional feature engineering methods. Knowledge transfer facilitates the effective transfer of knowledge from related domains, improving the model's performance in depression diagnosis. Furthermore, we analyzed the interpretability of the representation learning process, enhancing the transparency and trustworthiness of the diagnostic process. We extensively experimented with the DAIC-WOZ dataset, a diverse collection of multimodal data from clinical settings, to evaluate our proposed approach. The results demonstrate promising outcomes, indicating significant improvements over conventional diagnostic methods. Our study provides valuable insights into cutting-edge techniques for depression diagnosis, enabling more effective and personalized mental health interventions.
Collapse
Affiliation(s)
- Shanliang Yang
- School of Computer Science and Technology, Shandong University of Technology, Zibo, 255000, China
| | - Lichao Cui
- School of Computer Science and Technology, Shandong University of Technology, Zibo, 255000, China
| | - Lei Wang
- School of Computer Science and Technology, Shandong University of Technology, Zibo, 255000, China
| | - Tao Wang
- School of Computer Science and Technology, Shandong University of Technology, Zibo, 255000, China
| | - Jiebing You
- Department of Neurology, Zibo Central Hospital, Zibo, 255036, China
| |
Collapse
|
4
|
Han MM, Li XY, Yi XY, Zheng YS, Xia WL, Liu YF, Wang QX. Automatic recognition of depression based on audio and video: A review. World J Psychiatry 2024; 14:225-233. [PMID: 38464777 PMCID: PMC10921287 DOI: 10.5498/wjp.v14.i2.225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Revised: 12/18/2023] [Accepted: 01/24/2024] [Indexed: 02/06/2024] Open
Abstract
Depression is a common mental health disorder. With current depression detection methods, specialized physicians often engage in conversations and physiological examinations based on standardized scales as auxiliary measures for depression assessment. Non-biological markers-typically classified as verbal or non-verbal and deemed crucial evaluation criteria for depression-have not been effectively utilized. Specialized physicians usually require extensive training and experience to capture changes in these features. Advancements in deep learning technology have provided technical support for capturing non-biological markers. Several researchers have proposed automatic depression estimation (ADE) systems based on sounds and videos to assist physicians in capturing these features and conducting depression screening. This article summarizes commonly used public datasets and recent research on audio- and video-based ADE based on three perspectives: Datasets, deficiencies in existing research, and future development directions.
Collapse
Affiliation(s)
- Meng-Meng Han
- Shandong Mental Health Center, Shandong University, Jinan 250014, Shandong Province, China
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, Shandong Province, China
| | - Xing-Yun Li
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, Shandong Province, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, Shandong Province, China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan 250353, Shandong Province, China
| | - Xin-Yu Yi
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, Shandong Province, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, Shandong Province, China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan 250353, Shandong Province, China
| | - Yun-Shao Zheng
- Department of Ward Two, Shandong Mental Health Center, Shandong University, Jinan 250014, Shandong Province, China
| | - Wei-Li Xia
- Shandong Mental Health Center, Shandong University, Jinan 250014, Shandong Province, China
| | - Ya-Fei Liu
- Shandong Mental Health Center, Shandong University, Jinan 250014, Shandong Province, China
| | - Qing-Xiang Wang
- Shandong Mental Health Center, Shandong University, Jinan 250014, Shandong Province, China
| |
Collapse
|