1
|
Zhang X, Zhang X, Chen W, Li C, Yu C. Improving speech depression detection using transfer learning with wav2vec 2.0 in low-resource environments. Sci Rep 2024; 14:9543. [PMID: 38664511 PMCID: PMC11045867 DOI: 10.1038/s41598-024-60278-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 04/21/2024] [Indexed: 04/28/2024] Open
Abstract
Depression, a pervasive global mental disorder, profoundly impacts daily lives. Despite numerous deep learning studies focused on depression detection through speech analysis, the shortage of annotated bulk samples hampers the development of effective models. In response to this challenge, our research introduces a transfer learning approach for detecting depression in speech, aiming to overcome constraints imposed by limited resources. In the context of feature representation, we obtain depression-related features by fine-tuning wav2vec 2.0. By integrating 1D-CNN and attention pooling structures, we generate advanced features at the segment level, thereby enhancing the model's capability to capture temporal relationships within audio frames. In the realm of prediction results, we integrate LSTM and self-attention mechanisms. This incorporation assigns greater weights to segments associated with depression, thereby augmenting the model's discernment of depression-related information. The experimental results indicate that our model has achieved impressive F1 scores, reaching 79% on the DAIC-WOZ dataset and 90.53% on the CMDC dataset. It outperforms recent baseline models in the field of speech-based depression detection. This provides a promising solution for effective depression detection in low-resource environments.
Collapse
Affiliation(s)
- Xu Zhang
- School of Software Engineering, Xiamen University of Technology, Xiamen, 361024, China
| | - Xiangcheng Zhang
- School of Computer and Information Engineering, Xiamen University of Technology, Xiamen, 361024, China.
| | - Weisi Chen
- School of Software Engineering, Xiamen University of Technology, Xiamen, 361024, China
| | - Chenlong Li
- School of Computer and Information Engineering, Xiamen University of Technology, Xiamen, 361024, China
| | - Chengyuan Yu
- School of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, 330045, China
| |
Collapse
|
2
|
Li G, Zarei MA, Alibakhshi G, Labbafi A. Teachers and educators' experiences and perceptions of artificial-powered interventions for autism groups. BMC Psychol 2024; 12:199. [PMID: 38605422 PMCID: PMC11010416 DOI: 10.1186/s40359-024-01664-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 03/15/2024] [Indexed: 04/13/2024] Open
Abstract
BACKGROUND Artificial intelligence-powered interventions have emerged as promising tools to support autistic individuals. However, more research must examine how teachers and educators perceive and experience these AI systems when implemented. OBJECTIVES The first objective was to investigate informants' perceptions and experiences of AI-empowered interventions for children with autism. Mainly, it explores the informants' perceived benefits and challenges of using AI-empowered interventions and their recommendations for avoiding the perceived challenges. METHODOLOGY A qualitative phenomenological approach was used. Twenty educators and parents with experience implementing AI interventions for autism were recruited through purposive sampling. Semi-structured and focus group interviews conducted, transcribed verbatim, and analyzed using thematic analysis. FINDINGS The analysis identified four major themes: perceived benefits of AI interventions, implementation challenges, needed support, and recommendations for improvement. Benefits included increased engagement and personalized learning. Challenges included technology issues, training needs, and data privacy concerns. CONCLUSIONS AI-powered interventions show potential to improve autism support, but significant challenges must be addressed to ensure effective implementation from an educator's perspective. The benefits of personalized learning and student engagement demonstrate the potential value of these technologies. However, with adequate training, technical support, and measures to ensure data privacy, many educators will likely find integrating AI systems into their daily practices easier. IMPLICATIONS To realize the full benefits of AI for autism, developers must work closely with educators to understand their needs, optimize implementation, and build trust through transparent privacy policies and procedures. With proper support, AI interventions can transform how autistic individuals are educated by tailoring instruction to each student's unique profile and needs.
Collapse
Affiliation(s)
- Guang Li
- School of History, Capital Normal University, Beijing, China
| | | | | | - Akram Labbafi
- Maraghe Branch, PhD Candidate of English Language Teaching, Islamic Azad University, Teheran, Iran
| |
Collapse
|
3
|
Xu X, Li J, Zhu Z, Zhao L, Wang H, Song C, Chen Y, Zhao Q, Yang J, Pei Y. A Comprehensive Review on Synergy of Multi-Modal Data and AI Technologies in Medical Diagnosis. Bioengineering (Basel) 2024; 11:219. [PMID: 38534493 DOI: 10.3390/bioengineering11030219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 02/15/2024] [Accepted: 02/21/2024] [Indexed: 03/28/2024] Open
Abstract
Disease diagnosis represents a critical and arduous endeavor within the medical field. Artificial intelligence (AI) techniques, spanning from machine learning and deep learning to large model paradigms, stand poised to significantly augment physicians in rendering more evidence-based decisions, thus presenting a pioneering solution for clinical practice. Traditionally, the amalgamation of diverse medical data modalities (e.g., image, text, speech, genetic data, physiological signals) is imperative to facilitate a comprehensive disease analysis, a topic of burgeoning interest among both researchers and clinicians in recent times. Hence, there exists a pressing need to synthesize the latest strides in multi-modal data and AI technologies in the realm of medical diagnosis. In this paper, we narrow our focus to five specific disorders (Alzheimer's disease, breast cancer, depression, heart disease, epilepsy), elucidating advanced endeavors in their diagnosis and treatment through the lens of artificial intelligence. Our survey not only delineates detailed diagnostic methodologies across varying modalities but also underscores commonly utilized public datasets, the intricacies of feature engineering, prevalent classification models, and envisaged challenges for future endeavors. In essence, our research endeavors to contribute to the advancement of diagnostic methodologies, furnishing invaluable insights for clinical decision making.
Collapse
Affiliation(s)
- Xi Xu
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Jianqiang Li
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Zhichao Zhu
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Linna Zhao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Huina Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Changwei Song
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Yining Chen
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Qing Zhao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Jijiang Yang
- Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China
| | - Yan Pei
- School of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu 965-8580, Japan
| |
Collapse
|
4
|
Han MM, Li XY, Yi XY, Zheng YS, Xia WL, Liu YF, Wang QX. Automatic recognition of depression based on audio and video: A review. World J Psychiatry 2024; 14:225-233. [PMID: 38464777 PMCID: PMC10921287 DOI: 10.5498/wjp.v14.i2.225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Revised: 12/18/2023] [Accepted: 01/24/2024] [Indexed: 02/06/2024] Open
Abstract
Depression is a common mental health disorder. With current depression detection methods, specialized physicians often engage in conversations and physiological examinations based on standardized scales as auxiliary measures for depression assessment. Non-biological markers-typically classified as verbal or non-verbal and deemed crucial evaluation criteria for depression-have not been effectively utilized. Specialized physicians usually require extensive training and experience to capture changes in these features. Advancements in deep learning technology have provided technical support for capturing non-biological markers. Several researchers have proposed automatic depression estimation (ADE) systems based on sounds and videos to assist physicians in capturing these features and conducting depression screening. This article summarizes commonly used public datasets and recent research on audio- and video-based ADE based on three perspectives: Datasets, deficiencies in existing research, and future development directions.
Collapse
Affiliation(s)
- Meng-Meng Han
- Shandong Mental Health Center, Shandong University, Jinan 250014, Shandong Province, China
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, Shandong Province, China
| | - Xing-Yun Li
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, Shandong Province, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, Shandong Province, China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan 250353, Shandong Province, China
| | - Xin-Yu Yi
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, Shandong Province, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, Shandong Province, China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan 250353, Shandong Province, China
| | - Yun-Shao Zheng
- Department of Ward Two, Shandong Mental Health Center, Shandong University, Jinan 250014, Shandong Province, China
| | - Wei-Li Xia
- Shandong Mental Health Center, Shandong University, Jinan 250014, Shandong Province, China
| | - Ya-Fei Liu
- Shandong Mental Health Center, Shandong University, Jinan 250014, Shandong Province, China
| | - Qing-Xiang Wang
- Shandong Mental Health Center, Shandong University, Jinan 250014, Shandong Province, China
| |
Collapse
|
5
|
Han J, Li H, Lin H, Wu P, Wang S, Tu J, Lu J. Depression prediction based on LassoNet-RNN model: A longitudinal study. Heliyon 2023; 9:e20684. [PMID: 37842633 PMCID: PMC10570602 DOI: 10.1016/j.heliyon.2023.e20684] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 09/21/2023] [Accepted: 10/04/2023] [Indexed: 10/17/2023] Open
Abstract
Depression has become a widespread health concern today. Understanding the influencing factors can promote human mental health as well as provide a basis for exploring preventive measures. Combining LassoNet with recurrent neural network (RNN), this study constructed a screening model ,LassoNet-RNN, for identifying influencing factors of individual depression. Based on multi-wave surveys of China Health and Retirement Longitudinal Study (CHARLS) dataset (11,661 observations), we analyzed the multivariate time series data and recognized 27 characteristic variables selected from four perspectives: demographics, health-related risk factors, household economic status, and living environment. Additionally, the importance rankings of the characteristic variables were obtained. These results offered insightful recommendations for theoretical developments and practical decision making in public health.
Collapse
Affiliation(s)
- Jiatong Han
- School of Computer Science, Nanjing Audit University, China
| | - Hao Li
- School of Computer Science, Nanjing Audit University, China
| | - Han Lin
- Jiangsu Key Laboratory of Public Project Audit, School of Engineering Audit, Nanjing Audit University, China
| | - Pingping Wu
- Jiangsu Key Laboratory of Public Project Audit, School of Engineering Audit, Nanjing Audit University, China
| | - Shidan Wang
- School of Computer Science, Nanjing Audit University, China
| | - Juan Tu
- Key Laboratory of Modern Acoustics (MOE), School of Physics, Nanjing University, China
| | - Jing Lu
- Key Laboratory of Modern Acoustics (MOE), School of Physics, Nanjing University, China
| |
Collapse
|
6
|
Yang W, Liu J, Cao P, Zhu R, Wang Y, Liu JK, Wang F, Zhang X. Attention guided learnable time-domain filterbanks for speech depression detection. Neural Netw 2023; 165:135-149. [PMID: 37285730 DOI: 10.1016/j.neunet.2023.05.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2022] [Revised: 05/13/2023] [Accepted: 05/20/2023] [Indexed: 06/09/2023]
Abstract
Depression, as a global mental health problem, is lacking effective screening methods that can help with early detection and treatment. This paper aims to facilitate the large-scale screening of depression by focusing on the speech depression detection (SDD) task. Currently, direct modeling on the raw signal yields a large number of parameters, and the existing deep learning-based SDD models mainly use the fixed Mel-scale spectral features as input. However, these features are not designed for depression detection, and the manual settings limit the exploration of fine-grained feature representations. In this paper, we learn the effective representations of the raw signals from an interpretable perspective. Specifically, we present a joint learning framework with attention-guided learnable time-domain filterbanks for depression classification (DALF), which collaborates with the depression filterbanks features learning (DFBL) module and multi-scale spectral attention learning (MSSA) module. DFBL is capable of producing biologically meaningful acoustic features by employing learnable time-domain filters, and MSSA is used to guide the learnable filters to better retain the useful frequency sub-bands. We collect a new dataset, the Neutral Reading-based Audio Corpus (NRAC), to facilitate the research in depression analysis, and we evaluate the performance of DALF on the NRAC and the public DAIC-woz datasets. The experimental results demonstrate that our method outperforms the state-of-the-art SDD methods with an F1 of 78.4% on the DAIC-woz dataset. In particular, DALF achieves F1 scores of 87.3% and 81.7% on two parts of the NRAC dataset. By analyzing the filter coefficients, we find that the most important frequency range identified by our method is 600-700Hz, which corresponds to the Mandarin vowels /e/ and /eˆ/ and can be considered as an effective biomarker for the SDD task. Taken together, our DALF model provides a promising approach to depression detection.
Collapse
Affiliation(s)
- Wenju Yang
- College of Computer Science and Engineering, Northeastern University, Shenyang, 110819, Liaoning, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, 110819, Liaoning, China
| | - Jiankang Liu
- College of Computer Science and Engineering, Northeastern University, Shenyang, 110819, Liaoning, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, 110819, Liaoning, China
| | - Peng Cao
- College of Computer Science and Engineering, Northeastern University, Shenyang, 110819, Liaoning, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, 110819, Liaoning, China.
| | - Rongxin Zhu
- Early Intervention Unit, Department of Psychiatry, Affiliated Nanjing Brain Hospital, Nanjing Medical University, Nanjing, 210096, China
| | - Yang Wang
- Early Intervention Unit, Department of Psychiatry, Affiliated Nanjing Brain Hospital, Nanjing Medical University, Nanjing, 210096, China
| | - Jian K Liu
- School of Computing, University of Leeds, Leeds, LS2 9JT, United Kingdom
| | - Fei Wang
- Early Intervention Unit, Department of Psychiatry, Affiliated Nanjing Brain Hospital, Nanjing Medical University, Nanjing, 210096, China.
| | - Xizhe Zhang
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, 211166, China.
| |
Collapse
|
7
|
Pan W, Deng F, Wang X, Hang B, Zhou W, Zhu T. Exploring the ability of vocal biomarkers in distinguishing depression from bipolar disorder, schizophrenia, and healthy controls. Front Psychiatry 2023; 14:1079448. [PMID: 37575564 PMCID: PMC10415910 DOI: 10.3389/fpsyt.2023.1079448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 06/30/2023] [Indexed: 08/15/2023] Open
Abstract
Background Vocal features have been exploited to distinguish depression from healthy controls. While there have been some claims for success, the degree to which changes in vocal features are specific to depression has not been systematically studied. Hence, we examined the performances of vocal features in differentiating depression from bipolar disorder (BD), schizophrenia and healthy controls, as well as pairwise classifications for the three disorders. Methods We sampled 32 bipolar disorder patients, 106 depression patients, 114 healthy controls, and 20 schizophrenia patients. We extracted i-vectors from Mel-frequency cepstrum coefficients (MFCCs), and built logistic regression models with ridge regularization and 5-fold cross-validation on the training set, then applied models to the test set. There were seven classification tasks: any disorder versus healthy controls; depression versus healthy controls; BD versus healthy controls; schizophrenia versus healthy controls; depression versus BD; depression versus schizophrenia; BD versus schizophrenia. Results The area under curve (AUC) score for classifying depression and bipolar disorder was 0.5 (F-score = 0.44). For other comparisons, the AUC scores ranged from 0.75 to 0.92, and the F-scores ranged from 0.73 to 0.91. The model performance (AUC) of classifying depression and bipolar disorder was significantly worse than that of classifying bipolar disorder and schizophrenia (corrected p < 0.05). While there were no significant differences in the remaining pairwise comparisons of the 7 classification tasks. Conclusion Vocal features showed discriminatory potential in classifying depression and the healthy controls, as well as between depression and other mental disorders. Future research should systematically examine the mechanisms of voice features in distinguishing depression with other mental disorders and develop more sophisticated machine learning models so that voice can assist clinical diagnosis better.
Collapse
Affiliation(s)
- Wei Pan
- Key Laboratory of Adolescent Cyberpsychology and Behavior (CCNU), Ministry of Education, Wuhan, China
- School of Psychology, Central China Normal University, Wuhan, China
- Key Laboratory of Human Development and Mental Health of Hubei Province, Wuhan, China
| | - Fusong Deng
- Wuhan Wuchang Hospital, Wuchang Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China
| | - Xianbin Wang
- Key Laboratory of Adolescent Cyberpsychology and Behavior (CCNU), Ministry of Education, Wuhan, China
- School of Psychology, Central China Normal University, Wuhan, China
- Key Laboratory of Human Development and Mental Health of Hubei Province, Wuhan, China
| | - Bowen Hang
- Key Laboratory of Adolescent Cyberpsychology and Behavior (CCNU), Ministry of Education, Wuhan, China
- School of Psychology, Central China Normal University, Wuhan, China
- Key Laboratory of Human Development and Mental Health of Hubei Province, Wuhan, China
| | - Wenwei Zhou
- Key Laboratory of Adolescent Cyberpsychology and Behavior (CCNU), Ministry of Education, Wuhan, China
- School of Psychology, Central China Normal University, Wuhan, China
- Key Laboratory of Human Development and Mental Health of Hubei Province, Wuhan, China
| | - Tingshao Zhu
- Institute of Psychology, Chinese Academy of Sciences, Beijing, China
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
8
|
Du M, Liu S, Wang T, Zhang W, Ke Y, Chen L, Ming D. Depression recognition using a proposed speech chain model fusing speech production and perception features. J Affect Disord 2023; 323:299-308. [PMID: 36462607 DOI: 10.1016/j.jad.2022.11.060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 10/22/2022] [Accepted: 11/20/2022] [Indexed: 12/05/2022]
Abstract
BACKGROUND Increasing depression patients puts great pressure on clinical diagnosis. Audio-based diagnosis is a helpful auxiliary tool for early mass screening. However, current methods consider only speech perception features, ignoring patients' vocal tract changes, which may partly result in the poor recognition. METHODS This work proposes a novel machine speech chain model for depression recognition (MSCDR) that can capture text-independent depressive speech representation from the speaker's mouth to the listener's ear to improve recognition performance. In the proposed MSCDR, linear predictive coding (LPC) and Mel-frequency cepstral coefficients (MFCC) features are extracted to describe the processes of speech generation and of speech perception, respectively. Then, a one-dimensional convolutional neural network and a long short-term memory network sequentially capture intra- and inter-segment dynamic depressive features for classification. RESULTS We tested the MSCDR on two public datasets with different languages and paradigms, namely, the Distress Analysis Interview Corpus-Wizard of Oz and the Multi-modal Open Dataset for Mental-disorder Analysis. The accuracy of the MSCDR on the two datasets was 0.77 and 0.86, and the average F1 score was 0.75 and 0.86, which were better than the other existing methods. This improvement reveals the complementarity of speech production and perception features in carrying depressive information. LIMITATIONS The sample size was relatively small, which may limit the application in clinical translation to some extent. CONCLUSION This experiment proves the good generalization ability and superiority of the proposed MSCDR and suggests that the vocal tract changes in patients with depression deserve attention for audio-based depression diagnosis.
Collapse
Affiliation(s)
- Minghao Du
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
| | - Shuang Liu
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China.
| | - Tao Wang
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
| | - Wenquan Zhang
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
| | - Yufeng Ke
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
| | - Long Chen
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
| | - Dong Ming
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China; Lab of Neural Engineering & Rehabilitation, Department of Biomedical Engineering, College of Precision Instruments and Optoelectronics Engineering, Tianjin University, Tianjin, China.
| |
Collapse
|
9
|
A New Regression Model for Depression Severity Prediction Based on Correlation among Audio Features Using a Graph Convolutional Neural Network. Diagnostics (Basel) 2023; 13:diagnostics13040727. [PMID: 36832211 PMCID: PMC9955540 DOI: 10.3390/diagnostics13040727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 02/10/2023] [Accepted: 02/13/2023] [Indexed: 02/17/2023] Open
Abstract
Recent studies have revealed mutually correlated audio features in the voices of depressed patients. Thus, the voices of these patients can be characterized based on the combinatorial relationships among the audio features. To date, many deep learning-based methods have been proposed to predict the depression severity using audio data. However, existing methods have assumed that the individual audio features are independent. Hence, in this paper, we propose a new deep learning-based regression model that allows for the prediction of depression severity on the basis of the correlation among audio features. The proposed model was developed using a graph convolutional neural network. This model trains the voice characteristics using graph-structured data generated to express the correlation among audio features. We conducted prediction experiments on depression severity using the DAIC-WOZ dataset employed in several previous studies. The experimental results showed that the proposed model achieved a root mean square error (RMSE) of 2.15, a mean absolute error (MAE) of 1.25, and a symmetric mean absolute percentage error of 50.96%. Notably, RMSE and MAE significantly outperformed the existing state-of-the-art prediction methods. From these results, we conclude that the proposed model can be a promising tool for depression diagnosis.
Collapse
|
10
|
Eysenbach G, Jang EH, Lee SH, Choi KY, Park JG, Shin HC. Automatic Depression Detection Using Smartphone-Based Text-Dependent Speech Signals: Deep Convolutional Neural Network Approach. J Med Internet Res 2023; 25:e34474. [PMID: 36696160 PMCID: PMC9909514 DOI: 10.2196/34474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 05/20/2022] [Accepted: 12/18/2022] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Automatic diagnosis of depression based on speech can complement mental health treatment methods in the future. Previous studies have reported that acoustic properties can be used to identify depression. However, few studies have attempted a large-scale differential diagnosis of patients with depressive disorders using acoustic characteristics of non-English speakers. OBJECTIVE This study proposes a framework for automatic depression detection using large-scale acoustic characteristics based on the Korean language. METHODS We recruited 153 patients who met the criteria for major depressive disorder and 165 healthy controls without current or past mental illness. Participants' voices were recorded on a smartphone while performing the task of reading predefined text-based sentences. Three approaches were evaluated and compared to detect depression using data sets with text-dependent read speech tasks: conventional machine learning models based on acoustic features, a proposed model that trains and classifies log-Mel spectrograms by applying a deep convolutional neural network (CNN) with a relatively small number of parameters, and models that train and classify log-Mel spectrograms by applying well-known pretrained networks. RESULTS The acoustic characteristics of the predefined text-based sentence reading automatically detected depression using the proposed CNN model. The highest accuracy achieved with the proposed CNN on the speech data was 78.14%. Our results show that the deep-learned acoustic characteristics lead to better performance than those obtained using the conventional approach and pretrained models. CONCLUSIONS Checking the mood of patients with major depressive disorder and detecting the consistency of objective descriptions are very important research topics. This study suggests that the analysis of speech data recorded while reading text-dependent sentences could help predict depression status automatically by capturing the characteristics of depression. Our method is smartphone based, is easily accessible, and can contribute to the automatic identification of depressive states.
Collapse
Affiliation(s)
| | - Eun Hye Jang
- Medical Information Research Section, Electronics and Telecommunications Research Institute, Dajeon, Republic of Korea
| | - Seung-Hwan Lee
- Clinical Emotion and Cognition Research Laboratory, Inje University, Goyang, Republic of Korea.,Department of Psychiatry, Inje University, Ilsan-Paik Hospital, Goyang, Republic of Korea.,Bwave Inc, Goyang, Republic of Korea
| | - Kwang-Yeon Choi
- Department of Psychiatry, College of Medicine, Chungnam National University, Daejeon, Republic of Korea
| | - Jeon Gue Park
- Artificial Intelligence Research Laboratory, Electronics and Telecommunications Research Institute, Dajeon, Republic of Korea.,Tutorus Labs Inc, Seoul, Republic of Korea
| | - Hyun-Chool Shin
- Department of Electronics Engineering, Soongsil University, Seoul, Republic of Korea
| |
Collapse
|
11
|
Smart voice recognition based on deep learning for depression diagnosis. ARTIFICIAL LIFE AND ROBOTICS 2023. [DOI: 10.1007/s10015-023-00852-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
|
12
|
Chen Y, Ma S, Yang X, Liu D, Yang J. Screening Children's Intellectual Disabilities with Phonetic Features, Facial Phenotype and Craniofacial Variability Index. Brain Sci 2023; 13:brainsci13010155. [PMID: 36672135 PMCID: PMC9857173 DOI: 10.3390/brainsci13010155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Revised: 12/31/2022] [Accepted: 01/09/2023] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND Intellectual Disability (ID) is a kind of developmental deficiency syndrome caused by congenital diseases or postnatal events. This syndrome could be intervened as soon as possible if its early screening was efficient, which may improve the condition of patients and enhance their self-care ability. The early screening of ID is always achieved by clinical interview, which needs in-depth participation of medical professionals and related medical resources. METHODS A new method for screening ID has been proposed by analyzing the facial phenotype and phonetic characteristic of young subjects. First, the geometric features of subjects' faces and phonetic features of subjects' voice are extracted from interview videos, then craniofacial variability index (CVI) is calculated with the geometric features and the risk of ID is given with the measure of CVI. Furthermore, machine learning algorithms are utilized to establish a method for further screening ID based on facial features and phonetic features. RESULTS The proposed method using three feature sets, including geometric features, CVI features and phonetic features was evaluated. The best performance of accuracy was closer to 80%. CONCLUSIONS The results using the three feature sets revealed that the proposed method may be applied in a clinical setting in the future after continuous improvement.
Collapse
Affiliation(s)
- Yuhe Chen
- School of Foreign Languages, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Simeng Ma
- Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China
| | - Xiaoyu Yang
- Department of Pharmacy, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
- Hubei Province Clinical Research Center for Precision Medicine for Critical Illness, Wuhan 430030, China
| | - Dujuan Liu
- Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China
- Correspondence: (D.L.); (J.Y.)
| | - Jun Yang
- School of Computer Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
- School of Information Engineering, Wuhan University of Technology, Wuhan 430070, China
- Correspondence: (D.L.); (J.Y.)
| |
Collapse
|
13
|
Liu Z, Yu H, Li G, Chen Q, Ding Z, Feng L, Yao Z, Hu B. Ensemble learning with speaker embeddings in multiple speech task stimuli for depression detection. Front Neurosci 2023; 17:1141621. [PMID: 37034153 PMCID: PMC10076578 DOI: 10.3389/fnins.2023.1141621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 03/09/2023] [Indexed: 04/11/2023] Open
Abstract
Introduction As a biomarker of depression, speech signal has attracted the interest of many researchers due to its characteristics of easy collection and non-invasive. However, subjects' speech variation under different scenes and emotional stimuli, the insufficient amount of depression speech data for deep learning, and the variable length of speech frame-level features have an impact on the recognition performance. Methods The above problems, this study proposes a multi-task ensemble learning method based on speaker embeddings for depression classification. First, we extract the Mel Frequency Cepstral Coefficients (MFCC), the Perceptual Linear Predictive Coefficients (PLP), and the Filter Bank (FBANK) from the out-domain dataset (CN-Celeb) and train the Resnet x-vector extractor, Time delay neural network (TDNN) x-vector extractor, and i-vector extractor. Then, we extract the corresponding speaker embeddings of fixed length from the depression speech database of the Gansu Provincial Key Laboratory of Wearable Computing. Support Vector Machine (SVM) and Random Forest (RF) are used to obtain the classification results of speaker embeddings in nine speech tasks. To make full use of the information of speech tasks with different scenes and emotions, we aggregate the classification results of nine tasks into new features and then obtain the final classification results by using Multilayer Perceptron (MLP). In order to take advantage of the complementary effects of different features, Resnet x-vectors based on different acoustic features are fused in the ensemble learning method. Results Experimental results demonstrate that (1) MFCC-based Resnet x-vectors perform best among the nine speaker embeddings for depression detection; (2) interview speech is better than picture descriptions speech, and neutral stimulus is the best among the three emotional valences in the depression recognition task; (3) our multi-task ensemble learning method with MFCC-based Resnet x-vectors can effectively identify depressed patients; (4) in all cases, the combination of MFCC-based Resnet x-vectors and PLP-based Resnet x-vectors in our ensemble learning method achieves the best results, outperforming other literature studies using the depression speech database. Discussion Our multi-task ensemble learning method with MFCC-based Resnet x-vectors can fuse the depression related information of different stimuli effectively, which provides a new approach for depression detection. The limitation of this method is that speaker embeddings extractors were pre-trained on the out-domain dataset. We will consider using the augmented in-domain dataset for pre-training to improve the depression recognition performance further.
Collapse
Affiliation(s)
- Zhenyu Liu
- Gansu Provincial Key Laboratory of Wearable Computing, School of Information Science and Engineering, Lanzhou University, Lanzhou, China
| | - Huimin Yu
- Gansu Provincial Key Laboratory of Wearable Computing, School of Information Science and Engineering, Lanzhou University, Lanzhou, China
| | - Gang Li
- Tianshui Third People’s Hospital, Tianshui, China
| | - Qiongqiong Chen
- Second Provincial People’s Hospital of Gansu, Lanzhou, China
- Affiliated Hospital of Northwest Minzu University, Lanzhou, China
| | - Zhijie Ding
- Tianshui Third People’s Hospital, Tianshui, China
| | - Lei Feng
- Department of Psychiatry, Beijing Anding Hospital of Capital Medical University, Beijing, China
| | - Zhijun Yao
- Gansu Provincial Key Laboratory of Wearable Computing, School of Information Science and Engineering, Lanzhou University, Lanzhou, China
| | - Bin Hu
- Gansu Provincial Key Laboratory of Wearable Computing, School of Information Science and Engineering, Lanzhou University, Lanzhou, China
- *Correspondence: Bin Hu,
| |
Collapse
|
14
|
Alghowinem S, Gedeon T, Goecke R, Cohn JF, Parker G. Interpretation of Depression Detection Models via Feature Selection Methods. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING 2023; 14:133-152. [PMID: 36938342 PMCID: PMC10019578 DOI: 10.1109/taffc.2020.3035535] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Given the prevalence of depression worldwide and its major impact on society, several studies employed artificial intelligence modelling to automatically detect and assess depression. However, interpretation of these models and cues are rarely discussed in detail in the AI community, but have received increased attention lately. In this study, we aim to analyse the commonly selected features using a proposed framework of several feature selection methods and their effect on the classification results, which will provide an interpretation of the depression detection model. The developed framework aggregates and selects the most promising features for modelling depression detection from 38 feature selection algorithms of different categories. Using three real-world depression datasets, 902 behavioural cues were extracted from speech behaviour, speech prosody, eye movement and head pose. To verify the generalisability of the proposed framework, we applied the entire process to depression datasets individually and when combined. The results from the proposed framework showed that speech behaviour features (e.g. pauses) are the most distinctive features of the depression detection model. From the speech prosody modality, the strongest feature groups were F0, HNR, formants, and MFCC, while for the eye activity modality they were left-right eye movement and gaze direction, and for the head modality it was yaw head movement. Modelling depression detection using the selected features (even though there are only 9 features) outperformed using all features in all the individual and combined datasets. Our feature selection framework did not only provide an interpretation of the model, but was also able to produce a higher accuracy of depression detection with a small number of features in varied datasets. This could help to reduce the processing time needed to extract features and creating the model.
Collapse
Affiliation(s)
- Sharifa Alghowinem
- Media Lab, Massachusetts Institute of Technology, Cambridge, MA, USA, with Prince Sultan University, Riyadh, Saudi Arabia and with the Australian National University, Canberra, Australia
| | - Tom Gedeon
- Australian National University, Canberra, Australia
| | | | | | | |
Collapse
|
15
|
König A, Tröger J, Mallick E, Mina M, Linz N, Wagnon C, Karbach J, Kuhn C, Peter J. Detecting subtle signs of depression with automated speech analysis in a non-clinical sample. BMC Psychiatry 2022; 22:830. [PMID: 36575442 PMCID: PMC9793349 DOI: 10.1186/s12888-022-04475-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 12/14/2022] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Automated speech analysis has gained increasing attention to help diagnosing depression. Most previous studies, however, focused on comparing speech in patients with major depressive disorder to that in healthy volunteers. An alternative may be to associate speech with depressive symptoms in a non-clinical sample as this may help to find early and sensitive markers in those at risk of depression. METHODS We included n = 118 healthy young adults (mean age: 23.5 ± 3.7 years; 77% women) and asked them to talk about a positive and a negative event in their life. Then, we assessed the level of depressive symptoms with a self-report questionnaire, with scores ranging from 0-60. We transcribed speech data and extracted acoustic as well as linguistic features. Then, we tested whether individuals below or above the cut-off of clinically relevant depressive symptoms differed in speech features. Next, we predicted whether someone would be below or above that cut-off as well as the individual scores on the depression questionnaire. Since depression is associated with cognitive slowing or attentional deficits, we finally correlated depression scores with performance in the Trail Making Test. RESULTS In our sample, n = 93 individuals scored below and n = 25 scored above cut-off for clinically relevant depressive symptoms. Most speech features did not differ significantly between both groups, but individuals above cut-off spoke more than those below that cut-off in the positive and the negative story. In addition, higher depression scores in that group were associated with slower completion time of the Trail Making Test. We were able to predict with 93% accuracy who would be below or above cut-off. In addition, we were able to predict the individual depression scores with low mean absolute error (3.90), with best performance achieved by a support vector machine. CONCLUSIONS Our results indicate that even in a sample without a clinical diagnosis of depression, changes in speech relate to higher depression scores. This should be investigated in more detail in the future. In a longitudinal study, it may be tested whether speech features found in our study represent early and sensitive markers for subsequent depression in individuals at risk.
Collapse
Affiliation(s)
- Alexandra König
- grid.457356.6Institut National de Recherche en Informatique Et en Automatique (INRIA), Sophia Antipolis, Stars Team, Valbonne, France
| | | | | | | | | | - Carole Wagnon
- grid.5734.50000 0001 0726 5157University Hospital of Old Age Psychiatry and Psychotherapy, University of Bern, Bolligenstrasse 111, CH-3000 Bern 60, Switzerland
| | - Julia Karbach
- grid.5892.60000 0001 0087 7257Department of Psychology, University of Koblenz-Landau, Koblenz, Germany
| | - Caroline Kuhn
- grid.11749.3a0000 0001 2167 7588Department of Psychology, Clinical Neuropsychology, University of Saarland, Saarbrücken, Germany
| | - Jessica Peter
- University Hospital of Old Age Psychiatry and Psychotherapy, University of Bern, Bolligenstrasse 111, CH-3000, Bern 60, Switzerland.
| |
Collapse
|
16
|
Francese R, Attanasio P. Emotion detection for supporting depression screening. MULTIMEDIA TOOLS AND APPLICATIONS 2022; 82:12771-12795. [PMID: 36570729 PMCID: PMC9761032 DOI: 10.1007/s11042-022-14290-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 10/14/2022] [Accepted: 12/03/2022] [Indexed: 06/17/2023]
Abstract
Depression is the most prevalent mental disorder in the world. One of the most adopted tools for depression screening is the Beck Depression Inventory-II (BDI-II) questionnaire. Patients may minimize or exaggerate their answers. Thus, to further examine the patient's mood while filling in the questionnaire, we propose a mobile application that captures the BDI-II patient's responses together with their images and speech. Deep learning techniques such as Convolutional Neural Networks analyze the patient's audio and image data. The application displays the correlation between the patient's emotional scores and DBI-II scores to the clinician at the end of the questionnaire, indicating the relationship between the patient's emotional state and the depression screening score. We conducted a preliminary evaluation involving clinicians and patients to assess (i) the acceptability of proposed application for use in clinics and (ii) the patient user experience. The participants were eight clinicians who tried the tool with 21 of their patients. The results seem to confirm the acceptability of the app in clinical practice.
Collapse
Affiliation(s)
- Rita Francese
- Computer Science Department, Università degli Studi di Salerno, Via Giovanni Paolo II, 132, Fisciano, 84084 (SA) Italy
| | | |
Collapse
|
17
|
Barua PD, Vicnesh J, Lih OS, Palmer EE, Yamakawa T, Kobayashi M, Acharya UR. Artificial intelligence assisted tools for the detection of anxiety and depression leading to suicidal ideation in adolescents: a review. Cogn Neurodyn 2022:1-22. [PMID: 36467993 PMCID: PMC9684805 DOI: 10.1007/s11571-022-09904-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 09/26/2022] [Accepted: 10/17/2022] [Indexed: 11/24/2022] Open
Abstract
Epidemiological studies report high levels of anxiety and depression amongst adolescents. These psychiatric conditions and complex interplays of biological, social and environmental factors are important risk factors for suicidal behaviours and suicide, which show a peak in late adolescence and early adulthood. Although deaths by suicide have fallen globally in recent years, suicide deaths are increasing in some countries, such as the US. Suicide prevention is a challenging global public health problem. Currently, there aren't any validated clinical biomarkers for suicidal diagnosis, and traditional methods exhibit limitations. Artificial intelligence (AI) is budding in many fields, including in the diagnosis of medical conditions. This review paper summarizes recent studies (past 8 years) that employed AI tools for the automated detection of depression and/or anxiety disorder and discusses the limitations and effects of some modalities. The studies assert that AI tools produce promising results and could overcome the limitations of traditional diagnostic methods. Although using AI tools for suicidal ideation exhibits limitations, these are outweighed by the advantages. Thus, this review article also proposes extracting a fusion of features such as facial images, speech signals, and visual and clinical history features from deep models for the automated detection of depression and/or anxiety disorder in individuals, for future work. This may pave the way for the identification of individuals with suicidal thoughts.
Collapse
Affiliation(s)
- Prabal Datta Barua
- School of Management and Enterprise, University of Southern Queensland, Springfield, Australia
| | - Jahmunah Vicnesh
- Department of Electronics and Computer Engineering, Ngee Ann Polytechnic, Singapore, Singapore
| | - Oh Shu Lih
- Department of Electronics and Computer Engineering, Ngee Ann Polytechnic, Singapore, Singapore
| | - Elizabeth Emma Palmer
- Discipline of Pediatric and Child Health, School of Clinical Medicine, University of New South Wales, Kensington, Australia
- Sydney Children’s Hospitals Network, Sydney, Australia
| | - Toshitaka Yamakawa
- Department of Computer Science and Electrical Engineering, Kumamoto University, Kumamoto, Japan
| | - Makiko Kobayashi
- Department of Computer Science and Electrical Engineering, Kumamoto University, Kumamoto, Japan
| | - Udyavara Rajendra Acharya
- Department of Electronics and Computer Engineering, Ngee Ann Polytechnic, Singapore, Singapore
- School of Science and Technology, Singapore University of Social Sciences, Singapore, Singapore
- Department of Bioinformatics and Medical Engineering, Asia University, Taizhong, Taiwan
- International Research Organization for Advanced Science and Technology (IROAST), Kumamoto University, Kumamoto, Japan
| |
Collapse
|
18
|
Newborn Cry-Based Diagnostic System to Distinguish between Sepsis and Respiratory Distress Syndrome Using Combined Acoustic Features. Diagnostics (Basel) 2022; 12:diagnostics12112802. [PMID: 36428865 PMCID: PMC9689015 DOI: 10.3390/diagnostics12112802] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Revised: 11/05/2022] [Accepted: 11/11/2022] [Indexed: 11/18/2022] Open
Abstract
Crying is the only means of communication for a newborn baby with its surrounding environment, but it also provides significant information about the newborn's health, emotions, and needs. The cries of newborn babies have long been known as a biomarker for the diagnosis of pathologies. However, to the best of our knowledge, exploring the discrimination of two pathology groups by means of cry signals is unprecedented. Therefore, this study aimed to identify septic newborns with Neonatal Respiratory Distress Syndrome (RDS) by employing the Machine Learning (ML) methods of Multilayer Perceptron (MLP) and Support Vector Machine (SVM). Furthermore, the cry signal was analyzed from the following two different perspectives: 1) the musical perspective by studying the spectral feature set of Harmonic Ratio (HR), and 2) the speech processing perspective using the short-term feature set of Gammatone Frequency Cepstral Coefficients (GFCCs). In order to assess the role of employing features from both short-term and spectral modalities in distinguishing the two pathology groups, they were fused in one feature set named the combined features. The hyperparameters (HPs) of the implemented ML approaches were fine-tuned to fit each experiment. Finally, by normalizing and fusing the features originating from the two modalities, the overall performance of the proposed design was improved across all evaluation measures, achieving accuracies of 92.49% and 95.3% by the MLP and SVM classifiers, respectively. The MLP classifier was outperformed in terms of all evaluation measures presented in this study, except for the Area Under Curve of Receiver Operator Characteristics (AUC-ROC), which signifies the ability of the proposed design in class separation. The achieved results highlighted the role of combining features from different levels and modalities for a more powerful analysis of the cry signals, as well as including a neural network (NN)-based classifier. Consequently, attaining a 95.3% accuracy for the separation of two entangled pathology groups of RDS and sepsis elucidated the promising potential for further studies with larger datasets and more pathology groups.
Collapse
|
19
|
Dhelim S, Chen L, Ning H, Nugent C. Artificial intelligence for suicide assessment using Audiovisual Cues: a review. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10290-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
20
|
Malhotra A, Jindal R. Deep learning techniques for suicide and depression detection from online social media: A scoping review. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
21
|
Zlatintsi A, Filntisis PP, Garoufis C, Efthymiou N, Maragos P, Menychtas A, Maglogiannis I, Tsanakas P, Sounapoglou T, Kalisperakis E, Karantinos T, Lazaridi M, Garyfalli V, Mantas A, Mantonakis L, Smyrnis N. E-Prevention: Advanced Support System for Monitoring and Relapse Prevention in Patients with Psychotic Disorders Analyzing Long-Term Multimodal Data from Wearables and Video Captures. SENSORS (BASEL, SWITZERLAND) 2022; 22:7544. [PMID: 36236643 PMCID: PMC9572170 DOI: 10.3390/s22197544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 09/23/2022] [Accepted: 09/26/2022] [Indexed: 06/16/2023]
Abstract
Wearable technologies and digital phenotyping foster unique opportunities for designing novel intelligent electronic services that can address various well-being issues in patients with mental disorders (i.e., schizophrenia and bipolar disorder), thus having the potential to revolutionize psychiatry and its clinical practice. In this paper, we present e-Prevention, an innovative integrated system for medical support that facilitates effective monitoring and relapse prevention in patients with mental disorders. The technologies offered through e-Prevention include: (i) long-term continuous recording of biometric and behavioral indices through a smartwatch; (ii) video recordings of patients while being interviewed by a clinician, using a tablet; (iii) automatic and systematic storage of these data in a dedicated Cloud server and; (iv) the ability of relapse detection and prediction. This paper focuses on the description of the e-Prevention system and the methodologies developed for the identification of feature representations that correlate with and can predict psychopathology and relapses in patients with mental disorders. Specifically, we tackle the problem of relapse detection and prediction using Machine and Deep Learning techniques on all collected data. The results are promising, indicating that such predictions could be made and leading eventually to the prediction of psychopathology and the prevention of relapses.
Collapse
Affiliation(s)
- Athanasia Zlatintsi
- School of ECE, National Technical University of Athens, 157 73 Athens, Greece
| | | | - Christos Garoufis
- School of ECE, National Technical University of Athens, 157 73 Athens, Greece
| | - Niki Efthymiou
- School of ECE, National Technical University of Athens, 157 73 Athens, Greece
| | - Petros Maragos
- School of ECE, National Technical University of Athens, 157 73 Athens, Greece
| | - Andreas Menychtas
- Department of Digital Systems, University of Piraeus, 185 34 Pireas, Greece
| | - Ilias Maglogiannis
- Department of Digital Systems, University of Piraeus, 185 34 Pireas, Greece
| | - Panayiotis Tsanakas
- School of ECE, National Technical University of Athens, 157 73 Athens, Greece
| | | | - Emmanouil Kalisperakis
- Laboratory of Cognitive Neuroscience and Sensorimotor Control, University Mental Health, Neurosciences and Precision Medicine Research Institute “COSTAS STEFANIS”, 115 27 Athens, Greece
- 1st Department of Psychiatry, Eginition Hospital, Medical School, National and Kapodistrian University of Athens, 115 28 Athens, Greece
| | - Thomas Karantinos
- Laboratory of Cognitive Neuroscience and Sensorimotor Control, University Mental Health, Neurosciences and Precision Medicine Research Institute “COSTAS STEFANIS”, 115 27 Athens, Greece
| | - Marina Lazaridi
- Laboratory of Cognitive Neuroscience and Sensorimotor Control, University Mental Health, Neurosciences and Precision Medicine Research Institute “COSTAS STEFANIS”, 115 27 Athens, Greece
- 1st Department of Psychiatry, Eginition Hospital, Medical School, National and Kapodistrian University of Athens, 115 28 Athens, Greece
| | - Vasiliki Garyfalli
- Laboratory of Cognitive Neuroscience and Sensorimotor Control, University Mental Health, Neurosciences and Precision Medicine Research Institute “COSTAS STEFANIS”, 115 27 Athens, Greece
- 1st Department of Psychiatry, Eginition Hospital, Medical School, National and Kapodistrian University of Athens, 115 28 Athens, Greece
| | - Asimakis Mantas
- Laboratory of Cognitive Neuroscience and Sensorimotor Control, University Mental Health, Neurosciences and Precision Medicine Research Institute “COSTAS STEFANIS”, 115 27 Athens, Greece
| | - Leonidas Mantonakis
- Laboratory of Cognitive Neuroscience and Sensorimotor Control, University Mental Health, Neurosciences and Precision Medicine Research Institute “COSTAS STEFANIS”, 115 27 Athens, Greece
- 1st Department of Psychiatry, Eginition Hospital, Medical School, National and Kapodistrian University of Athens, 115 28 Athens, Greece
| | - Nikolaos Smyrnis
- Laboratory of Cognitive Neuroscience and Sensorimotor Control, University Mental Health, Neurosciences and Precision Medicine Research Institute “COSTAS STEFANIS”, 115 27 Athens, Greece
- 2nd Department of Psychiatry, University General Hospital “ATTIKON”, Medical School, National and Kapodistrian University of Athens, 124 62 Athens, Greece
| |
Collapse
|
22
|
Depression detection based on nonlinear and linear speech features in I-vector/SVDA framework. Comput Biol Med 2022; 149:105926. [DOI: 10.1016/j.compbiomed.2022.105926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 07/07/2022] [Accepted: 07/30/2022] [Indexed: 11/18/2022]
|
23
|
Wu P, Wang R, Lin H, Zhang F, Tu J, Sun M. Automatic depression recognition by intelligent speech signal processing: A systematic survey. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2022. [DOI: 10.1049/cit2.12113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Pingping Wu
- Jiangsu Key Laboratory of Public Project Audit, School of Engineering Audit Nanjing Audit University Nanjing China
| | - Ruihao Wang
- School of Information Engineering Nanjing Audit University Nanjing China
| | - Han Lin
- Jiangsu Key Laboratory of Public Project Audit, School of Engineering Audit Nanjing Audit University Nanjing China
| | - Fanlong Zhang
- School of Information Engineering Nanjing Audit University Nanjing China
| | - Juan Tu
- Key Laboratory of Modern Acoustics (MOE), School of Physics Nanjing University Nanjing China
| | - Miao Sun
- Faculty of Electrical Engineering, Mathematics & Computer Science Delft University of Technology Delft The Netherlands
| |
Collapse
|
24
|
Kshirsagar PR, Manoharan H, Selvarajan S, Alterazi HA, Singh D, Lee HN. Perception Exploration on Robustness Syndromes With Pre-processing Entities Using Machine Learning Algorithm. Front Public Health 2022; 10:893989. [PMID: 35784247 PMCID: PMC9243559 DOI: 10.3389/fpubh.2022.893989] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Accepted: 04/27/2022] [Indexed: 11/13/2022] Open
Abstract
The majority of the current-generation individuals all around the world are dealing with a variety of health-related issues. The most common cause of health problems has been found as depression, which is caused by intellectual difficulties. However, most people are unable to recognize such occurrences in them, and no procedures for discriminating them from normal people have been created so far. Even some advanced technologies do not support distinct classes of individuals as language writing skills vary greatly across numerous places, making the central operations cumbersome. As a result, the primary goal of the proposed research is to create a unique model that can detect a variety of diseases in humans, thereby averting a high level of depression. A machine learning method known as the Convolutional Neural Network (CNN) model has been included into this evolutionary process for extracting numerous features in three distinct units. The CNN also detects early-stage problems since it accepts input in the form of writing and sketching, both of which are turned to images. Furthermore, with this sort of image emotion analysis, ordinary reactions may be easily differentiated, resulting in more accurate prediction results. The characteristics such as reference line, tilt, length, edge, constraint, alignment, separation, and sectors are analyzed to test the usefulness of CNN for recognizing abnormalities, and the extracted features provide an enhanced value of around 74%higher than the conventional models.
Collapse
Affiliation(s)
- Pravin R. Kshirsagar
- Department of Artificial Intelligence, G.H. Raisoni College of Engineering, Nagpur, India
| | - Hariprasath Manoharan
- Department of Electronics and Communication Engineering, Panimalar Institute of Technology, Chennai, India
- Hariprasath Manoharan
| | - Shitharth Selvarajan
- Department of Computer Science and Engineering, Kebri Dehar University, Kebri Dehar, Ethiopia
| | - Hassan A. Alterazi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Dilbag Singh
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, South Korea
| | - Heung-No Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, South Korea
- *Correspondence: Heung-No Lee
| |
Collapse
|
25
|
He L, Tiwari P, Lv C, Wu W, Guo L. Reducing noisy annotations for depression estimation from facial images. Neural Netw 2022; 153:120-129. [DOI: 10.1016/j.neunet.2022.05.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 04/17/2022] [Accepted: 05/25/2022] [Indexed: 11/28/2022]
|
26
|
Abstract
BACKGROUND In this modern era, depression is one of the most prevalent mental disorders from which millions of individuals are affected today. The symptoms of depression are heterogeneous and often coincide with other disorders such as bipolar disorder, Parkinson's, schizophrenia, etc. It is a serious mental illness that may lead to other health problems if left untreated. Currently, identifying individuals with depression is totally based on the expertise of the clinician's experience. In order to assist clinicians in identifying the characteristics and classifying depressed people, different types of data modalities and machine learning techniques have been incorporated by researchers in this field. This study aims to find the answers to some important questions related to the trend of publications, data modality, machine learning models, dataset usage, pre-processing techniques and feature extraction and selection techniques that are prevalent and guide the direction of future research on depression diagnosis. METHODS This systematic review was conducted using a broad range of articles from two major databases: IEEE Xplore and PubMed. Studies ranging from the years 2011 to April 2021 were retrieved from the databases resulting in a total of 590 articles (53 articles from the IEEE Xplore database and 537 articles from the PubMed database). Out of those, the articles which satisfied the defined inclusion criteria were investigated for further analysis. RESULTS A total of 135 articles were identified and analysed for this review. High growth in the number of publications has been observed in recent years. Furthermore, significant diversity in the use of data modalities and machine learning classifiers has also been noted in this study. fMRI data with an SVM classifier was found to be the most popular choice among researchers. In most of the studies, data scarcity and small sample size, particularly for neuroimaging data are major concerns. The use of identical data pre-processing tools for similar data modalities can be seen. This study also provides statistical analysis of the current framework with respect to the modality, machine learning classifier, sample size and accuracy by applying one-way ANOVA and the Tukey - Kramer test. CONCLUSION The results indicate that an effective fusion of machine learning techniques with a potential data modality has a promising future for assisting clinicians in automatic depression diagnosis.
Collapse
Affiliation(s)
- Sweta Bhadra
- Department of CS & IT, Cotton University, Guwahati, India
| | | |
Collapse
|
27
|
Ravi V, Wang J, Flint J, Alwan A. FRAUG: A FRAME RATE BASED DATA AUGMENTATION METHOD FOR DEPRESSION DETECTION FROM SPEECH SIGNALS. PROCEEDINGS OF THE ... IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING. ICASSP (CONFERENCE) 2022; 2022:6267-6271. [PMID: 35531125 PMCID: PMC9070766 DOI: 10.1109/icassp43922.2022.9746307] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this paper, a data augmentation method is proposed for depression detection from speech signals. Samples for data augmentation were created by changing the frame-width and the frame-shift parameters during the feature extraction process. Unlike other data augmentation methods (such as VTLP, pitch perturbation, or speed perturbation), the proposed method does not explicitly change acoustic parameters but rather the time-frequency resolution of frame-level features. The proposed method was evaluated using two different datasets, models, and input acoustic features. For the DAIC-WOZ (English) dataset when using the DepAudioNet model and mel-Spectrograms as input, the proposed method resulted in an improvement of 5.97% (validation) and 25.13% (test) when compared to the baseline. The improvements for the CONVERGE (Mandarin) dataset when using the x-vector embeddings with CNN as the backend and MFCCs as input features were 9.32% (validation) and 12.99% (test). Baseline systems do not incorporate any data augmentation. Further, the proposed method outperformed commonly used data-augmentation methods such as noise augmentation, VTLP, Speed, and Pitch Perturbation. All improvements were statistically significant.
Collapse
Affiliation(s)
- Vijay Ravi
- Dept. of Electrical and Computer Engineering, University of California, Los Angeles, USA
| | - Jinhan Wang
- Dept. of Electrical and Computer Engineering, University of California, Los Angeles, USA
| | - Jonathan Flint
- Dept. of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, USA
| | - Abeer Alwan
- Dept. of Electrical and Computer Engineering, University of California, Los Angeles, USA
| |
Collapse
|
28
|
Gupta S, Goel L, Singh A, Prasad A, Ullah MA. Psychological Analysis for Depression Detection from Social Networking Sites. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:4395358. [PMID: 35432513 PMCID: PMC9007657 DOI: 10.1155/2022/4395358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 02/28/2022] [Accepted: 03/24/2022] [Indexed: 11/23/2022]
Abstract
Rapid technological advancements are altering people's communication styles. With the growth of the Internet, social networks (Twitter, Facebook, Telegram, and Instagram) have become popular forums for people to share their thoughts, psychological behavior, and emotions. Psychological analysis analyzes text and extracts facts, features, and important information from the opinions of users. Researchers working on psychological analysis rely on social networks for the detection of depression-related behavior and activity. Social networks provide innumerable data on mindsets of a person's onset of depression, such as low sociology and activities such as undergoing medical treatment, a primary emphasis on oneself, and a high rate of activity during the day and night. In this paper, we used five machine learning classifiers-decision trees, K-nearest neighbor, support vector machines, logistic regression, and LSTM-for depression detection in tweets. The dataset is collected in two forms-balanced and imbalanced-where the oversampling of techniques is studied technically. The results show that the LSTM classification model outperforms the other baseline models in the depression detection healthcare approach for both balanced and imbalanced data.
Collapse
Affiliation(s)
- Sonam Gupta
- Department of Computer Science and Engineering, Ajay Kumar Garg Engineering College, Ghaziabad, India
| | - Lipika Goel
- Gokaraju Rangaraju Institute of Engineering and Technology, Hyderabad, India
| | - Arjun Singh
- School of Computing and Information Technology, Manipal University Jaipur, Jaipur, India
| | - Ajay Prasad
- University of Petroleum and Energy Studies, Dehradun, India
| | - Mohammad Aman Ullah
- Department of Computer Science and Engineering, International Islamic University Chittagong, Chittagong, Bangladesh
| |
Collapse
|
29
|
Machine Learning Algorithms for Depression: Diagnosis, Insights, and Research Directions. ELECTRONICS 2022. [DOI: 10.3390/electronics11071111] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Over the years, stress, anxiety, and modern-day fast-paced lifestyles have had immense psychological effects on people’s minds worldwide. The global technological development in healthcare digitizes the scopious data, enabling the map of the various forms of human biology more accurately than traditional measuring techniques. Machine learning (ML) has been accredited as an efficient approach for analyzing the massive amount of data in the healthcare domain. ML methodologies are being utilized in mental health to predict the probabilities of mental disorders and, therefore, execute potential treatment outcomes. This review paper enlists different machine learning algorithms used to detect and diagnose depression. The ML-based depression detection algorithms are categorized into three classes, classification, deep learning, and ensemble. A general model for depression diagnosis involving data extraction, pre-processing, training ML classifier, detection classification, and performance evaluation is presented. Moreover, it presents an overview to identify the objectives and limitations of different research studies presented in the domain of depression detection. Furthermore, it discussed future research possibilities in the field of depression diagnosis.
Collapse
|
30
|
ENIC: Ensemble and Nature Inclined Classification with Sparse Depiction based Deep and Transfer Learning for Biosignal Classification. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.108416] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
31
|
Automatic Identification of Emotional Information in Spanish TV Debates and Human–Machine Interactions. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12041902] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Automatic emotion detection is a very attractive field of research that can help build more natural human–machine interaction systems. However, several issues arise when real scenarios are considered, such as the tendency toward neutrality, which makes it difficult to obtain balanced datasets, or the lack of standards for the annotation of emotional categories. Moreover, the intrinsic subjectivity of emotional information increases the difficulty of obtaining valuable data to train machine learning-based algorithms. In this work, two different real scenarios were tackled: human–human interactions in TV debates and human–machine interactions with a virtual agent. For comparison purposes, an analysis of the emotional information was conducted in both. Thus, a profiling of the speakers associated with each task was carried out. Furthermore, different classification experiments show that deep learning approaches can be useful for detecting speakers’ emotional information, mainly for arousal, valence, and dominance levels, reaching a 0.7F1-score.
Collapse
|
32
|
Tonn P, Seule L, Degani Y, Herzinger S, Klein A, Schulze N. Evaluation of a Digital Content-free Speech Analysis Tool to Measure Affective Distress in Mental Health (Preprint). JMIR Form Res 2022; 6:e37061. [PMID: 36040767 PMCID: PMC9472064 DOI: 10.2196/37061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 05/08/2022] [Accepted: 05/09/2022] [Indexed: 11/13/2022] Open
Affiliation(s)
- Peter Tonn
- Neuropsychiatric Center of Hamburg, Hamburg, Germany
| | - Lea Seule
- Neuropsychiatric Center of Hamburg, Hamburg, Germany
| | | | | | | | - Nina Schulze
- Neuropsychiatric Center of Hamburg, Hamburg, Germany
| |
Collapse
|
33
|
Birnbaum ML, Abrami A, Heisig S, Ali A, Arenare E, Agurto C, Lu N, Kane JM, Cecchi G. Acoustic and Facial Features From Clinical Interviews for Machine Learning-Based Psychiatric Diagnosis: Algorithm Development. JMIR Ment Health 2022; 9:e24699. [PMID: 35072648 PMCID: PMC8822433 DOI: 10.2196/24699] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Revised: 04/29/2021] [Accepted: 12/01/2021] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND In contrast to all other areas of medicine, psychiatry is still nearly entirely reliant on subjective assessments such as patient self-report and clinical observation. The lack of objective information on which to base clinical decisions can contribute to reduced quality of care. Behavioral health clinicians need objective and reliable patient data to support effective targeted interventions. OBJECTIVE We aimed to investigate whether reliable inferences-psychiatric signs, symptoms, and diagnoses-can be extracted from audiovisual patterns in recorded evaluation interviews of participants with schizophrenia spectrum disorders and bipolar disorder. METHODS We obtained audiovisual data from 89 participants (mean age 25.3 years; male: 48/89, 53.9%; female: 41/89, 46.1%): individuals with schizophrenia spectrum disorders (n=41), individuals with bipolar disorder (n=21), and healthy volunteers (n=27). We developed machine learning models based on acoustic and facial movement features extracted from participant interviews to predict diagnoses and detect clinician-coded neuropsychiatric symptoms, and we assessed model performance using area under the receiver operating characteristic curve (AUROC) in 5-fold cross-validation. RESULTS The model successfully differentiated between schizophrenia spectrum disorders and bipolar disorder (AUROC 0.73) when aggregating face and voice features. Facial action units including cheek-raising muscle (AUROC 0.64) and chin-raising muscle (AUROC 0.74) provided the strongest signal for men. Vocal features, such as energy in the frequency band 1 to 4 kHz (AUROC 0.80) and spectral harmonicity (AUROC 0.78), provided the strongest signal for women. Lip corner-pulling muscle signal discriminated between diagnoses for both men (AUROC 0.61) and women (AUROC 0.62). Several psychiatric signs and symptoms were successfully inferred: blunted affect (AUROC 0.81), avolition (AUROC 0.72), lack of vocal inflection (AUROC 0.71), asociality (AUROC 0.63), and worthlessness (AUROC 0.61). CONCLUSIONS This study represents advancement in efforts to capitalize on digital data to improve diagnostic assessment and supports the development of a new generation of innovative clinical tools by employing acoustic and facial data analysis.
Collapse
Affiliation(s)
- Michael L Birnbaum
- Department of Psychiatry, The Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, United States.,The Feinstein Institute for Medical Research, Northwell Health, Manhasset, NY, United States.,The Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY, United States
| | - Avner Abrami
- Computational Biology Center, IBM Research, Yorktown Heights, NY, United States
| | - Stephen Heisig
- Icahn School of Medicine at Mount Sinai, New York City, NY, United States
| | - Asra Ali
- Department of Psychiatry, The Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, United States.,The Feinstein Institute for Medical Research, Northwell Health, Manhasset, NY, United States
| | - Elizabeth Arenare
- Department of Psychiatry, The Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, United States.,The Feinstein Institute for Medical Research, Northwell Health, Manhasset, NY, United States
| | - Carla Agurto
- Computational Biology Center, IBM Research, Yorktown Heights, NY, United States
| | - Nathaniel Lu
- Department of Psychiatry, The Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, United States.,The Feinstein Institute for Medical Research, Northwell Health, Manhasset, NY, United States
| | - John M Kane
- Department of Psychiatry, The Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, United States.,The Feinstein Institute for Medical Research, Northwell Health, Manhasset, NY, United States.,The Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY, United States
| | - Guillermo Cecchi
- Computational Biology Center, IBM Research, Yorktown Heights, NY, United States
| |
Collapse
|
34
|
Artificial Intelligence Enabled Personalised Assistive Tools to Enhance Education of Children with Neurodevelopmental Disorders-A Review. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph19031192. [PMID: 35162220 PMCID: PMC8835076 DOI: 10.3390/ijerph19031192] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 01/07/2022] [Accepted: 01/10/2022] [Indexed: 11/26/2022]
Abstract
Mental disorders (MDs) with onset in childhood or adolescence include neurodevelopmental disorders (NDDs) (intellectual disability and specific learning disabilities, such as dyslexia, attention deficit disorder (ADHD), and autism spectrum disorders (ASD)), as well as a broad range of mental health disorders (MHDs), including anxiety, depressive, stress-related and psychotic disorders. There is a high co-morbidity of NDDs and MHDs. Globally, there have been dramatic increases in the diagnosis of childhood-onset mental disorders, with a 2- to 3-fold rise in prevalence for several MHDs in the US over the past 20 years. Depending on the type of MD, children often grapple with social and communication deficits and difficulties adapting to changes in their environment, which can impact their ability to learn effectively. To improve outcomes for children, it is important to provide timely and effective interventions. This review summarises the range and effectiveness of AI-assisted tools, developed using machine learning models, which have been applied to address learning challenges in students with a range of NDDs. Our review summarises the evidence that AI tools can be successfully used to improve social interaction and supportive education. Based on the limitations of existing AI tools, we provide recommendations for the development of future AI tools with a focus on providing personalised learning for individuals with NDDs.
Collapse
|
35
|
Calić G, Petrović-Lazić M, Mentus T, Babac S. Acoustic features of voice in adults suffering from depression. PSIHOLOSKA ISTRAZIVANJA 2022. [DOI: 10.5937/psistra25-39224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
In order to examine the differences in people suffering from depression (EG, N=18) compared to the healthy controls (CG1, N=24) and people with the diagnosed psychogenic voice disorder (CG2, N=9), nine acoustic features of voice were assessed among the total of 51 participants using the MDVP software programme ("Kay Elemetrics" Corp., model 4300). Nine acoustic parameters were analysed on the basis of the sustained phonation of the vowel /a/. The results revealed that the mean values of all acoustic parameters differed in the EG compared to both the CG1 and CG2 as follows: the parameters which indicate frequency variability (Jitt, PPQ), amplitude variability (Shim, vAm, APQ) and noise and tremor parameters (NHR, VTI) were higher; only the parameters of fundamental frequency (F0) and soft index phonation (SPI) were lower (F0 compared to CG1, and SPI compared to CG1 and CG2). Only the PPQ parameter was not significant. vAm and APQ had the highest discriminant value for depression. The acoustic features of voice, analysed in this study with regard to the sustained phonation of a vowel, were different and discriminant in the EG compared to CG1 and CG2. In voice analysis, the parameters vAm and APQ could potentially be the markers indicative of depression. The results of this research point to the importance of the voice, that is, its acoustic indicators, in recognizing depression. Important parameters that could help create a programme for the automatic recognition of depression are those from the domain of voice intensity variation.
Collapse
|
36
|
Hajduska-Dér B, Kiss G, Sztahó D, Vicsi K, Simon L. The applicability of the Beck Depression Inventory and Hamilton Depression Scale in the automatic recognition of depression based on speech signal processing. Front Psychiatry 2022; 13:879896. [PMID: 35990073 PMCID: PMC9385975 DOI: 10.3389/fpsyt.2022.879896] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Accepted: 07/18/2022] [Indexed: 11/25/2022] Open
Abstract
Depression is a growing problem worldwide, impacting on an increasing number of patients, and also affecting health systems and the global economy. The most common diagnostical rating scales of depression are self-reported or clinician-administered, which differ in the symptoms that they are sampling. Speech is a promising biomarker in the diagnostical assessment of depression, due to non-invasiveness and cost and time efficiency. In our study, we try to achieve a more accurate, sensitive model for determining depression based on speech processing. Regression and classification models were also developed using a machine learning method. During the research, we had access to a large speech database that includes speech samples from depressed and healthy subjects. The database contains the Beck Depression Inventory (BDI) score of each subject and the Hamilton Rating Scale for Depression (HAMD) score of 20% of the subjects. This fact provided an opportunity to compare the usefulness of BDI and HAMD for training models of automatic recognition of depression based on speech signal processing. We found that the estimated values of the acoustic model trained on BDI scores are closer to HAMD assessment than to the BDI scores, and the partial application of HAMD scores instead of BDI scores in training improves the accuracy of automatic recognition of depression.
Collapse
Affiliation(s)
- Bálint Hajduska-Dér
- Department of Psychiatry and Psychotherapy, Semmelweis University, Budapest, Hungary
| | - Gábor Kiss
- Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary
| | - Dávid Sztahó
- Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary
| | - Klára Vicsi
- Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary
| | - Lajos Simon
- Department of Psychiatry and Psychotherapy, Semmelweis University, Budapest, Hungary
| |
Collapse
|
37
|
Klangpornkun N, Ruangritchai M, Munthuli A, Onsuwan C, Jaisin K, Pattanaseri K, Lortrakul J, Thanakulakkarachai P, Anansiripinyo T, Amornlaksananon A, Laohawee S, Tantibundhit C. Classification of Depression and Other Psychiatric Conditions Using Speech Features Extracted from a Thai Psychiatric and Verbal Screening Test. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021; 2021:651-656. [PMID: 34891377 DOI: 10.1109/embc46164.2021.9629571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Depression is a common and serious mental illness which negatively affects daily functioning. To prevent the progression of the illness into severe or long-term consequences, early diagnosis is crucial. We developed an automated speech feature analysis application for depression and other psychiatric disorders derived from a developed Thai psychiatric and verbal screening test. The screening test includes Thai's version of Patient Health Questionnaire-9 (PHQ-9) and Hamilton Depression Rating Scale (HAM-D), and 32 additional emotion-induced questions. Case-control study was conducted on speech features from 66 participants. Twenty seven of those had depression (DP), 12 had other psychiatric disorders (OP), and 27 were normal controls (NC). The five-fold cross-validation from 6 settings of 5 classifiers with the combination of PHQ-9 and HAM-D scores, and speech features were examined. Results showed highest performance from the multilayer perceptron (MLP) classifier which yielded 83.33% sensitivity, 91.67% specificity, and 83.33% accuracy, where negative-emotional questions were most effective in classification. The automated speech feature analysis showed promising results for screening patients with depression or other psychiatric disorders. The current application is accessible through smartphone, making it a feasible and intuitive setup for low-resource countries such as Thailand.
Collapse
|
38
|
Kwon N, Kim S. Depression Severity Detection Using Read Speech with a Divide-and-Conquer Approach. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021; 2021:633-637. [PMID: 34891373 DOI: 10.1109/embc46164.2021.9629868] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We propose a divide-and-conquer approach to detect depression severity using speech. We divide speech features based on their attributes, i.e., acoustic, prosodic, and language features, then fuse them in a modeling stage with fully connected deep neural networks. Experiments with 76 clinically depressed patients (38 severe and 38 moderate in terms of Montgomery-Asberg Depression Rating Scale (MADRS)), we obtain 78% accuracy while patients' self-reporting scores can classify their own status with 79% accuracy.
Collapse
|
39
|
Prabhu S, Mittal H, Varagani R, Jha S, Singh S. Harnessing emotions for depression detection. Pattern Anal Appl 2021. [DOI: 10.1007/s10044-021-01020-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
40
|
Niu M, Liu B, Tao J, Li Q. A time-frequency channel attention and vectorization network for automatic depression level prediction. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.04.056] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
41
|
Little B, Alshabrawy O, Stow D, Ferrier IN, McNaney R, Jackson DG, Ladha K, Ladha C, Ploetz T, Bacardit J, Olivier P, Gallagher P, O'Brien JT. Deep learning-based automated speech detection as a marker of social functioning in late-life depression. Psychol Med 2021; 51:1441-1450. [PMID: 31944174 PMCID: PMC8311821 DOI: 10.1017/s0033291719003994] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Revised: 10/23/2019] [Accepted: 12/13/2019] [Indexed: 11/24/2022]
Abstract
BACKGROUND Late-life depression (LLD) is associated with poor social functioning. However, previous research uses bias-prone self-report scales to measure social functioning and a more objective measure is lacking. We tested a novel wearable device to measure speech that participants encounter as an indicator of social interaction. METHODS Twenty nine participants with LLD and 29 age-matched controls wore a wrist-worn device continuously for seven days, which recorded their acoustic environment. Acoustic data were automatically analysed using deep learning models that had been developed and validated on an independent speech dataset. Total speech activity and the proportion of speech produced by the device wearer were both detected whilst maintaining participants' privacy. Participants underwent a neuropsychological test battery and clinical and self-report scales to measure severity of depression, general and social functioning. RESULTS Compared to controls, participants with LLD showed poorer self-reported social and general functioning. Total speech activity was much lower for participants with LLD than controls, with no overlap between groups. The proportion of speech produced by the participants was smaller for LLD than controls. In LLD, both speech measures correlated with attention and psychomotor speed performance but not with depression severity or self-reported social functioning. CONCLUSIONS Using this device, LLD was associated with lower levels of speech than controls and speech activity was related to psychomotor retardation. We have demonstrated that speech activity measured by wearable technology differentiated LLD from controls with high precision and, in this study, provided an objective measure of an aspect of real-world social functioning in LLD.
Collapse
Affiliation(s)
- Bethany Little
- Institute of Neuroscience, Newcastle University, Newcastle upon Tyne, UK
| | - Ossama Alshabrawy
- Interdisciplinary Computing and Complex BioSystems (ICOS) group, School of Computing, Newcastle University, Newcastle upon Tyne, UK
- Faculty of Science, Damietta University, New Damietta, Egypt
| | - Daniel Stow
- Institute of Health and Society, Newcastle University, Newcastle upon Tyne, UK
| | - I. Nicol Ferrier
- Institute of Neuroscience, Newcastle University, Newcastle upon Tyne, UK
| | | | - Daniel G. Jackson
- Open Lab, School of Computing, Newcastle University, Newcastle upon Tyne, UK
| | - Karim Ladha
- Open Lab, School of Computing, Newcastle University, Newcastle upon Tyne, UK
| | | | - Thomas Ploetz
- School of Interactive Computing, Georgia Institute of Technology, Atlanta, GA, USA
| | - Jaume Bacardit
- Interdisciplinary Computing and Complex BioSystems (ICOS) group, School of Computing, Newcastle University, Newcastle upon Tyne, UK
| | - Patrick Olivier
- Faculty of Information Technology, Monash University, Melbourne, Australia
| | - Peter Gallagher
- Institute of Neuroscience, Newcastle University, Newcastle upon Tyne, UK
| | - John T. O'Brien
- Institute of Neuroscience, Newcastle University, Newcastle upon Tyne, UK
- Department of Psychiatry, University of Cambridge, Cambridge, UK
| |
Collapse
|
42
|
Dong Y, Yang X. A hierarchical depression detection model based on vocal and emotional cues. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.02.019] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
43
|
Amir O, Anker SD, Gork I, Abraham WT, Pinney SP, Burkhoff D, Shallom ID, Haviv R, Edelman ER, Lotan C. Feasibility of remote speech analysis in evaluation of dynamic fluid overload in heart failure patients undergoing haemodialysis treatment. ESC Heart Fail 2021; 8:2467-2472. [PMID: 33955187 PMCID: PMC8318440 DOI: 10.1002/ehf2.13367] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2020] [Revised: 03/02/2021] [Accepted: 04/01/2021] [Indexed: 12/02/2022] Open
Abstract
Aims This study aimed to assess the ability of a voice analysis application to discriminate between wet and dry states in chronic heart failure (CHF) patients undergoing regular scheduled haemodialysis treatment due to volume overload as a result of their chronic renal failure. Methods and results In this single‐centre, observational study, five patients with CHF, peripheral oedema of ≥2, and pulmonary congestion‐related dyspnoea, undergoing haemodialysis three times per week, recorded five sentences into a standard smartphone/tablet before and after haemodialysis. Recordings were provided that same noon/early evening and the next morning and evening. Patient weight was measured at the hospital before and after each haemodialysis session. Recordings were analysed by a smartphone application (app) algorithm, to compare speech measures (SMs) of utterances collected over time. On average, patients provided recordings throughout 25.8 ± 3.9 dialysis treatment cycles, resulting in a total of 472 recordings. Weight changes of 1.95 ± 0.64 kg were documented during cycles. Median baseline SM prior to dialysis was 0.87 ± 0.17, and rose to 1.07 ± 0.15 following the end of the dialysis session, at noon (P = 0.0355), and remained at a similar level until the following morning (P = 0.007). By the evening of the day following dialysis, SMs returned to baseline levels (0.88 ± 0.19). Changes in patient weight immediately after dialysis positively correlated with SM changes, with the strongest correlation measured the evening of the dialysis day [slope: −0.40 ± 0.15 (95% confidence interval: −0.71 to −0.10), P = 0.0096]. Conclusions The fluid‐controlled haemodialysis model demonstrated the ability of the app algorithm to identify cyclic changes in SMs, which reflected bodily fluid levels. The voice analysis platform bears considerable potential as a harbinger of impending fluid overload in a range of clinical scenarios, which will enhance monitoring and triage efforts, ultimately optimizing remote CHF management.
Collapse
Affiliation(s)
- Offer Amir
- Department of Cardiology, Hadassah Medical Center, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel.,Azrieli Faculty of Medicine, Bar-Ilan University, Safed, Israel
| | - Stefan D Anker
- Department of Cardiology (CVK) and Berlin Institute of Health Center for Regenerative Therapies (BCRT), German Centre for Cardiovascular Research (DZHK) partner site Berlin, Charité-Universitätsmedizin Berlin, Augustenburger Platz, Berlin, D-13353, Germany
| | - Ittamar Gork
- Department of Cardiology, Hadassah Medical Center, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - William T Abraham
- Division of Cardiovascular Medicine, The Ohio State University, Columbus, OH, USA
| | | | | | | | | | - Elazer R Edelman
- Institute for Medical Engineering and Science, MIT, Cambridge, MA, USA
| | - Chaim Lotan
- Department of Cardiology, Hadassah Medical Center, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
44
|
Xiao Y, Wang T, Deng W, Yang L, Zeng B, Lao X, Zhang S, Liu X, Ouyang D, Liao G, Liang Y. Data mining of an acoustic biomarker in tongue cancers and its clinical validation. Cancer Med 2021; 10:3822-3835. [PMID: 33938165 PMCID: PMC8178493 DOI: 10.1002/cam4.3872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2020] [Revised: 01/30/2021] [Accepted: 03/14/2021] [Indexed: 11/08/2022] Open
Abstract
The promise of speech disorders as biomarkers in clinical examination has been identified in a broad spectrum of neurodegenerative diseases. However, to the best of our knowledge, a validated acoustic marker with established discriminative and evaluative properties has not yet been developed for oral tongue cancers. Here we cross-sectionally collected a screening dataset that included acoustic parameters extracted from 3 sustained vowels /ɑ/, /i/, /u/ and binary perceptual outcomes from 12 consonant-vowel syllables. We used a support vector machine with linear kernel function within this dataset to identify the formant centralization ratio (FCR) as a dominant predictor of different perceptual outcomes across gender and syllable. The Acoustic analysis, Perceptual evaluation and Quality of Life assessment (APeQoL) was used to validate the FCR in 33 patients with primary resectable oral tongue cancers. Measurements were taken before (pre-op) and four to six weeks after (post-op) surgery. The speech handicap index (SHI), a speech-specific questionnaire, was also administrated at these time points. Pre-op correlation analysis within the APeQoL revealed overall consistency and a strong correlation between FCR and SHI scores. FCRs also increased significantly with increasing T classification pre-operatively, especially for women. Longitudinally, the main effects of T classification, the extent of resection, and their interaction effects with time (pre-op vs. post-op) on FCRs were all significant. For pre-operative FCR, after merging the two datasets, a cut-off value of 0.970 produced an AUC of 0.861 (95% confidence interval: 0.785-0.938) for T3-4 patients. In sum, this study determined that FCR is an acoustic marker with the potential to detect disease and related speech function in oral tongue cancers. These are preliminary findings that need to be replicated in longitudinal studies and/or larger cohorts.
Collapse
Affiliation(s)
- Yudong Xiao
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Tao Wang
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Wei Deng
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Le Yang
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Bin Zeng
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Xiaomei Lao
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Sien Zhang
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Xiangqi Liu
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Daiqiao Ouyang
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Guiqing Liao
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Yujie Liang
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
45
|
Alemayehu D, Hemmings R, Natarajan K, Roychoudhury S. Perspectives on Virtual (Remote) Clinical Trials as the "New Normal" to Accelerate Drug Development. Clin Pharmacol Ther 2021; 111:373-381. [PMID: 33792920 DOI: 10.1002/cpt.2248] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Accepted: 03/12/2021] [Indexed: 01/27/2023]
Abstract
Although the digital revolution has transformed many areas of human endeavor, pharmaceutical drug development has been relatively slow to embrace the emerging technologies to enhance efficiency and optimize value in clinical trials. The topic has garnered even greater attention in the face of the coronavirus disease 2019 (COVID-19) outbreak, which has caused unprecedented disruption in the conduct of clinical trials and presented considerable challenges and opportunities for clinical trialists and data analysts. In this paper, we highlight the potential opportunity with virtual or digital clinical trials as viable options to enhance efficiency in drug development and, more importantly, in offering diverse patients easier and attractive means to participate in clinical trials. Special reference is made to the implication of artificial intelligence and machine-learning tools in trial execution and data acquisition, processing, and analysis in a virtual trial setting. Issues of patient safety, measurement validity, and data integrity are reviewed, and considerations are put forth with reference to the mitigation of underlying regulatory and operational barriers.
Collapse
|
46
|
Mohammadi Y, Moradi MH. Prediction of Depression Severity Scores Based on Functional Connectivity and Complexity of the EEG Signal. Clin EEG Neurosci 2021; 52:52-60. [PMID: 33040603 DOI: 10.1177/1550059420965431] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
BACKGROUND Depression is one of the most common mental disorders and the leading cause of functional disabilities. This study aims to specify whether functional connectivity and complexity of brain activity can predict the severity of depression (Beck Depression Inventory-II scores). METHODS Resting-state, eyes-closed EEG data were recorded from 60 depressed patients. A phase synchronization measure was used to estimate functional connectivity between all pairs of the EEG channels in the delta (1-4 Hz), theta (4-8 Hz), alpha (8-13 Hz), and beta (13-30 Hz) frequency bands. To quantify the local value of functional connectivity, 2 graph theory metrics, degree, and clustering coefficient (CC), were measured. Moreover, Lempel-Ziv complexity (LZC) and fuzzy entropy (FuzzyEn) were used to measure the complexity of the EEG signal. RESULTS Through correlation analysis, a significant negative relationship was found between graph metrics and depression severity in the alpha band. This association was strongly positive for the complexity measures in alpha and delta bands. Also, the linear regression model represented a substantial performance of depression severity prediction based on EEG features of the alpha band (r = 0.839; P < .0001, root mean square error score of 7.69). CONCLUSION We found that the brain activity of patients with depression was related to depression severity. Abnormal brain activity reflects an increase in the severity of depression. The presented regression model provides a quantitative depression severity prediction, which can inform the development of EEG state and exhibit potential desirable application for the medical treatment of the depressive disorder.
Collapse
Affiliation(s)
- Yousef Mohammadi
- Biomedical Engineering Department, Amirkabir University of Technology, Tehran, Islamic Republic of Iran
| | - Mohammad Hassan Moradi
- Biomedical Engineering Department, Amirkabir University of Technology, Tehran, Islamic Republic of Iran
| |
Collapse
|
47
|
Analysis of gender and identity issues in depression detection on de-identified speech. COMPUT SPEECH LANG 2021. [DOI: 10.1016/j.csl.2020.101118] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
48
|
Solomon DH, Rudin RS. Digital health technologies: opportunities and challenges in rheumatology. Nat Rev Rheumatol 2020; 16:525-535. [PMID: 32709998 DOI: 10.1038/s41584-020-0461-x] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/24/2020] [Indexed: 12/22/2022]
Abstract
The past decade in rheumatology has seen tremendous innovation in digital health technologies, including the electronic health record, virtual visits, mobile health, wearable technology, digital therapeutics, artificial intelligence and machine learning. The increased availability of these technologies offers opportunities for improving important aspects of rheumatology, including access, outcomes, adherence and research. However, despite its growth in some areas, particularly with non-health-care consumers, digital health technology has not substantially changed the delivery of rheumatology care. This Review discusses key barriers and opportunities to improve application of digital health technologies in rheumatology. Key topics include smart design, voice enablement and the integration of electronic patient-reported outcomes. Smart design involves active engagement with the end users of the technologies, including patients and clinicians through focus groups, user testing sessions and prototype review. Voice enablement using voice assistants could be critical for enabling patients with hand arthritis to effectively use smartphone apps and might facilitate patient engagement with many technologies. Tracking many rheumatic diseases requires frequent monitoring of patient-reported outcomes. Current practice only collects this information sporadically, and rarely between visits. Digital health technology could enable patient-reported outcomes to inform appropriate timing of face-to-face visits and enable improved application of treat-to-target strategies. However, best practice standards for digital health technologies do not yet exist. To achieve the potential of digital health technology in rheumatology, rheumatology professionals will need to be more engaged upstream in the technology design process and provide leadership to effectively incorporate the new tools into clinical care.
Collapse
Affiliation(s)
- Daniel H Solomon
- Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
| | | |
Collapse
|
49
|
Su C, Xu Z, Pathak J, Wang F. Deep learning in mental health outcome research: a scoping review. Transl Psychiatry 2020; 10:116. [PMID: 32532967 PMCID: PMC7293215 DOI: 10.1038/s41398-020-0780-3] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/31/2019] [Revised: 02/17/2020] [Accepted: 02/26/2020] [Indexed: 12/17/2022] Open
Abstract
Mental illnesses, such as depression, are highly prevalent and have been shown to impact an individual's physical health. Recently, artificial intelligence (AI) methods have been introduced to assist mental health providers, including psychiatrists and psychologists, for decision-making based on patients' historical data (e.g., medical records, behavioral data, social media usage, etc.). Deep learning (DL), as one of the most recent generation of AI technologies, has demonstrated superior performance in many real-world applications ranging from computer vision to healthcare. The goal of this study is to review existing research on applications of DL algorithms in mental health outcome research. Specifically, we first briefly overview the state-of-the-art DL techniques. Then we review the literature relevant to DL applications in mental health outcomes. According to the application scenarios, we categorize these relevant articles into four groups: diagnosis and prognosis based on clinical data, analysis of genetics and genomics data for understanding mental health conditions, vocal and visual expression data analysis for disease detection, and estimation of risk of mental illness using social media data. Finally, we discuss challenges in using DL algorithms to improve our understanding of mental health conditions and suggest several promising directions for their applications in improving mental health diagnosis and treatment.
Collapse
Affiliation(s)
- Chang Su
- grid.5386.8000000041936877XDepartment of Healthcare Policy and Research, Weill Cornell Medicine, New York, NY USA
| | - Zhenxing Xu
- grid.5386.8000000041936877XDepartment of Healthcare Policy and Research, Weill Cornell Medicine, New York, NY USA
| | - Jyotishman Pathak
- grid.5386.8000000041936877XDepartment of Healthcare Policy and Research, Weill Cornell Medicine, New York, NY USA
| | - Fei Wang
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
50
|
|