1
|
Zhang Z, Zhang S, Ni D, Wei Z, Yang K, Jin S, Huang G, Liang Z, Zhang L, Li L, Ding H, Zhang Z, Wang J. Multimodal Sensing for Depression Risk Detection: Integrating Audio, Video, and Text Data. SENSORS (BASEL, SWITZERLAND) 2024; 24:3714. [PMID: 38931497 PMCID: PMC11207438 DOI: 10.3390/s24123714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 05/30/2024] [Accepted: 06/06/2024] [Indexed: 06/28/2024]
Abstract
Depression is a major psychological disorder with a growing impact worldwide. Traditional methods for detecting the risk of depression, predominantly reliant on psychiatric evaluations and self-assessment questionnaires, are often criticized for their inefficiency and lack of objectivity. Advancements in deep learning have paved the way for innovations in depression risk detection methods that fuse multimodal data. This paper introduces a novel framework, the Audio, Video, and Text Fusion-Three Branch Network (AVTF-TBN), designed to amalgamate auditory, visual, and textual cues for a comprehensive analysis of depression risk. Our approach encompasses three dedicated branches-Audio Branch, Video Branch, and Text Branch-each responsible for extracting salient features from the corresponding modality. These features are subsequently fused through a multimodal fusion (MMF) module, yielding a robust feature vector that feeds into a predictive modeling layer. To further our research, we devised an emotion elicitation paradigm based on two distinct tasks-reading and interviewing-implemented to gather a rich, sensor-based depression risk detection dataset. The sensory equipment, such as cameras, captures subtle facial expressions and vocal characteristics essential for our analysis. The research thoroughly investigates the data generated by varying emotional stimuli and evaluates the contribution of different tasks to emotion evocation. During the experiment, the AVTF-TBN model has the best performance when the data from the two tasks are simultaneously used for detection, where the F1 Score is 0.78, Precision is 0.76, and Recall is 0.81. Our experimental results confirm the validity of the paradigm and demonstrate the efficacy of the AVTF-TBN model in detecting depression risk, showcasing the crucial role of sensor-based data in mental health detection.
Collapse
Affiliation(s)
- Zhenwei Zhang
- School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen 518060, China; (Z.Z.); (D.N.); (G.H.); (Z.L.); (L.Z.); (L.L.); (H.D.)
- Guangdong Provincial Key Laboratory of Biomedical Measurements and Ultrasound Imaging, Shenzhen 518060, China
| | - Shengming Zhang
- Affiliated Mental Health Center, Southern University of Science and Technology, Shenzhen 518055, China;
| | - Dong Ni
- School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen 518060, China; (Z.Z.); (D.N.); (G.H.); (Z.L.); (L.Z.); (L.L.); (H.D.)
- Guangdong Provincial Key Laboratory of Biomedical Measurements and Ultrasound Imaging, Shenzhen 518060, China
| | - Zhaoguo Wei
- Shenzhen Kangning Hospital, Shenzhen 518020, China; (Z.W.); (K.Y.); (S.J.)
- Shenzhen Mental Health Center, Shenzhen 518020, China
| | - Kongjun Yang
- Shenzhen Kangning Hospital, Shenzhen 518020, China; (Z.W.); (K.Y.); (S.J.)
- Shenzhen Mental Health Center, Shenzhen 518020, China
| | - Shan Jin
- Shenzhen Kangning Hospital, Shenzhen 518020, China; (Z.W.); (K.Y.); (S.J.)
- Shenzhen Mental Health Center, Shenzhen 518020, China
| | - Gan Huang
- School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen 518060, China; (Z.Z.); (D.N.); (G.H.); (Z.L.); (L.Z.); (L.L.); (H.D.)
- Guangdong Provincial Key Laboratory of Biomedical Measurements and Ultrasound Imaging, Shenzhen 518060, China
| | - Zhen Liang
- School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen 518060, China; (Z.Z.); (D.N.); (G.H.); (Z.L.); (L.Z.); (L.L.); (H.D.)
- Guangdong Provincial Key Laboratory of Biomedical Measurements and Ultrasound Imaging, Shenzhen 518060, China
| | - Li Zhang
- School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen 518060, China; (Z.Z.); (D.N.); (G.H.); (Z.L.); (L.Z.); (L.L.); (H.D.)
- Guangdong Provincial Key Laboratory of Biomedical Measurements and Ultrasound Imaging, Shenzhen 518060, China
| | - Linling Li
- School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen 518060, China; (Z.Z.); (D.N.); (G.H.); (Z.L.); (L.Z.); (L.L.); (H.D.)
- Guangdong Provincial Key Laboratory of Biomedical Measurements and Ultrasound Imaging, Shenzhen 518060, China
| | - Huijun Ding
- School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen 518060, China; (Z.Z.); (D.N.); (G.H.); (Z.L.); (L.Z.); (L.L.); (H.D.)
- Guangdong Provincial Key Laboratory of Biomedical Measurements and Ultrasound Imaging, Shenzhen 518060, China
| | - Zhiguo Zhang
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China
- Peng Cheng Laboratory, Shenzhen 518055, China
| | - Jianhong Wang
- Shenzhen Kangning Hospital, Shenzhen 518020, China; (Z.W.); (K.Y.); (S.J.)
- Shenzhen Mental Health Center, Shenzhen 518020, China
| |
Collapse
|
2
|
Shi H, Fan Y, Zhang Y, Li X, Shu Y, Deng X, Zhang Y, Zheng Y, Yang J. Intelligent bell facial paralysis assessment: a facial recognition model using improved SSD network. Sci Rep 2024; 14:12763. [PMID: 38834661 DOI: 10.1038/s41598-024-63478-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Accepted: 05/29/2024] [Indexed: 06/06/2024] Open
Abstract
With the continuous progress of technology, the subject of life science plays an increasingly important role, among which the application of artificial intelligence in the medical field has attracted more and more attention. Bell facial palsy, a neurological ailment characterized by facial muscle weakness or paralysis, exerts a profound impact on patients' facial expressions and masticatory abilities, thereby inflicting considerable distress upon their overall quality of life and mental well-being. In this study, we designed a facial attribute recognition model specifically for individuals with Bell's facial palsy. The model utilizes an enhanced SSD network and scientific computing to perform a graded assessment of the patients' condition. By replacing the VGG network with a more efficient backbone, we improved the model's accuracy and significantly reduced its computational burden. The results show that the improved SSD network has an average precision of 87.9% in the classification of light, middle and severe facial palsy, and effectively performs the classification of patients with facial palsy, where scientific calculations also increase the precision of the classification. This is also one of the most significant contributions of this article, which provides intelligent means and objective data for future research on intelligent diagnosis and treatment as well as progressive rehabilitation.
Collapse
Affiliation(s)
- Haiping Shi
- The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, Anhui, China
- Anhui University of Chinese Medicine, Hefei, Anhui, China
| | - Yinqiu Fan
- The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, Anhui, China
- Anhui University of Chinese Medicine, Hefei, Anhui, China
| | - Yu Zhang
- The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, Anhui, China
- Anhui University of Chinese Medicine, Hefei, Anhui, China
| | - Xiaowei Li
- Anhui University of Chinese Medicine, Hefei, Anhui, China
| | - Yuling Shu
- Anhui University of Chinese Medicine, Hefei, Anhui, China
| | - Xinyuan Deng
- Anhui University of Chinese Medicine, Hefei, Anhui, China
| | - Yating Zhang
- Anhui University of Chinese Medicine, Hefei, Anhui, China
| | - Yunzi Zheng
- Anhui University of Chinese Medicine, Hefei, Anhui, China
| | - Jun Yang
- The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, Anhui, China.
- Anhui University of Chinese Medicine, Hefei, Anhui, China.
| |
Collapse
|
3
|
Xu X, Li J, Zhu Z, Zhao L, Wang H, Song C, Chen Y, Zhao Q, Yang J, Pei Y. A Comprehensive Review on Synergy of Multi-Modal Data and AI Technologies in Medical Diagnosis. Bioengineering (Basel) 2024; 11:219. [PMID: 38534493 DOI: 10.3390/bioengineering11030219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 02/15/2024] [Accepted: 02/21/2024] [Indexed: 03/28/2024] Open
Abstract
Disease diagnosis represents a critical and arduous endeavor within the medical field. Artificial intelligence (AI) techniques, spanning from machine learning and deep learning to large model paradigms, stand poised to significantly augment physicians in rendering more evidence-based decisions, thus presenting a pioneering solution for clinical practice. Traditionally, the amalgamation of diverse medical data modalities (e.g., image, text, speech, genetic data, physiological signals) is imperative to facilitate a comprehensive disease analysis, a topic of burgeoning interest among both researchers and clinicians in recent times. Hence, there exists a pressing need to synthesize the latest strides in multi-modal data and AI technologies in the realm of medical diagnosis. In this paper, we narrow our focus to five specific disorders (Alzheimer's disease, breast cancer, depression, heart disease, epilepsy), elucidating advanced endeavors in their diagnosis and treatment through the lens of artificial intelligence. Our survey not only delineates detailed diagnostic methodologies across varying modalities but also underscores commonly utilized public datasets, the intricacies of feature engineering, prevalent classification models, and envisaged challenges for future endeavors. In essence, our research endeavors to contribute to the advancement of diagnostic methodologies, furnishing invaluable insights for clinical decision making.
Collapse
Affiliation(s)
- Xi Xu
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Jianqiang Li
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Zhichao Zhu
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Linna Zhao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Huina Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Changwei Song
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Yining Chen
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Qing Zhao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Jijiang Yang
- Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China
| | - Yan Pei
- School of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu 965-8580, Japan
| |
Collapse
|
4
|
Khoo LS, Lim MK, Chong CY, McNaney R. Machine Learning for Multimodal Mental Health Detection: A Systematic Review of Passive Sensing Approaches. SENSORS (BASEL, SWITZERLAND) 2024; 24:348. [PMID: 38257440 PMCID: PMC10820860 DOI: 10.3390/s24020348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 12/14/2023] [Accepted: 12/18/2023] [Indexed: 01/24/2024]
Abstract
As mental health (MH) disorders become increasingly prevalent, their multifaceted symptoms and comorbidities with other conditions introduce complexity to diagnosis, posing a risk of underdiagnosis. While machine learning (ML) has been explored to mitigate these challenges, we hypothesized that multiple data modalities support more comprehensive detection and that non-intrusive collection approaches better capture natural behaviors. To understand the current trends, we systematically reviewed 184 studies to assess feature extraction, feature fusion, and ML methodologies applied to detect MH disorders from passively sensed multimodal data, including audio and video recordings, social media, smartphones, and wearable devices. Our findings revealed varying correlations of modality-specific features in individualized contexts, potentially influenced by demographics and personalities. We also observed the growing adoption of neural network architectures for model-level fusion and as ML algorithms, which have demonstrated promising efficacy in handling high-dimensional features while modeling within and cross-modality relationships. This work provides future researchers with a clear taxonomy of methodological approaches to multimodal detection of MH disorders to inspire future methodological advancements. The comprehensive analysis also guides and supports future researchers in making informed decisions to select an optimal data source that aligns with specific use cases based on the MH disorder of interest.
Collapse
Affiliation(s)
- Lin Sze Khoo
- Department of Human-Centered Computing, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia;
| | - Mei Kuan Lim
- School of Information Technology, Monash University Malaysia, Subang Jaya 46150, Malaysia; (M.K.L.); (C.Y.C.)
| | - Chun Yong Chong
- School of Information Technology, Monash University Malaysia, Subang Jaya 46150, Malaysia; (M.K.L.); (C.Y.C.)
| | - Roisin McNaney
- Department of Human-Centered Computing, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia;
| |
Collapse
|
5
|
Mao K, Wu Y, Chen J. A systematic review on automated clinical depression diagnosis. NPJ MENTAL HEALTH RESEARCH 2023; 2:20. [PMID: 38609509 PMCID: PMC10955993 DOI: 10.1038/s44184-023-00040-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 09/27/2023] [Indexed: 04/14/2024]
Abstract
Assessing mental health disorders and determining treatment can be difficult for a number of reasons, including access to healthcare providers. Assessments and treatments may not be continuous and can be limited by the unpredictable nature of psychiatric symptoms. Machine-learning models using data collected in a clinical setting can improve diagnosis and treatment. Studies have used speech, text, and facial expression analysis to identify depression. Still, more research is needed to address challenges such as the need for multimodality machine-learning models for clinical use. We conducted a review of studies from the past decade that utilized speech, text, and facial expression analysis to detect depression, as defined by the Diagnostic and Statistical Manual of Mental Disorders (DSM-5), using the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guideline. We provide information on the number of participants, techniques used to assess clinical outcomes, speech-eliciting tasks, machine-learning algorithms, metrics, and other important discoveries for each study. A total of 544 studies were examined, 264 of which satisfied the inclusion criteria. A database has been created containing the query results and a summary of how different features are used to detect depression. While machine learning shows its potential to enhance mental health disorder evaluations, some obstacles must be overcome, especially the requirement for more transparent machine-learning models for clinical purposes. Considering the variety of datasets, feature extraction techniques, and metrics used in this field, guidelines have been provided to collect data and train machine-learning models to guarantee reproducibility and generalizability across different contexts.
Collapse
Affiliation(s)
- Kaining Mao
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, T6G 2R3, Canada
| | - Yuqi Wu
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, T6G 2R3, Canada
| | - Jie Chen
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, T6G 2R3, Canada.
| |
Collapse
|
6
|
Wang JZ, Zhao S, Wu C, Adams RB, Newman MG, Shafir T, Tsachor R. Unlocking the Emotional World of Visual Media: An Overview of the Science, Research, and Impact of Understanding Emotion: Drawing Insights From Psychology, Engineering, and the Arts, This Article Provides a Comprehensive Overview of the Field of Emotion Analysis in Visual Media and Discusses the Latest Research, Systems, Challenges, Ethical Implications, and Potential Impact of Artificial Emotional Intelligence on Society. PROCEEDINGS OF THE IEEE. INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS 2023; 111:1236-1286. [PMID: 37859667 PMCID: PMC10586271 DOI: 10.1109/jproc.2023.3273517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/21/2023]
Abstract
The emergence of artificial emotional intelligence technology is revolutionizing the fields of computers and robotics, allowing for a new level of communication and understanding of human behavior that was once thought impossible. While recent advancements in deep learning have transformed the field of computer vision, automated understanding of evoked or expressed emotions in visual media remains in its infancy. This foundering stems from the absence of a universally accepted definition of "emotion," coupled with the inherently subjective nature of emotions and their intricate nuances. In this article, we provide a comprehensive, multidisciplinary overview of the field of emotion analysis in visual media, drawing on insights from psychology, engineering, and the arts. We begin by exploring the psychological foundations of emotion and the computational principles that underpin the understanding of emotions from images and videos. We then review the latest research and systems within the field, accentuating the most promising approaches. We also discuss the current technological challenges and limitations of emotion analysis, underscoring the necessity for continued investigation and innovation. We contend that this represents a "Holy Grail" research problem in computing and delineate pivotal directions for future inquiry. Finally, we examine the ethical ramifications of emotion-understanding technologies and contemplate their potential societal impacts. Overall, this article endeavors to equip readers with a deeper understanding of the domain of emotion analysis in visual media and to inspire further research and development in this captivating and rapidly evolving field.
Collapse
Affiliation(s)
- James Z Wang
- College of Information Sciences and Technology, The Pennsylvania State University, University Park, PA 16802 USA
| | - Sicheng Zhao
- Beijing National Research Center for Information Science and Technology (BNRist), Tsinghua University, Beijing 100084, China
| | - Chenyan Wu
- College of Information Sciences and Technology, The Pennsylvania State University, University Park, PA 16802 USA
| | - Reginald B Adams
- Department of Psychology, The Pennsylvania State University, University Park, PA 16802 USA
| | - Michelle G Newman
- Department of Psychology, The Pennsylvania State University, University Park, PA 16802 USA
| | - Tal Shafir
- Emily Sagol Creative Arts Therapies Research Center, University of Haifa, Haifa 3498838, Israel
| | - Rachelle Tsachor
- School of Theatre and Music, University of Illinois at Chicago, Chicago, IL 60607 USA
| |
Collapse
|
7
|
Ma Y, Shen J, Zhao Z, Liang H, Tan Y, Liu Z, Qian K, Yang M, Hu B. What Can Facial Movements Reveal? Depression Recognition and Analysis Based on Optical Flow Using Bayesian Networks. IEEE Trans Neural Syst Rehabil Eng 2023; 31:3459-3468. [PMID: 37581961 DOI: 10.1109/tnsre.2023.3305351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2023]
Abstract
Recent evidence have demonstrated that facial expressions could be a valid and important aspect for depression recognition. Although various works have been achieved in automatic depression recognition, it is a challenge to explore the inherent nuances of facial expressions that might reveal the underlying differences between depressed patients and healthy subjects under different stimuli. There is a lack of an undisturbed system that monitors depressive patients' mental states in various free-living scenarios, so this paper steps towards building a classification model where data collection, feature extraction, depression recognition and facial actions analysis are conducted to infer the differences of facial movements between depressive patients and healthy subjects. In this study, we firstly present a plan of dividing facial regions of interest to extract optical flow features of facial expressions for depression recognition. We then propose facial movements coefficients utilising discrete wavelet transformation. Specifically, Bayesian Networks equipped with construction of Pearson Correlation Coefficients based on discrete wavelet transformation is learnt, which allows for analysing movements of different facial regions. We evaluate our method on a clinically validated dataset of 30 depressed patients and 30 healthy control subjects, and experiments results obtained the accuracy and recall of 81.7%, 96.7%, respectively, outperforming other features for comparison. Most importantly, the Bayesian Networks we built on the coefficients under different stimuli may reveal some facial action patterns of depressed subjects, which have a potential to assist the automatic diagnosis of depression.
Collapse
|
8
|
Ye J, Yu Y, Fu G, Zheng Y, Liu Y, Zhu Y, Wang Q. Analysis and Recognition of Voluntary Facial Expression Mimicry Based on Depressed Patients. IEEE J Biomed Health Inform 2023; 27:3698-3709. [PMID: 37030686 DOI: 10.1109/jbhi.2023.3260816] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/10/2023]
Abstract
Many clinical studies have shown that facial expression recognition and cognitive function are impaired in depressed patients. Different from spontaneous facial expression mimicry (SFEM), 164 subjects (82 in a case group and 82 in a control group) participated in our voluntary facial expression mimicry (VFEM) experiment using expressions of neutrality, anger, disgust, fear, happiness, sadness and surprise. Our research is as follows. First, we collected a large amount of subject data for VFEM. Second, we extracted the geometric features of subject facial expression images for VFEM and used Spearman correlation analysis, a random forest, and logistic regression-based recursive feature elimination (LR-RFE) to perform feature selection. The features selected revealed the difference between the case group and the control group. Third, we combined geometric features with the original images and improved the advanced deep learning facial expression recognition (FER) algorithms in different systems. We propose the E-ViT and E-ResNet based on VFEM. The accuracies and F1 scores were higher than those of the baseline models, respectively. Our research proved that it is effective to use feature selection to screen geometric features and combine them with a deep learning model for depression facial expression recognition.
Collapse
|
9
|
Kleine AK, Kokje E, Lermer E, Gaube S. Attitudes Toward the Adoption of 2 Artificial Intelligence-Enabled Mental Health Tools Among Prospective Psychotherapists: Cross-sectional Study. JMIR Hum Factors 2023; 10:e46859. [PMID: 37436801 PMCID: PMC10372564 DOI: 10.2196/46859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/08/2023] [Accepted: 05/14/2023] [Indexed: 07/13/2023] Open
Abstract
BACKGROUND Despite growing efforts to develop user-friendly artificial intelligence (AI) applications for clinical care, their adoption remains limited because of the barriers at individual, organizational, and system levels. There is limited research on the intention to use AI systems in mental health care. OBJECTIVE This study aimed to address this gap by examining the predictors of psychology students' and early practitioners' intention to use 2 specific AI-enabled mental health tools based on the Unified Theory of Acceptance and Use of Technology. METHODS This cross-sectional study included 206 psychology students and psychotherapists in training to examine the predictors of their intention to use 2 AI-enabled mental health care tools. The first tool provides feedback to the psychotherapist on their adherence to motivational interviewing techniques. The second tool uses patient voice samples to derive mood scores that the therapists may use for treatment decisions. Participants were presented with graphic depictions of the tools' functioning mechanisms before measuring the variables of the extended Unified Theory of Acceptance and Use of Technology. In total, 2 structural equation models (1 for each tool) were specified, which included direct and mediated paths for predicting tool use intentions. RESULTS Perceived usefulness and social influence had a positive effect on the intention to use the feedback tool (P<.001) and the treatment recommendation tool (perceived usefulness, P=.01 and social influence, P<.001). However, trust was unrelated to use intentions for both the tools. Moreover, perceived ease of use was unrelated (feedback tool) and even negatively related (treatment recommendation tool) to use intentions when considering all predictors (P=.004). In addition, a positive relationship between cognitive technology readiness (P=.02) and the intention to use the feedback tool and a negative relationship between AI anxiety and the intention to use the feedback tool (P=.001) and the treatment recommendation tool (P<.001) were observed. CONCLUSIONS The results shed light on the general and tool-dependent drivers of AI technology adoption in mental health care. Future research may explore the technological and user group characteristics that influence the adoption of AI-enabled tools in mental health care.
Collapse
Affiliation(s)
- Anne-Kathrin Kleine
- Department of Psychology, Ludwig Maximilian University of Munich, Munich, Germany
| | - Eesha Kokje
- Department of Psychology, Ludwig Maximilian University of Munich, Munich, Germany
| | - Eva Lermer
- Department of Psychology, Ludwig Maximilian University of Munich, Munich, Germany
- Technical University of Applied Sciences Augsburg, Augsburg, Germany
| | - Susanne Gaube
- Department of Psychology, Ludwig Maximilian University of Munich, Munich, Germany
| |
Collapse
|
10
|
Li Y, Liu Z, Zhou L, Yuan X, Shangguan Z, Hu X, Hu B. A facial depression recognition method based on hybrid multi-head cross attention network. Front Neurosci 2023; 17:1188434. [PMID: 37292164 PMCID: PMC10244529 DOI: 10.3389/fnins.2023.1188434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Accepted: 05/02/2023] [Indexed: 06/10/2023] Open
Abstract
Introduction Deep-learn methods based on convolutional neural networks (CNNs) have demonstrated impressive performance in depression analysis. Nevertheless, some critical challenges need to be resolved in these methods: (1) It is still difficult for CNNs to learn long-range inductive biases in the low-level feature extraction of different facial regions because of the spatial locality. (2) It is difficult for a model with only a single attention head to concentrate on various parts of the face simultaneously, leading to less sensitivity to other important facial regions associated with depression. In the case of facial depression recognition, many of the clues come from a few areas of the face simultaneously, e.g., the mouth and eyes. Methods To address these issues, we present an end-to-end integrated framework called Hybrid Multi-head Cross Attention Network (HMHN), which includes two stages. The first stage consists of the Grid-Wise Attention block (GWA) and Deep Feature Fusion block (DFF) for the low-level visual depression feature learning. In the second stage, we obtain the global representation by encoding high-order interactions among local features with Multi-head Cross Attention block (MAB) and Attention Fusion block (AFB). Results We experimented on AVEC2013 and AVEC2014 depression datasets. The results of AVEC 2013 (RMSE = 7.38, MAE = 6.05) and AVEC 2014 (RMSE = 7.60, MAE = 6.01) demonstrated the efficacy of our method and outperformed most of the state-of-the-art video-based depression recognition approaches. Discussion We proposed a deep learning hybrid model for depression recognition by capturing the higher-order interactions between the depression features of multiple facial regions, which can effectively reduce the error in depression recognition and gives great potential for clinical experiments.
Collapse
|
11
|
Liu Z, Yuan X, Li Y, Shangguan Z, Zhou L, Hu B. PRA-Net: Part-and-Relation Attention Network for depression recognition from facial expression. Comput Biol Med 2023; 157:106589. [PMID: 36934531 DOI: 10.1016/j.compbiomed.2023.106589] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Revised: 01/04/2023] [Accepted: 01/22/2023] [Indexed: 01/25/2023]
Abstract
Artificial intelligence methods are widely applied to depression recognition and provide an objective solution. Many effective automated methods for detecting depression use facial expressions, which are strong indicators to reflect psychiatric disorders. However, these methods suffer from insufficient representations of depression. To this end, we propose a novel Part-and-Relation Attention Network (PRA-Net), which can enhance depression representations by accurately focusing on features that are highly correlated with depression. Specifically, we first perform partition on the feature map instead of the original image, in order to obtain part features rich in semantic information. Afterwards, self-attention is used to calculate the weight of each part feature. Following, the relationship between the part feature and the global content representation is explored by relation attention to refine the weight. Finally, all features are aggregated into a more compact and depression-informative representation via both weights for depression score prediction. Extensive experiments demonstrate the superiority of our method. Compared to other end-to-end methods, our method achieves state-of-the-art performance on AVEC2013 and AVEC2014.
Collapse
Affiliation(s)
- Zhenyu Liu
- Gansu Provincial Key Laboratory of Wearable Computing School of Information Science and Engineering Lanzhou University, Lanzhou, China.
| | - Xiaoyan Yuan
- Gansu Provincial Key Laboratory of Wearable Computing School of Information Science and Engineering Lanzhou University, Lanzhou, China.
| | - Yutong Li
- Gansu Provincial Key Laboratory of Wearable Computing School of Information Science and Engineering Lanzhou University, Lanzhou, China.
| | - Zixuan Shangguan
- Gansu Provincial Key Laboratory of Wearable Computing School of Information Science and Engineering Lanzhou University, Lanzhou, China.
| | - Li Zhou
- Gansu Provincial Key Laboratory of Wearable Computing School of Information Science and Engineering Lanzhou University, Lanzhou, China.
| | - Bin Hu
- Gansu Provincial Key Laboratory of Wearable Computing School of Information Science and Engineering Lanzhou University, Lanzhou, China.
| |
Collapse
|
12
|
Francese R, Attanasio P. Emotion detection for supporting depression screening. MULTIMEDIA TOOLS AND APPLICATIONS 2022; 82:12771-12795. [PMID: 36570729 PMCID: PMC9761032 DOI: 10.1007/s11042-022-14290-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 10/14/2022] [Accepted: 12/03/2022] [Indexed: 06/17/2023]
Abstract
Depression is the most prevalent mental disorder in the world. One of the most adopted tools for depression screening is the Beck Depression Inventory-II (BDI-II) questionnaire. Patients may minimize or exaggerate their answers. Thus, to further examine the patient's mood while filling in the questionnaire, we propose a mobile application that captures the BDI-II patient's responses together with their images and speech. Deep learning techniques such as Convolutional Neural Networks analyze the patient's audio and image data. The application displays the correlation between the patient's emotional scores and DBI-II scores to the clinician at the end of the questionnaire, indicating the relationship between the patient's emotional state and the depression screening score. We conducted a preliminary evaluation involving clinicians and patients to assess (i) the acceptability of proposed application for use in clinics and (ii) the patient user experience. The participants were eight clinicians who tried the tool with 21 of their patients. The results seem to confirm the acceptability of the app in clinical practice.
Collapse
Affiliation(s)
- Rita Francese
- Computer Science Department, Università degli Studi di Salerno, Via Giovanni Paolo II, 132, Fisciano, 84084 (SA) Italy
| | | |
Collapse
|
13
|
Automatic Identification of a Depressive State in Primary Care. Healthcare (Basel) 2022; 10:healthcare10122347. [PMID: 36553871 PMCID: PMC9777617 DOI: 10.3390/healthcare10122347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 11/04/2022] [Accepted: 11/19/2022] [Indexed: 11/24/2022] Open
Abstract
The Center for Epidemiologic Studies Depression Scale (CES-D) performs well in screening depression in primary care. However, people are looking for alternatives because it screens for too many items. With the popularity of social media platforms, facial movement can be recorded ecologically. Considering that there are nonverbal behaviors, including facial movement, associated with a depressive state, this study aims to establish an automatic depression recognition model to be easily used in primary healthcare. We integrated facial activities and gaze behaviors to establish a machine learning algorithm (Kernal Ridge Regression, KRR). We compared different algorithms and different features to achieve the best model. The results showed that the prediction effect of facial and gaze features was higher than that of only facial features. In all of the models we tried, the ridge model with a periodic kernel showed the best performance. The model showed a mutual fund R-squared (R2) value of 0.43 and a Pearson correlation coefficient (r) value of 0.69 (p < 0.001). Then, the most relevant variables (e.g., gaze directions and facial action units) were revealed in the present study.
Collapse
|
14
|
Zhang B, Wei D, Yan G, Lei T, Cai H, Yang Z. Feature-level fusion based on spatial-temporal of pervasive EEG for depression recognition. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 226:107113. [PMID: 36103735 DOI: 10.1016/j.cmpb.2022.107113] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 08/23/2022] [Accepted: 09/04/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND AND OBJECTIVE In view of the depression characteristics such as high prevalence, high disability rate, high fatality rate, and high recurrence rate, early identification and early intervention are the most effective methods to prevent irreversible damage of brain function over time. The traditional method of depression recognition based on questionnaires and interviews is time-consuming and labor-intensive, and heavily depends on the doctor's subjective experience. Therefore, accurate, convenient and effective recognition of depression has important social value and scientific significance. METHODS This paper proposes a depression recognition framework based on feature-level fusion of spatial-temporal pervasive electroencephalography (EEG). Time series EEG data were collected by portable three-electrode EEG acquisition instrument, and mapped to a spatial complex network called visibility graph (VG). Then temporal EEG features and spatial VG metric features were extracted and selected. Based on the correlation between features and categories, the differences in contribution of individual feature are explored, and different contribution coefficients are assigned to different features as the data basis of feature-level fusion to ensure the diversity of data. A cascade forest model based on three different decision forests is designed to realize the efficient depression recognition using spatial-temporal feature-level fusion data. RESULTS Experimental data were obtained from 26 depressed patients and 29 healthy controls (HC). The results of multiple control experiments show that compared with single type feature, feature-level fusion without contribution coefficient, and independent classifiers, the feature-level method with contribution coefficient of spatial-temporal has a stronger recognition ability of depression, and the highest accuracy is 92.48%. CONCLUSION Feature-level fusion method provides an effective computer-aided tool for rapid clinical diagnosis of depression.
Collapse
Affiliation(s)
- Bingtao Zhang
- School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China; School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China.
| | - Dan Wei
- School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
| | - Guanghui Yan
- School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
| | - Tao Lei
- School of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi'an 710021, China
| | - Haishu Cai
- School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China
| | - Zhifei Yang
- School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
| |
Collapse
|
15
|
Cao XJ, Liu XQ. Artificial intelligence-assisted psychosis risk screening in adolescents: Practices and challenges. World J Psychiatry 2022; 12:1287-1297. [PMID: 36389087 PMCID: PMC9641379 DOI: 10.5498/wjp.v12.i10.1287] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 08/09/2022] [Accepted: 09/22/2022] [Indexed: 02/05/2023] Open
Abstract
Artificial intelligence-based technologies are gradually being applied to psych-iatric research and practice. This paper reviews the primary literature concerning artificial intelligence-assisted psychosis risk screening in adolescents. In terms of the practice of psychosis risk screening, the application of two artificial intelligence-assisted screening methods, chatbot and large-scale social media data analysis, is summarized in detail. Regarding the challenges of psychiatric risk screening, ethical issues constitute the first challenge of psychiatric risk screening through artificial intelligence, which must comply with the four biomedical ethical principles of respect for autonomy, nonmaleficence, beneficence and impartiality such that the development of artificial intelligence can meet the moral and ethical requirements of human beings. By reviewing the pertinent literature concerning current artificial intelligence-assisted adolescent psychosis risk screens, we propose that assuming they meet ethical requirements, there are three directions worth considering in the future development of artificial intelligence-assisted psychosis risk screening in adolescents as follows: nonperceptual real-time artificial intelligence-assisted screening, further reducing the cost of artificial intelligence-assisted screening, and improving the ease of use of artificial intelligence-assisted screening techniques and tools.
Collapse
Affiliation(s)
- Xiao-Jie Cao
- Graduate School of Education, Peking University, Beijing 100871, China
| | - Xin-Qiao Liu
- School of Education, Tianjin University, Tianjin 300350, China
| |
Collapse
|
16
|
Wu P, Wang R, Lin H, Zhang F, Tu J, Sun M. Automatic depression recognition by intelligent speech signal processing: A systematic survey. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2022. [DOI: 10.1049/cit2.12113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Pingping Wu
- Jiangsu Key Laboratory of Public Project Audit, School of Engineering Audit Nanjing Audit University Nanjing China
| | - Ruihao Wang
- School of Information Engineering Nanjing Audit University Nanjing China
| | - Han Lin
- Jiangsu Key Laboratory of Public Project Audit, School of Engineering Audit Nanjing Audit University Nanjing China
| | - Fanlong Zhang
- School of Information Engineering Nanjing Audit University Nanjing China
| | - Juan Tu
- Key Laboratory of Modern Acoustics (MOE), School of Physics Nanjing University Nanjing China
| | - Miao Sun
- Faculty of Electrical Engineering, Mathematics & Computer Science Delft University of Technology Delft The Netherlands
| |
Collapse
|
17
|
Wang J, Lv K, Liu C, Nie X, Gowda D, Luan S. Automatic Assessment for Severe Self-Reported Depressive Symptoms Using Speech Cues. IEEE Trans Cogn Dev Syst 2021. [DOI: 10.1109/tcds.2020.3002512] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
18
|
Zhou Y, Jin L, Liu H, Song E. Color Facial Expression Recognition by Quaternion Convolutional Neural Network With Gabor Attention. IEEE Trans Cogn Dev Syst 2021. [DOI: 10.1109/tcds.2020.3041642] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
19
|
Muzammel M, Salam H, Othmani A. End-to-end multimodal clinical depression recognition using deep neural networks: A comparative analysis. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 211:106433. [PMID: 34614452 DOI: 10.1016/j.cmpb.2021.106433] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 09/15/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND AND OBJECTIVE Major Depressive Disorder is a highly prevalent and disabling mental health condition. Numerous studies explored multimodal fusion systems combining visual, audio, and textual features via deep learning architectures for clinical depression recognition. Yet, no comparative analysis for multimodal depression analysis has been proposed in the literature. METHODS In this paper, an up-to-date literature overview of multimodal depression recognition is presented and an extensive comparative analysis of different deep learning architectures for depression recognition is performed. First, audio features based Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) are studied. Then, early-level and model-level fusion of deep audio features with visual and textual features through LSTM and CNN architectures are investigated. RESULTS The performance of the proposed architectures using an hold-out strategy on the DAIC-WOZ dataset (80% training, 10% validation, 10% test split) for binary and severity levels of depression recognition is tested. Using this strategy, a set of experiments have been performed and they have demonstrated: (1) LSTM-based audio features perform slightly better than CNN ones with an accuracy of 66.25% versus 65.60% for binary depression classes. (2) the model level fusion of deep audio and visual features using LSTM network performed the best with an accuracy of 77.16%, a precision of 53% for the depressed class, and a precision of 83% for the non-depressed class. The given network obtained a normalized Root Mean Square Error (RMSE) of 0.15 for depression severity level prediction. Using a Leave-One-Subject-Out strategy, this network achieved an accuracy of 95.38% for binary depression detection, and a normalized RMSE of 0.1476 for depression severity level prediction. Our best-performing architecture outperforms all state-of-the-art approaches on DAIC-WOZ dataset. CONCLUSIONS The obtained results show that the proposed LSTM-based surpass the proposed CNN-based architectures allowing to learn temporal dynamics representations of multimodal features. Furthermore, model-level fusion of audio and visual features using an LSTM network leads to the best performance. Our best-performing architecture successfully detects depression using a speech segment of less than 8 seconds, and an average prediction computation time of less than 6ms; making it suitable for real-world clinical applications.
Collapse
Affiliation(s)
- Muhammad Muzammel
- Université Paris-Est Créteil (UPEC), LISSI, Vitry sur Seine 94400, France
| | - Hanan Salam
- New York University, SMART Lab, Saadiyat Island, Abu Dhabi
| | - Alice Othmani
- Université Paris-Est Créteil (UPEC), LISSI, Vitry sur Seine 94400, France.
| |
Collapse
|
20
|
He L, Guo C, Tiwari P, Su R, Pandey HM, Dang W. DepNet: An automated industrial intelligent system using deep learning for video‐based depression analysis. INT J INTELL SYST 2021. [DOI: 10.1002/int.22704] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Lang He
- School of Computer Science and Technology Xi'an University of Posts and Telecommunications Xi'an Shaanxi China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing Xi'an Shaanxi China
- Xi'an Key Laboratory of Big Data and Intelligent Computing Xi'an Shaanxi China
| | - Chenguang Guo
- School of Electronics and Information Northwestern Polytechnical University Xi'an Shaanxi China
| | - Prayag Tiwari
- Department of Computer Science Aalto University Espoo Finland
| | - Rui Su
- School of Foreign Languages Northwest University Xi'an Shaanxi China
| | - Hari Mohan Pandey
- Department of Computer Science Edge Hill University Ormskirk United Kingdom
| | - Wei Dang
- Xi'an Mental Health Center Xi'an Shaanxi China
| |
Collapse
|
21
|
Zhao Y, Liang Z, Du J, Zhang L, Liu C, Zhao L. Multi-Head Attention-Based Long Short-Term Memory for Depression Detection From Speech. Front Neurorobot 2021; 15:684037. [PMID: 34512301 PMCID: PMC8426553 DOI: 10.3389/fnbot.2021.684037] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 07/19/2021] [Indexed: 11/13/2022] Open
Abstract
Depression is a mental disorder that threatens the health and normal life of people. Hence, it is essential to provide an effective way to detect depression. However, research on depression detection mainly focuses on utilizing different parallel features from audio, video, and text for performance enhancement regardless of making full usage of the inherent information from speech. To focus on more emotionally salient regions of depression speech, in this research, we propose a multi-head time-dimension attention-based long short-term memory (LSTM) model. We first extract frame-level features to store the original temporal relationship of a speech sequence and then analyze their difference between speeches of depression and those of health status. Then, we study the performance of various features and use a modified feature set as the input of the LSTM layer. Instead of using the output of the traditional LSTM, multi-head time-dimension attention is employed to obtain more key time information related to depression detection by projecting the output into different subspaces. The experimental results show the proposed model leads to improvements of 2.3 and 10.3% over the LSTM model on the Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) and the Multi-modal Open Dataset for Mental-disorder Analysis (MODMA) corpus, respectively.
Collapse
Affiliation(s)
- Yan Zhao
- Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education, Southeast University, Nanjing, China
| | - Zhenlin Liang
- Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education, Southeast University, Nanjing, China
| | - Jing Du
- Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education, Southeast University, Nanjing, China
| | - Li Zhang
- Computational Intelligence Group, Northumbria University, Newcastle upon Tyne, United Kingdom
- National Subsea Centre, Robert Gordon University, Aberdeen, United Kingdom
| | - Chengyu Liu
- School of Instrument Science and Engineering, Southeast University, Nanjing, China
| | - Li Zhao
- Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education, Southeast University, Nanjing, China
| |
Collapse
|
22
|
Niu M, Liu B, Tao J, Li Q. A time-frequency channel attention and vectorization network for automatic depression level prediction. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.04.056] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
23
|
Dong Y, Yang X. A hierarchical depression detection model based on vocal and emotional cues. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.02.019] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
24
|
Guo W, Yang H, Liu Z, Xu Y, Hu B. Deep Neural Networks for Depression Recognition Based on 2D and 3D Facial Expressions Under Emotional Stimulus Tasks. Front Neurosci 2021; 15:609760. [PMID: 33967675 PMCID: PMC8102822 DOI: 10.3389/fnins.2021.609760] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Accepted: 03/08/2021] [Indexed: 11/23/2022] Open
Abstract
The proportion of individuals with depression has rapidly increased along with the growth of the global population. Depression has been the currently most prevalent mental health disorder. An effective depression recognition system is especially crucial for the early detection of potential depression risk. A depression-related dataset is also critical while evaluating the system for depression or potential depression risk detection. Due to the sensitive nature of clinical data, availability and scale of such datasets are scarce. To our knowledge, there are few extensively practical depression datasets for the Chinese population. In this study, we first create a large-scale dataset by asking subjects to perform five mood-elicitation tasks. After each task, subjects' audio and video are collected, including 3D information (depth information) of facial expressions via a Kinect. The constructed dataset is from a real environment, i.e., several psychiatric hospitals, and has a specific scale. Then we propose a novel approach for potential depression risk recognition based on two kinds of different deep belief network (DBN) models. One model extracts 2D appearance features from facial images collected by an optical camera, while the other model extracts 3D dynamic features from 3D facial points collected by a Kinect. The final decision result comes from the combination of the two models. Finally, we evaluate all proposed deep models on our built dataset. The experimental results demonstrate that (1) our proposed method is able to identify patients with potential depression risk; (2) the recognition performance of combined 2D and 3D features model outperforms using either 2D or 3D features model only; (3) the performance of depression recognition is higher in the positive and negative emotional stimulus, and females' recognition rate is generally higher than that for males. Meanwhile, we compare the performance with other methods on the same dataset. The experimental results show that our integrated 2D and 3D features DBN is more reasonable and universal than other methods, and the experimental paradigm designed for depression is reasonable and practical.
Collapse
Affiliation(s)
- Weitong Guo
- School of Information Science Engineering, Lanzhou University, Lanzhou, China
- School of Educational Technology, Northwest Normal University, Lanzhou, China
- Gansu Provincial Key Laboratory of Wearable Computing, Lanzhou, China
- National and Provincial Joint Engineering Laboratory of Learning Analysis Technology in Online Education, Lanzhou, China
| | - Hongwu Yang
- School of Educational Technology, Northwest Normal University, Lanzhou, China
- National and Provincial Joint Engineering Laboratory of Learning Analysis Technology in Online Education, Lanzhou, China
| | - Zhenyu Liu
- School of Information Science Engineering, Lanzhou University, Lanzhou, China
- Gansu Provincial Key Laboratory of Wearable Computing, Lanzhou, China
| | - Yaping Xu
- School of Information Science Engineering, Lanzhou University, Lanzhou, China
- School of Educational Technology, Northwest Normal University, Lanzhou, China
| | - Bin Hu
- School of Information Science Engineering, Lanzhou University, Lanzhou, China
- Gansu Provincial Key Laboratory of Wearable Computing, Lanzhou, China
| |
Collapse
|
25
|
He L, Guo C, Tiwari P, Pandey HM, Dang W. Intelligent system for depression scale estimation with facial expressions and case study in industrial intelligence. INT J INTELL SYST 2021. [DOI: 10.1002/int.22426] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Affiliation(s)
- Lang He
- Computer Science, School of Computer Science and Technology Xi'an University of Posts and Telecommunications Xi'an Shaanxi China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing Xi'an University of Posts and Telecommunications Xi'an Shaanxi China
| | - Chenguang Guo
- Department of Electronics and Information Engineering, School of Electronics and Information Northwestern Polytechnical University Xi'an China
| | - Prayag Tiwari
- Department of Computer Science Aalto University Espoo Finland
| | | | - Wei Dang
- Shaanxi Mental Health Center Xi'an Shaanxi China
| |
Collapse
|
26
|
Belouali A, Gupta S, Sourirajan V, Yu J, Allen N, Alaoui A, Dutton MA, Reinhard MJ. Acoustic and language analysis of speech for suicidal ideation among US veterans. BioData Min 2021; 14:11. [PMID: 33531048 PMCID: PMC7856815 DOI: 10.1186/s13040-021-00245-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Accepted: 01/20/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Screening for suicidal ideation in high-risk groups such as U.S. veterans is crucial for early detection and suicide prevention. Currently, screening is based on clinical interviews or self-report measures. Both approaches rely on subjects to disclose their suicidal thoughts. Innovative approaches are necessary to develop objective and clinically applicable assessments. Speech has been investigated as an objective marker to understand various mental states including suicidal ideation. In this work, we developed a machine learning and natural language processing classifier based on speech markers to screen for suicidal ideation in US veterans. METHODOLOGY Veterans submitted 588 narrative audio recordings via a mobile app in a real-life setting. In addition, participants completed self-report psychiatric scales and questionnaires. Recordings were analyzed to extract voice characteristics including prosodic, phonation, and glottal. The audios were also transcribed to extract textual features for linguistic analysis. We evaluated the acoustic and linguistic features using both statistical significance and ensemble feature selection. We also examined the performance of different machine learning algorithms on multiple combinations of features to classify suicidal and non-suicidal audios. RESULTS A combined set of 15 acoustic and linguistic features of speech were identified by the ensemble feature selection. Random Forest classifier, using the selected set of features, correctly identified suicidal ideation in veterans with 86% sensitivity, 70% specificity, and an area under the receiver operating characteristic curve (AUC) of 80%. CONCLUSIONS Speech analysis of audios collected from veterans in everyday life settings using smartphones offers a promising approach for suicidal ideation detection. A machine learning classifier may eventually help clinicians identify and monitor high-risk veterans.
Collapse
Affiliation(s)
- Anas Belouali
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA.
| | - Samir Gupta
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
| | - Vaibhav Sourirajan
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
| | - Jiawei Yu
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
| | - Nathaniel Allen
- War Related Illness and Injury Study Center, Veterans Affairs Medical Center, Washington, DC, USA
| | - Adil Alaoui
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
| | - Mary Ann Dutton
- Department of Psychiatry, Georgetown University Medical Center, Washington, DC, USA
| | - Matthew J Reinhard
- War Related Illness and Injury Study Center, Veterans Affairs Medical Center, Washington, DC, USA
- Department of Psychiatry, Georgetown University Medical Center, Washington, DC, USA
| |
Collapse
|
27
|
|
28
|
Mohammadi Y, Moradi MH. Prediction of Depression Severity Scores Based on Functional Connectivity and Complexity of the EEG Signal. Clin EEG Neurosci 2021; 52:52-60. [PMID: 33040603 DOI: 10.1177/1550059420965431] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
BACKGROUND Depression is one of the most common mental disorders and the leading cause of functional disabilities. This study aims to specify whether functional connectivity and complexity of brain activity can predict the severity of depression (Beck Depression Inventory-II scores). METHODS Resting-state, eyes-closed EEG data were recorded from 60 depressed patients. A phase synchronization measure was used to estimate functional connectivity between all pairs of the EEG channels in the delta (1-4 Hz), theta (4-8 Hz), alpha (8-13 Hz), and beta (13-30 Hz) frequency bands. To quantify the local value of functional connectivity, 2 graph theory metrics, degree, and clustering coefficient (CC), were measured. Moreover, Lempel-Ziv complexity (LZC) and fuzzy entropy (FuzzyEn) were used to measure the complexity of the EEG signal. RESULTS Through correlation analysis, a significant negative relationship was found between graph metrics and depression severity in the alpha band. This association was strongly positive for the complexity measures in alpha and delta bands. Also, the linear regression model represented a substantial performance of depression severity prediction based on EEG features of the alpha band (r = 0.839; P < .0001, root mean square error score of 7.69). CONCLUSION We found that the brain activity of patients with depression was related to depression severity. Abnormal brain activity reflects an increase in the severity of depression. The presented regression model provides a quantitative depression severity prediction, which can inform the development of EEG state and exhibit potential desirable application for the medical treatment of the depressive disorder.
Collapse
Affiliation(s)
- Yousef Mohammadi
- Biomedical Engineering Department, Amirkabir University of Technology, Tehran, Islamic Republic of Iran
| | - Mohammad Hassan Moradi
- Biomedical Engineering Department, Amirkabir University of Technology, Tehran, Islamic Republic of Iran
| |
Collapse
|
29
|
Karnati M, Seal A, Yazidi A, Krejcar O. LieNet: A Deep Convolution Neural Networks Framework for Detecting Deception. IEEE Trans Cogn Dev Syst 2021. [DOI: 10.1109/tcds.2021.3086011] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
30
|
What reveals about depression level? The role of multimodal features at the level of interview questions. INFORMATION & MANAGEMENT 2020. [DOI: 10.1016/j.im.2020.103349] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
31
|
Otberdout N, Kacem A, Daoudi M, Ballihi L, Berretti S. Automatic Analysis of Facial Expressions Based on Deep Covariance Trajectories. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:3892-3905. [PMID: 31725395 DOI: 10.1109/tnnls.2019.2947244] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, we propose a new approach for facial expression recognition (FER) using deep covariance descriptors. The solution is based on the idea of encoding local and global deep convolutional neural network (DCNN) features extracted from still images, in compact local and global covariance descriptors. The space geometry of the covariance matrices is that of symmetric positive definite (SPD) matrices. By conducting the classification of static facial expressions using a support vector machine (SVM) with a valid Gaussian kernel on the SPD manifold, we show that deep covariance descriptors are more effective than the standard classification with fully connected layers and softmax. Besides, we propose a completely new and original solution to model the temporal dynamic of facial expressions as deep trajectories on the SPD manifold. As an extension of the classification pipeline of covariance descriptors, we apply SVM with valid positive definite kernels derived from global alignment for deep covariance trajectories classification. By performing extensive experiments on the Oulu-CASIA, CK+, static facial expression in the wild (SFEW), and acted facial expressions in the wild (AFEW) data sets, we show that both the proposed static and dynamic approaches achieve the state-of-the-art performance for FER outperforming many recent approaches.
Collapse
|
32
|
Liu Y, Zhang X, Lin Y, Wang H. Facial Expression Recognition via Deep Action Units Graph Network Based on Psychological Mechanism. IEEE Trans Cogn Dev Syst 2020. [DOI: 10.1109/tcds.2019.2917711] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
33
|
Su C, Xu Z, Pathak J, Wang F. Deep learning in mental health outcome research: a scoping review. Transl Psychiatry 2020; 10:116. [PMID: 32532967 PMCID: PMC7293215 DOI: 10.1038/s41398-020-0780-3] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/31/2019] [Revised: 02/17/2020] [Accepted: 02/26/2020] [Indexed: 12/17/2022] Open
Abstract
Mental illnesses, such as depression, are highly prevalent and have been shown to impact an individual's physical health. Recently, artificial intelligence (AI) methods have been introduced to assist mental health providers, including psychiatrists and psychologists, for decision-making based on patients' historical data (e.g., medical records, behavioral data, social media usage, etc.). Deep learning (DL), as one of the most recent generation of AI technologies, has demonstrated superior performance in many real-world applications ranging from computer vision to healthcare. The goal of this study is to review existing research on applications of DL algorithms in mental health outcome research. Specifically, we first briefly overview the state-of-the-art DL techniques. Then we review the literature relevant to DL applications in mental health outcomes. According to the application scenarios, we categorize these relevant articles into four groups: diagnosis and prognosis based on clinical data, analysis of genetics and genomics data for understanding mental health conditions, vocal and visual expression data analysis for disease detection, and estimation of risk of mental illness using social media data. Finally, we discuss challenges in using DL algorithms to improve our understanding of mental health conditions and suggest several promising directions for their applications in improving mental health diagnosis and treatment.
Collapse
Affiliation(s)
- Chang Su
- grid.5386.8000000041936877XDepartment of Healthcare Policy and Research, Weill Cornell Medicine, New York, NY USA
| | - Zhenxing Xu
- grid.5386.8000000041936877XDepartment of Healthcare Policy and Research, Weill Cornell Medicine, New York, NY USA
| | - Jyotishman Pathak
- grid.5386.8000000041936877XDepartment of Healthcare Policy and Research, Weill Cornell Medicine, New York, NY USA
| | - Fei Wang
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
34
|
Rana R, Latif S, Gururajan R, Gray A, Mackenzie G, Humphris G, Dunn J. Automated screening for distress: A perspective for the future. Eur J Cancer Care (Engl) 2019; 28:e13033. [DOI: 10.1111/ecc.13033] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2019] [Revised: 02/05/2019] [Accepted: 02/18/2019] [Indexed: 01/13/2023]
Affiliation(s)
- Rajib Rana
- University of Southern Queensland Springfield Queensland Australia
| | - Siddique Latif
- University of Southern Queensland Springfield Queensland Australia
| | - Raj Gururajan
- University of Southern Queensland Springfield Queensland Australia
| | - Anthony Gray
- University of Southern Queensland Springfield Queensland Australia
| | | | | | - Jeff Dunn
- University of Southern Queensland Springfield Queensland Australia
- Griffith University Brisbane Queensland Australia
- University of Technology Sydney Sydney New South Wales Australia
| |
Collapse
|