1
|
Taşcı B. Multilevel hybrid handcrafted feature extraction based depression recognition method using speech. J Affect Disord 2024; 364:9-19. [PMID: 39127304 DOI: 10.1016/j.jad.2024.08.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 05/26/2024] [Accepted: 08/07/2024] [Indexed: 08/12/2024]
Abstract
BACKGROUND AND PURPOSE Diagnosis of depression is based on tests performed by psychiatrists and information provided by patients or their relatives. In the field of machine learning (ML), numerous models have been devised to detect depression automatically through the analysis of speech audio signals. While deep learning approaches often achieve superior classification accuracy, they are notably resource-intensive. This research introduces an innovative, multilevel hybrid feature extraction-based classification model, specifically designed for depression detection, which exhibits reduced time complexity. MATERIALS AND METHODS MODMA dataset consisting of 29 healthy and 23 Major depressive disorder audio signals was used. The constructed model architecture integrates multilevel hybrid feature extraction, iterative feature selection, and classification processes. During the Hybrid Handcrafted Feature (HHF) generation stage, a combination of textural and statistical methods was employed to extract low-level features from speech audio signals. To enhance this process for high-level feature creation, a Multilevel Discrete Wavelet Transform (MDWT) was applied. This technique produced wavelet subbands, which were then input into the hybrid feature extractor, enabling the extraction of both high and low-level features. For the selection of the most pertinent features from these extracted vectors, Iterative Neighborhood Component Analysis (INCA) was utilized. Finally, in the classification phase, a one-dimensional nearest neighbor classifier, augmented with ten-fold cross-validation, was implemented to achieve detailed, results. RESULTS The HHF-based speech audio signal classification model attained excellent performance, with the 94.63 % classification accuracy. CONCLUSIONS The findings validate the remarkable proficiency of the introduced HHF-based model in depression classification, underscoring its computational efficiency.
Collapse
Affiliation(s)
- Burak Taşcı
- Vocational School of Technical Sciences, Firat University, Elazig 23119, Turkey.
| |
Collapse
|
2
|
Nwosu OI, Naunheim MR. Artificial Intelligence in Laryngology, Broncho-Esophagology, and Sleep Surgery. Otolaryngol Clin North Am 2024; 57:821-829. [PMID: 38719714 DOI: 10.1016/j.otc.2024.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2024]
Abstract
Technological advancements in laryngology, broncho-esophagology, and sleep surgery have enabled the collection of increasing amounts of complex data for diagnosis and treatment of voice, swallowing, and sleep disorders. Clinicians face challenges in efficiently synthesizing these data for personalized patient care. Artificial intelligence (AI), specifically machine learning and deep learning, offers innovative solutions for processing and interpreting these data, revolutionizing diagnosis and management in these fields, and making care more efficient and effective. In this study, we review recent AI-based innovations in the fields of laryngology, broncho-esophagology, and sleep surgery.
Collapse
Affiliation(s)
- Obinna I Nwosu
- Department of Otolaryngology-Head & Neck Surgery, Massachusetts Eye & Ear, Boston, MA, USA; Department of Otolaryngology-Head & Neck Surgery, Harvard Medical School, Boston, MA, USA
| | - Matthew R Naunheim
- Department of Otolaryngology-Head & Neck Surgery, Massachusetts Eye & Ear, Boston, MA, USA; Department of Otolaryngology-Head & Neck Surgery, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
3
|
Verde L, Marulli F, De Fazio R, Campanile L, Marrone S. HEAR set: A ligHtwEight acoustic paRameters set to assess mental health from voice analysis. Comput Biol Med 2024; 182:109021. [PMID: 39236660 DOI: 10.1016/j.compbiomed.2024.109021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 06/23/2024] [Accepted: 08/09/2024] [Indexed: 09/07/2024]
Abstract
BACKGROUND Voice analysis has significant potential in aiding healthcare professionals with detecting, diagnosing, and personalising treatment. It represents an objective and non-intrusive tool for supporting the detection and monitoring of specific pathologies. By calculating various acoustic features, voice analysis extracts valuable information to assess voice quality. The choice of these parameters is crucial for an accurate assessment. METHOD In this paper, we propose a lightweight acoustic parameter set, named HEAR, able to evaluate voice quality to assess mental health. In detail, this consists of jitter, spectral centroid, Mel-frequency cepstral coefficients, and their derivates. The choice of parameters for the proposed set was influenced by the explainable significance of each acoustic parameter in the voice production process. RESULTS The reliability of the proposed acoustic set to detect the early symptoms of mental disorders was evaluated in an experimental phase. Voices of subjects suffering from different mental pathologies, selected from available databases, were analysed. The performance obtained from the HEAR features was compared with that obtained by analysing features selected from toolkits widely used in the literature, as with those obtained using learned procedures. The best performance in terms of MAE and RMSE was achieved for the detection of depression (5.32 and 6.24 respectively). For the detection of psychogenic dysphonia and anxiety, the highest accuracy rates were about 75 % and 97 %, respectively. CONCLUSIONS The comparative evaluation was carried out to assess the performance of the proposed approach, demonstrating a reliable capability to highlight affective physiological alterations of voice quality due to the considered mental disorders.
Collapse
Affiliation(s)
- Laura Verde
- Department of Mathematics and Physics, University of Campania "Luigi Vanvitelli", Viale Lincoln 5, Caserta, 81100, Italy.
| | - Fiammetta Marulli
- Department of Mathematics and Physics, University of Campania "Luigi Vanvitelli", Viale Lincoln 5, Caserta, 81100, Italy
| | - Roberta De Fazio
- Department of Mathematics and Physics, University of Campania "Luigi Vanvitelli", Viale Lincoln 5, Caserta, 81100, Italy
| | - Lelio Campanile
- Department of Mathematics and Physics, University of Campania "Luigi Vanvitelli", Viale Lincoln 5, Caserta, 81100, Italy
| | - Stefano Marrone
- Department of Mathematics and Physics, University of Campania "Luigi Vanvitelli", Viale Lincoln 5, Caserta, 81100, Italy
| |
Collapse
|
4
|
Rehmani F, Shaheen Q, Anwar M, Faheem M, Bhatti SS. Depression detection with machine learning of structural and non-structural dual languages. Healthc Technol Lett 2024; 11:218-226. [PMID: 39100503 PMCID: PMC11294929 DOI: 10.1049/htl2.12088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Revised: 04/29/2024] [Accepted: 05/30/2024] [Indexed: 08/06/2024] Open
Abstract
Depression is a serious mental state that negatively impacts thoughts, feelings, and actions. Social media use is rapidly growing, with people expressing themselves in their regional languages. In Pakistan and India, many people use Roman Urdu on social media. This makes Roman Urdu important for predicting depression in these regions. However, previous studies show no significant contribution in predicting depression through Roman Urdu or in combination with structured languages like English. The study aims to create a Roman Urdu dataset to predict depression risk in dual languages [Roman Urdu (non-structural language) + English (structural language)]. Two datasets were used: Roman Urdu data manually converted from English on Facebook, and English comments from Kaggle. These datasets were merged for the research experiments. Machine learning models, including Support Vector Machine (SVM), Support Vector Machine Radial Basis Function (SVM-RBF), Random Forest (RF), and Bidirectional Encoder Representations from Transformers (BERT), were tested. Depression risk was classified into not depressed, moderate, and severe. Experimental studies show that the SVM achieved the best result with anaccuracy of 0.84% compared to existing models. The presented study refines thearea of depression to predict the depression in Asian countries.
Collapse
Affiliation(s)
- Filza Rehmani
- Department of Computer Science & Information TechnologyThe Islamia University of BahawalpurBannuPakistan
| | - Qaisar Shaheen
- Department of Computer Science & Information TechnologyThe Islamia University of BahawalpurBannuPakistan
| | - Muhammad Anwar
- Department of Information Sciences, Division of Science and TechnologyUniversity of EducationLahorePakistan
| | - Muhammad Faheem
- School of Technology and InnovationsUniversity of VaasaVaasaFinland
| | | |
Collapse
|
5
|
Yang W, Liu J, Cao P, Zhu R, Wang Y, Liu JK, Wang F, Zhang X. Attention guided learnable time-domain filterbanks for speech depression detection. Neural Netw 2023; 165:135-149. [PMID: 37285730 DOI: 10.1016/j.neunet.2023.05.041] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2022] [Revised: 05/13/2023] [Accepted: 05/20/2023] [Indexed: 06/09/2023]
Abstract
Depression, as a global mental health problem, is lacking effective screening methods that can help with early detection and treatment. This paper aims to facilitate the large-scale screening of depression by focusing on the speech depression detection (SDD) task. Currently, direct modeling on the raw signal yields a large number of parameters, and the existing deep learning-based SDD models mainly use the fixed Mel-scale spectral features as input. However, these features are not designed for depression detection, and the manual settings limit the exploration of fine-grained feature representations. In this paper, we learn the effective representations of the raw signals from an interpretable perspective. Specifically, we present a joint learning framework with attention-guided learnable time-domain filterbanks for depression classification (DALF), which collaborates with the depression filterbanks features learning (DFBL) module and multi-scale spectral attention learning (MSSA) module. DFBL is capable of producing biologically meaningful acoustic features by employing learnable time-domain filters, and MSSA is used to guide the learnable filters to better retain the useful frequency sub-bands. We collect a new dataset, the Neutral Reading-based Audio Corpus (NRAC), to facilitate the research in depression analysis, and we evaluate the performance of DALF on the NRAC and the public DAIC-woz datasets. The experimental results demonstrate that our method outperforms the state-of-the-art SDD methods with an F1 of 78.4% on the DAIC-woz dataset. In particular, DALF achieves F1 scores of 87.3% and 81.7% on two parts of the NRAC dataset. By analyzing the filter coefficients, we find that the most important frequency range identified by our method is 600-700Hz, which corresponds to the Mandarin vowels /e/ and /eˆ/ and can be considered as an effective biomarker for the SDD task. Taken together, our DALF model provides a promising approach to depression detection.
Collapse
Affiliation(s)
- Wenju Yang
- College of Computer Science and Engineering, Northeastern University, Shenyang, 110819, Liaoning, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, 110819, Liaoning, China
| | - Jiankang Liu
- College of Computer Science and Engineering, Northeastern University, Shenyang, 110819, Liaoning, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, 110819, Liaoning, China
| | - Peng Cao
- College of Computer Science and Engineering, Northeastern University, Shenyang, 110819, Liaoning, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, 110819, Liaoning, China.
| | - Rongxin Zhu
- Early Intervention Unit, Department of Psychiatry, Affiliated Nanjing Brain Hospital, Nanjing Medical University, Nanjing, 210096, China
| | - Yang Wang
- Early Intervention Unit, Department of Psychiatry, Affiliated Nanjing Brain Hospital, Nanjing Medical University, Nanjing, 210096, China
| | - Jian K Liu
- School of Computing, University of Leeds, Leeds, LS2 9JT, United Kingdom
| | - Fei Wang
- Early Intervention Unit, Department of Psychiatry, Affiliated Nanjing Brain Hospital, Nanjing Medical University, Nanjing, 210096, China.
| | - Xizhe Zhang
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, 211166, China.
| |
Collapse
|
6
|
Wang J, Ravi V, Alwan A. Non-uniform Speaker Disentanglement For Depression Detection From Raw Speech Signals. INTERSPEECH 2023; 2023:2343-2347. [PMID: 38045821 PMCID: PMC10691447 DOI: 10.21437/interspeech.2023-2101] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
While speech-based depression detection methods that use speaker-identity features, such as speaker embeddings, are popular, they often compromise patient privacy. To address this issue, we propose a speaker disentanglement method that utilizes a non-uniform mechanism of adversarial SID loss maximization. This is achieved by varying the adversarial weight between different layers of a model during training. We find that a greater adversarial weight for the initial layers leads to performance improvement. Our approach using the ECAPA-TDNN model achieves an F1-score of 0.7349 (a 3.7% improvement over audio-only SOTA) on the DAIC-WoZ dataset, while simultaneously reducing the speaker-identification accuracy by 50%. Our findings suggest that identifying depression through speech signals can be accomplished without placing undue reliance on a speaker's identity, paving the way for privacy-preserving approaches of depression detection.
Collapse
Affiliation(s)
- Jinhan Wang
- Dept. of Electrical and Computer Engineering, University of California, Los Angeles, USA
| | - Vijay Ravi
- Dept. of Electrical and Computer Engineering, University of California, Los Angeles, USA
| | - Abeer Alwan
- Dept. of Electrical and Computer Engineering, University of California, Los Angeles, USA
| |
Collapse
|
7
|
Du M, Liu S, Wang T, Zhang W, Ke Y, Chen L, Ming D. Depression recognition using a proposed speech chain model fusing speech production and perception features. J Affect Disord 2023; 323:299-308. [PMID: 36462607 DOI: 10.1016/j.jad.2022.11.060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 10/22/2022] [Accepted: 11/20/2022] [Indexed: 12/05/2022]
Abstract
BACKGROUND Increasing depression patients puts great pressure on clinical diagnosis. Audio-based diagnosis is a helpful auxiliary tool for early mass screening. However, current methods consider only speech perception features, ignoring patients' vocal tract changes, which may partly result in the poor recognition. METHODS This work proposes a novel machine speech chain model for depression recognition (MSCDR) that can capture text-independent depressive speech representation from the speaker's mouth to the listener's ear to improve recognition performance. In the proposed MSCDR, linear predictive coding (LPC) and Mel-frequency cepstral coefficients (MFCC) features are extracted to describe the processes of speech generation and of speech perception, respectively. Then, a one-dimensional convolutional neural network and a long short-term memory network sequentially capture intra- and inter-segment dynamic depressive features for classification. RESULTS We tested the MSCDR on two public datasets with different languages and paradigms, namely, the Distress Analysis Interview Corpus-Wizard of Oz and the Multi-modal Open Dataset for Mental-disorder Analysis. The accuracy of the MSCDR on the two datasets was 0.77 and 0.86, and the average F1 score was 0.75 and 0.86, which were better than the other existing methods. This improvement reveals the complementarity of speech production and perception features in carrying depressive information. LIMITATIONS The sample size was relatively small, which may limit the application in clinical translation to some extent. CONCLUSION This experiment proves the good generalization ability and superiority of the proposed MSCDR and suggests that the vocal tract changes in patients with depression deserve attention for audio-based depression diagnosis.
Collapse
Affiliation(s)
- Minghao Du
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
| | - Shuang Liu
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China.
| | - Tao Wang
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
| | - Wenquan Zhang
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
| | - Yufeng Ke
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
| | - Long Chen
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
| | - Dong Ming
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China; Lab of Neural Engineering & Rehabilitation, Department of Biomedical Engineering, College of Precision Instruments and Optoelectronics Engineering, Tianjin University, Tianjin, China.
| |
Collapse
|
8
|
Hong K. Classification of emotional stress and physical stress using a multispectral based deep feature extraction model. Sci Rep 2023; 13:2693. [PMID: 36792679 PMCID: PMC9931761 DOI: 10.1038/s41598-023-29903-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 02/13/2023] [Indexed: 02/17/2023] Open
Abstract
A classification model (Stress Classification-Net) of emotional stress and physical stress is proposed, which can extract classification features based on multispectral and tissue blood oxygen saturation (StO2) characteristics. Related features are extracted on this basis, and the learning model with frequency domain and signal amplification is proposed for the first time. Given that multispectral imaging signals are time series data, time series StO2 is extracted from spectral signals. The proper region of interest (ROI) is obtained by a composite criterion, and the ROI source is determined by the universality and robustness of the signal. The frequency-domain signals of ROI are further obtained by wavelet transform. To fully utilize the frequency-domain characteristics, the multi-neighbor vector of locally aggregated descriptors (MN-VLAD) model is proposed to extract useful features. The acquired time series features are finally put into the long short-term memory (LSTM) model to learn the classification characteristics. Through SC-NET model, the classification signals of emotional stress and physical stress are successfully obtained. Experiments show that the classification result is encouraging, and the accuracy of the proposed algorithm is over 90%.
Collapse
Affiliation(s)
- Kan Hong
- Jiangxi University of Finance and Economics, Nanchang, China.
| |
Collapse
|
9
|
Eysenbach G, Jang EH, Lee SH, Choi KY, Park JG, Shin HC. Automatic Depression Detection Using Smartphone-Based Text-Dependent Speech Signals: Deep Convolutional Neural Network Approach. J Med Internet Res 2023; 25:e34474. [PMID: 36696160 PMCID: PMC9909514 DOI: 10.2196/34474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 05/20/2022] [Accepted: 12/18/2022] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Automatic diagnosis of depression based on speech can complement mental health treatment methods in the future. Previous studies have reported that acoustic properties can be used to identify depression. However, few studies have attempted a large-scale differential diagnosis of patients with depressive disorders using acoustic characteristics of non-English speakers. OBJECTIVE This study proposes a framework for automatic depression detection using large-scale acoustic characteristics based on the Korean language. METHODS We recruited 153 patients who met the criteria for major depressive disorder and 165 healthy controls without current or past mental illness. Participants' voices were recorded on a smartphone while performing the task of reading predefined text-based sentences. Three approaches were evaluated and compared to detect depression using data sets with text-dependent read speech tasks: conventional machine learning models based on acoustic features, a proposed model that trains and classifies log-Mel spectrograms by applying a deep convolutional neural network (CNN) with a relatively small number of parameters, and models that train and classify log-Mel spectrograms by applying well-known pretrained networks. RESULTS The acoustic characteristics of the predefined text-based sentence reading automatically detected depression using the proposed CNN model. The highest accuracy achieved with the proposed CNN on the speech data was 78.14%. Our results show that the deep-learned acoustic characteristics lead to better performance than those obtained using the conventional approach and pretrained models. CONCLUSIONS Checking the mood of patients with major depressive disorder and detecting the consistency of objective descriptions are very important research topics. This study suggests that the analysis of speech data recorded while reading text-dependent sentences could help predict depression status automatically by capturing the characteristics of depression. Our method is smartphone based, is easily accessible, and can contribute to the automatic identification of depressive states.
Collapse
Affiliation(s)
| | - Eun Hye Jang
- Medical Information Research Section, Electronics and Telecommunications Research Institute, Dajeon, Republic of Korea
| | - Seung-Hwan Lee
- Clinical Emotion and Cognition Research Laboratory, Inje University, Goyang, Republic of Korea.,Department of Psychiatry, Inje University, Ilsan-Paik Hospital, Goyang, Republic of Korea.,Bwave Inc, Goyang, Republic of Korea
| | - Kwang-Yeon Choi
- Department of Psychiatry, College of Medicine, Chungnam National University, Daejeon, Republic of Korea
| | - Jeon Gue Park
- Artificial Intelligence Research Laboratory, Electronics and Telecommunications Research Institute, Dajeon, Republic of Korea.,Tutorus Labs Inc, Seoul, Republic of Korea
| | - Hyun-Chool Shin
- Department of Electronics Engineering, Soongsil University, Seoul, Republic of Korea
| |
Collapse
|
10
|
Barua PD, Vicnesh J, Lih OS, Palmer EE, Yamakawa T, Kobayashi M, Acharya UR. Artificial intelligence assisted tools for the detection of anxiety and depression leading to suicidal ideation in adolescents: a review. Cogn Neurodyn 2022:1-22. [PMID: 36467993 PMCID: PMC9684805 DOI: 10.1007/s11571-022-09904-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 09/26/2022] [Accepted: 10/17/2022] [Indexed: 11/24/2022] Open
Abstract
Epidemiological studies report high levels of anxiety and depression amongst adolescents. These psychiatric conditions and complex interplays of biological, social and environmental factors are important risk factors for suicidal behaviours and suicide, which show a peak in late adolescence and early adulthood. Although deaths by suicide have fallen globally in recent years, suicide deaths are increasing in some countries, such as the US. Suicide prevention is a challenging global public health problem. Currently, there aren't any validated clinical biomarkers for suicidal diagnosis, and traditional methods exhibit limitations. Artificial intelligence (AI) is budding in many fields, including in the diagnosis of medical conditions. This review paper summarizes recent studies (past 8 years) that employed AI tools for the automated detection of depression and/or anxiety disorder and discusses the limitations and effects of some modalities. The studies assert that AI tools produce promising results and could overcome the limitations of traditional diagnostic methods. Although using AI tools for suicidal ideation exhibits limitations, these are outweighed by the advantages. Thus, this review article also proposes extracting a fusion of features such as facial images, speech signals, and visual and clinical history features from deep models for the automated detection of depression and/or anxiety disorder in individuals, for future work. This may pave the way for the identification of individuals with suicidal thoughts.
Collapse
Affiliation(s)
- Prabal Datta Barua
- School of Management and Enterprise, University of Southern Queensland, Springfield, Australia
| | - Jahmunah Vicnesh
- Department of Electronics and Computer Engineering, Ngee Ann Polytechnic, Singapore, Singapore
| | - Oh Shu Lih
- Department of Electronics and Computer Engineering, Ngee Ann Polytechnic, Singapore, Singapore
| | - Elizabeth Emma Palmer
- Discipline of Pediatric and Child Health, School of Clinical Medicine, University of New South Wales, Kensington, Australia
- Sydney Children’s Hospitals Network, Sydney, Australia
| | - Toshitaka Yamakawa
- Department of Computer Science and Electrical Engineering, Kumamoto University, Kumamoto, Japan
| | - Makiko Kobayashi
- Department of Computer Science and Electrical Engineering, Kumamoto University, Kumamoto, Japan
| | - Udyavara Rajendra Acharya
- Department of Electronics and Computer Engineering, Ngee Ann Polytechnic, Singapore, Singapore
- School of Science and Technology, Singapore University of Social Sciences, Singapore, Singapore
- Department of Bioinformatics and Medical Engineering, Asia University, Taizhong, Taiwan
- International Research Organization for Advanced Science and Technology (IROAST), Kumamoto University, Kumamoto, Japan
| |
Collapse
|
11
|
Exploration of Despair Eccentricities Based on Scale Metrics with Feature Sampling Using a Deep Learning Algorithm. Diagnostics (Basel) 2022; 12:diagnostics12112844. [PMID: 36428903 PMCID: PMC9689169 DOI: 10.3390/diagnostics12112844] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 11/11/2022] [Accepted: 11/15/2022] [Indexed: 11/19/2022] Open
Abstract
The majority of people in the modern biosphere struggle with depression as a result of the coronavirus pandemic's impact, which has adversely impacted mental health without warning. Even though the majority of individuals are still protected, it is crucial to check for post-corona virus symptoms if someone is feeling a little lethargic. In order to identify the post-coronavirus symptoms and attacks that are present in the human body, the recommended approach is included. When a harmful virus spreads inside a human body, the post-diagnosis symptoms are considerably more dangerous, and if they are not recognised at an early stage, the risks will be increased. Additionally, if the post-symptoms are severe and go untreated, it might harm one's mental health. In order to prevent someone from succumbing to depression, the technology of audio prediction is employed to recognise all the symptoms and potentially dangerous signs. Different choral characters are used to combine machine-learning algorithms to determine each person's mental state. Design considerations are made for a separate device that detects audio attribute outputs in order to evaluate the effectiveness of the suggested technique; compared to the previous method, the performance metric is substantially better by roughly 67%.
Collapse
|
12
|
Chen ZS, Kulkarni P(P, Galatzer-Levy IR, Bigio B, Nasca C, Zhang Y. Modern views of machine learning for precision psychiatry. PATTERNS (NEW YORK, N.Y.) 2022; 3:100602. [PMID: 36419447 PMCID: PMC9676543 DOI: 10.1016/j.patter.2022.100602] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
In light of the National Institute of Mental Health (NIMH)'s Research Domain Criteria (RDoC), the advent of functional neuroimaging, novel technologies and methods provide new opportunities to develop precise and personalized prognosis and diagnosis of mental disorders. Machine learning (ML) and artificial intelligence (AI) technologies are playing an increasingly critical role in the new era of precision psychiatry. Combining ML/AI with neuromodulation technologies can potentially provide explainable solutions in clinical practice and effective therapeutic treatment. Advanced wearable and mobile technologies also call for the new role of ML/AI for digital phenotyping in mobile mental health. In this review, we provide a comprehensive review of ML methodologies and applications by combining neuroimaging, neuromodulation, and advanced mobile technologies in psychiatry practice. We further review the role of ML in molecular phenotyping and cross-species biomarker identification in precision psychiatry. We also discuss explainable AI (XAI) and neuromodulation in a closed human-in-the-loop manner and highlight the ML potential in multi-media information extraction and multi-modal data fusion. Finally, we discuss conceptual and practical challenges in precision psychiatry and highlight ML opportunities in future research.
Collapse
Affiliation(s)
- Zhe Sage Chen
- Department of Psychiatry, New York University Grossman School of Medicine, New York, NY 10016, USA
- Department of Neuroscience and Physiology, New York University Grossman School of Medicine, New York, NY 10016, USA
- The Neuroscience Institute, New York University Grossman School of Medicine, New York, NY 10016, USA
- Department of Biomedical Engineering, New York University Tandon School of Engineering, Brooklyn, NY 11201, USA
| | | | - Isaac R. Galatzer-Levy
- Department of Psychiatry, New York University Grossman School of Medicine, New York, NY 10016, USA
- Meta Reality Lab, New York, NY, USA
| | - Benedetta Bigio
- Department of Psychiatry, New York University Grossman School of Medicine, New York, NY 10016, USA
| | - Carla Nasca
- Department of Psychiatry, New York University Grossman School of Medicine, New York, NY 10016, USA
- The Neuroscience Institute, New York University Grossman School of Medicine, New York, NY 10016, USA
| | - Yu Zhang
- Department of Bioengineering, Lehigh University, Bethlehem, PA 18015, USA
- Department of Electrical and Computer Engineering, Lehigh University, Bethlehem, PA 18015, USA
| |
Collapse
|
13
|
Othmani A, Zeghina AO, Muzammel M. A Model of Normality Inspired Deep Learning Framework for Depression Relapse Prediction Using Audiovisual Data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 226:107132. [PMID: 36183638 DOI: 10.1016/j.cmpb.2022.107132] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 09/04/2022] [Accepted: 09/13/2022] [Indexed: 06/16/2023]
Abstract
BACKGROUND Depression (Major Depressive Disorder) is one of the most common mental illnesses. According to the World Health Organization, more than 300 million people in the world are affected. A first depressive episode can be solved by a spontaneous remission within 6 to 12 months. It has been shown that depression affects speech production and facial expressions. Although numerous studies are proposed in the literature for depression recognition using audiovisual cues, depression relapse using audiovisual cues has not been studied in the literature. METHOD In this paper, we propose a deep learning-based approach for depression recognition and depression relapse prediction using audiovisual data. For more versatility and reusability, the proposed approach is based on a Model of Normality inspired framework where we define depression relapse by the closeness of the audiovisual patterns of a subject after a symptom-free period to the audiovisual patterns of depressed subjects. A model of Normality is an anomaly detection distance-based approach that computes a distance of normality between the deep audiovisual encoding of a test sample and a learned representation from audiovisual encodings of anomaly-free data. RESULTS The proposed approach shows a very promising results with an accuracy of 87.4% and a F1-score of 82.3% for relapse/depression prediction using a Leave-One-Subject-Out training strategy on the DAIC-Woz dataset. CONCLUSION The proposed model of normality-based framework is accurate in detecting depression and in predicting depression relapse. A prospective monitoring system is proposed for assisting depressed patients. The proposed framework is easily extensible and others modalities will be integrated in future works.
Collapse
Affiliation(s)
- Alice Othmani
- Université Paris-Est Créteil (UPEC), LISSI, Vitry sur Seine 94400, France.
| | | | - Muhammad Muzammel
- Université Paris-Est Créteil (UPEC), LISSI, Vitry sur Seine 94400, France
| |
Collapse
|
14
|
Pandey SK, Shekhawat HS, Prasanna SRM, Bhasin S, Jasuja R. A deep tensor-based approach for automatic depression recognition from speech utterances. PLoS One 2022; 17:e0272659. [PMID: 35951508 PMCID: PMC9371305 DOI: 10.1371/journal.pone.0272659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 07/24/2022] [Indexed: 11/26/2022] Open
Abstract
Depression is one of the significant mental health issues affecting all age groups globally. While it has been widely recognized to be one of the major disease burdens in populations, complexities in definitive diagnosis present a major challenge. Usually, trained psychologists utilize conventional methods including individualized interview assessment and manually administered PHQ-8 scoring. However, heterogeneity in symptomatic presentations, which span somatic to affective complaints, impart substantial subjectivity in its diagnosis. Diagnostic accuracy is further compounded by the cross-sectional nature of sporadic assessment methods during physician-office visits, especially since depressive symptoms/severity may evolve over time. With widespread acceptance of smart wearable devices and smartphones, passive monitoring of depression traits using behavioral signals such as speech presents a unique opportunity as companion diagnostics to assist the trained clinicians in objective assessment over time. Therefore, we propose a framework for automated depression classification leveraging alterations in speech patterns in the well documented and extensively studied DAIC-WOZ depression dataset. This novel tensor-based approach requires a substantially simpler implementation architecture and extracts discriminative features for depression recognition with high f1 score and accuracy. We posit that such algorithms, which use significantly less compute load would allow effective onboard deployment in wearables for improve diagnostics accuracy and real-time monitoring of depressive disorders.
Collapse
Affiliation(s)
- Sandeep Kumar Pandey
- Electronics and Electrical Engineering Dept, Indian Institute of Technology Guwahati, Assam, India
| | - Hanumant Singh Shekhawat
- Electronics and Electrical Engineering Dept, Indian Institute of Technology Guwahati, Assam, India
| | - S. R. M. Prasanna
- Electrical Engineering Dept, Indian Institute of Technology Dharwad, Dharwad, Karnataka, India
| | - Shalendar Bhasin
- Brigham and Womens Hospital, Harvard Medical School, Boston, MA, United States of America
| | - Ravi Jasuja
- Brigham and Womens Hospital, Harvard Medical School, Boston, MA, United States of America
- Function promoting Therapies, Waltham, MA, United States of America
| |
Collapse
|
15
|
Wu P, Wang R, Lin H, Zhang F, Tu J, Sun M. Automatic depression recognition by intelligent speech signal processing: A systematic survey. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2022. [DOI: 10.1049/cit2.12113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Pingping Wu
- Jiangsu Key Laboratory of Public Project Audit, School of Engineering Audit Nanjing Audit University Nanjing China
| | - Ruihao Wang
- School of Information Engineering Nanjing Audit University Nanjing China
| | - Han Lin
- Jiangsu Key Laboratory of Public Project Audit, School of Engineering Audit Nanjing Audit University Nanjing China
| | - Fanlong Zhang
- School of Information Engineering Nanjing Audit University Nanjing China
| | - Juan Tu
- Key Laboratory of Modern Acoustics (MOE), School of Physics Nanjing University Nanjing China
| | - Miao Sun
- Faculty of Electrical Engineering, Mathematics & Computer Science Delft University of Technology Delft The Netherlands
| |
Collapse
|
16
|
Saba T, Khan AR, Abunadi I, Bahaj SA, Ali H, Alruwaythi M. Arabic Speech Analysis for Classification and Prediction of Mental Illness due to Depression Using Deep Learning. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:8622022. [PMID: 35669665 PMCID: PMC9166990 DOI: 10.1155/2022/8622022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 03/30/2022] [Accepted: 04/18/2022] [Indexed: 11/18/2022]
Abstract
Depression is a global prevalent ailment for possible mental illness or mental disorder globally. Recognizing depressed early signs is critical for evaluating and preventing mental illness. With the progress of machine learning, it is possible to make intelligent systems capable of detecting depressive symptoms using speech analysis. This study presents a hybrid model to identify and predict mental illness from Arabic speech analysis due to depression. The proposed hybrid model comprises convolutional neural network (CNN) and a support vector machine (SVM) to identify and predict mental disorders. Experiments are performed on the Arabic speech benchmark data set of 200 speeches. A total of 70% of data were reserved for training, while 30% of data were to test the proposed model. The hybrid model (CNN + SVM) attained a 90.0% and 91.60% accuracy rate to predict the depression from Arabic speech analysis for training and testing stages. To authenticate the results of a proposed hybrid model, recurrent neural network (RNN) and CNN are also applied to the same data set individually, and the results are compared with each other. The RNN achieved an 80.70% and 81.60% accuracy rate to predict depression while speaking in the training and testing stages. The CNN predicted the depression in the training and testing stages with 88.50% and 86.60% accuracy rates. Based on the analysis, the proposed hybrid model secured better prediction results than individual RNN and CNN models on the same data set. Furthermore, the suggested model had a lower FPR, FNR, and higher accuracy, AUC, sensitivity, and specificity rate than individual RNN, CNN model performance in predicting depression. Finally, the achieved findings will be helpful to classify depression while speaking Arabic/speech and will be beneficial for physicians, psychiatrists, and psychologists in the detection of depression.
Collapse
Affiliation(s)
- Tanzila Saba
- Artificial Intelligence and Data Analytics Lab CCIS, Prince Sultan University, Riyadh, Saudi Arabia
| | - Amjad Rehman Khan
- Artificial Intelligence and Data Analytics Lab CCIS, Prince Sultan University, Riyadh, Saudi Arabia
| | - Ibrahim Abunadi
- Artificial Intelligence and Data Analytics Lab CCIS, Prince Sultan University, Riyadh, Saudi Arabia
| | - Saeed Ali Bahaj
- MIS Department College of Business Administration, Prince Sattam Bin Abdulaziz University, Alkharj 11942, Saudi Arabia
| | - Haider Ali
- Department of Statistics, University of Gujrat, Gujrat, Pakistan
| | - Maryam Alruwaythi
- Artificial Intelligence and Data Analytics Lab CCIS, Prince Sultan University, Riyadh, Saudi Arabia
| |
Collapse
|
17
|
Punithavathi R, Sharmila M, Avudaiappan T, Raj II, Kanchana S, Mamo SA. Empirical Investigation for Predicting Depression from Different Machine Learning Based Voice Recognition Techniques. EVIDENCE-BASED COMPLEMENTARY AND ALTERNATIVE MEDICINE : ECAM 2022; 2022:6395860. [PMID: 35432567 PMCID: PMC9010190 DOI: 10.1155/2022/6395860] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Accepted: 03/17/2022] [Indexed: 11/25/2022]
Abstract
Over the past few decades, the rate of diagnosing depression and mental illness among youths in both genders has been emerging as a challenging issue in the present society. Adequate numbers of cases that have been prevailing had unheard of symptoms linked to mental depression that are able to be detected using their voice recordings and their messages in social media websites. Due to the wide spread usage of mobile phones, services and social sites emotion prediction and analyzing have been an indispensable part of providing vital care for the eminence of youth's life. In addition to dynamicity and popularity of mobile applications and services, it is really a challenge to provide an emotion prediction system that can collect, analyze, and process emotional communications in real time and as well as in a highly accurate manner with minimal computation time. Few depression prediction researchers have analyzed and examined that various social networking sites and its activities may be merged to low self-confidence, particularly in young people and adolescents. Moreover, the researchers suggest that several objective voice acoustic measures affected by depression can be detected reliably over the smart phones. And also in some observational study, it is stated that speech samples of patients from the telephone were obtained each week using an IVR system, and voice recording files from smart phones have been under process for predicting the depression. Such that several telephonic standards for obtaining voice data were identified as a crucial factor influencing the reliability and eminence of speech data. Hence, this article investigates on different process applied in different machine learning algorithms in recognizing voice signals which in turn will be used for scrutinizing the techniques for detecting depression levels in future. This will make a blooming change in the youth's life and solve the social unethical issues in hand.
Collapse
Affiliation(s)
- R. Punithavathi
- Department of Information Technology, M.Kumarasamy College of Engineering (Autonomous), Karur, TN, India
| | - M. Sharmila
- Department of Information Technology, M.Kumarasamy College of Engineering (Autonomous), Karur, TN, India
| | - T. Avudaiappan
- Computer Science and Engineering, K. Ramakrishnan College of Technology, Trichy 621112, India
| | - I. Infant Raj
- Department of Computer Science and Engineering, K. Ramakrishnan College of Technology, Trichy 621112, India
| | - S. Kanchana
- Department of Software Systems, PSG College of Arts & Science, Coimbatore 641014, TN, India
| | - Samson Alemayehu Mamo
- Department of Electrical and Computer Engineering, Faculty of Electrical and Biomedical Engineering, Institute of Technology, Hawassa University, Hawassa, Ethiopia
| |
Collapse
|
18
|
Alghamdi NS, Zakariah M, Hoang VT, Elahi MM. Neurogenerative Disease Diagnosis in Cepstral Domain Using MFCC with Deep Learning. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:4364186. [PMID: 35419079 PMCID: PMC9001083 DOI: 10.1155/2022/4364186] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2022] [Revised: 03/13/2022] [Accepted: 03/17/2022] [Indexed: 11/18/2022]
Abstract
Because underlying cognitive and neuromuscular activities regulate speech signals, biomarkers in the human voice can provide insight into neurological illnesses. Multiple motor and nonmotor aspects of neurologic voice disorders arise from an underlying neurologic condition such as Parkinson's disease, multiple sclerosis, myasthenia gravis, or ALS. Voice problems can be caused by disorders that affect the corticospinal system, cerebellum, basal ganglia, and upper or lower motoneurons. According to a new study, voice pathology detection technologies can successfully aid in the assessment of voice irregularities and enable the early diagnosis of voice pathology. In this paper, we offer two deep-learning-based computational models, 1-dimensional convolutional neural network (1D CNN) and 2-dimensional convolutional neural network (2D CNN), that simultaneously detect voice pathologies caused by neurological illnesses or other causes. From the German corpus Saarbruecken Voice Database (SVD), we used voice recordings of sustained vowel /a/ generated at normal pitch. The collected voice signals are padded and segmented to maintain homogeneity and increase the number of samples. Convolutional layers are applied to raw data, and MFCC features are extracted in this project. Although the 1D CNN had the maximum accuracy of 93.11% on test data, model training produced overfitting and 2D CNN, which generalized the data better and had lower train and validation loss despite having an accuracy of 84.17% on test data. Also, 2D CNN outperforms state-of-the-art studies in the field, implying that a model trained on handcrafted features is better for speech processing than a model that extracts features directly.
Collapse
Affiliation(s)
- Norah Saleh Alghamdi
- Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University, P.O.Box 84428, Riyadh 11671, Saudi Arabia
| | - Mohammed Zakariah
- Department of Computer Science, College of Computer and Information Sciences, King Saud University, P.O. Box 57168, Riyadh 21574, Saudi Arabia
| | - Vinh Truong Hoang
- Faculty of Computer Science, Ho Chi Minh City Open University, 97 Vo Van Tan, Ward Vo Thi Sau, District 3. Ho Chi Minh City: 70000, Vietnam
| | - Mohammad Mamun Elahi
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
| |
Collapse
|
19
|
Robust respiratory disease classification using breathing sounds (RRDCBS) multiple features and models. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-06915-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
20
|
Automated diagnosis of depression from EEG signals using traditional and deep learning approaches: A comparative analysis. Biocybern Biomed Eng 2022. [DOI: 10.1016/j.bbe.2021.12.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
21
|
Wang J, Ravi V, Flint J, Alwan A. Unsupervised Instance Discriminative Learning for Depression Detection from Speech Signals. INTERSPEECH 2022; 2022:2018-2022. [PMID: 36341466 PMCID: PMC9634944 DOI: 10.21437/interspeech.2022-10814] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Major Depressive Disorder (MDD) is a severe illness that affects millions of people, and it is critical to diagnose this disorder as early as possible. Detecting depression from voice signals can be of great help to physicians and can be done without any invasive procedure. Since relevant labelled data are scarce, we propose a modified Instance Discriminative Learning (IDL) method, an unsupervised pre-training technique, to extract augment-invariant and instance-spread-out embeddings. In terms of learning augment-invariant embeddings, various data augmentation methods for speech are investigated, and time-masking yields the best performance. To learn instance-spreadout embeddings, we explore methods for sampling instances for a training batch (distinct speaker-based and random sampling). It is found that the distinct speaker-based sampling provides better performance than the random one, and we hypothesize that this result is because relevant speaker information is preserved in the embedding. Additionally, we propose a novel sampling strategy, Pseudo Instance-based Sampling (PIS), based on clustering algorithms, to enhance spread-out characteristics of the embeddings. Experiments are conducted with DepAudioNet on DAIC-WOZ (English) and CONVERGE (Mandarin) datasets, and statistically significant improvements, with p-value 0.0015 and 0.05, respectively, are observed using PIS in the detection of MDD relative to the baseline without pre-training.
Collapse
Affiliation(s)
- Jinhan Wang
- Dept. of Electrical and Computer Engineering, University of California, Los Angeles, USA
| | - Vijay Ravi
- Dept. of Electrical and Computer Engineering, University of California, Los Angeles, USA
| | - Jonathan Flint
- Dept. of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, USA
| | - Abeer Alwan
- Dept. of Electrical and Computer Engineering, University of California, Los Angeles, USA
| |
Collapse
|
22
|
Mohammed A, Kora R. An effective ensemble deep learning framework for text classification. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2021. [DOI: 10.1016/j.jksuci.2021.11.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
23
|
Martin VP, Rouas JL, Micoulaud-Franchi JA, Philip P, Krajewski J. How to Design a Relevant Corpus for Sleepiness Detection Through Voice? Front Digit Health 2021; 3:686068. [PMID: 34713156 PMCID: PMC8521834 DOI: 10.3389/fdgth.2021.686068] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 08/19/2021] [Indexed: 12/27/2022] Open
Abstract
This article presents research on the detection of pathologies affecting speech through automatic analysis. Voice processing has indeed been used for evaluating several diseases such as Parkinson, Alzheimer, or depression. If some studies present results that seem sufficient for clinical applications, this is not the case for the detection of sleepiness. Even two international challenges and the recent advent of deep learning techniques have still not managed to change this situation. This article explores the hypothesis that the observed average performances of automatic processing find their cause in the design of the corpora. To this aim, we first discuss and refine the concept of sleepiness related to the ground-truth labels. Second, we present an in-depth study of four corpora, bringing to light the methodological choices that have been made and the underlying biases they may have induced. Finally, in light of this information, we propose guidelines for the design of new corpora.
Collapse
Affiliation(s)
- Vincent P. Martin
- Laboratoire Bordelais de Recherche en Informatique, University of Bordeaux, CNRS–UMR 5800, Bordeaux INP, Talence, France
| | - Jean-Luc Rouas
- Laboratoire Bordelais de Recherche en Informatique, University of Bordeaux, CNRS–UMR 5800, Bordeaux INP, Talence, France
| | | | - Pierre Philip
- Sommeil, Addiction et Neuropsychiatrie, University of Bordeaux, CNRS–USR 3413, CHU Pellegrin, Bordeaux, France
| | - Jarek Krajewski
- Engineering Psychology, Rhenish University of Applied Science, Cologne, Germany
| |
Collapse
|
24
|
An Auditory Saliency Pooling-Based LSTM Model for Speech Intelligibility Classification. Symmetry (Basel) 2021. [DOI: 10.3390/sym13091728] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Speech intelligibility is a crucial element in oral communication that can be influenced by multiple elements, such as noise, channel characteristics, or speech disorders. In this paper, we address the task of speech intelligibility classification (SIC) in this last circumstance. Taking our previous works, a SIC system based on an attentional long short-term memory (LSTM) network, as a starting point, we deal with the problem of the inadequate learning of the attention weights due to training data scarcity. For overcoming this issue, the main contribution of this paper is a novel type of weighted pooling (WP) mechanism, called saliency pooling where the WP weights are not automatically learned during the training process of the network, but are obtained from an external source of information, the Kalinli’s auditory saliency model. In this way, it is intended to take advantage of the apparent symmetry between the human auditory attention mechanism and the attentional models integrated into deep learning networks. The developed systems are assessed on the UA-speech dataset that comprises speech uttered by subjects with several dysarthria levels. Results show that all the systems with saliency pooling significantly outperform a reference support vector machine (SVM)-based system and LSTM-based systems with mean pooling and attention pooling, suggesting that Kalinli’s saliency can be successfully incorporated into the LSTM architecture as an external cue for the estimation of the speech intelligibility level.
Collapse
|
25
|
Detecting Deception from Gaze and Speech Using a Multimodal Attention LSTM-Based Framework. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11146393] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
The automatic detection of deceptive behaviors has recently attracted the attention of the research community due to the variety of areas where it can play a crucial role, such as security or criminology. This work is focused on the development of an automatic deception detection system based on gaze and speech features. The first contribution of our research on this topic is the use of attention Long Short-Term Memory (LSTM) networks for single-modal systems with frame-level features as input. In the second contribution, we propose a multimodal system that combines the gaze and speech modalities into the LSTM architecture using two different combination strategies: Late Fusion and Attention-Pooling Fusion. The proposed models are evaluated over the Bag-of-Lies dataset, a multimodal database recorded in real conditions. On the one hand, results show that attentional LSTM networks are able to adequately model the gaze and speech feature sequences, outperforming a reference Support Vector Machine (SVM)-based system with compact features. On the other hand, both combination strategies produce better results than the single-modal systems and the multimodal reference system, suggesting that gaze and speech modalities carry complementary information for the task of deception detection that can be effectively exploited by using LSTMs.
Collapse
|
26
|
Pan W, Liu F. Power enterprise risk identification model based on convolutional neural network and adaptive comparison algorithm. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-219068] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Combined with the actual characteristics of risk identification in electric power enterprises, a convolutional neural network model suitable for load sequence data prediction is determined. Particle Swarm Optimization (PSO) algorithm is used to transform the convolutional neural network (convolutional neural network) to improve the global Optimization ability and convergence speed. Simulation results show that CNN can effectively extract sample information through its convolutional layer and pool layer. After particle swarm optimization, it also achieves good results in prediction accuracy and prediction speed. Secondly, classical interpretation combination model (ISM) is used to analyze the structure of the risk system of electric power enterprises, and the link relationship model of the risk of electric power enterprises is constructed. Through the structural analysis of risk and risk factors, the paper finds out the mutual influence relationship between risk and risk factors, and further finds out the risk chain and risk source. The classical explanatory structure model is extended to the fuzzy set, and then the influence intensity model of power enterprise risk is built. This model considers the influence of risk intensity when analyzing the risk relationship of electric power enterprises, and gives different risk link relations based on different impact intensity. Through comparative analysis, the relationship between the link relationship model and the influence intensity model of the risk of electric power enterprises is obtained. Put forward the sequence similarity matching algorithm based on adaptive search window (ADTW), average algorithm using Piecewise gathered (Piecewise Aggregate Approximation, PAA) strategy for sequence sampling sequence, low precision and low calculation precision sequence alignment of paths, and according to the change of gradient on the low precision of distance matrix forecast path deviation, expand the scope of limiting path search window; Then, the algorithm gradually improves the sequence accuracy, corrects the path in the search window, calculates the new search window, and finally realizes the fast solution of DTW distance and similarity alignment path.
Collapse
Affiliation(s)
- Wei Pan
- Guangzhou Power Supply Bureau of Guangdong Power Grid Co., Ltd. Guangzhou Guangdong, China
| | - Fengwei Liu
- Guangzhou Power Supply Bureau of Guangdong Power Grid Co., Ltd. Guangzhou Guangdong, China
| |
Collapse
|
27
|
Stasak B, Huang Z, Razavi S, Joachim D, Epps J. Automatic Detection of COVID-19 Based on Short-Duration Acoustic Smartphone Speech Analysis. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2021; 5:201-217. [PMID: 33723525 PMCID: PMC7948650 DOI: 10.1007/s41666-020-00090-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Revised: 11/11/2020] [Accepted: 12/03/2020] [Indexed: 12/16/2022]
Abstract
Currently, there is an increasing global need for COVID-19 screening to help reduce the rate of infection and at-risk patient workload at hospitals. Smartphone-based screening for COVID-19 along with other respiratory illnesses offers excellent potential due to its rapid-rollout remote platform, user convenience, symptom tracking, comparatively low cost, and prompt result processing timeframe. In particular, speech-based analysis embedded in smartphone app technology can measure physiological effects relevant to COVID-19 screening that are not yet digitally available at scale in the healthcare field. Using a selection of the Sonde Health COVID-19 2020 dataset, this study examines the speech of COVID-19-negative participants exhibiting mild and moderate COVID-19-like symptoms as well as that of COVID-19-positive participants with mild to moderate symptoms. Our study investigates the classification potential of acoustic features (e.g., glottal, prosodic, spectral) from short-duration speech segments (e.g., held vowel, pataka phrase, nasal phrase) for automatic COVID-19 classification using machine learning. Experimental results indicate that certain feature-task combinations can produce COVID-19 classification accuracy of up to 80% as compared with using the all-acoustic feature baseline (68%). Further, with brute-forced n-best feature selection and speech task fusion, automatic COVID-19 classification accuracy of upwards of 82-86% was achieved, depending on whether the COVID-19-negative participant had mild or moderate COVID-19-like symptom severity.
Collapse
Affiliation(s)
- Brian Stasak
- School of Electrical Engineering & Telecommunications, University of New South Wales, Sydney, NSW Australia
| | - Zhaocheng Huang
- School of Electrical Engineering & Telecommunications, University of New South Wales, Sydney, NSW Australia
| | | | | | - Julien Epps
- School of Electrical Engineering & Telecommunications, University of New South Wales, Sydney, NSW Australia
| |
Collapse
|
28
|
Belouali A, Gupta S, Sourirajan V, Yu J, Allen N, Alaoui A, Dutton MA, Reinhard MJ. Acoustic and language analysis of speech for suicidal ideation among US veterans. BioData Min 2021; 14:11. [PMID: 33531048 PMCID: PMC7856815 DOI: 10.1186/s13040-021-00245-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Accepted: 01/20/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Screening for suicidal ideation in high-risk groups such as U.S. veterans is crucial for early detection and suicide prevention. Currently, screening is based on clinical interviews or self-report measures. Both approaches rely on subjects to disclose their suicidal thoughts. Innovative approaches are necessary to develop objective and clinically applicable assessments. Speech has been investigated as an objective marker to understand various mental states including suicidal ideation. In this work, we developed a machine learning and natural language processing classifier based on speech markers to screen for suicidal ideation in US veterans. METHODOLOGY Veterans submitted 588 narrative audio recordings via a mobile app in a real-life setting. In addition, participants completed self-report psychiatric scales and questionnaires. Recordings were analyzed to extract voice characteristics including prosodic, phonation, and glottal. The audios were also transcribed to extract textual features for linguistic analysis. We evaluated the acoustic and linguistic features using both statistical significance and ensemble feature selection. We also examined the performance of different machine learning algorithms on multiple combinations of features to classify suicidal and non-suicidal audios. RESULTS A combined set of 15 acoustic and linguistic features of speech were identified by the ensemble feature selection. Random Forest classifier, using the selected set of features, correctly identified suicidal ideation in veterans with 86% sensitivity, 70% specificity, and an area under the receiver operating characteristic curve (AUC) of 80%. CONCLUSIONS Speech analysis of audios collected from veterans in everyday life settings using smartphones offers a promising approach for suicidal ideation detection. A machine learning classifier may eventually help clinicians identify and monitor high-risk veterans.
Collapse
Affiliation(s)
- Anas Belouali
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA.
| | - Samir Gupta
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
| | - Vaibhav Sourirajan
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
| | - Jiawei Yu
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
| | - Nathaniel Allen
- War Related Illness and Injury Study Center, Veterans Affairs Medical Center, Washington, DC, USA
| | - Adil Alaoui
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
| | - Mary Ann Dutton
- Department of Psychiatry, Georgetown University Medical Center, Washington, DC, USA
| | - Matthew J Reinhard
- War Related Illness and Injury Study Center, Veterans Affairs Medical Center, Washington, DC, USA
- Department of Psychiatry, Georgetown University Medical Center, Washington, DC, USA
| |
Collapse
|