1
|
Muñoz-Mata BG, Dorantes-Méndez G, Piña-Ramírez O. Classification of Parkinson's disease severity using gait stance signals in a spatiotemporal deep learning classifier. Med Biol Eng Comput 2024:10.1007/s11517-024-03148-2. [PMID: 38884852 DOI: 10.1007/s11517-024-03148-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 06/03/2024] [Indexed: 06/18/2024]
Abstract
Parkinson's disease (PD) is a degenerative nervous system disorder involving motor disturbances. Motor alterations affect the gait according to the progression of PD and can be used by experts in movement disorders to rate the severity of the disease. However, this rating depends on the expertise of the clinical specialist. Therefore, the diagnosis may be inaccurate, particularly in the early stages of PD where abnormal gait patterns can result from normal aging or other medical conditions. Consequently, several classification systems have been developed to enhance PD diagnosis. In this paper, a PD gait severity classification algorithm was developed using vertical ground reaction force (VGRF) signals. The VGRF records used are from a public database that includes 93 PD patients and 72 healthy controls adults. The work presented here focuses on modeling each foot's gait stance phase signals using a modified convolutional long deep neural network (CLDNN) architecture. Subsequently, the results of each model are combined to predict PD severity. The classifier performance was evaluated using ten-fold cross-validation. The best-weighted accuracies obtained were 99.296(0.128)% and 99.343(0.182)%, with the Hoehn-Yahr and UPDRS scales, respectively, outperforming previous results presented in the literature. The classifier proposed here can effectively differentiate gait patterns of different PD severity levels based on gait signals of the stance phase.
Collapse
Affiliation(s)
- Brenda G Muñoz-Mata
- Facultad de Ciencias, Universidad Autónoma de San Luis Potosí, Av. Parque Chapultepec 1570, San Luis Potosí, 78295, San Luis Potosí, México
| | - Guadalupe Dorantes-Méndez
- Facultad de Ciencias, Universidad Autónoma de San Luis Potosí, Av. Parque Chapultepec 1570, San Luis Potosí, 78295, San Luis Potosí, México.
| | - Omar Piña-Ramírez
- Departamento de Bioinformática y Análisis Estadísticos, Instituto Nacional de Perinatología "Isidro Espinosa de los Reyes", Montes Urales 800, Ciudad de México, 11000, Ciudad de México, México
| |
Collapse
|
2
|
Akinpelu S, Viriri S, Adegun A. An enhanced speech emotion recognition using vision transformer. Sci Rep 2024; 14:13126. [PMID: 38849422 PMCID: PMC11161461 DOI: 10.1038/s41598-024-63776-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Accepted: 06/02/2024] [Indexed: 06/09/2024] Open
Abstract
In human-computer interaction systems, speech emotion recognition (SER) plays a crucial role because it enables computers to understand and react to users' emotions. In the past, SER has significantly emphasised acoustic properties extracted from speech signals. The use of visual signals for enhancing SER performance, however, has been made possible by recent developments in deep learning and computer vision. This work utilizes a lightweight Vision Transformer (ViT) model to propose a novel method for improving speech emotion recognition. We leverage the ViT model's capabilities to capture spatial dependencies and high-level features in images which are adequate indicators of emotional states from mel spectrogram input fed into the model. To determine the efficiency of our proposed approach, we conduct a comprehensive experiment on two benchmark speech emotion datasets, the Toronto English Speech Set (TESS) and the Berlin Emotional Database (EMODB). The results of our extensive experiment demonstrate a considerable improvement in speech emotion recognition accuracy attesting to its generalizability as it achieved 98%, 91%, and 93% (TESS-EMODB) accuracy respectively on the datasets. The outcomes of the comparative experiment show that the non-overlapping patch-based feature extraction method substantially improves the discipline of speech emotion recognition. Our research indicates the potential for integrating vision transformer models into SER systems, opening up fresh opportunities for real-world applications requiring accurate emotion recognition from speech compared with other state-of-the-art techniques.
Collapse
Affiliation(s)
- Samson Akinpelu
- School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Durban, 4001, South Africa
| | - Serestina Viriri
- School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Durban, 4001, South Africa.
| | - Adekanmi Adegun
- School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Durban, 4001, South Africa
| |
Collapse
|
3
|
Zhang C, Su L, Li S, Fu Y. Differential Brain Activation for Four Emotions in VR-2D and VR-3D Modes. Brain Sci 2024; 14:326. [PMID: 38671977 PMCID: PMC11048237 DOI: 10.3390/brainsci14040326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 03/10/2024] [Accepted: 03/25/2024] [Indexed: 04/28/2024] Open
Abstract
Similar to traditional imaging, virtual reality (VR) imagery encompasses nonstereoscopic (VR-2D) and stereoscopic (VR-3D) modes. Currently, Russell's emotional model has been extensively studied in traditional 2D and VR-3D modes, but there is limited comparative research between VR-2D and VR-3D modes. In this study, we investigate whether Russell's emotional model exhibits stronger brain activation states in VR-3D mode compared to VR-2D mode. By designing an experiment covering four emotional categories (high arousal-high pleasure (HAHV), high arousal-low pleasure (HALV), low arousal-low pleasure (LALV), and low arousal-high pleasure (LAHV)), EEG signals were collected from 30 healthy undergraduate and graduate students while watching videos in both VR modes. Initially, power spectral density (PSD) computations revealed distinct brain activation patterns in different emotional states across the two modes, with VR-3D videos inducing significantly higher brainwave energy, primarily in the frontal, temporal, and occipital regions. Subsequently, Differential entropy (DE) feature sets, selected via a dual ten-fold cross-validation Support Vector Machine (SVM) classifier, demonstrate satisfactory classification accuracy, particularly superior in the VR-3D mode. The paper subsequently presents a deep learning-based EEG emotion recognition framework, adeptly utilizing the frequency, spatial, and temporal information of EEG data to improve recognition accuracy. The contribution of each individual feature to the prediction probabilities is discussed through machine-learning interpretability based on Shapley values. The study reveals notable differences in brain activation states for identical emotions between the two modes, with VR-3D mode showing more pronounced activation.
Collapse
Affiliation(s)
| | - Lei Su
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China; (C.Z.); (S.L.); (Y.F.)
| | | | | |
Collapse
|
4
|
Pentari A, Kafentzis G, Tsiknakis M. Speech emotion recognition via graph-based representations. Sci Rep 2024; 14:4484. [PMID: 38396002 PMCID: PMC10891082 DOI: 10.1038/s41598-024-52989-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 01/25/2024] [Indexed: 02/25/2024] Open
Abstract
Speech emotion recognition (SER) has gained an increased interest during the last decades as part of enriched affective computing. As a consequence, a variety of engineering approaches have been developed addressing the challenge of the SER problem, exploiting different features, learning algorithms, and datasets. In this paper, we propose the application of the graph theory for classifying emotionally-colored speech signals. Graph theory provides tools for extracting statistical as well as structural information from any time series. We propose to use the mentioned information as a novel feature set. Furthermore, we suggest setting a unique feature-based identity for each emotion belonging to each speaker. The emotion classification is performed by a Random Forest classifier in a Leave-One-Speaker-Out Cross Validation (LOSO-CV) scheme. The proposed method is compared with two state-of-the-art approaches involving well known hand-crafted features as well as deep learning architectures operating on mel-spectrograms. Experimental results on three datasets, EMODB (German, acted) and AESDD (Greek, acted), and DEMoS (Italian, in-the-wild), reveal that our proposed method outperforms the comparative methods in these datasets. Specifically, we observe an average UAR increase of almost [Formula: see text], [Formula: see text] and [Formula: see text], respectively.
Collapse
Affiliation(s)
- Anastasia Pentari
- Institute of Computer Science, Foundation for Research and Technology-Hellas, Heraklion, GR-700 13, Greece.
| | - George Kafentzis
- Computer Science Department, University of Crete, Heraklion, GR-700 13, Greece
| | - Manolis Tsiknakis
- Institute of Computer Science, Foundation for Research and Technology-Hellas, Heraklion, GR-700 13, Greece
- Department of Electrical and Computer Engineering, Hellenic Mediterranean University, Heraklion, Greece
| |
Collapse
|
5
|
Sang B, Wen H, Junek G, Neveu W, Di Francesco L, Ayazi F. An Accelerometer-Based Wearable Patch for Robust Respiratory Rate and Wheeze Detection Using Deep Learning. BIOSENSORS 2024; 14:118. [PMID: 38534225 DOI: 10.3390/bios14030118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 02/17/2024] [Accepted: 02/20/2024] [Indexed: 03/28/2024]
Abstract
Wheezing is a critical indicator of various respiratory conditions, including asthma and chronic obstructive pulmonary disease (COPD). Current diagnosis relies on subjective lung auscultation by physicians. Enabling this capability via a low-profile, objective wearable device for remote patient monitoring (RPM) could offer pre-emptive, accurate respiratory data to patients. With this goal as our aim, we used a low-profile accelerometer-based wearable system that utilizes deep learning to objectively detect wheezing along with respiration rate using a single sensor. The miniature patch consists of a sensitive wideband MEMS accelerometer and low-noise CMOS interface electronics on a small board, which was then placed on nine conventional lung auscultation sites on the patient's chest walls to capture the pulmonary-induced vibrations (PIVs). A deep learning model was developed and compared with a deterministic time-frequency method to objectively detect wheezing in the PIV signals using data captured from 52 diverse patients with respiratory diseases. The wearable accelerometer patch, paired with the deep learning model, demonstrated high fidelity in capturing and detecting respiratory wheezes and patterns across diverse and pertinent settings. It achieved accuracy, sensitivity, and specificity of 95%, 96%, and 93%, respectively, with an AUC of 0.99 on the test set-outperforming the deterministic time-frequency approach. Furthermore, the accelerometer patch outperforms the digital stethoscopes in sound analysis while offering immunity to ambient sounds, which not only enhances data quality and performance for computational wheeze detection by a significant margin but also provides a robust sensor solution that can quantify respiration patterns simultaneously.
Collapse
Affiliation(s)
- Brian Sang
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Haoran Wen
- StethX Microsystems Inc., Atlanta, GA 30308, USA
| | | | - Wendy Neveu
- Department of Medicine, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Lorenzo Di Francesco
- Department of Medicine, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Farrokh Ayazi
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
- StethX Microsystems Inc., Atlanta, GA 30308, USA
| |
Collapse
|
6
|
Xu C, Liu Y, Song W, Liang Z, Chen X. A New Network Structure for Speech Emotion Recognition Research. SENSORS (BASEL, SWITZERLAND) 2024; 24:1429. [PMID: 38474965 DOI: 10.3390/s24051429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 02/04/2024] [Accepted: 02/07/2024] [Indexed: 03/14/2024]
Abstract
Deep learning promotes the breakthrough of emotion recognition in many fields, especially speech emotion recognition (SER). As an important part of speech emotion recognition, the most relevant acoustic feature extraction has always attracted the attention of existing researchers. Aiming at the problem that the emotional information contained in the current speech signals is distributed dispersedly and cannot comprehensively integrate local and global information, this paper presents a network model based on a gated recurrent unit (GRU) and multi-head attention. We evaluate our proposed emotion model on the IEMOCAP and Emo-DB corpora. The experimental results show that the network model based on Bi-GRU and multi-head attention is significantly better than the traditional network model at detecting multiple evaluation indicators. At the same time, we also apply the model to a speech sentiment analysis task. On the CH-SIMS and MOSI datasets, the model shows excellent generalization performance.
Collapse
Affiliation(s)
- Chunsheng Xu
- School of Electronic Information Engineering, Changchun University of Science and Technology, Changchun 130022, China
| | - Yunqing Liu
- School of Electronic Information Engineering, Changchun University of Science and Technology, Changchun 130022, China
| | - Wenjun Song
- School of Electronic Information Engineering, Changchun University of Science and Technology, Changchun 130022, China
| | - Zonglin Liang
- School of Electronic Information Engineering, Changchun University of Science and Technology, Changchun 130022, China
| | - Xing Chen
- School of Electronic Information Engineering, Changchun University of Science and Technology, Changchun 130022, China
| |
Collapse
|
7
|
Eguchi K, Yaguchi H, Kudo I, Kimura I, Nabekura T, Kumagai R, Fujita K, Nakashiro Y, Iida Y, Hamada S, Honma S, Takei A, Moriwaka F, Yabe I. Differentiation of speech in Parkinson's disease and spinocerebellar degeneration using deep neural networks. J Neurol 2024; 271:1004-1012. [PMID: 37989963 DOI: 10.1007/s00415-023-12091-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Revised: 10/29/2023] [Accepted: 10/30/2023] [Indexed: 11/23/2023]
Abstract
INTRODUCTION Assessing dysarthria features in patients with neurodegenerative diseases helps diagnose underlying pathologies. Although deep neural network (DNN) techniques have been widely adopted in various audio processing tasks, few studies have tested whether DNNs can help differentiate neurodegenerative diseases using patients' speech data. This study evaluated whether a DNN model using a transformer architecture could differentiate patients with Parkinson's disease (PD) from patients with spinocerebellar degeneration (SCD) using speech data. METHODS Speech data were obtained from 251 and 101 patients with PD and SCD, respectively, while they read a passage. We fine-tuned a pre-trained DNN model using log-mel spectrograms generated from speech data. The DNN model was trained to predict whether the input spectrogram was generated from patients with PD or SCD. We used fivefold cross-validation to evaluate the predictive performance using the area under the receiver operating characteristic curve (AUC) and accuracy, sensitivity, and specificity. RESULTS Average ± standard deviation of the AUC, accuracy, sensitivity, and specificity of the trained model for the fivefold cross-validation were 0.93 ± 0.04, 0.87 ± 0.03, 0.83 ± 0.05, and 0.89 ± 0.05, respectively. CONCLUSION The DNN model can differentiate speech data of patients with PD from that of patients with SCD with relatively high accuracy and AUC. The proposed method can be used as a non-invasive, easy-to-perform screening method to differentiate PD from SCD using patient speech and is expected to be applied to telemedicine.
Collapse
Affiliation(s)
- Katsuki Eguchi
- Hokuyukai Neurological Hospital, 4-30, 2jo, 2cho-me, Nijuyonken, Nishi-ku, Sapporo, 063-0802, Japan.
- Department of Neurology, Faculty of Medicine and Graduate School of Medicine, Hokkaido University, Kita 15, Nishi 7, Kita-ku, Sapporo, 060-8638, Japan.
| | - Hiroaki Yaguchi
- Department of Neurology, Faculty of Medicine and Graduate School of Medicine, Hokkaido University, Kita 15, Nishi 7, Kita-ku, Sapporo, 060-8638, Japan
| | - Ikue Kudo
- Hokuyukai Neurological Hospital, 4-30, 2jo, 2cho-me, Nijuyonken, Nishi-ku, Sapporo, 063-0802, Japan
| | - Ibuki Kimura
- Hokuyukai Neurological Hospital, 4-30, 2jo, 2cho-me, Nijuyonken, Nishi-ku, Sapporo, 063-0802, Japan
| | - Tomoko Nabekura
- Hokuyukai Neurological Hospital, 4-30, 2jo, 2cho-me, Nijuyonken, Nishi-ku, Sapporo, 063-0802, Japan
| | - Ryuto Kumagai
- Sapporo Parkinson MS Neurological Clinic, Sapporo Kita Sky Building F12, 7-6, Kita 7-Nishi 5, Kita-ku, Sapporo, Hokkaido, 060-0807, Japan
| | - Kenichi Fujita
- Hokuyukai Neurological Hospital, 4-30, 2jo, 2cho-me, Nijuyonken, Nishi-ku, Sapporo, 063-0802, Japan
| | - Yuichi Nakashiro
- Hokuyukai Neurological Hospital, 4-30, 2jo, 2cho-me, Nijuyonken, Nishi-ku, Sapporo, 063-0802, Japan
| | - Yuki Iida
- Hokuyukai Neurological Hospital, 4-30, 2jo, 2cho-me, Nijuyonken, Nishi-ku, Sapporo, 063-0802, Japan
| | - Shinsuke Hamada
- Hokuyukai Neurological Hospital, 4-30, 2jo, 2cho-me, Nijuyonken, Nishi-ku, Sapporo, 063-0802, Japan
| | - Sanae Honma
- Hokuyukai Neurological Hospital, 4-30, 2jo, 2cho-me, Nijuyonken, Nishi-ku, Sapporo, 063-0802, Japan
| | - Asako Takei
- Hokuyukai Neurological Hospital, 4-30, 2jo, 2cho-me, Nijuyonken, Nishi-ku, Sapporo, 063-0802, Japan
| | - Fumio Moriwaka
- Hokuyukai Neurological Hospital, 4-30, 2jo, 2cho-me, Nijuyonken, Nishi-ku, Sapporo, 063-0802, Japan
| | - Ichiro Yabe
- Department of Neurology, Faculty of Medicine and Graduate School of Medicine, Hokkaido University, Kita 15, Nishi 7, Kita-ku, Sapporo, 060-8638, Japan
| |
Collapse
|
8
|
Başaran OT, Can YS, André E, Ersoy C. Relieving the burden of intensive labeling for stress monitoring in the wild by using semi-supervised learning. Front Psychol 2024; 14:1293513. [PMID: 38250116 PMCID: PMC10797089 DOI: 10.3389/fpsyg.2023.1293513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 12/08/2023] [Indexed: 01/23/2024] Open
Abstract
Stress, a natural process affecting individuals' wellbeing, has a profound impact on overall quality of life. Researchers from diverse fields employ various technologies and methodologies to investigate it and alleviate the negative effects of this phenomenon. Wearable devices, such as smart bands, capture physiological data, including heart rate variability, motions, and electrodermal activity, enabling stress level monitoring through machine learning models. However, labeling data for model accuracy assessment poses a significant challenge in stress-related research due to incomplete or inaccurate labels provided by individuals in their daily lives. To address this labeling predicament, our study proposes implementing Semi-Supervised Learning (SSL) models. Through comparisons with deep learning-based supervised models and clustering-based unsupervised models, we evaluate the performance of our SSL models. Our experiments show that our SSL models achieve 77% accuracy with a classifier trained on an augmented dataset prepared using the label propagation (LP) algorithm. Additionally, our deep autoencoder network achieves 76% accuracy. These results highlight the superiority of SSL models over unsupervised learning techniques and their comparable performance to supervised learning models, even with limited labeled data. By relieving the burden of labeling in daily life stress recognition, our study advances stress-related research, recognizing stress as a natural process rather than a disease. This facilitates the development of more efficient and accurate stress monitoring methods in the wild.
Collapse
Affiliation(s)
- Osman Tugay Başaran
- Computer and Communication Systems (CCS) Labs, Telecommunication Networks Group (TKN), Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany
| | - Yekta Said Can
- Faculty of Applied Computer Science, Institute of Computer Science, Universität Augsburg, Augsburg, Germany
| | - Elisabeth André
- Faculty of Applied Computer Science, Institute of Computer Science, Universität Augsburg, Augsburg, Germany
| | - Cem Ersoy
- NETLAB Research Laboratory, Department of Computer Engineering, Bogazici University, Istanbul, Turkey
| |
Collapse
|
9
|
Chugh N, Aggarwal S, Balyan A. The Hybrid Deep Learning Model for Identification of Attention-Deficit/Hyperactivity Disorder Using EEG. Clin EEG Neurosci 2024; 55:22-33. [PMID: 37682533 DOI: 10.1177/15500594231193511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
Common misbehavior among children that prevents them from paying attention to tasks and interacting with their surroundings appropriately is attention-deficit/hyperactivity disorder (ADHD). Studies of children's behavior presently face a significant problem in the early and timely diagnosis of this disease. To diagnose this disease, doctors often use the patient's description and questionnaires, psychological tests, and the patient's behavior in which reliability is questionable. Convolutional neural network (CNN) is one deep learning technique that has been used for the diagnosis of ADHD. CNN, however, does not account for how signals change over time, which leads to low classification performances and ambiguous findings. In this study, the authors designed a hybrid deep learning model that combines long-short-term memory (LSTM) and CNN to simultaneously extract and learn the spatial features and long-term dependencies of the electroencephalography (EEG) data. The effectiveness of the proposed hybrid deep learning model was assessed using 2 publicly available EEG datasets. The suggested model achieves a classification accuracy of 98.86% on the ADHD dataset and 98.28% on the FOCUS dataset, respectively. The experimental findings show that the proposed hybrid CNN-LSTM model outperforms the state-of-the-art methods to diagnose ADHD using EEG. Hence, the proposed hybrid CNN-LSTM model could therefore be utilized to help with the clinical diagnosis of ADHD patients.
Collapse
Affiliation(s)
- Nupur Chugh
- Netaji Subhas Institute of Technology, New Delhi, India
| | - Swati Aggarwal
- Netaji Subhas University of Technology, New Delhi, India
| | - Arnav Balyan
- Netaji Subhas Institute of Technology, New Delhi, India
| |
Collapse
|
10
|
Pyo J, Pachepsky Y, Kim S, Abbas A, Kim M, Kwon YS, Ligaray M, Cho KH. Long short-term memory models of water quality in inland water environments. WATER RESEARCH X 2023; 21:100207. [PMID: 38098887 PMCID: PMC10719578 DOI: 10.1016/j.wroa.2023.100207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 11/08/2023] [Accepted: 11/14/2023] [Indexed: 12/17/2023]
Abstract
Water quality is substantially influenced by a multitude of dynamic and interrelated variables, including climate conditions, landuse and seasonal changes. Deep learning models have demonstrated predictive power of water quality due to the superior ability to automatically learn complex patterns and relationships from variables. Long short-term memory (LSTM), one of deep learning models for water quality prediction, is a type of recurrent neural network that can account for longer-term traits of time-dependent data. It is the most widely applied network used to predict the time series of water quality variables. First, we reviewed applications of a standalone LSTM and discussed its calculation time, prediction accuracy, and good robustness with process-driven numerical models and the other machine learning. This review was expanded into the LSTM model with data pre-processing techniques, including the Complete Ensemble Empirical Mode Decomposition with Adaptive Noise method and Synchrosqueezed Wavelet Transform. The review then focused on the coupling of LSTM with a convolutional neural network, attention network, and transfer learning. The coupled networks demonstrated their performance over the standalone LSTM model. We also emphasized the influence of the static variables in the model and used the transformation method on the dataset. Outlook and further challenges were addressed. The outlook for research and application of LSTM in hydrology concludes the review.
Collapse
Affiliation(s)
- JongCheol Pyo
- Department for Environmental Engineering, Pusan National University, Busan 46241, Republic of Korea
| | - Yakov Pachepsky
- Environmental Microbial and Food Safety Laboratory, USDA-ARS, Beltsville, MD, USA
| | - Soobin Kim
- School of Civil, Urban, Earth, and Environmental Engineering, Ulsan National Institute of Science and Technology, 50 UNIST-gil, Ulju-gun, Ulsan 44919, Republic of Korea
- Disposal Safety Evaluation R&D Division, Korea Atomic Energy Research Institute (KAERI), 111, Daedeok-daero 989 beon-gil, Yuseong-gu, Daejeon 34057, Republic of Korea
| | - Ather Abbas
- Physical Sciences and Engineering, King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Minjeong Kim
- Disposal Safety Evaluation R&D Division, Korea Atomic Energy Research Institute (KAERI), 111, Daedeok-daero 989 beon-gil, Yuseong-gu, Daejeon 34057, Republic of Korea
| | - Yong Sung Kwon
- Environmental Impact Assessment Team, Division of Ecological Assessment Research, National Institute of Ecology, Seocheon, Republic of Korea
| | - Mayzonee Ligaray
- Institute of Environmental Science and Meteorology, College of Science, University of the Philippines Diliman, Quezon City 1101, Philippines
| | - Kyung Hwa Cho
- School of Civil, Environmental and Architectural Engineering, Korea University, Seoul 02841, Republic of Korea
| |
Collapse
|
11
|
Wedasingha N, Samarasinghe P, Senevirathna L, Papandrea M, Puiatti A, Rankin D. Automated anomalous child repetitive head movement identification through transformer networks. Phys Eng Sci Med 2023; 46:1427-1445. [PMID: 37814077 DOI: 10.1007/s13246-023-01309-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 07/24/2023] [Indexed: 10/11/2023]
Abstract
The increasing prevalence of behavioral disorders in children is of growing concern within the medical community. Recognising the significance of early identification and intervention for atypical behaviors, there is a consensus on their pivotal role in improving outcomes. Due to inadequate facilities and a shortage of medical professionals with specialized expertise, traditional diagnostic methods have been unable to effectively address the rising incidence of behavioral disorders. Hence, there is a need to develop automated approaches for the diagnosis of behavioral disorders in children, to overcome the challenges with traditional methods. The purpose of this study is to develop an automated model capable of analyzing videos to differentiate between typical and atypical repetitive head movements in. To address problems resulting from the limited availability of child datasets, various learning methods are employed to mitigate these issues. In this work, we present a fusion of transformer networks, and Non-deterministic Finite Automata (NFA) techniques, which classify repetitive head movements of a child as typical or atypical based on an analysis of gender, age, and type of repetitive head movement, along with count, duration, and frequency of each repetitive head movement. Experimentation was carried out with different transfer learning methods to enhance the performance of the model. The experimental results on five datasets: NIR face dataset, Bosphorus 3D face dataset, ASD dataset, SSBD dataset, and the Head Movements in the Wild dataset, indicate that our proposed model has outperformed many state-of-the-art frameworks when distinguishing typical and atypical repetitive head movements in children.
Collapse
Affiliation(s)
- Nushara Wedasingha
- Faculty of Computing, Sri Lanka Institute of Information Technology, New Kandy Rd, Malabe, 10115, Colombo, Sri Lanka.
| | - Pradeepa Samarasinghe
- Faculty of Computing, Sri Lanka Institute of Information Technology, New Kandy Rd, Malabe, 10115, Colombo, Sri Lanka
| | - Lasantha Senevirathna
- Faculty of Computing, Sri Lanka Institute of Information Technology, New Kandy Rd, Malabe, 10115, Colombo, Sri Lanka
| | - Michela Papandrea
- Information Systems and Networking Institute (ISIN), University of Applied Sciences and Arts of Southern Switzerland, Via Pobiette, Manno, 6928, Switzerland
| | - Alessandro Puiatti
- Institute of Digital Technologies for Personalized Healthcare (MeDiTech), University of Applied Sciences and Arts of Southern Switzerland, Via Pobiette, Manno, 6928, Switzerland
| | - Debbie Rankin
- School of Computing, Engineering and Intelligent Systems, Ulster University, Northland Road, Derry-Londonderry, BT48 7JL, Northern Ireland, UK
| |
Collapse
|
12
|
Xia L, Feng Y, Guo Z, Ding J, Li Y, Li Y, Ma M, Gan G, Xu Y, Luo J, Shi Z, Guan Y. MuLHiTA: A Novel Multiclass Classification Framework With Multibranch LSTM and Hierarchical Temporal Attention for Early Detection of Mental Stress. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:9657-9670. [PMID: 35385389 DOI: 10.1109/tnnls.2022.3159573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Mental stress is an increasingly common psychological issue leading to diseases such as depression, addiction, and heart attack. In this study, an early detection framework based on electroencephalogram (EEG) data is developed for reducing the risk of these diseases. In existing frameworks, signals are often segmented into smaller sections prior to being input to a deep neural network. However, this approach ignores the fundamental nature of EEG signals as a carrier of valuable information (e.g., the integrity of frequency and phase, and temporal fluctuations of EEG components). As such, this type of segmenting may lead to information loss and a failure to effectively identify mental stress levels. Thus, we propose a novel multiclass classification framework termed multibranch LSTM and hierarchical temporal attention (MuLHiTA) for the early identification of mental stress levels. It specifically focuses on not only intraslice (within each slice) but also interslice (between different slices) samples in parallel. This was achieved by including two complementary branches, each of which integrated a specifically designed attention module into a bidirectional long short-term memory (BLSTM) network, enabling extraction of the most discriminative features from interslice and intraslice EEG signals simultaneously. The outputs of attention modules were then summed to obtain a feature representation that contributes to reduce overfitting and more effective multiclass classification. In addition, electrode positions were optimized using neural activity areas under high-stress conditions, thereby reducing computational costs by minimizing the number of critical electrodes. MuLHiTA was evaluated across one private [Montreal imaging stress task (MIST)] and two publicly available EEG datasets [EEG during mental arithmetic tasks (DMAT) and Simultaneous task EEG workload (STEW)]. These were divided into training and test sets using an 8:2 ratio, and the training data were further divided into training and validation sets using a fivefold cross-validation (CV) method, in which the model with the highest accuracy among the five was selected. The model was trained once more with the full training set, and the test data were then used to evaluate its performance. This approach achieved average classification accuracies of 93.58%, 91.80%, and 99.71% for the MIST, STEW, and DMAT datasets, respectively. Experimental results showed MuLHiTA was superior to state-of-the-art algorithms, including EEGNet, BLSTM, EEGLearn, convolutional neural network (CNN)-long short-term memory (LSTM), and convolutional recurrent attention model (CRAM), for multiclass classification. This demonstrates the viability of MuLHiTA for the early detection of mental stress.
Collapse
|
13
|
Chung Y, Lee H. Joint triplet loss with semi-hard constraint for data augmentation and disease prediction using gene expression data. Sci Rep 2023; 13:18178. [PMID: 37875602 PMCID: PMC10598120 DOI: 10.1038/s41598-023-45467-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 10/19/2023] [Indexed: 10/26/2023] Open
Abstract
The accurate prediction of patients with complex diseases, such as Alzheimer's disease (AD), as well as disease stages, including early- and late-stage cancer, is challenging owing to substantial variability among patients and limited availability of clinical data. Deep metric learning has emerged as a promising approach for addressing these challenges by improving data representation. In this study, we propose a joint triplet loss model with a semi-hard constraint (JTSC) to represent data in a small number of samples. JTSC strictly selects semi-hard samples by switching anchors and positive samples during the learning process in triplet embedding and combines a triplet loss function with an angular loss function. Our results indicate that JTSC significantly improves the number of appropriately represented samples during training when applied to the gene expression data of AD and to cancer stage prediction tasks. Furthermore, we demonstrate that using an embedding vector from JTSC as an input to the classifiers for AD and cancer stage prediction significantly improves classification performance by extracting more accurate features. In conclusion, we show that feature embedding through JTSC can aid in classification when there are a small number of samples compared to a larger number of features.
Collapse
Affiliation(s)
- Yeonwoo Chung
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, 61005, Republic of Korea
| | - Hyunju Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, 61005, Republic of Korea.
- Artificial Intelligence Graduate School, Gwangju Institute of Science and Technology, Gwangju, 61005, Republic of Korea.
| |
Collapse
|
14
|
Choi SB, Shin HS, Kim JW. Convolution Neural Networks for Motion Detection with Electrospun Reversibly-Cross-linkable Polymers and Encapsulated Ag Nanowires. ACS APPLIED MATERIALS & INTERFACES 2023; 15:47591-47603. [PMID: 37782487 DOI: 10.1021/acsami.3c11918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/03/2023]
Abstract
This paper presents the design, fabrication, and implementation of a novel composite film, a polybutadiene-based urethane (PBU)/AgNW/PBU sensor (PAPS), demonstrating remarkable mechanical stability and precision in motion detection. The sensor capitalizes on the integration of Ag nanowire (AgNW) electrodes into a neutral plane, embedded within a reversibly cross-linkable PBU polymer. The meticulous arrangement confers pore-free and interfaceless sensor formation, resulting in an enhanced mechanical robustness, reproducibility, and long-term reliability. The PBU polymer is subjected to an electrospinning process, followed by sequential Diels-Alder (DA) and retro-DA reactions to produce a planarized encapsulation layer. This pioneering technology, based on electrospinning, allows for more flawless engineering of the neutral plane as compared to conventional film lamination or layer-by-layer spin-coating processes. This encapsulation, matching the thickness of the preformed PBU film, effectively houses the AgNW electrodes. The PAPS outperforms conventional AgNW/PBU sensors (APS) in terms of mechanical stability and bending insensitivity. When affixed to various body parts, the PAPS generates distinctive signal curves, reflecting the specific body part and degree of motion involved. The PAPS sensor's utility is further magnified by the application of machine learning and deep learning algorithms for signal interpretation. K-means clustering algorithm authenticated the superior reproducibility and consistency of the signals derived from the PAPS over the APS. Deep learning algorithms, including a singular 1D convolutional neural network (1D CNN), long short-term memory (LSTM) network, and dual-layered combinations of 1D CNN + LSTM and LSTM + 1D CNN, were deployed for signal classification. The singular 1D CNN model displayed a classification accuracy exceeding 98%. The PAPS sensor signifies a pivotal development in the field of intelligent motion sensors.
Collapse
Affiliation(s)
- Su Bin Choi
- Department of Smart Fab Technology, Sungkyunkwan University, Suwon 16419, Republic of Korea
| | - Hyun Sik Shin
- Department of Smart Fab Technology, Sungkyunkwan University, Suwon 16419, Republic of Korea
| | - Jong-Woong Kim
- Department of Smart Fab Technology, Sungkyunkwan University, Suwon 16419, Republic of Korea
- School of Mechanical Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea
| |
Collapse
|
15
|
Elmezughi MK, Salih O, Afullo TJ, Duffy KJ. Path loss modeling based on neural networks and ensemble method for future wireless networks. Heliyon 2023; 9:e19685. [PMID: 37809436 PMCID: PMC10558953 DOI: 10.1016/j.heliyon.2023.e19685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 08/26/2023] [Accepted: 08/30/2023] [Indexed: 10/10/2023] Open
Abstract
In light of the technological advancements that require faster data speeds, there has been an increasing demand for higher frequency bands. Consequently, numerous path loss prediction models have been developed for 5G and beyond communication networks, particularly in the millimeter-wave and subterahertz frequency ranges. Despite these efforts, there is a pressing need for more sophisticated models that offer greater flexibility and accuracy, particularly in challenging environments. These advanced models will help in deploying wireless networks with the guarantee of covering communication environments with optimum quality of service. This paper presents path loss prediction models based on machine learning algorithms, namely artificial neural network (ANN), artificial recurrent neural network (RNN) based on long short-term memory (LSTM), shortly known as RNN-LSTM, and convolutional neural network (CNN). Moreover, an ensemble-method-based neural network path loss model is proposed in this paper. Finally, an extensive performance analysis of the four models is provided regarding prediction accuracy, stability, the contribution of input features, and the time needed to run the model. The data used for training and testing in this study were obtained from measurement campaigns conducted in an indoor corridor setting, covering both line-of-sight and non-line-of-sight communication scenarios. The main result of this study demonstrates that the ensemble-method-based model outperforms the other models (ANN, RNN-LSTM, and CNN) in terms of efficiency and high prediction accuracy, and could be trusted as a promising model for path loss in complex environments at high-frequency bands.
Collapse
Affiliation(s)
- Mohamed K. Elmezughi
- The Discipline of Electrical, Electronic and Computer Engineering, University of KwaZulu-Natal, Durban, 4041, South Africa
| | - Omran Salih
- Institute of Systems Science, Durban University of Technology, Durban, 4000, South Africa
| | - Thomas J. Afullo
- The Discipline of Electrical, Electronic and Computer Engineering, University of KwaZulu-Natal, Durban, 4041, South Africa
| | - Kevin J. Duffy
- Institute of Systems Science, Durban University of Technology, Durban, 4000, South Africa
| |
Collapse
|
16
|
Obeso I, Yoon B, Ledbetter D, Aczon M, Laksana E, Zhou A, Eckberg RA, Mertan K, Khemani RG, Wetzel R. A Novel Application of Spectrograms with Machine Learning Can Detect Patient Ventilator Dyssynchrony. Biomed Signal Process Control 2023; 86:105251. [PMID: 37587924 PMCID: PMC10426752 DOI: 10.1016/j.bspc.2023.105251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/18/2023]
Abstract
Patients in intensive care units are frequently supported by mechanical ventilation. There is increasing awareness of patient-ventilator dyssynchrony (PVD), a mismatch between patient respiratory effort and assistance provided by the ventilator, as a risk factor for infection, narcotic exposure, lung injury, and adverse neurocognitive effects. One of the most injurious consequences of PVD are double cycled (DC) breaths when two breaths are delivered by the ventilator instead of one. Prior efforts to identify PVD have limited efficacy. An automated method to identify PVD, independent of clinician expertise, acumen, or time, would potentially permit early, targeted treatment to avoid further harm. We performed secondary analyses of data from a clinical trial of children with acute respiratory distress syndrome. Waveforms of ventilator flow, airway pressure and esophageal manometry were annotated to identify DC breaths and underlying PVD subtypes. Spectrograms were generated from those waveforms to train Convolutional Neural Network (CNN) models in detecting DC and underlying PVD subtypes: Reverse Trigger (RT) and Inadequate Support (IS). The DC breath detection model yielded AUROC of 0.980, while the multi-target detection model for underlying dyssynchrony yielded AUROC of 0.980 (RT) and 0.976 (IS). When operating at 75% sensitivity, DC breath detection had a number needed to alert (NNA) 1.3 (99% specificity), while underlying PVD had a NNA 1.6 (98.5% specificity) for RT and NNA 4.0 (98.2% specificity) for IS. CNNs using spectrograms of ventilator waveforms can identify DC breaths and detect the underlying PVD for targeted clinical interventions.
Collapse
Affiliation(s)
| | | | - David Ledbetter
- Ishmael Obeso, Benjamin Yoon, David Ledbetter, Melissa Aczon, Eugene Laksana, Alice Zhou, Andrew Eckberg, Keith Mertan, Robinder G. Khemani, and Randall Wetzel are with the Children’s Hospital Los Angeles, California
| | - Melissa Aczon
- Ishmael Obeso, Benjamin Yoon, David Ledbetter, Melissa Aczon, Eugene Laksana, Alice Zhou, Andrew Eckberg, Keith Mertan, Robinder G. Khemani, and Randall Wetzel are with the Children’s Hospital Los Angeles, California
| | - Eugene Laksana
- Ishmael Obeso, Benjamin Yoon, David Ledbetter, Melissa Aczon, Eugene Laksana, Alice Zhou, Andrew Eckberg, Keith Mertan, Robinder G. Khemani, and Randall Wetzel are with the Children’s Hospital Los Angeles, California
| | - Alice Zhou
- Ishmael Obeso, Benjamin Yoon, David Ledbetter, Melissa Aczon, Eugene Laksana, Alice Zhou, Andrew Eckberg, Keith Mertan, Robinder G. Khemani, and Randall Wetzel are with the Children’s Hospital Los Angeles, California
| | - R. Andrew Eckberg
- Ishmael Obeso, Benjamin Yoon, David Ledbetter, Melissa Aczon, Eugene Laksana, Alice Zhou, Andrew Eckberg, Keith Mertan, Robinder G. Khemani, and Randall Wetzel are with the Children’s Hospital Los Angeles, California
| | - Keith Mertan
- Ishmael Obeso, Benjamin Yoon, David Ledbetter, Melissa Aczon, Eugene Laksana, Alice Zhou, Andrew Eckberg, Keith Mertan, Robinder G. Khemani, and Randall Wetzel are with the Children’s Hospital Los Angeles, California
| | - Robinder G. Khemani
- Ishmael Obeso, Benjamin Yoon, David Ledbetter, Melissa Aczon, Eugene Laksana, Alice Zhou, Andrew Eckberg, Keith Mertan, Robinder G. Khemani, and Randall Wetzel are with the Children’s Hospital Los Angeles, California
| | - Randall Wetzel
- Ishmael Obeso, Benjamin Yoon, David Ledbetter, Melissa Aczon, Eugene Laksana, Alice Zhou, Andrew Eckberg, Keith Mertan, Robinder G. Khemani, and Randall Wetzel are with the Children’s Hospital Los Angeles, California
| |
Collapse
|
17
|
Liu K, Xie X, Yan J, Zhang S, Zhang H. An adsorption isotherm identification method based on CNN-LSTM neural network. J Mol Model 2023; 29:301. [PMID: 37651008 DOI: 10.1007/s00894-023-05704-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Accepted: 08/21/2023] [Indexed: 09/01/2023]
Abstract
CONTEXT The morphology of adsorption isotherms embodies a wealth of information regarding various adsorption mechanisms, rendering the classification and identification methodologies predicated on the shape of adsorption isotherms indispensably crucial. While research on classification techniques has been extensively developed, traditional methods of adsorption isotherm identification grapple with inefficiencies and a high margin of error. Neural network-based methodologies for adsorption isotherm identification serve as a countermeasure to these shortcomings, as they facilitate swift online identification while delivering precise results. In this paper, we deploy a hybrid of convolutional neural networks (CNN) and long short-term memory (LSTM) networks for the identification of adsorption isotherms. Extensive theoretical adsorption isotherms are generated via adsorption equations, forming a comprehensive training database, thereby circumventing the need for time-consuming and costly repetitive experiments. The F1-score, receiver operating characteristic (ROC) curves, and area under the ROC curve (AUC) are introduced as criteria to evaluate the identification performance and generalization ability of the model during the testing phase. The results highlight the model's superlative performance in the task of adsorption isotherm identification, with accuracy rates of 100% in both the training and validation sets. The mean F1-score obtained from the testing set reached 0.8885, with both macro-average and micro-average AUC exceeding 0.95. METHOD PyCharm was employed as an experimental and testing platform, with Python 3.9 serving as the programming language. TensorFlow 2.11.0 and Keras 2.10.0 were harnessed for the training and testing of CNN-LSTM, while numpy 1.21.5 and scipy 1.81 were utilized for the creation of training and validation datasets.
Collapse
Affiliation(s)
- Kaidi Liu
- School of Energy and Environmental Engineering, University of Science and Technology Beijing, Beijing, 100083, China
| | - Xiaohan Xie
- School of Computer Science, Northwestern Polytechnical University, Xian, 710119, China
| | - Juanting Yan
- School of Energy and Environmental Engineering, University of Science and Technology Beijing, Beijing, 100083, China
| | - Sizong Zhang
- School of Energy and Environmental Engineering, University of Science and Technology Beijing, Beijing, 100083, China
| | - Hui Zhang
- School of Energy and Environmental Engineering, University of Science and Technology Beijing, Beijing, 100083, China.
| |
Collapse
|
18
|
Li H, Lin X, Lu Y, Wang M, Cheng H. Pilot study of contactless sleep apnea detection based on snore signals with hardware implementation. Physiol Meas 2023; 44:085003. [PMID: 37506712 DOI: 10.1088/1361-6579/acebb5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 07/28/2023] [Indexed: 07/30/2023]
Abstract
Objective.Sleep apnea has a high incidence and is a potentially dangerous disease, and its early detection and diagnosis are challenging. Polysomnography (PSG) is considered the best approach for sleep apnea detection, but it requires cumbersome and complicated operations. Thus, it cannot satisfy the family healthcare needs.Approach.To facilitate the initial detection of sleep apnea in the home environment, we developed a sleep apnea classification model based on snoring and hybrid neural network, and implemented the well trained model in an embedded hardware platform. We used snore signals from 32 patients at Shenzhen People's Hospital. The Mel-Fbank features were extracted from snore signals to build a sleep apnea classification model based on Bi-LSTM with attention mechanism.Main results.The proposed model classified snore signals into four types: hypopnea, normal condition, obstructive sleep apnea, and central sleep apnea, with 83.52% and 62.31% accuracies, corresponding to the subject-dependence and subject-independence validation, respectively. After pruning and model quantization, at the cost of 0.81% and 0.95% accuracy loss of the subject dependence and subject independence classification, respectively, the number of model parameters and model storage space were reduced by 32.12% and 60.37%, respectively. The model exhibited accuracies of 82.71% and 61.36% based on the subject dependence and subject independence validations, respectively. When the well trained model was successfully porting and running on an STM32 ARM-embedded platform, the model accuracy was 58.85% for the four classifications based on leave-one-subject-out validation.Significance.The proposed sleep apnea detection model can be used in home healthcare for the initial detection of sleep apnea.
Collapse
Affiliation(s)
- Heng Li
- Shenzhen Key Laboratory of IoT Key Technology, Harbin Institute of Technology, Shenzhen 518055, People's Republic of China
| | - Xu Lin
- Shenzhen Key Laboratory of IoT Key Technology, Harbin Institute of Technology, Shenzhen 518055, People's Republic of China
| | - Yun Lu
- Shenzhen Key Laboratory of IoT Key Technology, Harbin Institute of Technology, Shenzhen 518055, People's Republic of China
- School of Computer Science and Engineering, Huizhou University, Huizhou, Guangdong 516007, People's Republic of China
| | - Mingjiang Wang
- Shenzhen Key Laboratory of IoT Key Technology, Harbin Institute of Technology, Shenzhen 518055, People's Republic of China
| | - Hanrong Cheng
- Department of Sleep Medicine, Shenzhen People's Hospital, The Second Clinical Medical College of Jinan University, The First Affiliated Hospital of Southern University of Science and Technology, Shenzhen, Guangdong, People's Republic of China
| |
Collapse
|
19
|
Ullah R, Asif M, Shah WA, Anjam F, Ullah I, Khurshaid T, Wuttisittikulkij L, Shah S, Ali SM, Alibakhshikenari M. Speech Emotion Recognition Using Convolution Neural Networks and Multi-Head Convolutional Transformer. SENSORS (BASEL, SWITZERLAND) 2023; 23:6212. [PMID: 37448062 DOI: 10.3390/s23136212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Revised: 05/26/2023] [Accepted: 06/04/2023] [Indexed: 07/15/2023]
Abstract
Speech emotion recognition (SER) is a challenging task in human-computer interaction (HCI) systems. One of the key challenges in speech emotion recognition is to extract the emotional features effectively from a speech utterance. Despite the promising results of recent studies, they generally do not leverage advanced fusion algorithms for the generation of effective representations of emotional features in speech utterances. To address this problem, we describe the fusion of spatial and temporal feature representations of speech emotion by parallelizing convolutional neural networks (CNNs) and a Transformer encoder for SER. We stack two parallel CNNs for spatial feature representation in parallel to a Transformer encoder for temporal feature representation, thereby simultaneously expanding the filter depth and reducing the feature map with an expressive hierarchical feature representation at a lower computational cost. We use the RAVDESS dataset to recognize eight different speech emotions. We augment and intensify the variations in the dataset to minimize model overfitting. Additive White Gaussian Noise (AWGN) is used to augment the RAVDESS dataset. With the spatial and sequential feature representations of CNNs and the Transformer, the SER model achieves 82.31% accuracy for eight emotions on a hold-out dataset. In addition, the SER system is evaluated with the IEMOCAP dataset and achieves 79.42% recognition accuracy for five emotions. Experimental results on the RAVDESS and IEMOCAP datasets show the success of the presented SER system and demonstrate an absolute performance improvement over the state-of-the-art (SOTA) models.
Collapse
Affiliation(s)
- Rizwan Ullah
- Wireless Communication Ecosystem Research Unit, Department of Electrical Engineering, Chulalongkorn University, Bangkok 10330, Thailand
| | - Muhammad Asif
- Department of Electrical Engineering, Main Campus, University of Science & Technology, Bannu 28100, Pakistan
| | - Wahab Ali Shah
- Department of Electrical Engineering, Namal University, Mianwali 42250, Pakistan
| | - Fakhar Anjam
- Department of Electrical Engineering, Main Campus, University of Science & Technology, Bannu 28100, Pakistan
| | - Ibrar Ullah
- Department of Electrical Engineering, Kohat Campus, University of Engineering and Technology Peshawar, Kohat 25000, Pakistan
| | - Tahir Khurshaid
- Department of Electrical Engineering, Yeungnam University, Gyeongsan 38541, Republic of Korea
| | - Lunchakorn Wuttisittikulkij
- Wireless Communication Ecosystem Research Unit, Department of Electrical Engineering, Chulalongkorn University, Bangkok 10330, Thailand
| | - Shashi Shah
- Wireless Communication Ecosystem Research Unit, Department of Electrical Engineering, Chulalongkorn University, Bangkok 10330, Thailand
| | - Syed Mansoor Ali
- Department of Physics and Astronomy, College of Science, King Saud University, P.O. Box 2455, Riyadh 11451, Saudi Arabia
| | - Mohammad Alibakhshikenari
- Department of Signal Theory and Communications, Universidad Carlos III de Madrid, Leganés, 28911 Madrid, Spain
| |
Collapse
|
20
|
Qayyum A, Razzak I, Tanveer M, Mazher M, Alhaqbani B. High-Density Electroencephalography and Speech Signal Based Deep Framework for Clinical Depression Diagnosis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2587-2597. [PMID: 37028339 DOI: 10.1109/tcbb.2023.3257175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Depression is a mental disorder characterized by persistent depressed mood or loss of interest in performing activities, causing significant impairment in daily routine. Possible causes include psychological, biological, and social sources of distress. Clinical depression is the more-severe form of depression, also known as major depression or major depressive disorder. Recently, electroencephalography and speech signals have been used for early diagnosis of depression; however, they focus on moderate or severe depression. We have combined audio spectrogram and multiple frequencies of EEG signals to improve diagnostic performance. To do so, we have fused different levels of speech and EEG features to generate descriptive features and applied vision transformers and various pre-trained networks on the speech and EEG spectrum. We have conducted extensive experiments on Multimodal Open Dataset for Mental-disorder Analysis (MODMA) dataset, which showed significant improvement in performance in depression diagnosis (0.972, 0.973 and 0.973 precision, recall and F1 score respectively) for patients at the mild stage. Besides, we provided a web-based framework using Flask and provided the source code publicly.1.
Collapse
|
21
|
Goumiri S, Benboudjema D, Pieczynski W. A new hybrid model of convolutional neural networks and hidden Markov chains for image classification. Neural Comput Appl 2023; 35:1-16. [PMID: 37362578 PMCID: PMC10230497 DOI: 10.1007/s00521-023-08644-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 05/02/2023] [Indexed: 06/28/2023]
Abstract
Convolutional neural networks (CNNs) have lately proven to be extremely effective in image recognition. Besides CNN, hidden Markov chains (HMCs) are probabilistic models widely used in image processing. This paper presents a new hybrid model composed of both CNNs and HMCs. The CNN model is used for feature extraction and dimensionality reduction and the HMC model for classification. In the new model, named CNN-HMC, convolutional and pooling layers of the CNN model are applied to extract features maps. Also a Peano scan is applied to obtain several HMCs. Expectation-Maximization (EM) algorithm is used to estimate HMC's parameters and to make the Bayesian Maximum Posterior Mode (MPM) classification method used unsupervised. The objective is to enhance the performances of the CNN models for the image classification task. To evaluate the performance of our proposal, it is compared to six models in two series of experiments. In the first series, we consider two CNN-HMC and compare them to two CNNs, 4Conv and Mini AlexNet, respectively. The results show that CNN-HMC model outperforms the classical CNN model, and significantly improves the accuracy of the Mini AlexNet. In the second series, it is compared to four models CNN-SVMs, CNN-LSTMs, CNN-RFs, and CNN-gcForests, which only differ from CNN-HMC by the second classification step. Based on five datasets and four metrics recall, precision, F1-score, and accuracy, results of these comparisons show again the interest of the proposed CNN-HMC. In particular, with a CNN model of 71% of accuracy, the CNN-HMC gives an accuracy ranging between 81.63% and 92.5%.
Collapse
Affiliation(s)
- Soumia Goumiri
- Laboratoire des Méthodes de Conception de Systèmes (LMCS), Ecole nationale Supérieure d’Informatique (ESI), BP, 68M Oued-Smar, 16270 Alger, Algeria
- CERIST, Centre de Recherche sur l’Information Scientifique et Technique, Ben Aknoun, 16030 Algeria
| | - Dalila Benboudjema
- Laboratoire des Méthodes de Conception de Systèmes (LMCS), Ecole nationale Supérieure d’Informatique (ESI), BP, 68M Oued-Smar, 16270 Alger, Algeria
| | - Wojciech Pieczynski
- SAMOVAR, Telecom SudParis, Institut Polytechnique de Paris, 91120 Palaiseau, France
| |
Collapse
|
22
|
Lukac M, Zhambulova G, Abdiyeva K, Lewis M. Study on emotion recognition bias in different regional groups. Sci Rep 2023; 13:8414. [PMID: 37225756 PMCID: PMC10209154 DOI: 10.1038/s41598-023-34932-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 05/10/2023] [Indexed: 05/26/2023] Open
Abstract
Human-machine communication can be substantially enhanced by the inclusion of high-quality real-time recognition of spontaneous human emotional expressions. However, successful recognition of such expressions can be negatively impacted by factors such as sudden variations of lighting, or intentional obfuscation. Reliable recognition can be more substantively impeded due to the observation that the presentation and meaning of emotional expressions can vary significantly based on the culture of the expressor and the environment within which the emotions are expressed. As an example, an emotion recognition model trained on a regionally-specific database collected from North America might fail to recognize standard emotional expressions from another region, such as East Asia. To address the problem of regional and cultural bias in emotion recognition from facial expressions, we propose a meta-model that fuses multiple emotional cues and features. The proposed approach integrates image features, action level units, micro-expressions and macro-expressions into a multi-cues emotion model (MCAM). Each of the facial attributes incorporated into the model represents a specific category: fine-grained content-independent features, facial muscle movements, short-term facial expressions and high-level facial expressions. The results of the proposed meta-classifier (MCAM) approach show that a) the successful classification of regional facial expressions is based on non-sympathetic features b) learning the emotional facial expressions of some regional groups can confound the successful recognition of emotional expressions of other regional groups unless it is done from scratch and c) the identification of certain facial cues and features of the data-sets that serve to preclude the design of the perfect unbiased classifier. As a result of these observations we posit that to learn certain regional emotional expressions, other regional expressions first have to be "forgotten".
Collapse
Affiliation(s)
- Martin Lukac
- Department of Computer Science, Nazarbayev University, Kabanbay Batyr 53, Astana, 010000, Kazakhstan.
| | - Gulnaz Zhambulova
- Department of Computer Science, Nazarbayev University, Kabanbay Batyr 53, Astana, 010000, Kazakhstan
| | - Kamila Abdiyeva
- Department of Computer Science, Nazarbayev University, Kabanbay Batyr 53, Astana, 010000, Kazakhstan
| | - Michael Lewis
- Department of Computer Science, Nazarbayev University, Kabanbay Batyr 53, Astana, 010000, Kazakhstan
| |
Collapse
|
23
|
Sun P, Wang J, Dong Z. CNN-LSTM Neural Network for Identification of Pre-Cooked Pasta Products in Different Physical States Using Infrared Spectroscopy. SENSORS (BASEL, SWITZERLAND) 2023; 23:4815. [PMID: 37430729 DOI: 10.3390/s23104815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 05/02/2023] [Accepted: 05/13/2023] [Indexed: 07/12/2023]
Abstract
Infrared (IR) spectroscopy is nondestructive, fast, and straightforward. Recently, a growing number of pasta companies have been using IR spectroscopy combined with chemometrics to quickly determine sample parameters. However, fewer models have used deep learning models to classify cooked wheat food products and even fewer have used deep learning models to classify Italian pasta. To solve these problems, an improved CNN-LSTM neural network is proposed to identify pasta in different physical states (frozen vs. thawed) using IR spectroscopy. A one-dimensional convolutional neural network (1D-CNN) and long short-term memory (LSTM) were constructed to extract the local abstraction and sequence position information from the spectra, respectively. The results showed that the accuracy of the CNN-LSTM model reached 100% after using principal component analysis (PCA) on the Italian pasta spectral data in the thawed state and 99.44% after using PCA on the Italian pasta spectral data in the frozen form, verifying that the method has high analytical accuracy and generalization. Therefore, the CNN-LSTM neural network combined with IR spectroscopy helps to identify different pasta products.
Collapse
Affiliation(s)
- Penghui Sun
- School of Information Science and Engineering, Xinjiang University, Urumqi 830017, China
| | - Jiajia Wang
- School of Information Science and Engineering, Xinjiang University, Urumqi 830017, China
- The Key Laboratory of Signal Detection and Processing, Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi 830017, China
- Post-Doctoral Workstation of Xinjiang Xinjiang Uygur Autonomous Region Product Quality Supervision and Inspection Institute, Urumqi 830011, China
| | - Zhilin Dong
- School of Information Science and Engineering, Xinjiang University, Urumqi 830017, China
| |
Collapse
|
24
|
Pandey SK, Shekhawat HS, Prasanna S. Multi-cultural speech emotion recognition using language and speaker cues. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2023.104679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
|
25
|
Tanko D, Demir FB, Dogan S, Sahin SE, Tuncer T. Automated speech emotion polarization for a distance education system based on orbital local binary pattern and an appropriate sub-band selection technique. MULTIMEDIA TOOLS AND APPLICATIONS 2023:1-18. [PMID: 37362680 PMCID: PMC10068203 DOI: 10.1007/s11042-023-14648-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 08/02/2022] [Accepted: 02/03/2023] [Indexed: 06/28/2023]
Abstract
The distance education system was widely adopted during the Covid-19 pandemic by many institutions of learning. To measure the effectiveness of this system, it is essential to evaluate the performance of the lecturers. To this end, an automated speech emotion recognition model is a solution. This research aims to develop an accurate speech emotion recognition model that will check the lecturers/instructors' emotional state during lecture presentations. A new speech emotion dataset is collected, and an automated speech emotion recognition (SER) model is proposed to achieve this aim. The presented SER model contains three main phases, which are (i) feature extraction using multi-level discrete wavelet transform (DWT) and one-dimensional orbital local binary pattern (1D-OLBP), (ii) feature selection using neighborhood component analysis (NCA), (iii) classification using support vector machine (SVM) with ten-fold cross-validation. The proposed 1D-OLBP and NCA-based model is tested on the collected dataset, containing three emotional states with 7101 sound segments. The presented 1D-OLBP and NCA-based technique achieved a 93.40% classification accuracy using the proposed model on the new dataset. Moreover, the proposed architecture has been tested on the three publicly available speech emotion recognition datasets to highlight the general classification ability of this self-organized model. We reached over 70% classification accuracies for all three public datasets, and these results demonstrated the success of this model.
Collapse
Affiliation(s)
- Dahiru Tanko
- Department of Digital Forensics Engineering, College of Technology, Firat University, Elazig, Turkey
| | - Fahrettin Burak Demir
- Deparment of Software Engineering, Faculty of Engineering and Natural Sciences, Bandirma Onyedi Eylul University, Bandirma, Turkey
| | - Sengul Dogan
- Department of Digital Forensics Engineering, College of Technology, Firat University, Elazig, Turkey
| | - Sakir Engin Sahin
- Department of Computer Technologies, Arapgir Vocational School, Malatya Turgut Ozal University, Malatya, Turkey
| | - Turker Tuncer
- Department of Digital Forensics Engineering, College of Technology, Firat University, Elazig, Turkey
| |
Collapse
|
26
|
Kshirsagar S, Pendyala A, Falk TH. Task-specific speech enhancement and data augmentation for improved multimodal emotion recognition under noisy conditions. FRONTIERS IN COMPUTER SCIENCE 2023. [DOI: 10.3389/fcomp.2023.1039261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023] Open
Abstract
Automatic emotion recognition (AER) systems are burgeoning and systems based on either audio, video, text, or physiological signals have emerged. Multimodal systems, in turn, have shown to improve overall AER accuracy and to also provide some robustness against artifacts and missing data. Collecting multiple signal modalities, however, can be very intrusive, time consuming, and expensive. Recent advances in deep learning based speech-to-text and natural language processing systems, however, have enabled the development of reliable multimodal systems based on speech and text while only requiring the collection of audio data. Audio data, however, is extremely sensitive to environmental disturbances, such as additive noise, thus faces some challenges when deployed “in the wild.” To overcome this issue, speech enhancement algorithms have been deployed at the input signal level to improve testing accuracy in noisy conditions. Speech enhancement algorithms can come in different flavors and can be optimized for different tasks (e.g., for human perception vs. machine performance). Data augmentation, in turn, has also been deployed at the model level during training time to improve accuracy in noisy testing conditions. In this paper, we explore the combination of task-specific speech enhancement and data augmentation as a strategy to improve overall multimodal emotion recognition in noisy conditions. We show that AER accuracy under noisy conditions can be improved to levels close to those seen in clean conditions. When compared against a system without speech enhancement or data augmentation, an increase in AER accuracy of 40% was seen in a cross-corpus test, thus showing promising results for “in the wild” AER.
Collapse
|
27
|
Long X, Ding X, Li J, Dong R, Su Y, Chang C. Indentation Reverse Algorithm of Mechanical Response for Elastoplastic Coatings Based on LSTM Deep Learning. MATERIALS (BASEL, SWITZERLAND) 2023; 16:2617. [PMID: 37048911 PMCID: PMC10096397 DOI: 10.3390/ma16072617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Revised: 03/19/2023] [Accepted: 03/22/2023] [Indexed: 06/19/2023]
Abstract
The load-penetration depth (P-h) curves of different metallic coating materials can be determined by nanoindentation experiments, and it is a challenge to obtain stress-strain response and elastoplastic properties directly using P-h curves. These problems can be solved by means of finite element (FE) simulation along with reverse analyses and methods, which, however, typically occupy a lengthy time, in addition to the low generality of FE methodologies for different metallic materials. To eliminate the challenges that exist in conventional FE simulations, a long short-term memory (LSTM) neural network is proposed in this study and implemented to deep learn the time series of P-h curves, which is capable of mapping P-h curves to the corresponding stress-strain responses for elastoplastic materials. Prior to the operation of the neural network, 1000 sets of indentation data of metallic coating materials were generated using the FE method as the training and validating sets. Each dataset contains a set of P-h curves as well as the corresponding stress-strain curves, which are used as input data for the network and as training targets. The proposed LSTM neural networks, with various numbers of hidden layers and hidden units, are evaluated to determine the optimal hyperparameters by comparing their loss curves. Based on the analysis of the prediction results of the network, it is concluded that the relationship between the P-h curves of metallic coating materials and their stress-strain responses is well predicted, and this relationship basically coincides with the power-law equation. Furthermore, the deep learning method based on LSTM is advantageous to interpret the elastoplastic behaviors of coating materials from indentation measurement, making the predictions of stress-strain responses much more efficient than FE analysis. The established LSTM neural network exhibits the prediction accuracy up to 97%, which is proved to reliably satisfy the engineering requirements in practice.
Collapse
Affiliation(s)
- Xu Long
- Research & Development Institute, Northwestern Polytechnical University in Shenzhen, Shenzhen 518063, China
| | - Xiaoyue Ding
- School of Mechanics, Civil Engineering and Architecture, Northwestern Polytechnical University, Xi’an 710072, China
| | - Jiao Li
- School of Mechanics, Civil Engineering and Architecture, Northwestern Polytechnical University, Xi’an 710072, China
| | - Ruipeng Dong
- School of Mechanics, Civil Engineering and Architecture, Northwestern Polytechnical University, Xi’an 710072, China
| | - Yutai Su
- School of Mechanics, Civil Engineering and Architecture, Northwestern Polytechnical University, Xi’an 710072, China
| | - Chao Chang
- School of Applied Science, Taiyuan University of Science and Technology, Taiyuan 030024, China
| |
Collapse
|
28
|
Mohammed Alsumaidaee YA, Yaw CT, Koh SP, Tiong SK, Chen CP, Yusaf T, Abdalla AN, Ali K, Raj AA. Detection of Corona Faults in Switchgear by Using 1D-CNN, LSTM, and 1D-CNN-LSTM Methods. SENSORS (BASEL, SWITZERLAND) 2023; 23:3108. [PMID: 36991819 PMCID: PMC10059847 DOI: 10.3390/s23063108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 02/28/2023] [Accepted: 03/01/2023] [Indexed: 06/19/2023]
Abstract
The damaging effects of corona faults have made them a major concern in metal-clad switchgear, requiring extreme caution during operation. Corona faults are also the primary cause of flashovers in medium-voltage metal-clad electrical equipment. The root cause of this issue is an electrical breakdown of the air due to electrical stress and poor air quality within the switchgear. Without proper preventative measures, a flashover can occur, resulting in serious harm to workers and equipment. As a result, detecting corona faults in switchgear and preventing electrical stress buildup in switches is critical. Recent years have seen the successful use of Deep Learning (DL) applications for corona and non-corona detection, owing to their autonomous feature learning capability. This paper systematically analyzes three deep learning techniques, namely 1D-CNN, LSTM, and 1D-CNN-LSTM hybrid models, to identify the most effective model for detecting corona faults. The hybrid 1D-CNN-LSTM model is deemed the best due to its high accuracy in both the time and frequency domains. This model analyzes the sound waves generated in switchgear to detect faults. The study examines model performance in both the time and frequency domains. In the time domain analysis (TDA), 1D-CNN achieved success rates of 98%, 98.4%, and 93.9%, while LSTM obtained success rates of 97.3%, 98.4%, and 92.4%. The most suitable model, the 1D-CNN-LSTM, achieved success rates of 99.3%, 98.4%, and 98.4% in differentiating corona and non-corona cases during training, validation, and testing. In the frequency domain analysis (FDA), 1D-CNN achieved success rates of 100%, 95.8%, and 95.8%, while LSTM obtained success rates of 100%, 100%, and 100%. The 1D-CNN-LSTM model achieved a 100%, 100%, and 100% success rate during training, validation, and testing. Hence, the developed algorithms achieved high performance in identifying corona faults in switchgear, particularly the 1D-CNN-LSTM model due to its accuracy in detecting corona faults in both the time and frequency domains.
Collapse
Affiliation(s)
- Yaseen Ahmed Mohammed Alsumaidaee
- College of Graduate Studies (COGS), Universiti Tenaga Nasional (The Energy University), Jalan Ikram-Uniten, Kajang 43000, Selangor, Malaysia
| | - Chong Tak Yaw
- Institute of Sustainable Energy, Universiti Tenaga Nasional (The Energy University), Jalan Ikram-Uniten, Kajang 43000, Selangor, Malaysia
| | - Siaw Paw Koh
- Institute of Sustainable Energy, Universiti Tenaga Nasional (The Energy University), Jalan Ikram-Uniten, Kajang 43000, Selangor, Malaysia
- Department Electrical and Electronics Engineering, Universiti Tenaga Nasional (The Energy University), Jalan Ikram-Uniten, Kajang 43000, Selangor, Malaysia
| | - Sieh Kiong Tiong
- Institute of Sustainable Energy, Universiti Tenaga Nasional (The Energy University), Jalan Ikram-Uniten, Kajang 43000, Selangor, Malaysia
- Department Electrical and Electronics Engineering, Universiti Tenaga Nasional (The Energy University), Jalan Ikram-Uniten, Kajang 43000, Selangor, Malaysia
| | - Chai Phing Chen
- Department Electrical and Electronics Engineering, Universiti Tenaga Nasional (The Energy University), Jalan Ikram-Uniten, Kajang 43000, Selangor, Malaysia
| | - Talal Yusaf
- School of Engineering and Technology, Central Queensland University, Brisbane, QLD 4009, Australia
| | - Ahmed N Abdalla
- Faculty of Electronic Information Engineering, Huaiyin Institute of Technology, Huai’an 223003, China
| | - Kharudin Ali
- Faculty of Electrical and Automation Engineering Technology, UC TATI, Teluk Kalong, Kemaman 24000, Terengganu, Malaysia
| | - Avinash Ashwin Raj
- Tenaga National Berhard Research Sdn. Bhd., No. 1, Kawasan Institusi Penyelidikan, Jln Ayer Hitam, Kajang 43000, Selangor, Malaysia
| |
Collapse
|
29
|
Singh J, Saheer LB, Faust O. Speech Emotion Recognition Using Attention Model. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:5140. [PMID: 36982048 PMCID: PMC10049636 DOI: 10.3390/ijerph20065140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Revised: 03/01/2023] [Accepted: 03/01/2023] [Indexed: 06/18/2023]
Abstract
Speech emotion recognition is an important research topic that can help to maintain and improve public health and contribute towards the ongoing progress of healthcare technology. There have been several advancements in the field of speech emotion recognition systems including the use of deep learning models and new acoustic and temporal features. This paper proposes a self-attention-based deep learning model that was created by combining a two-dimensional Convolutional Neural Network (CNN) and a long short-term memory (LSTM) network. This research builds on the existing literature to identify the best-performing features for this task with extensive experiments on different combinations of spectral and rhythmic information. Mel Frequency Cepstral Coefficients (MFCCs) emerged as the best performing features for this task. The experiments were performed on a customised dataset that was developed as a combination of RAVDESS, SAVEE, and TESS datasets. Eight states of emotions (happy, sad, angry, surprise, disgust, calm, fearful, and neutral) were detected. The proposed attention-based deep learning model achieved an average test accuracy rate of 90%, which is a substantial improvement over established models. Hence, this emotion detection model has the potential to improve automated mental health monitoring.
Collapse
|
30
|
Olatinwo DD, Abu-Mahfouz A, Hancke G, Myburgh H. IoT-Enabled WBAN and Machine Learning for Speech Emotion Recognition in Patients. SENSORS (BASEL, SWITZERLAND) 2023; 23:2948. [PMID: 36991659 PMCID: PMC10056097 DOI: 10.3390/s23062948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 02/27/2023] [Accepted: 03/03/2023] [Indexed: 06/19/2023]
Abstract
Internet of things (IoT)-enabled wireless body area network (WBAN) is an emerging technology that combines medical devices, wireless devices, and non-medical devices for healthcare management applications. Speech emotion recognition (SER) is an active research field in the healthcare domain and machine learning. It is a technique that can be used to automatically identify speakers' emotions from their speech. However, the SER system, especially in the healthcare domain, is confronted with a few challenges. For example, low prediction accuracy, high computational complexity, delay in real-time prediction, and how to identify appropriate features from speech. Motivated by these research gaps, we proposed an emotion-aware IoT-enabled WBAN system within the healthcare framework where data processing and long-range data transmissions are performed by an edge AI system for real-time prediction of patients' speech emotions as well as to capture the changes in emotions before and after treatment. Additionally, we investigated the effectiveness of different machine learning and deep learning algorithms in terms of performance classification, feature extraction methods, and normalization methods. We developed a hybrid deep learning model, i.e., convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM), and a regularized CNN model. We combined the models with different optimization strategies and regularization techniques to improve the prediction accuracy, reduce generalization error, and reduce the computational complexity of the neural networks in terms of their computational time, power, and space. Different experiments were performed to check the efficiency and effectiveness of the proposed machine learning and deep learning algorithms. The proposed models are compared with a related existing model for evaluation and validation using standard performance metrics such as prediction accuracy, precision, recall, F1 score, confusion matrix, and the differences between the actual and predicted values. The experimental results proved that one of the proposed models outperformed the existing model with an accuracy of about 98%.
Collapse
Affiliation(s)
- Damilola D. Olatinwo
- Department of Electrical, Electronic and Computer Engineering, University of Pretoria, Pretoria 0001, South Africa
| | - Adnan Abu-Mahfouz
- Department of Electrical, Electronic and Computer Engineering, University of Pretoria, Pretoria 0001, South Africa
- Council for Scientific and Industrial Research (CSIR), Pretoria 0184, South Africa
| | - Gerhard Hancke
- Department of Electrical, Electronic and Computer Engineering, University of Pretoria, Pretoria 0001, South Africa
- Department of Computer Science, City University of Hong Kong, Hong Kong, China
| | - Hermanus Myburgh
- Department of Electrical, Electronic and Computer Engineering, University of Pretoria, Pretoria 0001, South Africa
| |
Collapse
|
31
|
Aspect-Based Sentiment Analysis of Customer Speech Data Using Deep Convolutional Neural Network and BiLSTM. Cognit Comput 2023. [DOI: 10.1007/s12559-023-10127-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2023]
|
32
|
SMDetector: Small mitotic detector in histopathology images using faster R-CNN with dilated convolutions in backbone model. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
33
|
Jiao L, Sun C, Yan N, Yan C, Qu L, Wang Q, Zhang S, Ma L. Discrimination of Salvia miltiorrhiza from Different Geographical Origins by Laser-Induced Breakdown Spectroscopy (LIBS) with Convolutional Neural Network (CNN). ANAL LETT 2023. [DOI: 10.1080/00032719.2023.2180515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2023]
Affiliation(s)
- Long Jiao
- College of Chemistry and Chemical Engineering, Xi’an Shiyou University, Xi’an, Shaanxi, China
| | - Chengyu Sun
- College of Chemistry and Chemical Engineering, Xi’an Shiyou University, Xi’an, Shaanxi, China
| | - Naying Yan
- College of Chemistry and Chemical Engineering, Xi’an Shiyou University, Xi’an, Shaanxi, China
| | - Chunhua Yan
- College of Chemistry and Chemical Engineering, Xi’an Shiyou University, Xi’an, Shaanxi, China
| | - Le Qu
- Cooperative Innovation Center of Unconventional Oil and Gas Exploration and Development in Shaanxi Province, Xi’an, Shaanxi, China
| | - Qin Wang
- School of Chemistry and Environment Science, Shaanxi University of Technology, Hanzhong, Shaanxi, China
| | - Shengrui Zhang
- School of Chemistry and Environment Science, Shaanxi University of Technology, Hanzhong, Shaanxi, China
| | - Ling Ma
- College of Chemistry and Chemical Engineering, Xi’an Shiyou University, Xi’an, Shaanxi, China
| |
Collapse
|
34
|
Alsabhan W. Human-Computer Interaction with a Real-Time Speech Emotion Recognition with Ensembling Techniques 1D Convolution Neural Network and Attention. SENSORS (BASEL, SWITZERLAND) 2023; 23:1386. [PMID: 36772427 PMCID: PMC9921095 DOI: 10.3390/s23031386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 01/19/2023] [Accepted: 01/20/2023] [Indexed: 06/18/2023]
Abstract
Emotions have a crucial function in the mental existence of humans. They are vital for identifying a person's behaviour and mental condition. Speech Emotion Recognition (SER) is extracting a speaker's emotional state from their speech signal. SER is a growing discipline in human-computer interaction, and it has recently attracted more significant interest. This is because there are not so many universal emotions; therefore, any intelligent system with enough computational capacity can educate itself to recognise them. However, the issue is that human speech is immensely diverse, making it difficult to create a single, standardised recipe for detecting hidden emotions. This work attempted to solve this research difficulty by combining a multilingual emotional dataset with building a more generalised and effective model for recognising human emotions. A two-step process was used to develop the model. The first stage involved the extraction of features, and the second stage involved the classification of the features that were extracted. ZCR, RMSE, and the renowned MFC coefficients were retrieved as features. Two proposed models, 1D CNN combined with LSTM and attention and a proprietary 2D CNN architecture, were used for classification. The outcomes demonstrated that the suggested 1D CNN with LSTM and attention performed better than the 2D CNN. For the EMO-DB, SAVEE, ANAD, and BAVED datasets, the model's accuracy was 96.72%, 97.13%, 96.72%, and 88.39%, respectively. The model beat several earlier efforts on the same datasets, demonstrating the generality and efficacy of recognising multiple emotions from various languages.
Collapse
Affiliation(s)
- Waleed Alsabhan
- College of Engineering, Al Faisal University, P.O. Box 50927, Riyadh 11533, Saudi Arabia
| |
Collapse
|
35
|
Zhong MY, Yang QY, Liu Y, Zhen BY, Zhao FD, Xie BB. EEG emotion recognition based on TQWT-features and hybrid convolutional recurrent neural network. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
36
|
Wang W, Tang Q. Combined model of air quality index forecasting based on the combination of complementary empirical mode decomposition and sequence reconstruction. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2023; 316:120628. [PMID: 36370980 DOI: 10.1016/j.envpol.2022.120628] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 10/27/2022] [Accepted: 11/07/2022] [Indexed: 06/16/2023]
Abstract
One of the most important issues that cities face is air pollution. In this study, a novel integrated forecasting model of the air quality index (AQI) is suggested to carry out reliable prediction, providing useful references for urban air pollution control, public health construction, and residents' travel planning. Firstly, the original data is decomposed by the method of complementary set empirical mode decomposition (CEEMD), and the subsequences of different frequencies are formed. Secondly, the fuzzy entropy (FE) algorithm is used to reconstruct the subsequence. Then, the combined forecasting model is established, and different prediction methods are selected for different frequency subsequences. The new high-frequency sequences, low-frequency sequences, and trend sequences are predicted by the whale algorithm optimized long short term neural network (WOA-LSTM) and the extreme learning machine (ELM), respectively. Empirical analysis are carried out with the example of Beijing and Chengdu. The results indicated that: (1) The proposed CEEMD-FE-WOA-LSTM-ELM model effectively integrates the characteristics of the original sequence and has the highest prediction accuracy among all the comparison models. (2) It is necessary to preprocess the data, which can effectively extract data features. Taking Beijing as an example, compared with the non-decomposition model, after adding the decomposition algorithm, the prediction accuracy rate (PA) is increased by 8.55% on average, the RMSE is decreased by 10.36 on average, and the MAPE is decreased by 6.11% on average. (3) The overall prediction level and prediction accuracy can be effectively increased by applying various prediction methods for recombination sequences with various frequency. The research results can provide references for urban air quality prediction.
Collapse
Affiliation(s)
- Weijun Wang
- Department of Economics and Management, North China Electric Power University, 689 Huadian Road, Baoding 071000, China.
| | - Qing Tang
- Department of Economics and Management, North China Electric Power University, 689 Huadian Road, Baoding 071000, China.
| |
Collapse
|
37
|
de Lope J, Graña M. An ongoing review of speech emotion recognition. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
38
|
Shanthi N, Stonier AA, Sherine A, Devaraju T, Abinash S, Ajay R, Arul Prasath V, Ganji V. An integrated approach for mental health assessment using emotion analysis and scales. Healthc Technol Lett 2022. [DOI: 10.1049/htl2.12040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Affiliation(s)
- N. Shanthi
- Department of Computer Science & Engineering Kongu Engineering College Perundurai Tamil Nadu India
| | | | - Anli Sherine
- School of Computing and Creative Media University of Technology Sarawak Sarawak Malaysia
| | - T. Devaraju
- Department of Electrical and Electronics Engineering Sree Vidyanikethan Engineering College Tirupati Andhra Pradesh India
| | - S. Abinash
- Department of Computer Science & Engineering Kongu Engineering College Perundurai Tamil Nadu India
| | - R. Ajay
- Department of Computer Science & Engineering Kongu Engineering College Perundurai Tamil Nadu India
| | - V. Arul Prasath
- Department of Computer Science & Engineering Kongu Engineering College Perundurai Tamil Nadu India
| | - Vivekananda Ganji
- Department of Electrical and Computer Engineering Debre Tabor University Debre Tabor Ethiopia
| |
Collapse
|
39
|
Wang Y, Li S, Zhang H, Liu T. A lightweight CNN-based model for early warning in sow oestrus sound monitoring. ECOL INFORM 2022. [DOI: 10.1016/j.ecoinf.2022.101863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
40
|
Uppada SK, Patel P, B. S. An image and text-based multimodal model for detecting fake news in OSN's. J Intell Inf Syst 2022; 61:1-27. [PMID: 36465146 PMCID: PMC9708513 DOI: 10.1007/s10844-022-00764-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 10/18/2022] [Accepted: 11/03/2022] [Indexed: 12/02/2022]
Abstract
Digital Mass Media has become the new paradigm of communication that revolves around online social networks. The increase in the utilization of online social networks (OSNs) as the primary source of information and the increase of online social platforms providing such news has increased the scope of spreading fake news. People spread fake news in multimedia formats like images, audio, and video. Visual-based news is prone to have a psychological impact on the users and is often misleading. Therefore, Multimodal frameworks for detecting fake posts have gained demand in recent times. This paper proposes a framework that flags fake posts with Visual data embedded with text. The proposed framework works on data derived from the Fakeddit dataset, with over 1 million samples containing text, image, metadata, and comments data gathered from a wide range of sources, and tries to exploit the unique features of fake and legitimate images. The proposed framework has different architectures to learn visual and linguistic models from the post individually. Image polarity datasets, derived from Flickr, are also considered for analysis, and the features extracted from these visual and text-based data helped in flagging news. The proposed fusion model has achieved an overall accuracy of 91.94%, Precision of 93.43%, Recall of 93.07%, and F1-score of 93%. The experimental results show that the proposed Multimodality model with Image and Text achieves better results than other state-of-art models working on a similar dataset.
Collapse
Affiliation(s)
- Santosh Kumar Uppada
- Department of Computer Science and Engineering, IIITDM Kancheepuram, Melakottiyur, Chennai, 600127 Tamil Nadu India
| | - Parth Patel
- Department of Computer Science and Engineering, IIITDM Kancheepuram, Melakottiyur, Chennai, 600127 Tamil Nadu India
| | - Sivaselvan B.
- Department of Computer Science and Engineering, IIITDM Kancheepuram, Melakottiyur, Chennai, 600127 Tamil Nadu India
| |
Collapse
|
41
|
Akalya devi C, Karthika Renuka D, Pooventhiran G, Harish D, Yadav S, Thirunarayan K. Towards enhancing emotion recognition via multimodal framework. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-220280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Emotional AI is the next era of AI to play a major role in various fields such as entertainment, health care, self-paced online education, etc., considering clues from multiple sources. In this work, we propose a multimodal emotion recognition system extracting information from speech, motion capture, and text data. The main aim of this research is to improve the unimodal architectures to outperform the state-of-the-arts and combine them together to build a robust multi-modal fusion architecture. We developed 1D and 2D CNN-LSTM time-distributed models for speech, a hybrid CNN-LSTM model for motion capture data, and a BERT-based model for text data to achieve state-of-the-art results, and attempted both concatenation-based decision-level fusion and Deep CCA-based feature-level fusion schemes. The proposed speech and mocap models achieve emotion recognition accuracies of 65.08% and 67.51%, respectively, and the BERT-based text model achieves an accuracy of 72.60% . The decision-level fusion approach significantly improves the accuracy of detecting emotions on the IEMOCAP and MELD datasets. This approach achieves 80.20% accuracy on IEMOCAP which is 8.61% higher than the state-of-the-art methods, and 63.52% and 61.65% in 5-class and 7-class classification on the MELD dataset which are higher than the state-of-the-arts.
Collapse
Affiliation(s)
- C. Akalya devi
- Department of Information Technology, PSG College of Technology, Coimbatore, India
| | - D. Karthika Renuka
- Department of Information Technology, PSG College of Technology, Coimbatore, India
| | | | | | - Shweta Yadav
- Department of Computer Science and Engineering, Wright State University, Dayton, OH, USA
| | | |
Collapse
|
42
|
Sun C, Zhang Y, Huang G, Liu L, Hao X. A soft sensor model based on long&short-term memory dual pathways convolutional gated recurrent unit network for predicting cement specific surface area. ISA TRANSACTIONS 2022; 130:293-305. [PMID: 35367055 DOI: 10.1016/j.isatra.2022.03.013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Revised: 03/06/2022] [Accepted: 03/10/2022] [Indexed: 06/14/2023]
Abstract
The specific surface area of cement is an important index for the quality of cement products. But the time-varying delay, non-linearity and data redundancy in the process industry data make it difficult to establish an accurate online monitoring model. To solve the problems, a soft sensor model based on long&short-term memory dual pathways convolutional gated recurrent unit network (L/S-ConvGRU) is proposed for predicting the cement specific surface area. In this paper, first, as the linear coupling constraint inside the gated recurrent unit network (GRU) hinders the flow of information, parameters L and S are introduced into convolutional gated recurrent unit network (ConvGRU). L and S are decimals in the range (0, 1) which changed its internal linear constraint relationship and enhanced the feature extraction capability of the model. Then, two spatio-temporal feature extraction pathways are designed: long-term memory enhancement pathway and short-term dependence pathway, which capture long-term and short-term time-varying delay information from the sample data. Finally, the two feature extraction pathways mentioned above are applied to the L/S-ConvGRU model and the extracted spatio-temporal features are fused to achieve accurate prediction of the specific surface area of cement. The model was trained using raw data from the cement plant and the experimental results show that L/S-ConvGRU has higher precision and better generalization capability.
Collapse
Affiliation(s)
- Chao Sun
- School of Electrical Engineering, Yanshan University, 438 Hebei Avenue, Qinhuangdao 066004, China.
| | - Yuxuan Zhang
- School of Electrical Engineering, Yanshan University, 438 Hebei Avenue, Qinhuangdao 066004, China.
| | - Gaolu Huang
- School of Electrical Engineering, Yanshan University, 438 Hebei Avenue, Qinhuangdao 066004, China.
| | - Lin Liu
- School of Electrical Engineering, Yanshan University, 438 Hebei Avenue, Qinhuangdao 066004, China.
| | - Xiaochen Hao
- School of Electrical Engineering, Yanshan University, 438 Hebei Avenue, Qinhuangdao 066004, China.
| |
Collapse
|
43
|
Deep learning for Covid-19 forecasting: State-of-the-art review. Neurocomputing 2022; 511:142-154. [PMID: 36097509 PMCID: PMC9454152 DOI: 10.1016/j.neucom.2022.09.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 07/03/2022] [Accepted: 09/04/2022] [Indexed: 11/21/2022]
Abstract
The Covid-19 pandemic has galvanized scientists to apply machine learning methods to help combat the crisis. Despite the significant amount of research there exists no comprehensive survey devoted specifically to examining deep learning methods for Covid-19 forecasting. In this paper, we fill the gap in the literature by reviewing and analyzing the current studies that use deep learning for Covid-19 forecasting. In our review, all published papers and preprints, discoverable through Google Scholar, for the period from Apr 1, 2020 to Feb 20, 2022 which describe deep learning approaches to forecasting Covid-19 were considered. Our search identified 152 studies, of which 53 passed the initial quality screening and were included in our survey. We propose a model-based taxonomy to categorize the literature. We describe each model and highlight its performance. Finally, the deficiencies of the existing approaches are identified and the necessary improvements for future research are elucidated. The study provides a gateway for researchers who are interested in forecasting Covid-19 using deep learning.
Collapse
|
44
|
Xefteris VR, Tsanousa A, Georgakopoulou N, Diplaris S, Vrochidis S, Kompatsiaris I. Graph Theoretical Analysis of EEG Functional Connectivity Patterns and Fusion with Physiological Signals for Emotion Recognition. SENSORS (BASEL, SWITZERLAND) 2022; 22:8198. [PMID: 36365896 PMCID: PMC9656224 DOI: 10.3390/s22218198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 10/22/2022] [Accepted: 10/24/2022] [Indexed: 06/16/2023]
Abstract
Emotion recognition is a key attribute for realizing advances in human-computer interaction, especially when using non-intrusive physiological sensors, such as electroencephalograph (EEG) and electrocardiograph. Although functional connectivity of EEG has been utilized for emotion recognition, the graph theory analysis of EEG connectivity patterns has not been adequately explored. The exploitation of brain network characteristics could provide valuable information regarding emotions, while the combination of EEG and peripheral physiological signals can reveal correlation patterns of human internal state. In this work, a graph theoretical analysis of EEG functional connectivity patterns along with fusion between EEG and peripheral physiological signals for emotion recognition has been proposed. After extracting functional connectivity from EEG signals, both global and local graph theory features are extracted. Those features are concatenated with statistical features from peripheral physiological signals and fed to different classifiers and a Convolutional Neural Network (CNN) for emotion recognition. The average accuracy on the DEAP dataset using CNN was 55.62% and 57.38% for subject-independent valence and arousal classification, respectively, and 83.94% and 83.87% for subject-dependent classification. Those scores went up to 75.44% and 78.77% for subject-independent classification and 88.27% and 90.84% for subject-dependent classification using a feature selection algorithm, exceeding the current state-of-the-art results.
Collapse
|
45
|
CNN-LSTM Facial Expression Recognition Method Fused with Two-Layer Attention Mechanism. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:7450637. [DOI: 10.1155/2022/7450637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 09/29/2022] [Indexed: 11/17/2022]
Abstract
When exploring facial expression recognition methods, it is found that existing algorithms make insufficient use of information about the key parts that express emotion. For this problem, on the basis of a convolutional neural network and long short-term memory (CNN-LSTM), we propose a facial expression recognition method that incorporates an attention mechanism (CNN-ALSTM). Compared with the general CNN-LSTM algorithm, it can mine the information of important regions more effectively. Furthermore, a CNN-LSTM facial expression recognition method incorporating a two-layer attention mechanism (ACNN-ALSTM) is proposed. We conducted comparative experiments on Fer2013 and processed CK + datasets with CNN-ALSTM, ACNN-ALSTM, patch based ACNN (pACNN), Facial expression recognition with attention net (FERAtt), and other networks. The results show that the proposed ACNN-ALSTM hybrid neural network model is superior to related work in expression recognition.
Collapse
|
46
|
Zhang X, Li H, Dong R, Lu Z, Li C. Electroencephalogram and surface electromyogram fusion-based precise detection of lower limb voluntary movement using convolution neural network-long short-term memory model. Front Neurosci 2022; 16:954387. [PMID: 36213740 PMCID: PMC9538146 DOI: 10.3389/fnins.2022.954387] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 08/26/2022] [Indexed: 11/13/2022] Open
Abstract
The electroencephalogram (EEG) and surface electromyogram (sEMG) fusion has been widely used in the detection of human movement intention for human–robot interaction, but the internal relationship of EEG and sEMG signals is not clear, so their fusion still has some shortcomings. A precise fusion method of EEG and sEMG using the CNN-LSTM model was investigated to detect lower limb voluntary movement in this study. At first, the EEG and sEMG signal processing of each stage was analyzed so that the response time difference between EEG and sEMG can be estimated to detect lower limb voluntary movement, and it can be calculated by the symbolic transfer entropy. Second, the data fusion and feature of EEG and sEMG were both used for obtaining a data matrix of the model, and a hybrid CNN-LSTM model was established for the EEG and sEMG-based decoding model of lower limb voluntary movement so that the estimated value of time difference was about 24 ∼ 26 ms, and the calculated value was between 25 and 45 ms. Finally, the offline experimental results showed that the accuracy of data fusion was significantly higher than feature fusion-based accuracy in 5-fold cross-validation, and the average accuracy of EEG and sEMG data fusion was more than 95%; the improved average accuracy for eliminating the response time difference between EEG and sEMG was about 0.7 ± 0.26% in data fusion. In the meantime, the online average accuracy of data fusion-based CNN-LSTM was more than 87% in all subjects. These results demonstrated that the time difference had an influence on the EEG and sEMG fusion to detect lower limb voluntary movement, and the proposed CNN-LSTM model can achieve high performance. This work provides a stable and reliable basis for human–robot interaction of the lower limb exoskeleton.
Collapse
Affiliation(s)
- Xiaodong Zhang
- School of Mechanical Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi, China
- Shaanxi Key Laboratory of Intelligent Robots, Xi’an Jiaotong University, Xi’an, Shaanxi, China
- Wearable Human Enhancement Technology Innovation Center, Xi’an Jiaotong University, Xi’an, Shaanxi, China
| | - Hanzhe Li
- School of Mechanical Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi, China
- Wearable Human Enhancement Technology Innovation Center, Xi’an Jiaotong University, Xi’an, Shaanxi, China
- *Correspondence: Hanzhe Li,
| | - Runlin Dong
- School of Mechanical Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi, China
| | - Zhufeng Lu
- School of Mechanical Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi, China
| | - Cunxin Li
- School of Mechanical Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi, China
| |
Collapse
|
47
|
A hybrid data-driven online solar energy disaggregation system from the grid supply point. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00842-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
AbstractThe integration of small-scale Photovoltaics (PV) systems (such as rooftop PVs) decreases the visibility of power systems, since the real demand load is masked. Most rooftop systems are behind the metre and cannot be measured by household smart meters. To overcome the challenges mentioned above, this paper proposes an online solar energy disaggregation system to decouple the solar energy generated by rooftop PV systems and the ground truth demand load from net measurements. A 1D Convolutional Neural Network (CNN) Bidirectional Long Short-Term Memory (BiLSTM) deep learning method is used as the core algorithm of the proposed system. The system takes a wide range of online information (Advanced Metering Infrastructure (AMI) data, meteorological data, satellite-driven irradiance, and temporal information) as inputs to evaluate PV generation, and the system also enables online and offline modes. The effectiveness of the proposed algorithm is evaluated by comparing it to baselines. The results show that the proposed method achieves good performance under different penetration rates and different feeder levels. Finally, a transfer learning process is introduced to verify that the proposed system has good robustness and can be applied to other feeders.
Collapse
|
48
|
Saurav S, Saini R, Singh S. Fast facial expression recognition using Boosted Histogram of Oriented Gradient (BHOG) features. Pattern Anal Appl 2022. [DOI: 10.1007/s10044-022-01112-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
49
|
Bagadi KR, Sivappagari CMR. A robust feature selection method based on meta-heuristic optimization for speech emotion recognition. EVOLUTIONARY INTELLIGENCE 2022. [DOI: 10.1007/s12065-022-00772-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
50
|
Xu X, Li D, Zhou Y, Wang Z. Multi-type features separating fusion learning for Speech Emotion Recognition. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|