1
|
Ding X, Yang F, Ma F, Chen S. A Unified Multi-Class Feature Selection Framework for Microarray Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3725-3736. [PMID: 37698974 DOI: 10.1109/tcbb.2023.3314432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/14/2023]
Abstract
In feature selection research, simultaneous multi-class feature selection technologies are popular because they simultaneously select informative features for all classes. Recursive feature elimination (RFE) methods are state-of-the-art binary feature selection algorithms. However, extending existing RFE algorithms to multi-class tasks may increase the computational cost and lead to performance degradation. With this motivation, we introduce a unified multi-class feature selection (UFS) framework for randomization-based neural networks to address these challenges. First, we propose a new multi-class feature ranking criterion using the output weights of neural networks. The heuristic underlying this criterion is that "the importance of a feature should be related to the magnitude of the output weights of a neural network". Subsequently, the UFS framework utilizes the original features to construct a training model based on a randomization-based neural network, ranks these features by the criterion of the norm of the output weights, and recursively removes a feature with the lowest ranking score. Extensive experiments on 15 real-world datasets suggest that our proposed framework outperforms state-of-the-art algorithms. The code of UFS is available at https://github.com/SVMrelated/UFS.git.
Collapse
|
2
|
Zang Z, Xu Y, Lu L, Geng Y, Yang S, Li SZ. UDRN: Unified Dimensional Reduction Neural Network for feature selection and feature projection. Neural Netw 2023; 161:626-637. [PMID: 36827960 DOI: 10.1016/j.neunet.2023.02.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 11/22/2022] [Accepted: 02/11/2023] [Indexed: 02/17/2023]
Abstract
Dimensional reduction (DR) maps high-dimensional data into a lower dimensions latent space with minimized defined optimization objectives. The two independent branches of DR are feature selection (FS) and feature projection (FP). FS focuses on selecting a critical subset of dimensions but risks destroying the data distribution (structure). On the other hand, FP combines all the input features into lower dimensions space, aiming to maintain the data structure, but lacks interpretability and sparsity. Moreover, FS and FP are traditionally incompatible categories and have not been unified into an amicable framework. Therefore, we consider that the ideal DR approach combines both FS and FP into a unified end-to-end manifold learning framework, simultaneously performing fundamental feature discovery while maintaining the intrinsic relationships between data samples in the latent space. This paper proposes a unified framework named Unified Dimensional Reduction Network (UDRN) to integrate FS and FP in an end-to-end way. Furthermore, a novel network framework is designed to implement FS and FP tasks separately using a stacked feature selection network and feature projection network. In addition, a stronger manifold assumption and a novel loss function are proposed. Furthermore, the loss function can leverage the priors of data augmentation to enhance the generalization ability of the proposed UDRN. Finally, comprehensive experimental results on four image and four biological datasets, including very high-dimensional data, demonstrate the advantages of DRN over existing methods (FS, FP, and FS&FP pipeline), especially in downstream tasks such as classification and visualization.
Collapse
Affiliation(s)
- Zelin Zang
- Zhejiang University, Hangzhou, 310000, China; Westlake University, AI Lab, School of Engineering, Hangzhou, 310000, China; Westlake Institute for Advanced Study, Institute of Advanced Technology, Hangzhou, 310000, China.
| | - Yongjie Xu
- Zhejiang University, Hangzhou, 310000, China; Westlake University, AI Lab, School of Engineering, Hangzhou, 310000, China; Westlake Institute for Advanced Study, Institute of Advanced Technology, Hangzhou, 310000, China
| | - Linyan Lu
- China Telecom Corporation Limited, Hangzhou Branch, Hangzhou, 310000, China
| | - Yulan Geng
- Westlake University, AI Lab, School of Engineering, Hangzhou, 310000, China
| | - Senqiao Yang
- Westlake University, AI Lab, School of Engineering, Hangzhou, 310000, China
| | - Stan Z Li
- Westlake University, AI Lab, School of Engineering, Hangzhou, 310000, China; Westlake Institute for Advanced Study, Institute of Advanced Technology, Hangzhou, 310000, China.
| |
Collapse
|
3
|
Vargas VM, Gutierrez PA, Barbero-Gomez J, Hervas-Martinez C. Activation Functions for Convolutional Neural Networks: Proposals and Experimental Study. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:1478-1488. [PMID: 34428161 DOI: 10.1109/tnnls.2021.3105444] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Activation functions lie at the core of every neural network model from shallow to deep convolutional neural networks. Their properties and characteristics shape the output range of each layer and, thus, their capabilities. Modern approaches rely mostly on a single function choice for the whole network, usually ReLU or other similar alternatives. In this work, we propose two new activation functions and analyze their properties and compare them with 17 different function proposals from recent literature on six distinct problems with different characteristics. The objective is to shed some light on their comparative performance. The results show that the proposed functions achieved better performance than the most commonly used ones.
Collapse
|
4
|
de Lope J, Graña M. An ongoing review of speech emotion recognition. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
5
|
Duville MM, Alonso-Valerdi LM, Ibarra-Zarate DI. Neuronal and behavioral affective perceptions of human and naturalness-reduced emotional prosodies. Front Comput Neurosci 2022; 16:1022787. [PMID: 36465969 PMCID: PMC9716567 DOI: 10.3389/fncom.2022.1022787] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Accepted: 10/24/2022] [Indexed: 12/27/2024] Open
Abstract
Artificial voices are nowadays embedded into our daily lives with latest neural voices approaching human voice consistency (naturalness). Nevertheless, behavioral, and neuronal correlates of the perception of less naturalistic emotional prosodies are still misunderstood. In this study, we explored the acoustic tendencies that define naturalness from human to synthesized voices. Then, we created naturalness-reduced emotional utterances by acoustic editions of human voices. Finally, we used Event-Related Potentials (ERP) to assess the time dynamics of emotional integration when listening to both human and synthesized voices in a healthy adult sample. Additionally, listeners rated their perceptions for valence, arousal, discrete emotions, naturalness, and intelligibility. Synthesized voices were characterized by less lexical stress (i.e., reduced difference between stressed and unstressed syllables within words) as regards duration and median pitch modulations. Besides, spectral content was attenuated toward lower F2 and F3 frequencies and lower intensities for harmonics 1 and 4. Both psychometric and neuronal correlates were sensitive to naturalness reduction. (1) Naturalness and intelligibility ratings dropped with emotional utterances synthetization, (2) Discrete emotion recognition was impaired as naturalness declined, consistent with P200 and Late Positive Potentials (LPP) being less sensitive to emotional differentiation at lower naturalness, and (3) Relative P200 and LPP amplitudes between prosodies were modulated by synthetization. Nevertheless, (4) Valence and arousal perceptions were preserved at lower naturalness, (5) Valence (arousal) ratings correlated negatively (positively) with Higuchi's fractal dimension extracted on neuronal data under all naturalness perturbations, (6) Inter-Trial Phase Coherence (ITPC) and standard deviation measurements revealed high inter-individual heterogeneity for emotion perception that is still preserved as naturalness reduces. Notably, partial between-participant synchrony (low ITPC), along with high amplitude dispersion on ERPs at both early and late stages emphasized miscellaneous emotional responses among subjects. In this study, we highlighted for the first time both behavioral and neuronal basis of emotional perception under acoustic naturalness alterations. Partial dependencies between ecological relevance and emotion understanding outlined the modulation but not the annihilation of emotional integration by synthetization.
Collapse
|
6
|
Xu X, Li D, Zhou Y, Wang Z. Multi-type features separating fusion learning for Speech Emotion Recognition. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
7
|
|
8
|
Binary Aquila Optimizer for Selecting Effective Features from Medical Data: A COVID-19 Case Study. MATHEMATICS 2022. [DOI: 10.3390/math10111929] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Medical technological advancements have led to the creation of various large datasets with numerous attributes. The presence of redundant and irrelevant features in datasets negatively influences algorithms and leads to decreases in the performance of the algorithms. Using effective features in data mining and analyzing tasks such as classification can increase the accuracy of the results and relevant decisions made by decision-makers using them. This increase can become more acute when dealing with challenging, large-scale problems in medical applications. Nature-inspired metaheuristics show superior performance in finding optimal feature subsets in the literature. As a seminal attempt, a wrapper feature selection approach is presented on the basis of the newly proposed Aquila optimizer (AO) in this work. In this regard, the wrapper approach uses AO as a search algorithm in order to discover the most effective feature subset. S-shaped binary Aquila optimizer (SBAO) and V-shaped binary Aquila optimizer (VBAO) are two binary algorithms suggested for feature selection in medical datasets. Binary position vectors are generated utilizing S- and V-shaped transfer functions while the search space stays continuous. The suggested algorithms are compared to six recent binary optimization algorithms on seven benchmark medical datasets. In comparison to the comparative algorithms, the gained results demonstrate that using both proposed BAO variants can improve the classification accuracy on these medical datasets. The proposed algorithm is also tested on the real-dataset COVID-19. The findings testified that SBAO outperforms comparative algorithms regarding the least number of selected features with the highest accuracy.
Collapse
|
9
|
Manohar K, Logashanmugam E. Hybrid deep learning with optimal feature selection for speech emotion recognition using improved meta-heuristic algorithm. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
10
|
Emotion recognition of speech signal using Taylor series and deep belief network based classification. EVOLUTIONARY INTELLIGENCE 2022. [DOI: 10.1007/s12065-019-00333-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
11
|
The Influence of E-book Teaching on the Motivation and Effectiveness of Learning Law by Using Data Mining Analysis. J ORGAN END USER COM 2022. [DOI: 10.4018/joeuc.295092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
This paper studies the motivation of learning law, compares the teaching effectiveness of two different teaching methods, e-book teaching and traditional teaching, and analyses the influence of e-book teaching on the effectiveness of law by using big data analysis. From the perspective of law student psychology, e-book teaching can attract students' attention, stimulate students' interest in learning, deepen knowledge impression while learning, expand knowledge, and ultimately improve the performance of practical assessment. With a small sample size, there may be some deficiencies in the research results' representativeness. To stimulate the learning motivation of law as well as some other theoretical disciplines in colleges and universities has particular referential significance and provides ideas for the reform of teaching mode at colleges and universities. This paper uses a decision tree algorithm in data mining for the analysis and finds out the influencing factors of law students' learning motivation and effectiveness in the learning process from students' perspective.
Collapse
|
12
|
A Novel Reformed Reduced Kernel Extreme Learning Machine with RELIEF-F for Classification. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:4795535. [PMID: 35371239 PMCID: PMC8970950 DOI: 10.1155/2022/4795535] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Revised: 02/15/2022] [Accepted: 02/28/2022] [Indexed: 11/30/2022]
Abstract
With the exponential growth of the Internet population, scientists and researchers face the large-scale data for processing. However, the traditional algorithms, due to their complex computation, are not suitable for the large-scale data, although they play a vital role in dealing with large-scale data for classification and regression. One of these variants, which is called Reduced Kernel Extreme Learning Machine (Reduced-KELM), is widely used in the classification task and attracts attention from researchers due to its superior performance. However, it still has limitations, such as instability of prediction because of the random selection and the redundant training samples and features because of large-scaled input data. This study proposes a novel model called Reformed Reduced Kernel Extreme Learning Machine with RELIEF-F (R-RKELM) for human activity recognition. RELIEF-F is applied to discard the attributes of samples with the negative values in the weights. A new sample selection approach, which is used to further reduce training samples and to replace the random selection part of Reduced-KELM, solves the unstable classification problem in the conventional Reduced-KELM and computation complexity problem. According to experimental results and statistical analysis, our proposed model obtains the best classification performances for human activity data sets than those of the baseline model, with an accuracy of 92.87 % for HAPT, 92.81 % for HARUS, and 86.92 % for Smartphone, respectively.
Collapse
|
13
|
Human-Computer Interaction with Detection of Speaker Emotions Using Convolution Neural Networks. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:7463091. [PMID: 35401731 PMCID: PMC8989588 DOI: 10.1155/2022/7463091] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2022] [Revised: 02/26/2022] [Accepted: 03/02/2022] [Indexed: 11/18/2022]
Abstract
Emotions play an essential role in human relationships, and many real-time applications rely on interpreting the speaker’s emotion from their words. Speech emotion recognition (SER) modules aid human-computer interface (HCI) applications, but they are challenging to implement because of the lack of balanced data for training and clarity about which features are sufficient for categorization. This research discusses the impact of the classification approach, identifying the most appropriate combination of features and data augmentation on speech emotion detection accuracy. Selection of the correct combination of handcrafted features with the classifier plays an integral part in reducing computation complexity. The suggested classification model, a 1D convolutional neural network (1D CNN), outperforms traditional machine learning approaches in classification. Unlike most earlier studies, which examined emotions primarily through a single language lens, our analysis looks at numerous language data sets. With the most discriminating features and data augmentation, our technique achieves 97.09%, 96.44%, and 83.33% accuracy for the BAVED, ANAD, and SAVEE data sets, respectively.
Collapse
|
14
|
Jemioło P, Storman D, Mamica M, Szymkowski M, Żabicka W, Wojtaszek-Główka M, Ligęza A. Datasets for Automated Affect and Emotion Recognition from Cardiovascular Signals Using Artificial Intelligence- A Systematic Review. SENSORS (BASEL, SWITZERLAND) 2022; 22:2538. [PMID: 35408149 PMCID: PMC9002643 DOI: 10.3390/s22072538] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 03/21/2022] [Accepted: 03/22/2022] [Indexed: 02/04/2023]
Abstract
Our review aimed to assess the current state and quality of publicly available datasets used for automated affect and emotion recognition (AAER) with artificial intelligence (AI), and emphasising cardiovascular (CV) signals. The quality of such datasets is essential to create replicable systems for future work to grow. We investigated nine sources up to 31 August 2020, using a developed search strategy, including studies considering the use of AI in AAER based on CV signals. Two independent reviewers performed the screening of identified records, full-text assessment, data extraction, and credibility. All discrepancies were resolved by discussion. We descriptively synthesised the results and assessed their credibility. The protocol was registered on the Open Science Framework (OSF) platform. Eighteen records out of 195 were selected from 4649 records, focusing on datasets containing CV signals for AAER. Included papers analysed and shared data of 812 participants aged 17 to 47. Electrocardiography was the most explored signal (83.33% of datasets). Authors utilised video stimulation most frequently (52.38% of experiments). Despite these results, much information was not reported by researchers. The quality of the analysed papers was mainly low. Researchers in the field should concentrate more on methodology.
Collapse
Affiliation(s)
- Paweł Jemioło
- AGH University of Science and Technology, Faculty of Electrical Engineering, Automatics, Computer Science and Biomedical Engineering, al. A. Mickiewicza 30, 30-059 Krakow, Poland; (M.M.); (M.S.)
| | - Dawid Storman
- Chair of Epidemiology and Preventive Medicine, Department of Hygiene and Dietetics, Jagiellonian University Medical College, ul. M. Kopernika 7, 31-034 Krakow, Poland;
| | - Maria Mamica
- AGH University of Science and Technology, Faculty of Electrical Engineering, Automatics, Computer Science and Biomedical Engineering, al. A. Mickiewicza 30, 30-059 Krakow, Poland; (M.M.); (M.S.)
| | - Mateusz Szymkowski
- AGH University of Science and Technology, Faculty of Electrical Engineering, Automatics, Computer Science and Biomedical Engineering, al. A. Mickiewicza 30, 30-059 Krakow, Poland; (M.M.); (M.S.)
| | - Wioletta Żabicka
- Students’ Scientific Research Group of Systematic Reviews, Jagiellonian University Medical College, ul. M. Kopernika 7, 31-034 Krakow, Poland; (W.Ż.); (M.W.-G.)
| | - Magdalena Wojtaszek-Główka
- Students’ Scientific Research Group of Systematic Reviews, Jagiellonian University Medical College, ul. M. Kopernika 7, 31-034 Krakow, Poland; (W.Ż.); (M.W.-G.)
| | - Antoni Ligęza
- AGH University of Science and Technology, Faculty of Electrical Engineering, Automatics, Computer Science and Biomedical Engineering, al. A. Mickiewicza 30, 30-059 Krakow, Poland; (M.M.); (M.S.)
| |
Collapse
|
15
|
Kaur K, Singh P. Impact of Feature Extraction and Feature Selection Algorithms on Punjabi Speech Emotion Recognition Using Convolutional Neural Network. ACM T ASIAN LOW-RESO 2022. [DOI: 10.1145/3511888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
The challenge to refine the spontaneity and productivity of a machine and human coherence, speech emotion recognition has been an overriding area of research. The trustability and fulfillment of such emotion recognition are largely involved with the feature extraction and selection processes. An important role is played in exploring and distinguishing audio content during the feature extraction phase. Also, the features that have been extracted should be tough to a number of disturbances and reliable enough for an adequate classification system. This paper focuses on three main components of a Speech Emotion Recognition (SER) Process. The first one is the optimal feature extraction method for Punjabi SER system. The second one is the use of an appropriate feature selection method that desires to select effectual features from the ones extracted in the first step, and removes the redundant features, to improve the conduct of emotion recognition. The third one is the classification model that has been used further for emotion recognition. So, the scope of this paper is to explain the three main steps of Punjabi SER system, feature extraction, feature selection, and emotion recognition with classifier. The results have been calculated and compared for number of feature set combinations, with and without feature selection process. A total of 10 experiments are carried out and various performance metrics such as precision, recall, F1-score, accuracy, etc. are used to demonstrate the results.
Collapse
Affiliation(s)
- Kamaldeep Kaur
- Research Scholar, IKG Punjab Technical University, Punjab, India and Department of Computer Science & Engineering, Guru Nanak Dev Engineering College, Ludhiana, Punjab, India
| | - Parminder Singh
- Department of Computer Science & Engineering, Guru Nanak Dev Engineering College, Ludhiana, Punjab, India
| |
Collapse
|
16
|
Xie X, Zhang X, Shen J, Du K. Poplar's Waterlogging Resistance Modeling and Evaluating: Exploring and Perfecting the Feasibility of Machine Learning Methods in Plant Science. FRONTIERS IN PLANT SCIENCE 2022; 13:821365. [PMID: 35222479 PMCID: PMC8874143 DOI: 10.3389/fpls.2022.821365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 01/20/2022] [Indexed: 06/14/2023]
Abstract
Floods, as one of the most common disasters in the natural environment, have caused huge losses to human life and property. Predicting the flood resistance of poplar can effectively help researchers select seedlings scientifically and resist floods precisely. Using machine learning algorithms, models of poplar's waterlogging tolerance were established and evaluated. First of all, the evaluation indexes of poplar's waterlogging tolerance were analyzed and determined. Then, significance testing, correlation analysis, and three feature selection algorithms (Hierarchical clustering, Lasso, and Stepwise regression) were used to screen photosynthesis, chlorophyll fluorescence, and environmental parameters. Based on this, four machine learning methods, BP neural network regression (BPR), extreme learning machine regression (ELMR), support vector regression (SVR), and random forest regression (RFR) were used to predict the flood resistance of poplar. The results show that random forest regression (RFR) and support vector regression (SVR) have high precision. On the test set, the coefficient of determination (R2) is 0.8351 and 0.6864, the root mean square error (RMSE) is 0.2016 and 0.2780, and the mean absolute error (MAE) is 0.1782 and 0.2031, respectively. Therefore, random forest regression (RFR) and support vector regression (SVR) can be given priority to predict poplar flood resistance.
Collapse
Affiliation(s)
- Xuelin Xie
- College of Sciences, Huazhong Agricultural University, Wuhan, China
| | | | - Jingfang Shen
- College of Sciences, Huazhong Agricultural University, Wuhan, China
| | - Kebing Du
- College of Horticulture and Forestry Sciences, Hubei Engineering Technology Research Center for Forestry Information, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
17
|
Mexican Emotional Speech Database Based on Semantic, Frequency, Familiarity, Concreteness, and Cultural Shaping of Affective Prosody. DATA 2021. [DOI: 10.3390/data6120130] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
In this paper, the Mexican Emotional Speech Database (MESD) that contains single-word emotional utterances for anger, disgust, fear, happiness, neutral and sadness with adult (male and female) and child voices is described. To validate the emotional prosody of the uttered words, a cubic Support Vector Machines classifier was trained on the basis of prosodic, spectral and voice quality features for each case study: (1) male adult, (2) female adult and (3) child. In addition, cultural, semantic, and linguistic shaping of emotional expression was assessed by statistical analysis. This study was registered at BioMed Central and is part of the implementation of a published study protocol. Mean emotional classification accuracies yielded 93.3%, 89.4% and 83.3% for male, female and child utterances respectively. Statistical analysis emphasized the shaping of emotional prosodies by semantic and linguistic features. A cultural variation in emotional expression was highlighted by comparing the MESD with the INTERFACE for Castilian Spanish database. The MESD provides reliable content for linguistic emotional prosody shaped by the Mexican cultural environment. In order to facilitate further investigations, a corpus controlled for linguistic features and emotional semantics, as well as one containing words repeated across voices and emotions are provided. The MESD is made freely available.
Collapse
|
18
|
Singh P, Srivastava R, Rana K, Kumar V. A multimodal hierarchical approach to speech emotion recognition from audio and text. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107316] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
19
|
Dong Y, Yang X. Affect-salient event sequence modelling for continuous speech emotion recognition. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.06.036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
20
|
Liu ZT, Jiang CS, Li SH, Wu M, Cao WH, Hao M. Eye state detection based on Weight Binarization Convolution Neural Network and Transfer Learning. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107565] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
21
|
García-Ordás MT, Alaiz-Moretón H, Benítez-Andrades JA, García-Rodríguez I, García-Olalla O, Benavides C. Sentiment analysis in non-fixed length audios using a Fully Convolutional Neural Network. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102946] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
22
|
Li S, Xing X, Fan W, Cai B, Fordson P, Xu X. Spatiotemporal and frequential cascaded attention networks for speech emotion recognition. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.02.094] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
23
|
Fernandes B, Mannepalli K. Speech Emotion Recognition Using Deep Learning LSTM for Tamil Language. PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY 2021. [DOI: 10.47836/pjst.29.3.33] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Deep Neural Networks (DNN) are more than just neural networks with several hidden units that gives better results with classification algorithm in automated voice recognition activities. Then spatial correlation was considered in traditional feedforward neural networks and which do not manage speech signal properly to it extend, so recurrent neural networks (RNNs) were implemented. Long Short-Term Memory (LSTM) systems is a unique case of RNNs for speech processing, thus considering long-term dependencies Deep Hierarchical LSTM and BiLSTM is designed with dropout layers to reduce the gradient and long-term learning error in emotional speech analysis. Thus, four different combinations of deep hierarchical learning architecture Deep Hierarchical LSTM and LSTM (DHLL), Deep Hierarchical LSTM and BiLSTM (DHLB), Deep Hierarchical BiLSTM and LSTM (DHBL) and Deep Hierarchical dual BiLSTM (DHBB) is designed with dropout layers to improve the networks. The performance test of all four model were compared in this paper and better efficiency of classification is attained with minimal dataset of Tamil Language. The experimental results show that DHLB reaches the best precision of about 84% in recognition of emotions for Tamil database, however, the DHBL gives 83% of efficiency. Other design layers also show equal performance but less than the above models DHLL & DHBB shows 81% of efficiency for lesser dataset and minimal execution and training time.
Collapse
|
24
|
Dhal P, Azad C. A comprehensive survey on feature selection in the various fields of machine learning. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02550-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
25
|
Speech emotion recognition based on formant characteristics feature extraction and phoneme type convergence. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.02.016] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
26
|
Song P, Zheng W, Yu Y, Ou S. Speech Emotion Recognition Based on Robust Discriminative Sparse Regression. IEEE Trans Cogn Dev Syst 2021. [DOI: 10.1109/tcds.2020.2990928] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
27
|
Abstract
Emotions are an integral part of human interactions and are significant factors in determining user satisfaction or customer opinion. speech emotion recognition (SER) modules also play an important role in the development of human–computer interaction (HCI) applications. A tremendous number of SER systems have been developed over the last decades. Attention-based deep neural networks (DNNs) have been shown as suitable tools for mining information that is unevenly time distributed in multimedia content. The attention mechanism has been recently incorporated in DNN architectures to emphasise also emotional salient information. This paper provides a review of the recent development in SER and also examines the impact of various attention mechanisms on SER performance. Overall comparison of the system accuracies is performed on a widely used IEMOCAP benchmark database.
Collapse
|
28
|
Ouyang CS, Chen YJ, Tsai JT, Chang YJ, Huang TH, Hwang KS, Ho YC, Ho WH. Data mining analysis of the influences of electrocardiogram P-wave morphology parameters on atrial fibrillation. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-189612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Atrial fibrillation (AF) is a type of paroxysmal cardiac disease that presents no obvious symptoms during onset, and even the electrocardiograms (ECG) results of patients with AF appear normal under a premorbid status, rendering AF difficult to detect and diagnose. However, it can result in deterioration and increased risk of stroke if not detected and treated early. This study used the ECG database provided by the Physionet website (https://physionet.org), filtered data, and employed parameter-extraction methods to identify parameters that signify ECG features. A total of 31 parameters were obtained, consisting of P-wave morphology parameters and heart rate variability parameters, and the data were further examined by implementing a decision tree, of which the topmost node indicated a significant causal relationship. The experiment results verified that the P-wave morphology parameters significantly affected the ECG results of patients with AF.
Collapse
Affiliation(s)
- Chen-Sen Ouyang
- Department of Information Engineering, I-Shou University, Kaohsiung, Taiwan
| | - Yenming J. Chen
- Department of Logistics Management, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan
| | - Jinn-Tsong Tsai
- Department of Computer Science, National Pingtung University, Pingtung, Taiwan
- Department of Healthcare Administration and Medical Informatics, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Yiu-Jen Chang
- Department of Healthcare Administration and Medical Informatics, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Tian-Hsiang Huang
- Center for Big Data Research, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Kao-Shing Hwang
- Department of Healthcare Administration and Medical Informatics, Kaohsiung Medical University, Kaohsiung, Taiwan
- Department of Electrical Engineering, National Sun Yat-Sen University, Kaohsiung, Taiwan
| | - Yuan-Chih Ho
- Division of Cardiology, Department of Internal Medicine, Yuan’s General Hospital, Kaohsiung, Taiwan
| | - Wen-Hsien Ho
- Department of Healthcare Administration and Medical Informatics, Kaohsiung Medical University, Kaohsiung, Taiwan
- Department of Medical Research, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
| |
Collapse
|
29
|
Mustaqeem, Kwon S. Att-Net: Enhanced emotion recognition system using lightweight self-attention module. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107101] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
30
|
A Smartphone-Based Cursor Position System in Cross-Device Interaction Using Machine Learning Techniques. SENSORS 2021; 21:s21051665. [PMID: 33670978 PMCID: PMC7957670 DOI: 10.3390/s21051665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 02/20/2021] [Accepted: 02/24/2021] [Indexed: 12/02/2022]
Abstract
The use of mobile devices, especially smartphones, has become popular in recent years. There is an increasing need for cross-device interaction techniques that seamlessly integrate mobile devices and large display devices together. This paper develops a novel cross-device cursor position system that maps a mobile device’s movement on a flat surface to a cursor’s movement on a large display. The system allows a user to directly manipulate objects on a large display device through a mobile device and supports seamless cross-device data sharing without physical distance restrictions. To achieve this, we utilize sound localization to initialize the mobile device position as the starting location of a cursor on the large screen. Then, the mobile device’s movement is detected through an accelerometer and is accordingly translated to the cursor’s movement on the large display using machine learning models. In total, 63 features and 10 classifiers were employed to construct the machine learning models for movement detection. The evaluation results have demonstrated that three classifiers, in particular, gradient boosting, linear discriminant analysis (LDA), and naïve Bayes, are suitable for detecting the movement of a mobile device.
Collapse
|
31
|
Noh KJ, Jeong CY, Lim J, Chung S, Kim G, Lim JM, Jeong H. Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets. SENSORS 2021; 21:s21051579. [PMID: 33668254 PMCID: PMC7956608 DOI: 10.3390/s21051579] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Revised: 02/15/2021] [Accepted: 02/21/2021] [Indexed: 01/01/2023]
Abstract
Speech emotion recognition (SER) is a natural method of recognizing individual emotions in everyday life. To distribute SER models to real-world applications, some key challenges must be overcome, such as the lack of datasets tagged with emotion labels and the weak generalization of the SER model for an unseen target domain. This study proposes a multi-path and group-loss-based network (MPGLN) for SER to support multi-domain adaptation. The proposed model includes a bidirectional long short-term memory-based temporal feature generator and a transferred feature extractor from the pre-trained VGG-like audio classification model (VGGish), and it learns simultaneously based on multiple losses according to the association of emotion labels in the discrete and dimensional models. For the evaluation of the MPGLN SER as applied to multi-cultural domain datasets, the Korean Emotional Speech Database (KESD), including KESDy18 and KESDy19, is constructed, and the English-speaking Interactive Emotional Dyadic Motion Capture database (IEMOCAP) is used. The evaluation of multi-domain adaptation and domain generalization showed 3.7% and 3.5% improvements, respectively, of the F1 score when comparing the performance of MPGLN SER with a baseline SER model that uses a temporal feature generator. We show that the MPGLN SER efficiently supports multi-domain adaptation and reinforces model generalization.
Collapse
|
32
|
Design and analysis of a decision intelligent system based on enzymatic numerical technology. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.07.033] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
33
|
Algorithm for speech emotion recognition classification based on Mel-frequency Cepstral coefficients and broad learning system. EVOLUTIONARY INTELLIGENCE 2021. [DOI: 10.1007/s12065-020-00532-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
34
|
Spezialetti M, Placidi G, Rossi S. Emotion Recognition for Human-Robot Interaction: Recent Advances and Future Perspectives. Front Robot AI 2020; 7:532279. [PMID: 33501307 PMCID: PMC7806093 DOI: 10.3389/frobt.2020.532279] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Accepted: 09/18/2020] [Indexed: 12/11/2022] Open
Abstract
A fascinating challenge in the field of human-robot interaction is the possibility to endow robots with emotional intelligence in order to make the interaction more intuitive, genuine, and natural. To achieve this, a critical point is the capability of the robot to infer and interpret human emotions. Emotion recognition has been widely explored in the broader fields of human-machine interaction and affective computing. Here, we report recent advances in emotion recognition, with particular regard to the human-robot interaction context. Our aim is to review the state of the art of currently adopted emotional models, interaction modalities, and classification strategies and offer our point of view on future developments and critical issues. We focus on facial expressions, body poses and kinematics, voice, brain activity, and peripheral physiological responses, also providing a list of available datasets containing data from these modalities.
Collapse
Affiliation(s)
- Matteo Spezialetti
- PRISCA (Intelligent Robotics and Advanced Cognitive System Projects) Laboratory, Department of Electrical Engineering and Information Technology (DIETI), University of Naples Federico II, Naples, Italy
- Department of Information Engineering, Computer Science and Mathematics, University of L'Aquila, L'Aquila, Italy
| | - Giuseppe Placidi
- AVI (Acquisition, Analysis, Visualization & Imaging Laboratory) Laboratory, Department of Life, Health and Environmental Sciences (MESVA), University of L'Aquila, L'Aquila, Italy
| | - Silvia Rossi
- PRISCA (Intelligent Robotics and Advanced Cognitive System Projects) Laboratory, Department of Electrical Engineering and Information Technology (DIETI), University of Naples Federico II, Naples, Italy
| |
Collapse
|
35
|
Bromuri S, Henkel AP, Iren D, Urovi V. Using AI to predict service agent stress from emotion patterns in service interactions. JOURNAL OF SERVICE MANAGEMENT 2020. [DOI: 10.1108/josm-06-2019-0163] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
PurposeA vast body of literature has documented the negative consequences of stress on employee performance and well-being. These deleterious effects are particularly pronounced for service agents who need to constantly endure and manage customer emotions. The purpose of this paper is to introduce and describe a deep learning model to predict in real-time service agent stress from emotion patterns in voice-to-voice service interactions.Design/methodology/approachA deep learning model was developed to identify emotion patterns in call center interactions based on 363 recorded service interactions, subdivided in 27,889 manually expert-labeled three-second audio snippets. In a second step, the deep learning model was deployed in a call center for a period of one month to be further trained by the data collected from 40 service agents in another 4,672 service interactions.FindingsThe deep learning emotion classifier reached a balanced accuracy of 68% in predicting discrete emotions in service interactions. Integrating this model in a binary classification model, it was able to predict service agent stress with a balanced accuracy of 80%.Practical implicationsService managers can benefit from employing the deep learning model to continuously and unobtrusively monitor the stress level of their service agents with numerous practical applications, including real-time early warning systems for service agents, customized training and automatically linking stress to customer-related outcomes.Originality/valueThe present study is the first to document an artificial intelligence (AI)-based model that is able to identify emotions in natural (i.e. nonstaged) interactions. It is further a pioneer in developing a smart emotion-based stress measure for service agents. Finally, the study contributes to the literature on the role of emotions in service interactions and employee stress.
Collapse
|
36
|
Yang N, Dey N, Sherratt RS, Shi F. Recognize basic emotional statesin speech by machine learning techniques using mel-frequency cepstral coefficient features. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2020. [DOI: 10.3233/jifs-179963] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Speech Emotion Recognition (SER) has been widely used in many fields, such as smart home assistants commonly found in the market. Smart home assistants that could detect the user’s emotion would improve the communication between a user and the assistant enabling the assistant to offer more productive feedback. Thus, the aim of this work is to analyze emotional states in speech and propose a suitable algorithm considering performance verses complexity for deployment in smart home devices. The four emotional speech sets were selected from the Berlin Emotional Database (EMO-DB) as experimental data, 26 MFCC features were extracted from each type of emotional speech to identify the emotions of happiness, anger, sadness and neutrality. Then, speaker-independent experiments for our Speech emotion Recognition (SER) were conducted by using the Back Propagation Neural Network (BPNN), Extreme Learning Machine (ELM), Probabilistic Neural Network (PNN) and Support Vector Machine (SVM). Synthesizing the recognition accuracy and processing time, this work shows that the performance of SVM was the best among the four methods as a good candidate to be deployed for SER in smart home devices. SVM achieved an overall accuracy of 92.4% while offering low computational requirements when training and testing. We conclude that the MFCC features and the SVM classification models used in speaker-independent experiments are highly effective in the automatic prediction of emotion.
Collapse
Affiliation(s)
- Ningning Yang
- First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Nilanjan Dey
- Department of Information Technology, Techno India College of Technology, West Bengal, India
| | | | - Fuqian Shi
- First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| |
Collapse
|
37
|
Identifying redundant features using unsupervised learning for high-dimensional data. SN APPLIED SCIENCES 2020. [DOI: 10.1007/s42452-020-3157-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
|
38
|
Vargas VM, Gutiérrez PA, Hervás-Martínez C. Cumulative link models for deep ordinal classification. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.03.034] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
39
|
Kallipolitis A, Galliakis M, Menychtas A, Maglogiannis I. Affective analysis of patients in homecare video-assisted telemedicine using computational intelligence. Neural Comput Appl 2020. [DOI: 10.1007/s00521-020-05203-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
40
|
Griol D, Molina JM, Sanchis A, Callejas Z. A data-driven approach to spoken dialog segmentation. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.02.072] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
41
|
Hao M, Cao WH, Liu ZT, Wu M, Xiao P. Visual-audio emotion recognition based on multi-task and ensemble learning with multiple features. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.01.048] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
42
|
Liu ZT, Wu BH, Li DY, Xiao P, Mao JW. Speech Emotion Recognition Based on Selective Interpolation Synthetic Minority Over-Sampling Technique in Small Sample Environment. SENSORS 2020; 20:s20082297. [PMID: 32316473 PMCID: PMC7219047 DOI: 10.3390/s20082297] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/15/2020] [Revised: 04/10/2020] [Accepted: 04/14/2020] [Indexed: 11/16/2022]
Abstract
Speech emotion recognition often encounters the problems of data imbalance and redundant features in different application scenarios. Researchers usually design different recognition models for different sample conditions. In this study, a speech emotion recognition model for a small sample environment is proposed. A data imbalance processing method based on selective interpolation synthetic minority over-sampling technique (SISMOTE) is proposed to reduce the impact of sample imbalance on emotion recognition results. In addition, feature selection method based on variance analysis and gradient boosting decision tree (GBDT) is introduced, which can exclude the redundant features that possess poor emotional representation. Results of experiments of speech emotion recognition on three databases (i.e., CASIA, Emo-DB, SAVEE) show that our method obtains average recognition accuracy of 90.28% (CASIA), 75.00% (SAVEE) and 85.82% (Emo-DB) for speaker-dependent speech emotion recognition which is superior to some state-of-the-arts works.
Collapse
Affiliation(s)
- Zhen-Tao Liu
- School of Automation, China University of Geosciences, Wuhan 430074, China; (Z.-T.L.); (B.-H.W.); (P.X.); (J.-W.M.)
- Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan 430074, China
| | - Bao-Han Wu
- School of Automation, China University of Geosciences, Wuhan 430074, China; (Z.-T.L.); (B.-H.W.); (P.X.); (J.-W.M.)
- Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan 430074, China
| | - Dan-Yun Li
- School of Automation, China University of Geosciences, Wuhan 430074, China; (Z.-T.L.); (B.-H.W.); (P.X.); (J.-W.M.)
- Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan 430074, China
- Correspondence:
| | - Peng Xiao
- School of Automation, China University of Geosciences, Wuhan 430074, China; (Z.-T.L.); (B.-H.W.); (P.X.); (J.-W.M.)
- Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan 430074, China
| | - Jun-Wei Mao
- School of Automation, China University of Geosciences, Wuhan 430074, China; (Z.-T.L.); (B.-H.W.); (P.X.); (J.-W.M.)
- Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan 430074, China
| |
Collapse
|
43
|
Chu Y, Lin H, Yang L, Diao Y, Zhang D, Zhang S, Fan X, Shen C, Xu B, Yan D. Discriminative globality-locality preserving extreme learning machine for image classification. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.09.013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
44
|
Ensemble Learning of Hybrid Acoustic Features for Speech Emotion Recognition. ALGORITHMS 2020. [DOI: 10.3390/a13030070] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Automatic recognition of emotion is important for facilitating seamless interactivity between a human being and intelligent robot towards the full realization of a smart society. The methods of signal processing and machine learning are widely applied to recognize human emotions based on features extracted from facial images, video files or speech signals. However, these features were not able to recognize the fear emotion with the same level of precision as other emotions. The authors propose the agglutination of prosodic and spectral features from a group of carefully selected features to realize hybrid acoustic features for improving the task of emotion recognition. Experiments were performed to test the effectiveness of the proposed features extracted from speech files of two public databases and used to train five popular ensemble learning algorithms. Results show that random decision forest ensemble learning of the proposed hybrid acoustic features is highly effective for speech emotion recognition.
Collapse
|
45
|
Liu ZT, Li SH, Wu M, Cao WH, Hao M, Xian LB. Eye localization based on weight binarization cascade convolution neural network. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.10.048] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
46
|
Kaur R, Sharma R, Kumar P. Speaker Classification with Support Vector Machine and Crossover-Based Particle Swarm Optimization. INT J PATTERN RECOGN 2020. [DOI: 10.1142/s0218001420510106] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
It has been observed from the literature that speech is the most natural means of communication between humans. Human beings start speaking without any tool or any explicit education. The environment surrounding them helps them to learn the art of speaking. From the existing literature, it is found that the existing speaker classification techniques suffer from over-fitting and parameter tuning issues. An efficient tuning of machine learning techniques can improve the classification accuracy of speaker classification. To overcome this issue, in this paper, an efficient particle swarm optimization-based support vector machine is proposed. The proposed and the competitive speaker classification techniques are tested on the speaker classification data of Punjabi persons. The comparative analysis of the proposed technique reveals that it outperforms existing techniques in terms of accuracy, [Formula: see text]-measure, specificity and sensitivity.
Collapse
Affiliation(s)
- Rupinderdeep Kaur
- Department of Computer Science and Engineering, Thapar Institute of Engineering and Technology, Patiala, Punjab, India
| | - R. K. Sharma
- Department of Computer Science and Engineering, Thapar Institute of Engineering and Technology, Patiala, Punjab, India
| | - Parteek Kumar
- Department of Computer Science and Engineering, Thapar Institute of Engineering and Technology, Patiala, Punjab, India
| |
Collapse
|
47
|
Langari S, Marvi H, Zahedi M. Efficient speech emotion recognition using modified feature extraction. INFORMATICS IN MEDICINE UNLOCKED 2020. [DOI: 10.1016/j.imu.2020.100424] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
|
48
|
Mustaqeem, Kwon S. A CNN-Assisted Enhanced Audio Signal Processing for Speech Emotion Recognition. SENSORS (BASEL, SWITZERLAND) 2019; 20:E183. [PMID: 31905692 PMCID: PMC6982825 DOI: 10.3390/s20010183] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 12/25/2019] [Accepted: 12/26/2019] [Indexed: 01/09/2023]
Abstract
Speech is the most significant mode of communication among human beings and a potential method for human-computer interaction (HCI) by using a microphone sensor. Quantifiable emotion recognition using these sensors from speech signals is an emerging area of research in HCI, which applies to multiple applications such as human-reboot interaction, virtual reality, behavior assessment, healthcare, and emergency call centers to determine the speaker's emotional state from an individual's speech. In this paper, we present major contributions for; (i) increasing the accuracy of speech emotion recognition (SER) compared to state of the art and (ii) reducing the computational complexity of the presented SER model. We propose an artificial intelligence-assisted deep stride convolutional neural network (DSCNN) architecture using the plain nets strategy to learn salient and discriminative features from spectrogram of speech signals that are enhanced in prior steps to perform better. Local hidden patterns are learned in convolutional layers with special strides to down-sample the feature maps rather than pooling layer and global discriminative features are learned in fully connected layers. A SoftMax classifier is used for the classification of emotions in speech. The proposed technique is evaluated on Interactive Emotional Dyadic Motion Capture (IEMOCAP) and Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) datasets to improve accuracy by 7.85% and 4.5%, respectively, with the model size reduced by 34.5 MB. It proves the effectiveness and significance of the proposed SER technique and reveals its applicability in real-world applications.
Collapse
Affiliation(s)
| | - Soonil Kwon
- Interaction Technology Laboratory, Department of Software, Sejong University, Seoul 05006, Korea;
| |
Collapse
|
49
|
Liu ZT, Xie Q, Wu M, Cao WH, Li DY, Li SH. Electroencephalogram Emotion Recognition Based on Empirical Mode Decomposition and Optimal Feature Selection. IEEE Trans Cogn Dev Syst 2019. [DOI: 10.1109/tcds.2018.2868121] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
50
|
An Optimisation-Driven Prediction Method for Automated Diagnosis and Prognosis. MATHEMATICS 2019. [DOI: 10.3390/math7111051] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
This article presents a novel hybrid classification paradigm for medical diagnoses and prognoses prediction. The core mechanism of the proposed method relies on a centroid classification algorithm whose logic is exploited to formulate the classification task as a real-valued optimisation problem. A novel metaheuristic combining the algorithmic structure of Swarm Intelligence optimisers with the probabilistic search models of Estimation of Distribution Algorithms is designed to optimise such a problem, thus leading to high-accuracy predictions. This method is tested over 11 medical datasets and compared against 14 cherry-picked classification algorithms. Results show that the proposed approach is competitive and superior to the state-of-the-art on several occasions.
Collapse
|