1
|
Xie W, Peng Z, Shen L, Lu W, Zhang Y, Song S. Cross-Layer Contrastive Learning of Latent Semantics for Facial Expression Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:2514-2529. [PMID: 38530732 DOI: 10.1109/tip.2024.3378459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/28/2024]
Abstract
Convolutional neural networks (CNNs) have achieved significant improvement for the task of facial expression recognition. However, current training still suffers from the inconsistent learning intensities among different layers, i.e., the feature representations in the shallow layers are not sufficiently learned compared with those in deep layers. To this end, this work proposes a contrastive learning framework to align the feature semantics of shallow and deep layers, followed by an attention module for representing the multi-scale features in the weight-adaptive manner. The proposed algorithm has three main merits. First, the learning intensity, defined as the magnitude of the backpropagation gradient, of the features on the shallow layer is enhanced by cross-layer contrastive learning. Second, the latent semantics in the shallow-layer and deep-layer features are explored and aligned in the contrastive learning, and thus the fine-grained characteristics of expressions can be taken into account for the feature representation learning. Third, by integrating the multi-scale features from multiple layers with an attention module, our algorithm achieved the state-of-the-art performances, i.e. 92.21%, 89.50%, 62.82%, on three in-the-wild expression databases, i.e. RAF-DB, FERPlus, SFEW, and the second best performance, i.e. 65.29% on AffectNet dataset. Our codes will be made publicly available.
Collapse
|
2
|
Kartheek MN, Prasad MVNK, Bhukya R. Texture based feature extraction using symbol patterns for facial expression recognition. Cogn Neurodyn 2024; 18:317-335. [PMID: 38699622 PMCID: PMC11061079 DOI: 10.1007/s11571-022-09824-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 03/27/2022] [Accepted: 05/15/2022] [Indexed: 11/03/2022] Open
Abstract
Facial expressions can convey the internal emotions of a person within a certain scenario and play a major role in the social interaction of human beings. In automatic Facial Expression Recognition (FER) systems, the method applied for feature extraction plays a major role in determining the performance of a system. In this regard, by drawing inspiration from the Swastik symbol, three texture based feature descriptors named Symbol Patterns (SP1, SP2 and SP3) have been proposed for facial feature extraction. SP1 generates one pattern value by comparing eight pixels within a 3× 3 neighborhood, whereas, SP2 and SP3 generates two pattern values each by comparing twelve and sixteen pixels within a 5× 5 neighborhood respectively. In this work, the proposed Symbol Patterns (SP) have been evaluated with natural, fibonacci, odd, prime, squares and binary weights for determining the optimal recognition accuracy. The proposed SP methods have been tested on MUG, TFEID, CK+, KDEF, FER2013 and FERG datasets and the results from the experimental analysis demonstrated an improvement in the recognition accuracy when compared to the existing FER methods.
Collapse
Affiliation(s)
- Mukku Nisanth Kartheek
- Institute for Development and Research in Banking Technology, Hyderabad, India
- Department of Computer Science and Engineering, National Institute of Technology, Warangal, India
| | | | - Raju Bhukya
- Department of Computer Science and Engineering, National Institute of Technology, Warangal, India
| |
Collapse
|
3
|
Kim J, Lee D. Facial Expression Recognition Robust to Occlusion and to Intra-Similarity Problem Using Relevant Subsampling. SENSORS (BASEL, SWITZERLAND) 2023; 23:2619. [PMID: 36904823 PMCID: PMC10007059 DOI: 10.3390/s23052619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Revised: 02/22/2023] [Accepted: 02/25/2023] [Indexed: 06/18/2023]
Abstract
This paper proposes facial expression recognition (FER) with the wild data set. In particular, this paper chiefly deals with two issues, occlusion and intra-similarity problems. The attention mechanism enables one to use the most relevant areas of facial images for specific expressions, and the triplet loss function solves the intra-similarity problem that sometimes fails to aggregate the same expression from different faces and vice versa. The proposed approach for the FER is robust to occlusion, and it uses a spatial transformer network (STN) with an attention mechanism to utilize specific facial region that dominantly contributes (or that is the most relevant) to particular facial expressions, e.g., anger, contempt, disgust, fear, joy, sadness, and surprise. In addition, the STN model is connected to the triplet loss function to improve the recognition rate which outperforms the existing approaches that employ cross-entropy or other approaches using only deep neural networks or classical methods. The triplet loss module alleviates limitations of the intra-similarity problem, leading to further improvement of the classification. Experimental results are provided to substantiate the proposed approach for FER, and the result outperforms the recognition rate in more practical cases, e.g., occlusion. The quantitative result provides FER results with more than 2.09% higher accuracy compared to the existing FER results in CK+ data sets and 0.48% higher than the accuracy of the results with the modified ResNet model in the FER2013 data set.
Collapse
|
4
|
Wang K, He R, Wang S, Liu L, Yamauchi T. The Efficient-CapsNet model for facial expression recognition. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04349-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
5
|
Pascual AM, Valverde EC, Kim JI, Jeong JW, Jung Y, Kim SH, Lim W. Light-FER: A Lightweight Facial Emotion Recognition System on Edge Devices. SENSORS (BASEL, SWITZERLAND) 2022; 22:9524. [PMID: 36502225 PMCID: PMC9738842 DOI: 10.3390/s22239524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 10/28/2022] [Accepted: 11/30/2022] [Indexed: 06/17/2023]
Abstract
Facial emotion recognition (FER) systems are imperative in recent advanced artificial intelligence (AI) applications to realize better human-computer interactions. Most deep learning-based FER systems have issues with low accuracy and high resource requirements, especially when deployed on edge devices with limited computing resources and memory. To tackle these problems, a lightweight FER system, called Light-FER, is proposed in this paper, which is obtained from the Xception model through model compression. First, pruning is performed during the network training to remove the less important connections within the architecture of Xception. Second, the model is quantized to half-precision format, which could significantly reduce its memory consumption. Third, different deep learning compilers performing several advanced optimization techniques are benchmarked to further accelerate the inference speed of the FER system. Lastly, to experimentally demonstrate the objectives of the proposed system on edge devices, Light-FER is deployed on NVIDIA Jetson Nano.
Collapse
Affiliation(s)
- Alexander M. Pascual
- Department of Aeronautics, Mechanical and Electronic Convergence Engineering, Kumoh National Institute of Technology, Gumi 39177, Republic of Korea
| | - Erick C. Valverde
- Department of Aeronautics, Mechanical and Electronic Convergence Engineering, Kumoh National Institute of Technology, Gumi 39177, Republic of Korea
| | - Jeong-in Kim
- Department of Aeronautics, Mechanical and Electronic Convergence Engineering, Kumoh National Institute of Technology, Gumi 39177, Republic of Korea
| | - Jin-Woo Jeong
- Department of Data Science, Seoul National University of Science and Technology, Seoul 01811, Republic of Korea
| | - Yuchul Jung
- Department of Computer Engineering, Kumoh National Institute of Technology, Gumi 39177, Republic of Korea
| | - Sang-Ho Kim
- Department of Industrial Engineering, Kumoh National Institute of Technology, Gumi 39177, Republic of Korea
| | - Wansu Lim
- Department of Aeronautics, Mechanical and Electronic Convergence Engineering, Kumoh National Institute of Technology, Gumi 39177, Republic of Korea
| |
Collapse
|
6
|
Zhu Y, Lin G, Song L, Zhang J. The application of neural network for software vulnerability detection: a review. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-08046-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
7
|
Mukhopadhyay M, Dey A, Kahali S. A deep-learning-based facial expression recognition method using textural features. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-08005-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
8
|
Tan X, Fan Y, Sun M, Zhuang M, Qu F. An Emotion Index Estimation based on Facial Action Unit Prediction. Pattern Recognit Lett 2022. [DOI: 10.1016/j.patrec.2022.11.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
9
|
Aung ST, Hassan M, Brady M, Mannan ZI, Azam S, Karim A, Zaman S, Wongsawat Y. Entropy-Based Emotion Recognition from Multichannel EEG Signals Using Artificial Neural Network. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:6000989. [PMID: 36275950 PMCID: PMC9584707 DOI: 10.1155/2022/6000989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 09/22/2022] [Indexed: 11/17/2022]
Abstract
Humans experience a variety of emotions throughout the course of their daily lives, including happiness, sadness, and rage. As a result, an effective emotion identification system is essential for electroencephalography (EEG) data to accurately reflect emotion in real-time. Although recent studies on this problem can provide acceptable performance measures, it is still not adequate for the implementation of a complete emotion recognition system. In this research work, we propose a new approach for an emotion recognition system, using multichannel EEG calculation with our developed entropy known as multivariate multiscale modified-distribution entropy (MM-mDistEn) which is combined with a model based on an artificial neural network (ANN) to attain a better outcome over existing methods. The proposed system has been tested with two different datasets and achieved better accuracy than existing methods. For the GAMEEMO dataset, we achieved an average accuracy ± standard deviation of 95.73% ± 0.67 for valence and 96.78% ± 0.25 for arousal. Moreover, the average accuracy percentage for the DEAP dataset reached 92.57% ± 1.51 in valence and 80.23% ± 1.83 in arousal.
Collapse
Affiliation(s)
- Si Thu Aung
- Department of Biomedical Engineering, Faculty of Engineering, Mahidol University, Salaya, Thailand
| | - Mehedi Hassan
- Computer Science and Engineering, North Western University, Khulna, Bangladesh
| | - Mark Brady
- Asia Pacific College of Business and Law, Charles Darwin University, Casuarina, NT, Australia
| | - Zubaer Ibna Mannan
- Department of Smart Computing, Kyungdong University, Global Campus, Goseong-Gun, Republic of Korea
| | - Sami Azam
- College of Engineering IT and Environment, Charles Darwin University, Casuarina, NT, Australia
| | - Asif Karim
- College of Engineering IT and Environment, Charles Darwin University, Casuarina, NT, Australia
| | - Sadika Zaman
- Computer Science and Engineering, North Western University, Khulna, Bangladesh
| | - Yodchanan Wongsawat
- Department of Biomedical Engineering, Faculty of Engineering, Mahidol University, Salaya, Thailand
| |
Collapse
|
10
|
Kartheek MN, Prasad MVNK, Bhukya R. DRCP: Dimensionality Reduced Chess Pattern for Person Independent Facial Expression Recognition. INT J PATTERN RECOGN 2022. [DOI: 10.1142/s021800142256016x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
11
|
Real-Time Facial Expression Recognition Using Deep Learning with Application in the Active Classroom Environment. ELECTRONICS 2022. [DOI: 10.3390/electronics11081240] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The quality of a teaching method used in a classroom can be assessed by observing the facial expressions of students. To automate this, Facial Expression Recognition (FER) can be employed. Based on the recognized emotions of students, teachers can improve their lectures by determining which activities during the lecture evoke which emotions and how these emotions are related to the tasks solved by the students. Previous work mostly addresses the problem in the context of passive teaching, where teachers present while students listen and take notes, and usually in online courses. We take this a step further and develop predictive models that can classify emotions in the context of active teaching, specifically a robotics workshop, which is more challenging. The two best generalizing models (Inception-v3 and ResNet-34) on the test set were combined with the goal of real-time emotion prediction on videos of workshop participants solving eight tasks using an educational robot. As a proof of concept, we applied the models to the video data and analyzed the predicted emotions with regard to activities, tasks, and gender of the participants. Statistical analysis showed that female participants were more likely to show emotions in almost all activity types. In addition, for all activity types, the emotion of happiness was most likely regardless of gender. Finally, the activity type in which the analyzed emotions were the most frequent was programming. These results indicate that students’ facial expressions are related to the activities they are currently engaged in and contain valuable information for teachers about what they can improve in their teaching practice.
Collapse
|
12
|
Effective attention feature reconstruction loss for facial expression recognition in the wild. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07016-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
13
|
Šumak B, Brdnik S, Pušnik M. Sensors and Artificial Intelligence Methods and Algorithms for Human-Computer Intelligent Interaction: A Systematic Mapping Study. SENSORS 2021; 22:s22010020. [PMID: 35009562 PMCID: PMC8747169 DOI: 10.3390/s22010020] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 12/09/2021] [Accepted: 12/18/2021] [Indexed: 11/16/2022]
Abstract
To equip computers with human communication skills and to enable natural interaction between the computer and a human, intelligent solutions are required based on artificial intelligence (AI) methods, algorithms, and sensor technology. This study aimed at identifying and analyzing the state-of-the-art AI methods and algorithms and sensors technology in existing human-computer intelligent interaction (HCII) research to explore trends in HCII research, categorize existing evidence, and identify potential directions for future research. We conduct a systematic mapping study of the HCII body of research. Four hundred fifty-four studies published in various journals and conferences between 2010 and 2021 were identified and analyzed. Studies in the HCII and IUI fields have primarily been focused on intelligent recognition of emotion, gestures, and facial expressions using sensors technology, such as the camera, EEG, Kinect, wearable sensors, eye tracker, gyroscope, and others. Researchers most often apply deep-learning and instance-based AI methods and algorithms. The support sector machine (SVM) is the most widely used algorithm for various kinds of recognition, primarily an emotion, facial expression, and gesture. The convolutional neural network (CNN) is the often-used deep-learning algorithm for emotion recognition, facial recognition, and gesture recognition solutions.
Collapse
|
14
|
Multimodal magnetic resonance image and electroencephalogram constrained fusion algorithm using deep learning. Soft comput 2021. [DOI: 10.1007/s00500-021-06574-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
15
|
Abstract
Facial emotion recognition is an inherently complex problem due to individual diversity in facial features and racial and cultural differences. Moreover, facial expressions typically reflect the mixture of people’s emotional statuses, which can be expressed using compound emotions. Compound facial emotion recognition makes the problem even more difficult because the discrimination between dominant and complementary emotions is usually weak. We have created a database that includes 31,250 facial images with different emotions of 115 subjects whose gender distribution is almost uniform to address compound emotion recognition. In addition, we have organized a competition based on the proposed dataset, held at FG workshop 2020. This paper analyzes the winner’s approach—a two-stage recognition method (1st stage, coarse recognition; 2nd stage, fine recognition), which enhances the classification of symmetrical emotion labels.
Collapse
|
16
|
Zhuang M, Yin L, Wang Y, Bai Y, Zhan J, Hou C, Yin L, Xu Z, Tan X, Huang Y. Highly Robust and Wearable Facial Expression Recognition via Deep-Learning-Assisted, Soft Epidermal Electronics. RESEARCH 2021; 2021:9759601. [PMID: 34368767 PMCID: PMC8302843 DOI: 10.34133/2021/9759601] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 06/11/2021] [Indexed: 11/30/2022]
Abstract
The facial expressions are a mirror of the elusive emotion hidden in the mind, and thus, capturing expressions is a crucial way of merging the inward world and virtual world. However, typical facial expression recognition (FER) systems are restricted by environments where faces must be clearly seen for computer vision, or rigid devices that are not suitable for the time-dynamic, curvilinear faces. Here, we present a robust, highly wearable FER system that is based on deep-learning-assisted, soft epidermal electronics. The epidermal electronics that can fully conform on faces enable high-fidelity biosignal acquisition without hindering spontaneous facial expressions, releasing the constraint of movement, space, and light. The deep learning method can significantly enhance the recognition accuracy of facial expression types and intensities based on a small sample. The proposed wearable FER system is superior for wide applicability and high accuracy. The FER system is suitable for the individual and shows essential robustness to different light, occlusion, and various face poses. It is totally different from but complementary to the computer vision technology that is merely suitable for simultaneous FER of multiple individuals in a specific place. This wearable FER system is successfully applied to human-avatar emotion interaction and verbal communication disambiguation in a real-life environment, enabling promising human-computer interaction applications.
Collapse
Affiliation(s)
- Meiqi Zhuang
- Information Engineering College, Capital Normal University, Beijing 100048, China
| | - Lang Yin
- State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.,Flexible Electronics Research Center, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Youhua Wang
- State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.,Flexible Electronics Research Center, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Yunzhao Bai
- State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.,Flexible Electronics Research Center, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Jian Zhan
- State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.,Flexible Electronics Research Center, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Chao Hou
- State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.,Flexible Electronics Research Center, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Liting Yin
- State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.,Flexible Electronics Research Center, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Zhangyu Xu
- State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.,Flexible Electronics Research Center, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Xiaohui Tan
- Information Engineering College, Capital Normal University, Beijing 100048, China
| | - YongAn Huang
- State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.,Flexible Electronics Research Center, Huazhong University of Science and Technology, Wuhan 430074, China
| |
Collapse
|
17
|
Karnati M, Seal A, Yazidi A, Krejcar O. LieNet: A Deep Convolution Neural Networks Framework for Detecting Deception. IEEE Trans Cogn Dev Syst 2021. [DOI: 10.1109/tcds.2021.3086011] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|