1
|
Wang Z, Yang M, Jiao Q, Xu L, Han B, Li Y, Tan X. Two-Level Spatio-Temporal Feature Fused Two-Stream Network for Micro-Expression Recognition. SENSORS (BASEL, SWITZERLAND) 2024; 24:1574. [PMID: 38475109 DOI: 10.3390/s24051574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 02/12/2024] [Accepted: 02/27/2024] [Indexed: 03/14/2024]
Abstract
Micro-expressions, which are spontaneous and difficult to suppress, reveal a person's true emotions. They are characterized by short duration and low intensity, making the task of micro-expression recognition challenging in the field of emotion computing. In recent years, deep learning-based feature extraction and fusion techniques have been widely used for micro-expression recognition, particularly methods based on Vision Transformer that have gained popularity. However, the Vision Transformer-based architecture used in micro-expression recognition involves a significant amount of invalid computation. Additionally, in the traditional two-stream architecture, although separate streams are combined through late fusion, only the output features from the deepest level of the network are utilized for classification, thus limiting the network's ability to capture subtle details due to the lack of fine-grained information. To address these issues, we propose a new two-level spatio-temporal feature fused with a two-stream architecture. This architecture includes a spatial encoder (modified ResNet) for learning texture features of the face, a temporal encoder (Swin Transformer) for learning facial muscle motor features, a feature fusion algorithm for integrating multi-level spatio-temporal features, a classification head, and a weighted average operator for temporal aggregation. The two-stream architecture has the advantage of extracting richer features compared to the single-stream architecture, leading to improved performance. The shifted window scheme of Swin Transformer restricts self-attention computation to non-overlapping local windows and allows cross-window connections, significantly improving the performance and reducing the computation compared to Vision Transformer. Moreover, the modified ResNet is computationally less intensive. Our proposed feature fusion algorithm leverages the similarity in output feature shapes at each stage of the two streams, enabling the effective fusion of multi-level spatio-temporal features. This algorithm results in an improvement of approximately 4% in both the F1 score and the UAR. Comprehensive evaluations conducted on three widely used spontaneous micro-expression datasets (SMIC-HS, CASME II, and SAMM) consistently demonstrate the superiority of our approach over comparative methods. Notably, our approach achieves a UAR exceeding 0.905 on CASME II, making it one of the few frameworks in the published micro-expression recognition literature to achieve such high performance.
Collapse
Affiliation(s)
- Zebiao Wang
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Mingyu Yang
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qingbin Jiao
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Liang Xu
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Bing Han
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| | - Yuhang Li
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xin Tan
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
2
|
Yang H, Xie L, Pan H, Li C, Wang Z, Zhong J. Multimodal Attention Dynamic Fusion Network for Facial Micro-Expression Recognition. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1246. [PMID: 37761545 PMCID: PMC10528512 DOI: 10.3390/e25091246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 08/07/2023] [Accepted: 08/15/2023] [Indexed: 09/29/2023]
Abstract
The emotional changes in facial micro-expressions are combinations of action units. The researchers have revealed that action units can be used as additional auxiliary data to improve facial micro-expression recognition. Most of the researchers attempt to fuse image features and action unit information. However, these works ignore the impact of action units on the facial image feature extraction process. Therefore, this paper proposes a local detail feature enhancement model based on a multimodal dynamic attention fusion network (MADFN) method for micro-expression recognition. This method uses a masked autoencoder based on learnable class tokens to remove local areas with low emotional expression ability in micro-expression images. Then, we utilize the action unit dynamic fusion module to fuse action unit representation to improve the potential representation ability of image features. The state-of-the-art performance of our proposed model is evaluated and verified on SMIC, CASME II, SAMM, and their combined 3DB-Combined datasets. The experimental results demonstrated that the proposed model achieved competitive performance with accuracy rates of 81.71%, 82.11%, and 77.21% on SMIC, CASME II, and SAMM datasets, respectively, that show the MADFN model can help to improve the discrimination of facial image emotional features.
Collapse
Affiliation(s)
- Hongling Yang
- Department of Computer Science, Changzhi University, Changzhi 046011, China;
| | - Lun Xie
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China; (L.X.); (C.L.); (Z.W.)
| | - Hang Pan
- Department of Computer Science, Changzhi University, Changzhi 046011, China;
| | - Chiqin Li
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China; (L.X.); (C.L.); (Z.W.)
| | - Zhiliang Wang
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China; (L.X.); (C.L.); (Z.W.)
| | - Jialiang Zhong
- School of Mathematics and Computer Sciences, Nanchang University, Nanchang 330031, China;
| |
Collapse
|
3
|
Pan H, Yang H, Xie L, Wang Z. Multi-scale fusion visual attention network for facial micro-expression recognition. Front Neurosci 2023; 17:1216181. [PMID: 37575295 PMCID: PMC10412924 DOI: 10.3389/fnins.2023.1216181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Accepted: 06/26/2023] [Indexed: 08/15/2023] Open
Abstract
Introduction Micro-expressions are facial muscle movements that hide genuine emotions. In response to the challenge of micro-expression low-intensity, recent studies have attempted to locate localized areas of facial muscle movement. However, this ignores the feature redundancy caused by the inaccurate locating of the regions of interest. Methods This paper proposes a novel multi-scale fusion visual attention network (MFVAN), which learns multi-scale local attention weights to mask regions of redundancy features. Specifically, this model extracts the multi-scale features of the apex frame in the micro-expression video clips by convolutional neural networks. The attention mechanism focuses on the weights of local region features in the multi-scale feature maps. Then, we mask operate redundancy regions in multi-scale features and fuse local features with high attention weights for micro-expression recognition. The self-supervision and transfer learning reduce the influence of individual identity attributes and increase the robustness of multi-scale feature maps. Finally, the multi-scale classification loss, mask loss, and removing individual identity attributes loss joint to optimize the model. Results The proposed MFVAN method is evaluated on SMIC, CASME II, SAMM, and 3DB-Combined datasets that achieve state-of-the-art performance. The experimental results show that focusing on local at the multi-scale contributes to micro-expression recognition. Discussion This paper proposed MFVAN model is the first to combine image generation with visual attention mechanisms to solve the combination challenge problem of individual identity attribute interference and low-intensity facial muscle movements. Meanwhile, the MFVAN model reveal the impact of individual attributes on the localization of local ROIs. The experimental results show that a multi-scale fusion visual attention network contributes to micro-expression recognition.
Collapse
Affiliation(s)
- Hang Pan
- Department of Computer Science, Changzhi University, Changzhi, China
| | - Hongling Yang
- Department of Computer Science, Changzhi University, Changzhi, China
| | - Lun Xie
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
| | - Zhiliang Wang
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
| |
Collapse
|
4
|
Zhou H, Huang S, Li J, Wang SJ. Dual-ATME: Dual-Branch Attention Network for Micro-Expression Recognition. ENTROPY (BASEL, SWITZERLAND) 2023; 25:460. [PMID: 36981348 PMCID: PMC10048169 DOI: 10.3390/e25030460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 02/26/2023] [Accepted: 02/28/2023] [Indexed: 06/18/2023]
Abstract
Micro-expression recognition (MER) is challenging due to the difficulty of capturing the instantaneous and subtle motion changes of micro-expressions (MEs). Early works based on hand-crafted features extracted from prior knowledge showed some promising results, but have recently been replaced by deep learning methods based on the attention mechanism. However, with limited ME sample sizes, features extracted by these methods lack discriminative ME representations, in yet-to-be improved MER performance. This paper proposes the Dual-branch Attention Network (Dual-ATME) for MER to address the problem of ineffective single-scale features representing MEs. Specifically, Dual-ATME consists of two components: Hand-crafted Attention Region Selection (HARS) and Automated Attention Region Selection (AARS). HARS uses prior knowledge to manually extract features from regions of interest (ROIs). Meanwhile, AARS is based on attention mechanisms and extracts hidden information from data automatically. Finally, through similarity comparison and feature fusion, the dual-scale features could be used to learn ME representations effectively. Experiments on spontaneous ME datasets (including CASME II, SAMM, SMIC) and their composite dataset, MEGC2019-CD, showed that Dual-ATME achieves better, or more competitive, performance than the state-of-the-art MER methods.
Collapse
Affiliation(s)
- Haoliang Zhou
- School of Computer, Jiangsu University of Science and Technology, Zhenjiang 212100, China;
- Key Laboratory of Behavior Sciences, Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China;
| | - Shucheng Huang
- School of Computer, Jiangsu University of Science and Technology, Zhenjiang 212100, China;
| | - Jingting Li
- Key Laboratory of Behavior Sciences, Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China;
- Department of Psychology, University of the Chinese Academy of Sciences, Beijing 100049, China
| | - Su-Jing Wang
- Key Laboratory of Behavior Sciences, Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China;
- Department of Psychology, University of the Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
5
|
Li J, Dong Z, Lu S, Wang SJ, Yan WJ, Ma Y, Liu Y, Huang C, Fu X. CAS(ME) 3: A Third Generation Facial Spontaneous Micro-Expression Database With Depth Information and High Ecological Validity. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:2782-2800. [PMID: 35560102 DOI: 10.1109/tpami.2022.3174895] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Micro-expression (ME) is a significant non-verbal communication clue that reveals one person's genuine emotional state. The development of micro-expression analysis (MEA) has just gained attention in the last decade. However, the small sample size problem constrains the use of deep learning on MEA. Besides, ME samples distribute in six different databases, leading to database bias. Moreover, the ME database development is complicated. In this article, we introduce a large-scale spontaneous ME database: CAS(ME) 3. The contribution of this article is summarized as follows: (1) CAS(ME) 3 offers around 80 hours of videos with over 8,000,000 frames, including manually labeled 1,109 MEs and 3,490 macro-expressions. Such a large sample size allows effective MEA method validation while avoiding database bias. (2) Inspired by psychological experiments, CAS(ME) 3 provides the depth information as an additional modality unprecedentedly, contributing to multi-modal MEA. (3) For the first time, CAS(ME) 3 elicits ME with high ecological validity using the mock crime paradigm, along with physiological and voice signals, contributing to practical MEA. (4) Besides, CAS(ME) 3 provides 1,508 unlabeled videos with more than 4,000,000 frames, i.e., a data platform for unsupervised MEA methods. (5) Finally, we demonstrate the effectiveness of depth information by the proposed depth flow algorithm and RGB-D information.
Collapse
|
6
|
Temporal augmented contrastive learning for micro-expression recognition. Pattern Recognit Lett 2023. [DOI: 10.1016/j.patrec.2023.02.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
7
|
A Survey of Micro-expression Recognition Methods Based on LBP, Optical Flow and Deep Learning. Neural Process Lett 2023. [DOI: 10.1007/s11063-022-11123-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
8
|
Zeng X, Zhao X, Wang S, Qin J, Xie J, Zhong X, Chen J, Liu G. Affection of facial artifacts caused by micro-expressions on electroencephalography signals. Front Neurosci 2022; 16:1048199. [PMID: 36507351 PMCID: PMC9729706 DOI: 10.3389/fnins.2022.1048199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Accepted: 11/03/2022] [Indexed: 11/25/2022] Open
Abstract
Macro-expressions are widely used in emotion recognition based on electroencephalography (EEG) because of their use as an intuitive external expression. Similarly, micro-expressions, as suppressed and brief emotional expressions, can also reflect a person's genuine emotional state. Therefore, researchers have started to focus on emotion recognition studies based on micro-expressions and EEG. However, compared to the effect of artifacts generated by macro-expressions on the EEG signal, it is not clear how artifacts generated by micro-expressions affect EEG signals. In this study, we investigated the effects of facial muscle activity caused by micro-expressions in positive emotions on EEG signals. We recorded the participants' facial expression images and EEG signals while they watched positive emotion-inducing videos. We then divided the 13 facial regions and extracted the main directional mean optical flow features as facial micro-expression image features, and the power spectral densities of theta, alpha, beta, and gamma frequency bands as EEG features. Multiple linear regression and Granger causality test analyses were used to determine the extent of the effect of facial muscle activity artifacts on EEG signals. The results showed that the average percentage of EEG signals affected by muscle artifacts caused by micro-expressions was 11.5%, with the frontal and temporal regions being significantly affected. After removing the artifacts from the EEG signal, the average percentage of the affected EEG signal dropped to 3.7%. To the best of our knowledge, this is the first study to investigate the affection of facial artifacts caused by micro-expressions on EEG signals.
Collapse
Affiliation(s)
- Xiaomei Zeng
- School of Electronics and Information Engineering, Southwest University, Chongqing, China,Institute of Affective Computing and Information Processing, Southwest University, Chongqing, China,Chongqing Key Laboratory of Nonlinear Circuits and Intelligent Information Processing, Southwest University, Chongqing, China
| | - Xingcong Zhao
- School of Electronics and Information Engineering, Southwest University, Chongqing, China,Institute of Affective Computing and Information Processing, Southwest University, Chongqing, China,Chongqing Key Laboratory of Nonlinear Circuits and Intelligent Information Processing, Southwest University, Chongqing, China
| | - Shiyuan Wang
- School of Electronics and Information Engineering, Southwest University, Chongqing, China,Institute of Affective Computing and Information Processing, Southwest University, Chongqing, China,Chongqing Key Laboratory of Nonlinear Circuits and Intelligent Information Processing, Southwest University, Chongqing, China
| | - Jian Qin
- School of Electronics and Information Engineering, Southwest University, Chongqing, China,Institute of Affective Computing and Information Processing, Southwest University, Chongqing, China,Chongqing Key Laboratory of Nonlinear Circuits and Intelligent Information Processing, Southwest University, Chongqing, China
| | - Jialan Xie
- School of Electronics and Information Engineering, Southwest University, Chongqing, China,Institute of Affective Computing and Information Processing, Southwest University, Chongqing, China,Chongqing Key Laboratory of Nonlinear Circuits and Intelligent Information Processing, Southwest University, Chongqing, China
| | - Xinyue Zhong
- School of Electronics and Information Engineering, Southwest University, Chongqing, China,Institute of Affective Computing and Information Processing, Southwest University, Chongqing, China,Chongqing Key Laboratory of Nonlinear Circuits and Intelligent Information Processing, Southwest University, Chongqing, China
| | - Jiejia Chen
- School of Electronics and Information Engineering, Southwest University, Chongqing, China,Institute of Affective Computing and Information Processing, Southwest University, Chongqing, China,Chongqing Key Laboratory of Nonlinear Circuits and Intelligent Information Processing, Southwest University, Chongqing, China
| | - Guangyuan Liu
- School of Electronics and Information Engineering, Southwest University, Chongqing, China,Institute of Affective Computing and Information Processing, Southwest University, Chongqing, China,Chongqing Key Laboratory of Nonlinear Circuits and Intelligent Information Processing, Southwest University, Chongqing, China,*Correspondence: Guangyuan Liu,
| |
Collapse
|
9
|
Xie T, Sun G, Sun H, Lin Q, Ben X. Decoupling facial motion features and identity features for micro-expression recognition. PeerJ Comput Sci 2022; 8:e1140. [PMID: 36426264 PMCID: PMC9680898 DOI: 10.7717/peerj-cs.1140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 10/10/2022] [Indexed: 06/16/2023]
Abstract
BACKGROUND Micro-expression is a kind of expression produced by people spontaneously and unconsciously when receiving stimulus. It has the characteristics of low intensity and short duration. Moreover, it cannot be controlled and disguised. Thus, micro-expression can objectively reflect people's real emotional states. Therefore, automatic recognition of micro-expressions can help machines better understand the users' emotion, which can promote human-computer interaction. What's more, micro-expression recognition has a wide range of applications in fields like security systems and psychological treatment. Nowadays, thanks to the development of artificial intelligence, most micro-expression recognition algorithms are based on deep learning. The features extracted by deep learning model from the micro-expression video sequences mainly contain facial motion feature information and identity feature information. However, in micro-expression recognition tasks, the motions of facial muscles are subtle. As a result, the recognition can be easily interfered by identity feature information. METHODS To solve the above problem, a micro-expression recognition algorithm which decouples facial motion features and identity features is proposed in this paper. A Micro-Expression Motion Information Features Extraction Network (MENet) and an Identity Information Features Extraction Network (IDNet) are designed. By adding a Diverse Attention Operation (DAO) module and constructing divergence loss function in MENet, facial motion features can be effectively extracted. Global attention operations are used in IDNet to extract identity features. A Mutual Information Neural Estimator (MINE) is utilized to decouple facial motion features and identity features, which can help the model obtain more discriminative micro-expression features. RESULTS Experiments on the SDU, MMEW, SAMM and CASME II datasets were conducted, which achieved competitive results and proved the superiority of the proposed algorithm.
Collapse
|
10
|
Micro-expression recognition with supervised contrastive learning. Pattern Recognit Lett 2022. [DOI: 10.1016/j.patrec.2022.09.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
11
|
Fan X, Shahid AR, Yan H. Edge-aware motion based facial micro-expression generation with attention mechanism. Pattern Recognit Lett 2022. [DOI: 10.1016/j.patrec.2022.09.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
12
|
Liu Y, Li Y, Yi X, Hu Z, Zhang H, Liu Y. Lightweight ViT Model for Micro-Expression Recognition Enhanced by Transfer Learning. Front Neurorobot 2022; 16:922761. [PMID: 35845761 PMCID: PMC9280988 DOI: 10.3389/fnbot.2022.922761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 05/16/2022] [Indexed: 11/23/2022] Open
Abstract
As opposed to macro-expressions, micro-expressions are subtle and not easily detectable emotional expressions, often containing rich information about mental activities. The practical recognition of micro-expressions is essential in interrogation and healthcare. Neural networks are currently one of the most common approaches to micro-expression recognition. Still, neural networks often increase their complexity when improving accuracy, and overly large neural networks require extremely high hardware requirements for running equipment. In recent years, vision transformers based on self-attentive mechanisms have achieved accuracy in image recognition and classification that is no less than that of neural networks. Still, the drawback is that without the image-specific biases inherent to neural networks, the cost of improving accuracy is an exponential increase in the number of parameters. This approach describes training a facial expression feature extractor by transfer learning and then fine-tuning and optimizing the MobileViT model to perform the micro-expression recognition task. First, the CASME II, SAMM, and SMIC datasets are combined into a compound dataset, and macro-expression samples are extracted from the three macro-expression datasets. Each macro-expression sample and micro-expression sample are pre-processed identically to make them similar. Second, the macro-expression samples were used to train the MobileNetV2 block in MobileViT as a facial expression feature extractor and to save the weights when the accuracy was highest. Finally, some of the hyperparameters of the MobileViT model are determined by grid search and then fed into the micro-expression samples for training. The samples are classified using an SVM classifier. In the experiments, the proposed method obtained an accuracy of 84.27%, and the time to process individual samples was only 35.4 ms. Comparative experiments show that the proposed method is comparable to state-of-the-art methods in terms of accuracy while improving recognition efficiency.
Collapse
Affiliation(s)
- Yanju Liu
- School of Mathematics and Information Science, Nanjing Normal University of Special Education, Nanjing, China
| | - Yange Li
- School of Computer and Control Engineering, Qiqihar University, Qiqihar, China
| | - Xinhai Yi
- School of Computer and Control Engineering, Qiqihar University, Qiqihar, China
| | - Zuojin Hu
- School of Mathematics and Information Science, Nanjing Normal University of Special Education, Nanjing, China
| | - Huiyu Zhang
- School of Computer and Control Engineering, Qiqihar University, Qiqihar, China
| | - Yanzhong Liu
- School of Computer and Control Engineering, Qiqihar University, Qiqihar, China
- *Correspondence: Yanzhong Liu
| |
Collapse
|
13
|
Saffaryazdi N, Wasim ST, Dileep K, Nia AF, Nanayakkara S, Broadbent E, Billinghurst M. Using Facial Micro-Expressions in Combination With EEG and Physiological Signals for Emotion Recognition. Front Psychol 2022; 13:864047. [PMID: 35837650 PMCID: PMC9275379 DOI: 10.3389/fpsyg.2022.864047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 05/30/2022] [Indexed: 11/13/2022] Open
Abstract
Emotions are multimodal processes that play a crucial role in our everyday lives. Recognizing emotions is becoming more critical in a wide range of application domains such as healthcare, education, human-computer interaction, Virtual Reality, intelligent agents, entertainment, and more. Facial macro-expressions or intense facial expressions are the most common modalities in recognizing emotional states. However, since facial expressions can be voluntarily controlled, they may not accurately represent emotional states. Earlier studies have shown that facial micro-expressions are more reliable than facial macro-expressions for revealing emotions. They are subtle, involuntary movements responding to external stimuli that cannot be controlled. This paper proposes using facial micro-expressions combined with brain and physiological signals to more reliably detect underlying emotions. We describe our models for measuring arousal and valence levels from a combination of facial micro-expressions, Electroencephalography (EEG) signals, galvanic skin responses (GSR), and Photoplethysmography (PPG) signals. We then evaluate our model using the DEAP dataset and our own dataset based on a subject-independent approach. Lastly, we discuss our results, the limitations of our work, and how these limitations could be overcome. We also discuss future directions for using facial micro-expressions and physiological signals in emotion recognition.
Collapse
Affiliation(s)
- Nastaran Saffaryazdi
- Empathic Computing Laboratory, Auckland Bioengineering Institute, The University of Auckland, Auckland, New Zealand
| | - Syed Talal Wasim
- Empathic Computing Laboratory, Auckland Bioengineering Institute, The University of Auckland, Auckland, New Zealand
| | - Kuldeep Dileep
- Empathic Computing Laboratory, Auckland Bioengineering Institute, The University of Auckland, Auckland, New Zealand
| | - Alireza Farrokhi Nia
- Empathic Computing Laboratory, Auckland Bioengineering Institute, The University of Auckland, Auckland, New Zealand
| | - Suranga Nanayakkara
- Augmented Human Laboratory, Auckland Bioengineering Institute, The University of Auckland, Auckland, New Zealand
| | - Elizabeth Broadbent
- Department of Psychological Medicine, The University of Auckland, Auckland, New Zealand
| | - Mark Billinghurst
- Empathic Computing Laboratory, Auckland Bioengineering Institute, The University of Auckland, Auckland, New Zealand
| |
Collapse
|
14
|
Zhao S, Tang H, Liu S, Zhang Y, Wang H, Xu T, Chen E, Guan C. ME-PLAN: A deep prototypical learning with local attention network for dynamic micro-expression recognition. Neural Netw 2022; 153:427-443. [DOI: 10.1016/j.neunet.2022.06.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Revised: 05/09/2022] [Accepted: 06/20/2022] [Indexed: 10/17/2022]
|
15
|
The design of error-correcting output codes based deep forest for the micro-expression recognition. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03590-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
16
|
Facial Micro-Expression Recognition Based on Deep Local-Holistic Network. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12094643] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
A micro-expression is a subtle, local and brief facial movement. It can reveal the genuine emotions that a person tries to conceal and is considered an important clue for lie detection. The micro-expression research has attracted much attention due to its promising applications in various fields. However, due to the short duration and low intensity of micro-expression movements, micro-expression recognition faces great challenges, and the accuracy still demands improvement. To improve the efficiency of micro-expression feature extraction, inspired by the psychological study of attentional resource allocation for micro-expression cognition, we propose a deep local-holistic network method for micro-expression recognition. Our proposed algorithm consists of two sub-networks. The first is a Hierarchical Convolutional Recurrent Neural Network (HCRNN), which extracts the local and abundant spatio-temporal micro-expression features. The second is a Robust principal-component-analysis-based recurrent neural network (RPRNN), which extracts global and sparse features with micro-expression-specific representations. The extracted effective features are employed for micro-expression recognition through the fusion of sub-networks. We evaluate the proposed method on combined databases consisting of the four most commonly used databases, i.e., CASME, CASME II, CAS(ME)2, and SAMM. The experimental results show that our method achieves a reasonably good performance.
Collapse
|
17
|
Facial Kinship Verification: A Comprehensive Review and Outlook. Int J Comput Vis 2022; 130:1494-1525. [PMID: 35465628 PMCID: PMC9016696 DOI: 10.1007/s11263-022-01605-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Accepted: 02/26/2022] [Indexed: 11/05/2022]
Abstract
AbstractThe goal of Facial Kinship Verification (FKV) is to automatically determine whether two individuals have a kin relationship or not from their given facial images or videos. It is an emerging and challenging problem that has attracted increasing attention due to its practical applications. Over the past decade, significant progress has been achieved in this new field. Handcrafted features and deep learning techniques have been widely studied in FKV. The goal of this paper is to conduct a comprehensive review of the problem of FKV. We cover different aspects of the research, including problem definition, challenges, applications, benchmark datasets, a taxonomy of existing methods, and state-of-the-art performance. In retrospect of what has been achieved so far, we identify gaps in current research and discuss potential future research directions.
Collapse
|
18
|
Deep learning-based microexpression recognition: a survey. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07157-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
19
|
Sun MX, Liong ST, Liu KH, Wu QQ. The heterogeneous ensemble of deep forest and deep neural networks for micro-expressions recognition. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03284-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
20
|
Motion magnification multi-feature relation network for facial microexpression recognition. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00680-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
AbstractMicroexpressions cannot be observed easily due to their short duration and small-expression range. These properties pose considerable challenges for the recognition of microexpressions. Thus, video motion magnification techniques help us to see small motions previously invisible to the naked eye. This study aimed to enhance the microexpression features with the help of motion amplification technology. Also, a motion magnification multi-feature relation network (MMFRN) combining video motion amplification and two feature relation modules was proposed. The spatial feature is enlarged while completing the spatial feature extraction, which is used for classification. In addition, we transferred Resnet50 network to extract the global features and improve feature comprehensiveness. The magnification of the features is controlled through hyperparameter amplification factor α. The effects of different magnification factors on the results are compared, and the best is selected. The experiments have verified that the enlarged network can resolve the misclassification problem caused by the one-to-one correspondence between microexpressions and facial action coding units. On CASME II datasets, MMFRN outperforms the traditional recognition methods and other neural networks.
Collapse
|
21
|
Zhao Y, Chen Z, Luo S. Micro-Expression Recognition Based on Pixel Residual Sum and Cropped Gaussian Pyramid. Front Neurorobot 2022; 15:746985. [PMID: 34987370 PMCID: PMC8721811 DOI: 10.3389/fnbot.2021.746985] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2021] [Accepted: 11/22/2021] [Indexed: 01/01/2023] Open
Abstract
Facial micro-expression(ME) recognition has great significance for the progress of human society and could find a person's true feelings. Meanwhile, ME recognition faces a huge challenge, since it is difficult to detect and easy to be disturbed by the environment. In this article, we propose two novel preprocessing methods based on Pixel Residual Sum. These methods can preprocess video clips according to the unit pixel displacement of images, resist environmental interference, and be easy to extract subtle facial features. Furthermore, we propose a Cropped Gaussian Pyramid with Overlapping(CGPO) module, which divides images of different resolutions through Gaussian pyramids and crops different resolutions images into multiple overlapping subplots. Then, we use a convolutional neural networks of progressively increasing channels based on the depthwise convolution to extract preliminary features. Finally, we fuse preliminary features and make position embedding to get the last features. Our experiments show that the proposed methods and model have better performance than the well-known methods.
Collapse
Affiliation(s)
- Yuan Zhao
- School of Computer Science and Engineering, Chongqing University of Technology, Chongqing, China
| | - Zhuang Chen
- School of Computer Science and Engineering, Chongqing University of Technology, Chongqing, China
| | - Song Luo
- School of Computer Science and Engineering, Chongqing University of Technology, Chongqing, China
| |
Collapse
|
22
|
Liu J. Convolutional Neural Network-Based Human Movement Recognition Algorithm in Sports Analysis. Front Psychol 2021; 12:663359. [PMID: 34248758 PMCID: PMC8267374 DOI: 10.3389/fpsyg.2021.663359] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 04/20/2021] [Indexed: 11/13/2022] Open
Abstract
In order to analyse the sports psychology of athletes and to identify the psychology of athletes in their movements, a human action recognition (HAR) algorithm has been designed in this study. First, a HAR model is established based on the convolutional neural network (CNN) to classify the current action state by analysing the action information of a task in the collected videos. Secondly, the psychology of basketball players displaying fake actions during the offensive and defensive process is investigated by combining with related sports psychological theories. Then, the psychology of athletes is also analysed through the collected videos, so as to predict the next response action of the athletes. Experimental results show that the combination of grayscale and red-green-blue (RGB) images can reduce the image loss and effectively improve the recognition accuracy of the model. The optimised convolutional three-dimensional network (C3D) HAR model designed in this study has a recognition accuracy of 80% with an image loss of 5.6. Besides, the time complexity is reduced by 33%. Therefore, the proposed optimised C3D can recognise effectively human actions, and the results of this study can provide a reference for the investigation of the image recognition of human action in sports.
Collapse
Affiliation(s)
- Jiatian Liu
- College of Strength and Conditioning, Beijing Sport University, Beijing, China
| |
Collapse
|