1
|
Tang H, Chai L. Facial micro-expression recognition using stochastic graph convolutional network and dual transferred learning. Neural Netw 2024; 178:106421. [PMID: 38850638 DOI: 10.1016/j.neunet.2024.106421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 05/30/2024] [Accepted: 05/30/2024] [Indexed: 06/10/2024]
Abstract
Micro-expression recognition (MER) has drawn increasing attention due to its wide application in lie detection, criminal detection and psychological consultation. However, the best recognition accuracy on recent public dataset is still low compared to the accuracy of macro-expression recognition. In this paper, we propose a novel graph convolution network (GCN) for MER achieving state-of-the-art accuracy. Different to existing GCN with fixed graph structure, we define a stochastic graph structure in which some neighbors are selected randomly. As shown by numerical examples, randomness enables better feature characterization while reducing computational complexity. The whole network consists of two branches, one is the spatial branch taking micro-expression images as input, the other is the temporal branch taking optical flow images as input. Because the micro-expression dataset does not have enough images for training the GCN, we employ the transfer learning mechanism. That is, different stochastic GCNs (SGCN) have been trained by the macro-expression dataset in the source network. Then the well-trained SGCNs are transferred to the target network. It is shown that our proposed method achieves the state-of-art performance on all four well-known datasets. This paper explores stochastic GCN and transfer learning with this random structure in the MER task, which is of great importance to improve the recognition performance.
Collapse
Affiliation(s)
- Hui Tang
- School of Information Science and Engineering, Wuhan University of Science and Technology, Wuhan, 430081, Hubei, China
| | - Li Chai
- College of Control Science and Engineering, Zhejiang University, Hangzhou, 310027, Zhejiang, China.
| |
Collapse
|
2
|
Ahmad A, Li Z, Iqbal S, Aurangzeb M, Tariq I, Flah A, Blazek V, Prokop L. A comprehensive bibliometric survey of micro-expression recognition system based on deep learning. Heliyon 2024; 10:e27392. [PMID: 38495163 PMCID: PMC10943397 DOI: 10.1016/j.heliyon.2024.e27392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Revised: 02/21/2024] [Accepted: 02/28/2024] [Indexed: 03/19/2024] Open
Abstract
Micro-expressions (ME) are rapidly occurring expressions that reveal the true emotions that a human being is trying to hide, cover, or suppress. These expressions, which reveal a person's actual feelings, have a broad spectrum of applications in public safety and clinical diagnosis. This study provides a comprehensive review of the area of ME recognition. A bibliometric and network analysis techniques is used to compile all the available literature related to ME recognition. A total of 735 publications from the Web of Science (WOS) and Scopus databases were evaluated from December 2012 to December 2022 using all relevant keywords. The first round of data screening produced some basic information, which was further extracted for citation, coupling, co-authorship, co-occurrence, bibliographic, and co-citation analysis. Additionally, a thematic and descriptive analysis was executed to investigate the content of prior research findings, and research techniques used in the literature. The year wise publications indicated that the published literature between 2012 and 2017 was relatively low but however by 2021, a nearly 24-fold increment made it to 154 publications. The three topmost productive journals and conferences included IEEE Transactions on Affective Computing (n = 20 publications) followed by Neurocomputing (n = 17) and Multimedia tools and applications (n = 15). Zhao G was the most proficient author with 48 publications and the top influential country was China (620 publications). Publications by citations showed that each of the authors acquired citations ranging from 100 to 1225. While publications by organizations indicated that the University of Oulu had the most published papers (n = 51). Deep learning, facial expression recognition, and emotion recognition were among the most frequently used terms. It has been discovered that ME research was primarily classified in the discipline of engineering, with more contribution from China and Malaysia comparatively.
Collapse
Affiliation(s)
- Adnan Ahmad
- Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education, School of Information Science and Engineering, Southeast University, Nanjing, 210096, China
| | - Zhao Li
- Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education, School of Information Science and Engineering, Southeast University, Nanjing, 210096, China
| | - Sheeraz Iqbal
- Department of Electrical Engineering, University of Azad Jammu and Kashmir, Muzaffarabad, 13100, AJK, Pakistan
| | - Muhammad Aurangzeb
- School of Electrical Engineering, Southeast University, Nanjing, 210096, China
| | - Irfan Tariq
- Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education, School of Information Science and Engineering, Southeast University, Nanjing, 210096, China
| | - Ayman Flah
- College of Engineering, University of Business and Technology (UBT), Jeddah, 21448, Saudi Arabia
- MEU Research Unit, Middle East University, Amman, Jordan
- The Private Higher School of Applied Sciences and Technology of Gabes, University of Gabes, Gabes, Tunisia
- National Engineering School of Gabes, University of Gabes, Gabes, 6029, Tunisia
| | - Vojtech Blazek
- ENET Centre, VSB—Technical University of Ostrava, Ostrava, Czech Republic
| | - Lukas Prokop
- ENET Centre, VSB—Technical University of Ostrava, Ostrava, Czech Republic
| |
Collapse
|
3
|
Li X, Yi X, Lu L, Wang H, Zheng Y, Han M, Wang Q. TSFFM: Depression detection based on latent association of facial and body expressions. Comput Biol Med 2024; 168:107805. [PMID: 38064845 DOI: 10.1016/j.compbiomed.2023.107805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 11/23/2023] [Accepted: 11/29/2023] [Indexed: 01/10/2024]
Abstract
Depression is a prevalent mental disorder worldwide. Early screening and treatment are crucial in preventing the progression of the illness. Existing emotion-based depression recognition methods primarily rely on facial expressions, while body expressions as a means of emotional expression have been overlooked. To aid in the identification of depression, we recruited 156 participants for an emotional stimulation experiment, gathering data on facial and body expressions. Our analysis revealed notable distinctions in facial and body expressions between the case group and the control group and a synergistic relationship between these variables. Hence, we propose a two-stream feature fusion model (TSFFM) that integrates facial and body features. The central component of TSFFM is the Fusion and Extraction (FE) module. In contrast to conventional methods such as feature concatenation and decision fusion, our approach, FE, places a greater emphasis on in-depth analysis during the feature extraction and fusion processes. Firstly, within FE, we carry out local enhancement of facial and body features, employing an embedded attention mechanism, eliminating the need for original image segmentation and the use of multiple feature extractors. Secondly, FE conducts the extraction of temporal features to better capture the dynamic aspects of expression patterns. Finally, we retain and fuse informative data from different temporal and spatial features to support the ultimate decision. TSFFM achieves an Accuracy and F1-score of 0.896 and 0.896 on the depression emotional stimulus dataset, respectively. On the AVEC2014 dataset, TSFFM achieves MAE and RMSE values of 5.749 and 7.909, respectively. Furthermore, TSFFM has undergone testing on additional public datasets to showcase the effectiveness of the FE module.
Collapse
Affiliation(s)
- Xingyun Li
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, China; Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China; Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
| | - Xinyu Yi
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, China; Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China; Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
| | - Lin Lu
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, China; Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China; Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
| | - Hao Wang
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, China; Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China; Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
| | - Yunshao Zheng
- Shandong Mental Health Center, Shandong University, Jinan, China
| | - Mengmeng Han
- Advanced Technology Research Institute, Beijing Institute of Technology, Jinan, China
| | - Qingxiang Wang
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, China; Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China; Shandong Mental Health Center, Shandong University, Jinan, China; Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China.
| |
Collapse
|
4
|
Tang X. Application of Intelligent Lie Recognition Technology in Laws and Regulations Based on Occupational Mental Health Protection. Psychol Res Behav Manag 2023; 16:2943-2959. [PMID: 37554305 PMCID: PMC10404594 DOI: 10.2147/prbm.s409723] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 07/06/2023] [Indexed: 08/10/2023] Open
Abstract
INTRODUCTION Since the reform and opening up, the social economy has developed rapidly. The competition in the employer market is fierce, which leads leaders to have strict requirements for workers, and workplace stress increases. The blind pursuit of corporate economic benefits has led to the neglect of workers' mental health. Employee retaliation against the corporate occurs frequently. The perfection of the legal system for occupational mental health protection is imminent. METHODS Based on the above questions, this study first introduces the research background, significance, and purpose in the introduction. Second, in the literature review, the current status of research is sorted out, the problems in the existing research are summarized, and the innovation points of this study are highlighted. Then, in the method section, the algorithms and models used here are introduced, including convolutional neural networks, long short-term memory networks, and the design of interview processes. Finally, the results of the questionnaire survey and the experimental test are analyzed. RESULTS (1) There is further room for optimization of intelligent lie recognition technology. (2) The employee assistance program system can effectively solve the mental health problems of employees. (3) There is a need to expand the legislative mechanism for workers' mental health protection at the legal level. DISCUSSION This study mainly explores the loopholes of occupational mental health protection under the formulation of laws and regulations. Intelligent lie recognition technology reduces workers' adverse physical and mental health risks due to work. It is dedicated to protecting workers' legitimate rights and interests from the formulation of laws and regulations.
Collapse
Affiliation(s)
- Xin Tang
- School of Law, Chongqing University, Chongqing, 400044, People’s Republic of China
| |
Collapse
|
5
|
Zheng Y, Blasch E. Facial Micro-Expression Recognition Enhanced by Score Fusion and a Hybrid Model from Convolutional LSTM and Vision Transformer. SENSORS (BASEL, SWITZERLAND) 2023; 23:5650. [PMID: 37420815 PMCID: PMC10303532 DOI: 10.3390/s23125650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 06/02/2023] [Accepted: 06/13/2023] [Indexed: 07/09/2023]
Abstract
In the billions of faces that are shaped by thousands of different cultures and ethnicities, one thing remains universal: the way emotions are expressed. To take the next step in human-machine interactions, a machine (e.g., a humanoid robot) must be able to clarify facial emotions. Allowing systems to recognize micro-expressions affords the machine a deeper dive into a person's true feelings, which will take human emotion into account while making optimal decisions. For instance, these machines will be able to detect dangerous situations, alert caregivers to challenges, and provide appropriate responses. Micro-expressions are involuntary and transient facial expressions capable of revealing genuine emotions. We propose a new hybrid neural network (NN) model capable of micro-expression recognition in real-time applications. Several NN models are first compared in this study. Then, a hybrid NN model is created by combining a convolutional neural network (CNN), a recurrent neural network (RNN, e.g., long short-term memory (LSTM)), and a vision transformer. The CNN can extract spatial features (within a neighborhood of an image), whereas the LSTM can summarize temporal features. In addition, a transformer with an attention mechanism can capture sparse spatial relations residing in an image or between frames in a video clip. The inputs of the model are short facial videos, while the outputs are the micro-expressions recognized from the videos. The NN models are trained and tested with publicly available facial micro-expression datasets to recognize different micro-expressions (e.g., happiness, fear, anger, surprise, disgust, sadness). Score fusion and improvement metrics are also presented in our experiments. The results of our proposed models are compared with that of literature-reported methods tested on the same datasets. The proposed hybrid model performs the best, where score fusion can dramatically increase recognition performance.
Collapse
Affiliation(s)
- Yufeng Zheng
- Department of Data Science, University of Mississippi Medical Center, Jackson, MS 39216, USA
| | | |
Collapse
|
6
|
Li Z, Zhang Y, Xing H, Chan KL. Facial Micro-Expression Recognition Using Double-Stream 3D Convolutional Neural Network with Domain Adaptation. SENSORS (BASEL, SWITZERLAND) 2023; 23:3577. [PMID: 37050637 PMCID: PMC10098639 DOI: 10.3390/s23073577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 03/17/2023] [Accepted: 03/27/2023] [Indexed: 06/19/2023]
Abstract
Humans show micro-expressions (MEs) under some circumstances. MEs are a display of emotions that a human wants to conceal. The recognition of MEs has been applied in various fields. However, automatic ME recognition remains a challenging problem due to two major obstacles. As MEs are typically of short duration and low intensity, it is hard to extract discriminative features from ME videos. Moreover, it is tedious to collect ME data. Existing ME datasets usually contain insufficient video samples. In this paper, we propose a deep learning model, double-stream 3D convolutional neural network (DS-3DCNN), for recognizing MEs captured in video. The recognition framework contains two streams of 3D-CNN. The first extracts spatiotemporal features from the raw ME videos. The second extracts variations of the facial motions within the spatiotemporal domain. To facilitate feature extraction, the subtle motion embedded in a ME is amplified. To address the insufficient ME data, a macro-expression dataset is employed to expand the training sample size. Supervised domain adaptation is adopted in model training in order to bridge the difference between ME and macro-expression datasets. The DS-3DCNN model is evaluated on two publicly available ME datasets. The results show that the model outperforms various state-of-the-art models; in particular, the model outperformed the best model presented in MEGC2019 by more than 6%.
Collapse
Affiliation(s)
- Zhengdao Li
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China; (Z.L.); (H.X.)
| | - Yupei Zhang
- Centre for Intelligent Multidimensional Data Analysis Limited, Hong Kong, China;
| | - Hanwen Xing
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China; (Z.L.); (H.X.)
| | - Kwok-Leung Chan
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China; (Z.L.); (H.X.)
| |
Collapse
|
7
|
Li J, Dong Z, Lu S, Wang SJ, Yan WJ, Ma Y, Liu Y, Huang C, Fu X. CAS(ME) 3: A Third Generation Facial Spontaneous Micro-Expression Database With Depth Information and High Ecological Validity. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:2782-2800. [PMID: 35560102 DOI: 10.1109/tpami.2022.3174895] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Micro-expression (ME) is a significant non-verbal communication clue that reveals one person's genuine emotional state. The development of micro-expression analysis (MEA) has just gained attention in the last decade. However, the small sample size problem constrains the use of deep learning on MEA. Besides, ME samples distribute in six different databases, leading to database bias. Moreover, the ME database development is complicated. In this article, we introduce a large-scale spontaneous ME database: CAS(ME) 3. The contribution of this article is summarized as follows: (1) CAS(ME) 3 offers around 80 hours of videos with over 8,000,000 frames, including manually labeled 1,109 MEs and 3,490 macro-expressions. Such a large sample size allows effective MEA method validation while avoiding database bias. (2) Inspired by psychological experiments, CAS(ME) 3 provides the depth information as an additional modality unprecedentedly, contributing to multi-modal MEA. (3) For the first time, CAS(ME) 3 elicits ME with high ecological validity using the mock crime paradigm, along with physiological and voice signals, contributing to practical MEA. (4) Besides, CAS(ME) 3 provides 1,508 unlabeled videos with more than 4,000,000 frames, i.e., a data platform for unsupervised MEA methods. (5) Finally, we demonstrate the effectiveness of depth information by the proposed depth flow algorithm and RGB-D information.
Collapse
|
8
|
A Survey of Micro-expression Recognition Methods Based on LBP, Optical Flow and Deep Learning. Neural Process Lett 2023. [DOI: 10.1007/s11063-022-11123-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
9
|
Kapadia S, Praladhka U, Unadkat U, Parekh V, Karani R. Video-Based Micro Expressions Recognition Using Deep Learning and Transfer Learning. SPRINGER PROCEEDINGS IN MATHEMATICS & STATISTICS 2023:221-231. [DOI: 10.1007/978-3-031-16178-0_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
10
|
Alaskar H, Sbaï Z, Khan W, Hussain A, Alrawais A. Intelligent techniques for deception detection: a survey and critical study. Soft comput 2022. [DOI: 10.1007/s00500-022-07603-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
11
|
Verma M, Reddy MSK, Meedimale YR, Mandal M, Vipparthi SK. AutoMER: Spatiotemporal Neural Architecture Search for Microexpression Recognition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:6116-6128. [PMID: 33886480 DOI: 10.1109/tnnls.2021.3072290] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Facial microexpressions offer useful insights into subtle human emotions. This unpremeditated emotional leakage exhibits the true emotions of a person. However, the minute temporal changes in the video sequences are very difficult to model for accurate classification. In this article, we propose a novel spatiotemporal architecture search algorithm, AutoMER for microexpression recognition (MER). Our main contribution is a new parallelogram design-based search space for efficient architecture search. We introduce a spatiotemporal feature module named 3-D singleton convolution for cell-level analysis. Furthermore, we present four such candidate operators and two 3-D dilated convolution operators to encode the raw video sequences in an end-to-end manner. To the best of our knowledge, this is the first attempt to discover 3-D convolutional neural network (CNN) architectures with a network-level search for MER. The searched models using the proposed AutoMER algorithm are evaluated over five microexpression data sets: CASME-I, SMIC, CASME-II, CAS(ME) ∧2 , and SAMM. The proposed generated models quantitatively outperform the existing state-of-the-art approaches. The AutoMER is further validated with different configurations, such as downsampling rate factor, multiscale singleton 3-D convolution, parallelogram, and multiscale kernels. Overall, five ablation experiments were conducted to analyze the operational insights of the proposed AutoMER.
Collapse
|
12
|
Ben X, Ren Y, Zhang J, Wang SJ, Kpalma K, Meng W, Liu YJ. Video-Based Facial Micro-Expression Analysis: A Survey of Datasets, Features and Algorithms. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:5826-5846. [PMID: 33739920 DOI: 10.1109/tpami.2021.3067464] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Unlike the conventional facial expressions, micro-expressions are involuntary and transient facial expressions capable of revealing the genuine emotions that people attempt to hide. Therefore, they can provide important information in a broad range of applications such as lie detection, criminal detection, etc. Since micro-expressions are transient and of low intensity, however, their detection and recognition is difficult and relies heavily on expert experiences. Due to its intrinsic particularity and complexity, video-based micro-expression analysis is attractive but challenging, and has recently become an active area of research. Although there have been numerous developments in this area, thus far there has been no comprehensive survey that provides researchers with a systematic overview of these developments with a unified evaluation. Accordingly, in this survey paper, we first highlight the key differences between macro- and micro-expressions, then use these differences to guide our research survey of video-based micro-expression analysis in a cascaded structure, encompassing the neuropsychological basis, datasets, features, spotting algorithms, recognition algorithms, applications and evaluation of state-of-the-art approaches. For each aspect, the basic techniques, advanced developments and major challenges are addressed and discussed. Furthermore, after considering the limitations of existing micro-expression datasets, we present and release a new dataset - called micro-and-macro expression warehouse (MMEW) - containing more video samples and more labeled emotion types. We then perform a unified comparison of representative methods on CAS(ME) 2 for spotting, and on MMEW and SAMM for recognition, respectively. Finally, some potential future research directions are explored and outlined.
Collapse
|
13
|
Liu Y, Li Y, Yi X, Hu Z, Zhang H, Liu Y. Lightweight ViT Model for Micro-Expression Recognition Enhanced by Transfer Learning. Front Neurorobot 2022; 16:922761. [PMID: 35845761 PMCID: PMC9280988 DOI: 10.3389/fnbot.2022.922761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 05/16/2022] [Indexed: 11/23/2022] Open
Abstract
As opposed to macro-expressions, micro-expressions are subtle and not easily detectable emotional expressions, often containing rich information about mental activities. The practical recognition of micro-expressions is essential in interrogation and healthcare. Neural networks are currently one of the most common approaches to micro-expression recognition. Still, neural networks often increase their complexity when improving accuracy, and overly large neural networks require extremely high hardware requirements for running equipment. In recent years, vision transformers based on self-attentive mechanisms have achieved accuracy in image recognition and classification that is no less than that of neural networks. Still, the drawback is that without the image-specific biases inherent to neural networks, the cost of improving accuracy is an exponential increase in the number of parameters. This approach describes training a facial expression feature extractor by transfer learning and then fine-tuning and optimizing the MobileViT model to perform the micro-expression recognition task. First, the CASME II, SAMM, and SMIC datasets are combined into a compound dataset, and macro-expression samples are extracted from the three macro-expression datasets. Each macro-expression sample and micro-expression sample are pre-processed identically to make them similar. Second, the macro-expression samples were used to train the MobileNetV2 block in MobileViT as a facial expression feature extractor and to save the weights when the accuracy was highest. Finally, some of the hyperparameters of the MobileViT model are determined by grid search and then fed into the micro-expression samples for training. The samples are classified using an SVM classifier. In the experiments, the proposed method obtained an accuracy of 84.27%, and the time to process individual samples was only 35.4 ms. Comparative experiments show that the proposed method is comparable to state-of-the-art methods in terms of accuracy while improving recognition efficiency.
Collapse
Affiliation(s)
- Yanju Liu
- School of Mathematics and Information Science, Nanjing Normal University of Special Education, Nanjing, China
| | - Yange Li
- School of Computer and Control Engineering, Qiqihar University, Qiqihar, China
| | - Xinhai Yi
- School of Computer and Control Engineering, Qiqihar University, Qiqihar, China
| | - Zuojin Hu
- School of Mathematics and Information Science, Nanjing Normal University of Special Education, Nanjing, China
| | - Huiyu Zhang
- School of Computer and Control Engineering, Qiqihar University, Qiqihar, China
| | - Yanzhong Liu
- School of Computer and Control Engineering, Qiqihar University, Qiqihar, China
- *Correspondence: Yanzhong Liu
| |
Collapse
|
14
|
Zhao S, Tang H, Liu S, Zhang Y, Wang H, Xu T, Chen E, Guan C. ME-PLAN: A deep prototypical learning with local attention network for dynamic micro-expression recognition. Neural Netw 2022; 153:427-443. [DOI: 10.1016/j.neunet.2022.06.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Revised: 05/09/2022] [Accepted: 06/20/2022] [Indexed: 10/17/2022]
|
15
|
The design of error-correcting output codes based deep forest for the micro-expression recognition. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03590-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
16
|
Wang Y, Han J, Guo Z. LCBP-STGCN: A local cube binary pattern spatial temporal graph convolutional network for micro-expression recognition. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-213079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Automated micro-expression recognition has become a research highlight in the emotion recognition field. Recent works proposed an LCBP (Local Cube Binary Pattern) method for micro-expression recognition and made full use of spatiotemporal features to represent micro-expressions. Nevertheless, LCBP misses the features while ignoring the underlying discriminative information. In this paper, we present an LCBP-STGCN (Local Cube Binary Pattern Spatial-Temporal Graph Convolutional Network) to resolve the problems of LCBP. A new STGCN with the ability to handle non-Euclidean structure data is proposed to extract high-level features of the micro-expression. STGCN is composed of Spatial Graph Convolutional Network (SGCN) to obtain spatial information and Temporal Convolutional Network (TCN) to capture temporal information of micro-expression. To validly establish the spatiotemporal graph structure of SGCN, we apply ROI (Region of Interest) as node position, LCBP features as node information. By the alternating convolution of SGCN and TCN, high-level spatiotemporal features can be obtained. The extensive experiments on four spontaneous micro-expression datasets of SMIC, CASME I, CASME II, and SAMM demonstrate the proposed LCBP-STGCN can effectively recognize micro-expressions and achieve better performance than some state-of-the-arts.
Collapse
Affiliation(s)
- Yan Wang
- College of Information Engineering, Tianjin University of Commerce, Tianjin, China
| | - Jianfeng Han
- College of Information Engineering, Tianjin University of Commerce, Tianjin, China
| | - Ziqi Guo
- Tsinghua Unigroup, Tianjin, China
| |
Collapse
|
17
|
Facial Micro-Expression Recognition Based on Deep Local-Holistic Network. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12094643] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
A micro-expression is a subtle, local and brief facial movement. It can reveal the genuine emotions that a person tries to conceal and is considered an important clue for lie detection. The micro-expression research has attracted much attention due to its promising applications in various fields. However, due to the short duration and low intensity of micro-expression movements, micro-expression recognition faces great challenges, and the accuracy still demands improvement. To improve the efficiency of micro-expression feature extraction, inspired by the psychological study of attentional resource allocation for micro-expression cognition, we propose a deep local-holistic network method for micro-expression recognition. Our proposed algorithm consists of two sub-networks. The first is a Hierarchical Convolutional Recurrent Neural Network (HCRNN), which extracts the local and abundant spatio-temporal micro-expression features. The second is a Robust principal-component-analysis-based recurrent neural network (RPRNN), which extracts global and sparse features with micro-expression-specific representations. The extracted effective features are employed for micro-expression recognition through the fusion of sub-networks. We evaluate the proposed method on combined databases consisting of the four most commonly used databases, i.e., CASME, CASME II, CAS(ME)2, and SAMM. The experimental results show that our method achieves a reasonably good performance.
Collapse
|
18
|
Deep learning-based microexpression recognition: a survey. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07157-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
19
|
Learning two groups of discriminative features for micro-expression recognition. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.12.088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
20
|
Motion magnification multi-feature relation network for facial microexpression recognition. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00680-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
AbstractMicroexpressions cannot be observed easily due to their short duration and small-expression range. These properties pose considerable challenges for the recognition of microexpressions. Thus, video motion magnification techniques help us to see small motions previously invisible to the naked eye. This study aimed to enhance the microexpression features with the help of motion amplification technology. Also, a motion magnification multi-feature relation network (MMFRN) combining video motion amplification and two feature relation modules was proposed. The spatial feature is enlarged while completing the spatial feature extraction, which is used for classification. In addition, we transferred Resnet50 network to extract the global features and improve feature comprehensiveness. The magnification of the features is controlled through hyperparameter amplification factor α. The effects of different magnification factors on the results are compared, and the best is selected. The experiments have verified that the enlarged network can resolve the misclassification problem caused by the one-to-one correspondence between microexpressions and facial action coding units. On CASME II datasets, MMFRN outperforms the traditional recognition methods and other neural networks.
Collapse
|
21
|
Zhang J, Deng X, Li C, Su G, Yu Y. Cloud-edge collaboration based transferring prediction of building energy consumption. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-211607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Building energy consumption (BEC) prediction often requires constructing a corresponding model for each building based historical data. However, the constructed model for one building is difficult to be reused in other buildings. Recent approaches have shown that cloud-edge collaboration architecture is promising in realizing model reuse. How to complete the reuse of cloud energy consumption prediction models at the edge and reduce the computational cost of the model training is one of the key issues that need to be solved. To handle the above problems, a cloud-edge collaboration based transferring prediction method for BEC is proposed in this paper. Specifically, a model library stored prediction models for different types of buildings is constructed based the historical energy consumption data and the long short-term memory (LSTM) network in the cloud firstly; then, the similarity measurement strategies of time series with different granularity are given, and the model to be transferred from the model library is matched by analyzing the similarity between observation data uploaded to the cloud and the historical data collected in the cloud; finally, the fine-tuning strategy of the matching prediction model is given, and this model is fine-tuned at the edge to achieve its reuse in concrete application scenarios. Experiments on practical datasets reveal that compared with the prediction model which doesn’t utilize the transfer strategy, the proposed prediction model has better performance according to MAE and RMSE. Experimental results also confirm that the proposed method effectively reduces the computational cost of the network training at the edge.
Collapse
Affiliation(s)
- Jinping Zhang
- Shandong Key Laboratory of Intelligent BuildingsTechnology, School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan, China
| | - Xiaoping Deng
- Shandong Key Laboratory of Intelligent BuildingsTechnology, School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan, China
| | - Chengdong Li
- Shandong Key Laboratory of Intelligent BuildingsTechnology, School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan, China
| | - Guanqun Su
- Shandong Internet of Things Association, Jinan, China
| | - Yulong Yu
- Shandong Key Laboratory of Intelligent BuildingsTechnology, School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan, China
| |
Collapse
|
22
|
Cen S, Yu Y, Yan G, Yu M, Kong Y. Micro-expression recognition based on facial action learning with muscle movement constraints. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-202962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
As a spontaneous facial expression, micro-expression reveals the psychological responses of human beings. However, micro-expression recognition (MER) is highly susceptible to noise interference due to the short existing time and low-intensity of facial actions. Research on facial action coding systems explores the correlation between emotional states and facial actions, which provides more discriminative features. Therefore, based on the exploration of correlation information, the goal of our work is to propose a spatiotemporal network that is robust to low-intensity muscle movements for the MER task. Firstly, a multi-scale weighted module is proposed to encode the spatial global context, which is obtained by merging features of different resolutions preserved from the backbone network. Secondly, we propose a multi-task-based facial action learning module using the constraints of the correlation between muscle movement and micro-expressions to encode local action features. Besides, a clustering constraint term is introduced to restrict the feature distribution of similar actions to improve categories’ separability in feature space. Finally, the global context and local action features are stacked as high-quality spatial descriptions to predict micro-expressions by passing through the Convolutional Long Short-Term Memory (ConvLSTM) network. The proposed method is proved to outperform other mainstream methods through comparative experiments on the SMIC, CASME-I, and CASME-II datasets.
Collapse
Affiliation(s)
- Shixin Cen
- School of Electronic and Information Engineering, Hebei University of Technology, Tianjin, P.R. China
| | - Yang Yu
- School of Artificial Intelligence, Hebei University of Technology, Tianjin, P.R. China
| | - Gang Yan
- School of Artificial Intelligence, Hebei University of Technology, Tianjin, P.R. China
| | - Ming Yu
- School of Electronic and Information Engineering, Hebei University of Technology, Tianjin, P.R. China
- School of Artificial Intelligence, Hebei University of Technology, Tianjin, P.R. China
| | - Yanlei Kong
- School of Artificial Intelligence, Hebei University of Technology, Tianjin, P.R. China
| |
Collapse
|
23
|
A comparative study on movement feature in different directions for micro-expression recognition. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.03.063] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
24
|
|
25
|
Wang SJ, He Y, Li J, Fu X. MESNet: A Convolutional Neural Network for Spotting Multi-Scale Micro-Expression Intervals in Long Videos. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:3956-3969. [PMID: 33788686 DOI: 10.1109/tip.2021.3064258] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Micro-expression spotting is a fundamental step in the micro-expression analysis. This paper proposes a novel network based convolutional neural network (CNN) for spotting multi-scale spontaneous micro-expression intervals in long videos. We named the network as Micro-Expression Spotting Network (MESNet). It is composed of three modules. The first module is a 2+1D Spatiotemporal Convolutional Network, which uses 2D convolution to extract spatial features and 1D convolution to extract temporal features. The second module is a Clip Proposal Network, which gives some proposed micro-expression clips. The last module is a Classification Regression Network, which classifies the proposed clips to micro-expression or not, and further regresses their temporal boundaries. We also propose a novel evaluation metric for spotting micro-expression. Extensive experiments have been conducted on the two long video datasets: CAS(ME)2 and SAMM, and the leave-one-subject-out cross-validation is used to evaluate the spotting performance. Results show that the proposed MESNet effectively enhances the F1-score metric. And comparative results show the proposed MESNet has achieved a good performance, which outperforms other state-of-the-art methods, especially in the SAMM dataset.
Collapse
|
26
|
Gao J, Chen H, Zhang X, Guo J, Liang W. A New Feature Extraction and Recognition Method for Microexpression Based on Local Non-negative Matrix Factorization. Front Neurorobot 2020; 14:579338. [PMID: 33312122 PMCID: PMC7702905 DOI: 10.3389/fnbot.2020.579338] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Accepted: 09/29/2020] [Indexed: 11/13/2022] Open
Abstract
Microexpression is usually characterized by short duration and small action range, and the existing general expression recognition algorithms do not work well for microexpression. As a feature extraction method, non-negative matrix factorization can decompose the original data into different components, which has been successfully applied to facial recognition. In this paper, local non-negative matrix factorization is explored to decompose microexpression into some facial muscle actions, and extract features for recognition based on apex frame. However, the existing microexpression datasets fall short of samples to train a classifier with good generalization. The macro-to-micro algorithm based on singular value decomposition can augment the number of microexpressions, but it cannot meet non-negative properties of feature vectors. To address these problems, we propose an improved macro-to-micro algorithm to augment microexpression samples by manipulating the macroexpression data based on local non-negative matrix factorization. Finally, several experiments are conducted to verify the effectiveness of the proposed scheme, which results show that it has a higher recognition accuracy for microexpression compared with the related algorithms based on CK+/CASME2/SAMM datasets.
Collapse
Affiliation(s)
- Junli Gao
- School of Automation, Guangdong University of Technology, Guangzhou, China
| | - Huajun Chen
- School of Automation, Guangdong University of Technology, Guangzhou, China
| | - Xiaohua Zhang
- College of Automation, Zhongkai University of Agriculture and Engineering, Guangzhou, China
| | - Jing Guo
- School of Automation, Guangdong University of Technology, Guangzhou, China
| | - Wenyu Liang
- Institute for Infocomm Research (I2R), Agency for Science, Technology and Research (ASTAR), Singapore, Singapore
| |
Collapse
|
27
|
Xia Z, Peng W, Khor HQ, Feng X, Zhao G. Revealing the Invisible with Model and Data Shrinking for Composite-database Micro-expression Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8590-8605. [PMID: 32845838 DOI: 10.1109/tip.2020.3018222] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Composite-database micro-expression recognition is attracting increasing attention as it is more practical for real-world applications. Though the composite database provides more sample diversity for learning good representation models, the important subtle dynamics are prone to disappearing in the domain shift such that the models greatly degrade their performance, especially for deep models. In this paper, we analyze the influence of learning complexity, including input complexity and model complexity, and discover that the lower-resolution input data and shallower-architecture model are helpful to ease the degradation of deep models in composite-database task. Based on this, we propose a recurrent convolutional network (RCN) to explore the shallower-architecture and lower-resolution input data, shrinking model and input complexities simultaneously. Furthermore, we develop three parameter-free modules (i.e., wide expansion, shortcut connection and attention unit) to integrate with RCN without increasing any learnable parameters. These three modules can enhance the representation ability in various perspectives while preserving not-very-deep architecture for lower-resolution data. Besides, three modules can further be combined by an automatic strategy (a neural architecture search strategy) and the searched architecture becomes more robust. Extensive experiments on the MEGC2019 dataset (composited of existing SMIC, CASME II and SAMM datasets) have verified the influence of learning complexity and shown that RCNs with three modules and the searched combination outperform the state-of-the-art approaches.
Collapse
|
28
|
Cen S, Yu Y, Yan G, Yu M, Yang Q. Sparse Spatiotemporal Descriptor for Micro-Expression Recognition Using Enhanced Local Cube Binary Pattern. SENSORS 2020; 20:s20164437. [PMID: 32784460 PMCID: PMC7471998 DOI: 10.3390/s20164437] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Revised: 08/04/2020] [Accepted: 08/05/2020] [Indexed: 11/19/2022]
Abstract
As a spontaneous facial expression, a micro-expression can reveal the psychological responses of human beings. Thus, micro-expression recognition can be widely studied and applied for its potentiality in clinical diagnosis, psychological research, and security. However, micro-expression recognition is a formidable challenge due to the short-lived time frame and low-intensity of the facial actions. In this paper, a sparse spatiotemporal descriptor for micro-expression recognition is developed by using the Enhanced Local Cube Binary Pattern (Enhanced LCBP). The proposed Enhanced LCBP is composed of three complementary binary features containing Spatial Difference Local Cube Binary Patterns (Spatial Difference LCBP), Temporal Direction Local Cube Binary Patterns (Temporal Direction LCBP), and Temporal Gradient Local Cube Binary Patterns (Temporal Gradient LCBP). With the application of Enhanced LCBP, it would no longer be a problem to provide binary features with spatiotemporal domain complementarity to capture subtle facial changes. In addition, due to the redundant information existing among the division grids, which affects the ability of descriptors to distinguish micro-expressions, the Multi-Regional Joint Sparse Learning is designed to perform feature selection for the division grids, thus paying more attention to the critical local regions. Finally, the Multi-kernel Support Vector Machine (SVM) is employed to fuse the selected features for the final classification. The proposed method exhibits great advantage and achieves promising results on four spontaneous micro-expression datasets. Through further observation of parameter evaluation and confusion matrix, the sufficiency and effectiveness of the proposed method are proved.
Collapse
Affiliation(s)
- Shixin Cen
- School of Electronic and Information Engineering, Hebei University of Technology, Tianjin 300401, China; (S.C.); (Q.Y.)
| | - Yang Yu
- School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China; (Y.Y.); (G.Y.)
| | - Gang Yan
- School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China; (Y.Y.); (G.Y.)
| | - Ming Yu
- School of Electronic and Information Engineering, Hebei University of Technology, Tianjin 300401, China; (S.C.); (Q.Y.)
- School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China; (Y.Y.); (G.Y.)
- Correspondence: ; Tel.: +86-137-0217-3627
| | - Qing Yang
- School of Electronic and Information Engineering, Hebei University of Technology, Tianjin 300401, China; (S.C.); (Q.Y.)
- Department of Electronic and Optical Engineering, Army Engineering University Shijiazhuang Campus, Shijiazhuang 050000, China
| |
Collapse
|
29
|
Spatiotemporal Convolutional Neural Network with Convolutional Block Attention Module for Micro-Expression Recognition. INFORMATION 2020. [DOI: 10.3390/info11080380] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
A micro-expression is defined as an uncontrollable muscular movement shown on the face of humans when one is trying to conceal or repress his true emotions. Many researchers have applied the deep learning framework to micro-expression recognition in recent years. However, few have introduced the human visual attention mechanism to micro-expression recognition. In this study, we propose a three-dimensional (3D) spatiotemporal convolutional neural network with the convolutional block attention module (CBAM) for micro-expression recognition. First image sequences were input to a medium-sized convolutional neural network (CNN) to extract visual features. Afterwards, it learned to allocate the feature weights in an adaptive manner with the help of a convolutional block attention module. The method was testified in spontaneous micro-expression databases (Chinese Academy of Sciences Micro-expression II (CASME II), Spontaneous Micro-expression Database (SMIC)). The experimental results show that the 3D CNN with convolutional block attention module outperformed other algorithms in micro-expression recognition.
Collapse
|
30
|
Abstract
Micro-Expression (ME) recognition is a hot topic in computer vision as it presents a gateway to capture and understand daily human emotions. It is nonetheless a challenging problem due to ME typically being transient (lasting less than 200 ms) and subtle. Recent advances in machine learning enable new and effective methods to be adopted for solving diverse computer vision tasks. In particular, the use of deep learning techniques on large datasets outperforms classical approaches based on classical machine learning which rely on hand-crafted features. Even though available datasets for spontaneous ME are scarce and much smaller, using off-the-shelf Convolutional Neural Networks (CNNs) still demonstrates satisfactory classification results. However, these networks are intense in terms of memory consumption and computational resources. This poses great challenges when deploying CNN-based solutions in many applications, such as driver monitoring and comprehension recognition in virtual classrooms, which demand fast and accurate recognition. As these networks were initially designed for tasks of different domains, they are over-parameterized and need to be optimized for ME recognition. In this paper, we propose a new network based on the well-known ResNet18 which we optimized for ME classification in two ways. Firstly, we reduced the depth of the network by removing residual layers. Secondly, we introduced a more compact representation of optical flow used as input to the network. We present extensive experiments and demonstrate that the proposed network obtains accuracies comparable to the state-of-the-art methods while significantly reducing the necessary memory space. Our best classification accuracy was 60.17% on the challenging composite dataset containing five objectives classes. Our method takes only 24.6 ms for classifying a ME video clip (less than the occurrence time of the shortest ME which lasts 40 ms). Our CNN design is suitable for real-time embedded applications with limited memory and computing resources.
Collapse
|
31
|
|