1
|
Bellamkonda S, Gopalan NP, Mala C, Settipalli L. Facial expression recognition on partially occluded faces using component based ensemble stacked CNN. Cogn Neurodyn 2023; 17:985-1008. [PMID: 37522034 PMCID: PMC10374495 DOI: 10.1007/s11571-022-09879-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 07/22/2022] [Accepted: 08/13/2022] [Indexed: 11/28/2022] Open
Abstract
Facial Expression Recognition (FER) is the basis for many applications including human-computer interaction and surveillance. While developing such applications, it is imperative to understand human emotions for better interaction with machines. Among many FER models developed so far, Ensemble Stacked Convolution Neural Networks (ES-CNN) showed an empirical impact in improving the performance of FER on static images. However, the existing ES-CNN based FER models trained with features extracted from the entire face, are unable to address the issues of ambient parameters such as pose, illumination, occlusions. To mitigate the problem of reduced performance of ES-CNN on partially occluded faces, a Component based ES-CNN (CES-CNN) is proposed. CES-CNN applies ES-CNN on action units of individual face components such as eyes, eyebrows, nose, cheek, mouth, and glabella as one subnet of the network. Max-Voting based ensemble classifier is used to ensemble the decisions of the subnets in order to obtain the optimized recognition accuracy. The proposed CES-CNN is validated by conducting experiments on benchmark datasets and the performance is compared with the state-of-the-art models. It is observed from the experimental results that the proposed model has a significant enhancement in the recognition accuracy compared to the existing models.
Collapse
Affiliation(s)
- Sivaiah Bellamkonda
- Department of Computer Applications, National Institute of Technology, Tiruchirappalli, Tamilnadu 620015 India
| | - N. P. Gopalan
- Department of Computer Applications, National Institute of Technology, Tiruchirappalli, Tamilnadu 620015 India
| | - C. Mala
- Department of Computer Science and Engineering, National Institute of Technology, Tiruchirappalli, Tamilnadu 620015 India
| | - Lavanya Settipalli
- Department of Computer Applications, National Institute of Technology, Tiruchirappalli, Tamilnadu 620015 India
| |
Collapse
|
2
|
Cao M, Xie K, Liu F, Li B, Wen C, He J, Zhang W. Recognition of Occluded Goods under Prior Inference Based on Generative Adversarial Network. SENSORS (BASEL, SWITZERLAND) 2023; 23:3355. [PMID: 36992064 PMCID: PMC10058100 DOI: 10.3390/s23063355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 02/11/2023] [Accepted: 03/21/2023] [Indexed: 06/19/2023]
Abstract
Aiming at the recognition of intelligent retail dynamic visual container goods, two problems that lead to low recognition accuracy must be addressed; one is the lack of goods features caused by the occlusion of the hand, and the other is the high similarity of goods. Therefore, this study proposes an approach for occluding goods recognition based on a generative adversarial network combined with prior inference to address the two abovementioned problems. With DarkNet53 as the backbone network, semantic segmentation is used to locate the occluded part in the feature extraction network, and simultaneously, the YOLOX decoupling head is used to obtain the detection frame. Subsequently, a generative adversarial network under prior inference is used to restore and expand the features of the occluded parts, and a multi-scale spatial attention and effective channel attention weighted attention mechanism module is proposed to select fine-grained features of goods. Finally, a metric learning method based on von Mises-Fisher distribution is proposed to increase the class spacing of features to achieve the effect of feature distinction, whilst the distinguished features are utilized to recognize goods at a fine-grained level. The experimental data used in this study were all obtained from the self-made smart retail container dataset, which contains a total of 12 types of goods used for recognition and includes four couples of similar goods. Experimental results reveal that the peak signal-to-noise ratio and structural similarity under improved prior inference are 0.7743 and 0.0183 higher than those of the other models, respectively. Compared with other optimal models, mAP improves the recognition accuracy by 1.2% and the recognition accuracy by 2.82%. This study solves two problems: one is the occlusion caused by hands, and the other is the high similarity of goods, thus meeting the requirements of commodity recognition accuracy in the field of intelligent retail and exhibiting good application prospects.
Collapse
Affiliation(s)
- Mingxuan Cao
- School of Electronic and Information, Yangtze University, Jingzhou 434023, China
- National Electrical and Electronic Experimental Teaching Demonstration Center, Yangtze University, Jingzhou 434023, China
| | - Kai Xie
- School of Electronic and Information, Yangtze University, Jingzhou 434023, China
- National Electrical and Electronic Experimental Teaching Demonstration Center, Yangtze University, Jingzhou 434023, China
- Western Research Institute, Yangtze University, Karamay 834000, China
| | - Feng Liu
- School of Electronic and Information, Yangtze University, Jingzhou 434023, China
- National Electrical and Electronic Experimental Teaching Demonstration Center, Yangtze University, Jingzhou 434023, China
| | - Bohao Li
- School of Electronic and Information, Yangtze University, Jingzhou 434023, China
- National Electrical and Electronic Experimental Teaching Demonstration Center, Yangtze University, Jingzhou 434023, China
| | - Chang Wen
- Western Research Institute, Yangtze University, Karamay 834000, China
- School of Computer Science, Yangtze University, Jingzhou 434023, China
| | - Jianbiao He
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Wei Zhang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
3
|
Zhou Y, Jin L, Ma G, Xu X. Quaternion Capsule Neural Network With Region Attention for Facial Expression Recognition in Color Images. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2022. [DOI: 10.1109/tetci.2021.3120513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Yu Zhou
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Lianghai Jin
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Guangzhi Ma
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Xiangyang Xu
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, China
| |
Collapse
|
4
|
Hybrid Approach for Facial Expression Recognition Using Convolutional Neural Networks and SVM. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12115493] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Facial expression recognition is very useful for effective human–computer interaction, robot interfaces, and emotion-aware smart agent systems. This paper presents a new framework for facial expression recognition by using a hybrid model: a combination of convolutional neural networks (CNNs) and a support vector machine (SVM) classifier using dynamic facial expression data. In order to extract facial motion characteristics, dense facial motion flows and geometry landmark flows of facial expression sequences were used as inputs to the CNN and SVM classifier, respectively. CNN architectures for facial expression recognition from dense facial motion flows were proposed. The optimal weighting combination of the hybrid classifiers provides better facial expression recognition results than individual classifiers. The system has successfully classified seven facial expressions signalling anger, contempt, disgust, fear, happiness, sadness and surprise classes for the CK+ database, and facial expressions of anger, disgust, fear, happiness, sadness and surprise for the BU4D database. The recognition performance of the proposed system is 99.69% for the CK+ database and 94.69% for the BU4D database. The proposed method shows state-of-the-art results for the CK+ database and is proven to be effective for the BU4D database when compared with the previous schemes.
Collapse
|
5
|
Deploying Machine Learning Techniques for Human Emotion Detection. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:8032673. [PMID: 35154306 PMCID: PMC8828335 DOI: 10.1155/2022/8032673] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Revised: 10/12/2021] [Accepted: 10/22/2021] [Indexed: 11/17/2022]
Abstract
Emotion recognition is one of the trending research fields. It is involved in several applications. Its most interesting applications include robotic vision and interactive robotic communication. Human emotions can be detected using both speech and visual modalities. Facial expressions can be considered as ideal means for detecting the persons' emotions. This paper presents a real-time approach for implementing emotion detection and deploying it in the robotic vision applications. The proposed approach consists of four phases: preprocessing, key point generation, key point selection and angular encoding, and classification. The main idea is to generate key points using MediaPipe face mesh algorithm, which is based on real-time deep learning. In addition, the generated key points are encoded using a sequence of carefully designed mesh generator and angular encoding modules. Furthermore, feature decomposition is performed using Principal Component Analysis (PCA). This phase is deployed to enhance the accuracy of emotion detection. Finally, the decomposed features are enrolled into a Machine Learning (ML) technique that depends on a Support Vector Machine (SVM), k-Nearest Neighbor (KNN), Naïve Bayes (NB), Logistic Regression (LR), or Random Forest (RF) classifier. Moreover, we deploy a Multilayer Perceptron (MLP) as an efficient deep neural network technique. The presented techniques are evaluated on different datasets with different evaluation metrics. The simulation results reveal that they achieve a superior performance with a human emotion detection accuracy of 97%, which ensures superiority among the efforts in this field.
Collapse
|
6
|
|
7
|
Poux D, Allaert B, Ihaddadene N, Bilasco IM, Djeraba C, Bennamoun M. Dynamic Facial Expression Recognition Under Partial Occlusion With Optical Flow Reconstruction. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 31:446-457. [PMID: 34855591 DOI: 10.1109/tip.2021.3129120] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Video facial expression recognition is useful for many applications and received much interest lately. Although some methods give good results in controlled environments (no occlusion), recognition in the presence of partial facial occlusion remains a challenging task. To handle facial occlusions, methods based on the reconstruction of the occluded part of the face have been proposed. These methods are mainly based on the texture or the geometry of the face. However, the similarity of the face movement between different persons doing the same expression seems to be a real asset for the reconstruction. In this paper we exploit this asset and propose a new method based on an auto-encoder with skip connections to reconstruct the occluded part of the face in the optical flow domain. To the best of our knowledge, this is the first work that directly reconstructs the movement for facial expression recognition. We validated our approach in the controlled CK+ datasets on which different occlusions were generated. Our experiments show that the proposed method reduces the gap in the recognition accuracy between occluded and unoccluded situations. We also compare our approach with existing state-of-the-art approaches. In order to lay the basis of a reproducible and fair comparison in the future, we also propose a new experimental protocol that includes occlusion generation and reconstruction evaluation.
Collapse
|
8
|
|
9
|
Liu Y, Zhang X, Lin Y, Wang H. Facial Expression Recognition via Deep Action Units Graph Network Based on Psychological Mechanism. IEEE Trans Cogn Dev Syst 2020. [DOI: 10.1109/tcds.2019.2917711] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
10
|
Grossard C, Dapogny A, Cohen D, Bernheim S, Juillet E, Hamel F, Hun S, Bourgeois J, Pellerin H, Serret S, Bailly K, Chaby L. Children with autism spectrum disorder produce more ambiguous and less socially meaningful facial expressions: an experimental study using random forest classifiers. Mol Autism 2020; 11:5. [PMID: 31956394 PMCID: PMC6958757 DOI: 10.1186/s13229-020-0312-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Accepted: 01/01/2020] [Indexed: 01/19/2023] Open
Abstract
Background Computer vision combined with human annotation could offer a novel method for exploring facial expression (FE) dynamics in children with autism spectrum disorder (ASD). Methods We recruited 157 children with typical development (TD) and 36 children with ASD in Paris and Nice to perform two experimental tasks to produce FEs with emotional valence. FEs were explored by judging ratings and by random forest (RF) classifiers. To do so, we located a set of 49 facial landmarks in the task videos, we generated a set of geometric and appearance features and we used RF classifiers to explore how children with ASD differed from TD children when producing FEs. Results Using multivariate models including other factors known to predict FEs (age, gender, intellectual quotient, emotion subtype, cultural background), ratings from expert raters showed that children with ASD had more difficulty producing FEs than TD children. In addition, when we explored how RF classifiers performed, we found that classification tasks, except for those for sadness, were highly accurate and that RF classifiers needed more facial landmarks to achieve the best classification for children with ASD. Confusion matrices showed that when RF classifiers were tested in children with ASD, anger was often confounded with happiness. Limitations The sample size of the group of children with ASD was lower than that of the group of TD children. By using several control calculations, we tried to compensate for this limitation. Conclusion Children with ASD have more difficulty producing socially meaningful FEs. The computer vision methods we used to explore FE dynamics also highlight that the production of FEs in children with ASD carries more ambiguity.
Collapse
Affiliation(s)
- Charline Grossard
- 1Service de Psychiatrie de l'Enfant et de l'Adolescent, GH Pitié-Salpêtrière Charles Foix, APHP.6, Paris, France.,2Institut des Systèmes Intelligents et de Robotique, Sorbonne Université, ISIR CNRS UMR 7222, Paris, France
| | - Arnaud Dapogny
- 2Institut des Systèmes Intelligents et de Robotique, Sorbonne Université, ISIR CNRS UMR 7222, Paris, France
| | - David Cohen
- 1Service de Psychiatrie de l'Enfant et de l'Adolescent, GH Pitié-Salpêtrière Charles Foix, APHP.6, Paris, France.,2Institut des Systèmes Intelligents et de Robotique, Sorbonne Université, ISIR CNRS UMR 7222, Paris, France
| | - Sacha Bernheim
- 2Institut des Systèmes Intelligents et de Robotique, Sorbonne Université, ISIR CNRS UMR 7222, Paris, France
| | - Estelle Juillet
- 1Service de Psychiatrie de l'Enfant et de l'Adolescent, GH Pitié-Salpêtrière Charles Foix, APHP.6, Paris, France
| | - Fanny Hamel
- 1Service de Psychiatrie de l'Enfant et de l'Adolescent, GH Pitié-Salpêtrière Charles Foix, APHP.6, Paris, France
| | | | | | - Hugues Pellerin
- 1Service de Psychiatrie de l'Enfant et de l'Adolescent, GH Pitié-Salpêtrière Charles Foix, APHP.6, Paris, France
| | | | - Kevin Bailly
- 2Institut des Systèmes Intelligents et de Robotique, Sorbonne Université, ISIR CNRS UMR 7222, Paris, France
| | - Laurence Chaby
- 1Service de Psychiatrie de l'Enfant et de l'Adolescent, GH Pitié-Salpêtrière Charles Foix, APHP.6, Paris, France.,2Institut des Systèmes Intelligents et de Robotique, Sorbonne Université, ISIR CNRS UMR 7222, Paris, France.,4Institut de Psychologie, Université de Paris, 92100 Boulogne-Billancourt, France
| |
Collapse
|
11
|
Facial Expression Recognition of Nonlinear Facial Variations Using Deep Locality De-Expression Residue Learning in the Wild. ELECTRONICS 2019. [DOI: 10.3390/electronics8121487] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Automatic facial expression recognition is an emerging field. Moreover, the interest has been increased with the transition from laboratory-controlled conditions to in the wild scenarios. Most of the research has been done over nonoccluded faces under the constrained environment, while automatic facial expression is less understood/implemented for partial occlusion in the real world conditions. Apart from that, our research aims to tackle the issues of overfitting (caused by the shortage of adequate training data) and to alleviate the expression-unrelated/intraclass/nonlinear facial variations, such as head pose estimation, eye gaze estimation, intensity and microexpressions. In our research, we control the magnitude of each Action Unit (AU) and combine several of the Action Unit combinations to leverage learning from the generative and discriminative representations for automatic FER. We have also addressed the problem of diversification of expressions from lab controlled to real-world scenarios from our cross-database study and proposed a model for enhancement of the discriminative power of deep features while increasing the interclass scatters, by preserving the locality closeness. Furthermore, facial expression consists of an expressive component as well as neutral component, so we proposed a generative model which is capable of generating neutral expression from an input image using cGAN. The expressive component is filtered and passed to the intermediate layers and the process is called De-expression Residue Learning. The residue in the intermediate/middle layers is very important for learning through expressive components. Finally, we validate the effectiveness of our method (DLP-DeRL) through qualitative and quantitative experimental results using four databases. Our method is more accurate and robust, and outperforms all the existing methods (hand crafted features and deep learning) while dealing the images in the wild.
Collapse
|
12
|
Facial Expression Recognition Using Computer Vision: A Systematic Review. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9214678] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Emotion recognition has attracted major attention in numerous fields because of its relevant applications in the contemporary world: marketing, psychology, surveillance, and entertainment are some examples. It is possible to recognize an emotion through several ways; however, this paper focuses on facial expressions, presenting a systematic review on the matter. In addition, 112 papers published in ACM, IEEE, BASE and Springer between January 2006 and April 2019 regarding this topic were extensively reviewed. Their most used methods and algorithms will be firstly introduced and summarized for a better understanding, such as face detection, smoothing, Principal Component Analysis (PCA), Local Binary Patterns (LBP), Optical Flow (OF), Gabor filters, among others. This review identified a clear difficulty in translating the high facial expression recognition (FER) accuracy in controlled environments to uncontrolled and pose-variant environments. The future efforts in the FER field should be put into multimodal systems that are robust enough to face the adversities of real world scenarios. A thorough analysis on the research done on FER in Computer Vision based on the selected papers is presented. This review aims to not only become a reference for future research on emotion recognition, but also to provide an overview of the work done in this topic for potential readers.
Collapse
|
13
|
Li Y, Zeng J, Shan S, Chen X. Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 28:2439-2450. [PMID: 30571627 DOI: 10.1109/tip.2018.2886767] [Citation(s) in RCA: 86] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Facial expression recognition in the wild is challenging due to various un-constrained conditions. Although existing facial expression classifiers have been almost perfect on analyzing constrained frontal faces, they fail to perform well on partially occluded faces that are common in the wild. In this paper, we propose a Convolution Neutral Network with attention mechanism (ACNN) that can perceive the occlusion regions of the face and focus on the most discriminative unoccluded regions. ACNN is an end to end learning framework. It combines the multiple representations from facial regions of interest (ROIs). Each representation is weighed via a proposed Gate Unit that computes an adaptive weight from the region itself according to the unobstructed-ness and importance. Considering different RoIs, we introduce two versions of ACNN: patch based ACNN (pACNN) and global-local based ACNN (gACNN). pACNN only pays attention to local facial patches. gACNN integrates local representations at patch-level with global representation at image-level. The proposed ACNNs are evaluated on both real and synthetic occlusions, including a self-collected facial expression dataset with real-world occlusions (FED-RO), two largest in-the-wild facial expression datasets (RAF-DB and AffectNet) and their modifications with synthesized facial occlusions. Experimental results show that ACNNs improve the recognition accuracy on both the non-occluded faces and occluded faces. Visualization results demonstrate that, compared with the CNN without Gate Unit, ACNNs are capable of shifting the attention from the occluded patches to other related but unobstructed ones. ACNNs also outperform other state-of-the-art methods on several widely used in-the-lab facial expression datasets under the cross-dataset evaluation protocol.
Collapse
|