1
|
Li S, Wang J, Tian L, Wang J, Huang Y. A fine-grained human facial key feature extraction and fusion method for emotion recognition. Sci Rep 2025; 15:6153. [PMID: 39979500 PMCID: PMC11842553 DOI: 10.1038/s41598-025-90440-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Accepted: 02/13/2025] [Indexed: 02/22/2025] Open
Abstract
Emotion, a fundamental mapping of human responses to external stimuli, has been extensively studied in human-computer interaction, particularly in areas such as intelligent cockpits and systems. However, accurately recognizing emotions from facial expressions remains a significant challenge due to lighting conditions, posture, and micro-expressions. Emotion recognition using global or local facial features is a key research direction. However, relying solely on global or local features often results in models that exhibit uneven attention across facial features, neglecting key variations critical for detecting emotional changes. This paper proposes a method for modeling and extracting key facial features by integrating global and local facial data. First, we construct a comprehensive image preprocessing model that includes super-resolution processing, lighting and shading processing, and texture enhancement. This preprocessing step significantly enriches the expression of image features. Second, A global facial feature recognition model is developed using an encoder-decoder architecture, which effectively eliminates environmental noise and generates a comprehensive global feature dataset for facial analysis. Simultaneously, the Haar cascade classifier is employed to extract refined features from key facial regions, including the eyes, mouth, and overall face, resulting in a corresponding local feature dataset. Finally, a two-branch convolutional neural network is designed to integrate both global and local facial feature datasets, enhancing the model's ability to recognize facial characteristics accurately. The global feature branch fully characterizes the global features of the face, while the local feature branch focuses on the local features. An adaptive fusion module integrates the global and local features, enhancing the model's ability to differentiate subtle emotional changes. To evaluate the accuracy and robustness of the model, we train and test it on the FER-2013 and JAFFE emotion datasets, achieving average accuracies of 80.59% and 97.61%, respectively. Compared to existing state-of-the-art models, our refined face feature extraction and fusion model demonstrates superior performance in emotion recognition. Additionally, the comparative analysis shows that emotional features across different faces show similarities. Building on psychological research, we categorize the dataset into three emotion classes: positive, neutral, and negative. The accuracy of emotion recognition is significantly improved under the new classification criteria. Additionally, the self-built dataset is used to validate further that this classification approach has important implications for practical applications.
Collapse
Affiliation(s)
- Shiwei Li
- School of Traffic and Transportation, Lanzhou Jiaotong University, Lanzhou, 730070, China.
- Key Laboratory of Railway Industry on Plateau Railway Transportation Intelligent Management and Control, Lanzhou, 730070, China.
| | - Jisen Wang
- School of Traffic and Transportation, Lanzhou Jiaotong University, Lanzhou, 730070, China
| | - Linbo Tian
- School of Traffic and Transportation, Lanzhou Jiaotong University, Lanzhou, 730070, China
| | - Jianqiang Wang
- School of Traffic and Transportation, Lanzhou Jiaotong University, Lanzhou, 730070, China
- Key Laboratory of Railway Industry on Plateau Railway Transportation Intelligent Management and Control, Lanzhou, 730070, China
| | - Yan Huang
- School of Traffic and Transportation, Lanzhou Jiaotong University, Lanzhou, 730070, China
- Key Laboratory of Railway Industry on Plateau Railway Transportation Intelligent Management and Control, Lanzhou, 730070, China
| |
Collapse
|
2
|
Elsheikh RA, Mohamed MA, Abou-Taleb AM, Ata MM. Improved facial emotion recognition model based on a novel deep convolutional structure. Sci Rep 2024; 14:29050. [PMID: 39580589 PMCID: PMC11585570 DOI: 10.1038/s41598-024-79167-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Accepted: 11/06/2024] [Indexed: 11/25/2024] Open
Abstract
Facial Emotion Recognition (FER) is a very challenging task due to the varying nature of facial expressions, occlusions, illumination, pose variations, cultural and gender differences, and many other aspects that cause a drastic degradation in quality of facial images. In this paper, an anti-aliased deep convolution network (AA-DCN) model has been developed and proposed to explore how anti-aliasing can increase and improve recognition fidelity of facial emotions. The AA-DCN model detects eight distinct emotions from image data. Furthermore, their features have been extracted using the proposed model and numerous classical deep learning algorithms. The proposed AA-DCN model has been applied to three different datasets to evaluate its performance: The Cohn-Kanade Extending (CK+) database has been utilized, achieving an ultimate accuracy of 99.26% in (5 min, 25 s), the Japanese female facial expressions (JAFFE) obtained 98% accuracy in (8 min, 13 s), and on one of the most challenging FER datasets; the Real-world Affective Face (RAF) dataset; reached 82%, in low training time (12 min, 2s). The experimental results demonstrate that the anti-aliased DCN model is significantly increasing emotion recognition while improving the aliasing artifacts caused by the down-sampling layers.
Collapse
Affiliation(s)
- Reham A Elsheikh
- Department of Electronics and Communications Engineering, Faculty of Engineering, Mansoura University, Mansoura, Egypt.
| | - M A Mohamed
- Department of Electronics and Communications Engineering, Faculty of Engineering, Mansoura University, Mansoura, Egypt
| | - Ahmed Mohamed Abou-Taleb
- Department of Electronics and Communications Engineering, Faculty of Engineering, Mansoura University, Mansoura, Egypt
| | - Mohamed Maher Ata
- School of Computational Sciences and Artificial Intelligence (CSAI), Zewail City of Science and Technology, October Gardens, 6th of October City, 12578, Giza, Egypt.
| |
Collapse
|
3
|
Alsubai S, Alqahtani A, Alanazi A, Sha M, Gumaei A. Facial emotion recognition using deep quantum and advanced transfer learning mechanism. Front Comput Neurosci 2024; 18:1435956. [PMID: 39539995 PMCID: PMC11557492 DOI: 10.3389/fncom.2024.1435956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Accepted: 10/04/2024] [Indexed: 11/16/2024] Open
Abstract
Introduction Facial expressions have become a common way for interaction among humans. People cannot comprehend and predict the emotions or expressions of individuals through simple vision. Thus, in psychology, detecting facial expressions or emotion analysis demands an assessment and evaluation of decisions for identifying the emotions of a person or any group during communication. With the recent evolution of technology, AI (Artificial Intelligence) has gained significant usage, wherein DL (Deep Learning) based algorithms are employed for detecting facial expressions. Methods The study proposes a system design that detects facial expressions by extracting relevant features using a Modified ResNet model. The proposed system stacks building-blocks with residual connections and employs an advanced extraction method with quantum computing, which significantly reduces computation time compared to conventional methods. The backbone stem utilizes a quantum convolutional layer comprised of several parameterized quantum-filters. Additionally, the research integrates residual connections in the ResNet-18 model with the Modified up Sampled Bottle Neck Process (MuS-BNP), retaining computational efficacy while benefiting from residual connections. Results The proposed model demonstrates superior performance by overcoming the issue of maximum similarity within varied facial expressions. The system's ability to accurately detect and differentiate between expressions is measured using performance metrics such as accuracy, F1-score, recall, and precision. Discussion This performance analysis confirms the efficacy of the proposed system, highlighting the advantages of quantum computing in feature extraction and the integration of residual connections. The model achieves quantum superiority, providing faster and more accurate computations compared to existing methodologies. The results suggest that the proposed approach offers a promising solution for facial expression recognition tasks, significantly improving both speed and accuracy.
Collapse
Affiliation(s)
- Shtwai Alsubai
- Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia
| | - Abdullah Alqahtani
- Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia
| | - Abed Alanazi
- Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia
| | - Mohemmed Sha
- Department of Software Engineering, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia
| | - Abdu Gumaei
- Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia
| |
Collapse
|
4
|
Pereira R, Mendes C, Ribeiro J, Ribeiro R, Miragaia R, Rodrigues N, Costa N, Pereira A. Systematic Review of Emotion Detection with Computer Vision and Deep Learning. SENSORS (BASEL, SWITZERLAND) 2024; 24:3484. [PMID: 38894274 PMCID: PMC11175284 DOI: 10.3390/s24113484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Revised: 05/20/2024] [Accepted: 05/24/2024] [Indexed: 06/21/2024]
Abstract
Emotion recognition has become increasingly important in the field of Deep Learning (DL) and computer vision due to its broad applicability by using human-computer interaction (HCI) in areas such as psychology, healthcare, and entertainment. In this paper, we conduct a systematic review of facial and pose emotion recognition using DL and computer vision, analyzing and evaluating 77 papers from different sources under Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Our review covers several topics, including the scope and purpose of the studies, the methods employed, and the used datasets. The scope of this work is to conduct a systematic review of facial and pose emotion recognition using DL methods and computer vision. The studies were categorized based on a proposed taxonomy that describes the type of expressions used for emotion detection, the testing environment, the currently relevant DL methods, and the datasets used. The taxonomy of methods in our review includes Convolutional Neural Network (CNN), Faster Region-based Convolutional Neural Network (R-CNN), Vision Transformer (ViT), and "Other NNs", which are the most commonly used models in the analyzed studies, indicating their trendiness in the field. Hybrid and augmented models are not explicitly categorized within this taxonomy, but they are still important to the field. This review offers an understanding of state-of-the-art computer vision algorithms and datasets for emotion recognition through facial expressions and body poses, allowing researchers to understand its fundamental components and trends.
Collapse
Affiliation(s)
- Rafael Pereira
- Computer Science and Communications Research Centre, School of Technology and Management, Polytechnic of Leiria, 2411-901 Leiria, Portugal; (R.P.); (C.M.); (J.R.); (R.R.); (R.M.); (N.R.); (N.C.)
| | - Carla Mendes
- Computer Science and Communications Research Centre, School of Technology and Management, Polytechnic of Leiria, 2411-901 Leiria, Portugal; (R.P.); (C.M.); (J.R.); (R.R.); (R.M.); (N.R.); (N.C.)
| | - José Ribeiro
- Computer Science and Communications Research Centre, School of Technology and Management, Polytechnic of Leiria, 2411-901 Leiria, Portugal; (R.P.); (C.M.); (J.R.); (R.R.); (R.M.); (N.R.); (N.C.)
| | - Roberto Ribeiro
- Computer Science and Communications Research Centre, School of Technology and Management, Polytechnic of Leiria, 2411-901 Leiria, Portugal; (R.P.); (C.M.); (J.R.); (R.R.); (R.M.); (N.R.); (N.C.)
| | - Rolando Miragaia
- Computer Science and Communications Research Centre, School of Technology and Management, Polytechnic of Leiria, 2411-901 Leiria, Portugal; (R.P.); (C.M.); (J.R.); (R.R.); (R.M.); (N.R.); (N.C.)
| | - Nuno Rodrigues
- Computer Science and Communications Research Centre, School of Technology and Management, Polytechnic of Leiria, 2411-901 Leiria, Portugal; (R.P.); (C.M.); (J.R.); (R.R.); (R.M.); (N.R.); (N.C.)
| | - Nuno Costa
- Computer Science and Communications Research Centre, School of Technology and Management, Polytechnic of Leiria, 2411-901 Leiria, Portugal; (R.P.); (C.M.); (J.R.); (R.R.); (R.M.); (N.R.); (N.C.)
| | - António Pereira
- Computer Science and Communications Research Centre, School of Technology and Management, Polytechnic of Leiria, 2411-901 Leiria, Portugal; (R.P.); (C.M.); (J.R.); (R.R.); (R.M.); (N.R.); (N.C.)
- INOV INESC Inovação, Institute of New Technologies, Leiria Office, 2411-901 Leiria, Portugal
| |
Collapse
|
5
|
Ding J, Hou C, Zhao Y, Liu H, Hu Z, Meng F, Liang S. Virtual draw of microstructured optical fiber based on physics-informed neural networks. OPTICS EXPRESS 2024; 32:9316-9331. [PMID: 38571169 DOI: 10.1364/oe.518238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 02/15/2024] [Indexed: 04/05/2024]
Abstract
The implementation of microstructured optical fibers (MOFs) with novel micro-structures and perfect performance is challenging due to the complex fabrication processes. Physics-informed neural networks (PINNs) offer what we believe to be a new approach to solving complex partial differential equations within the virtual fabrication model of MOFs. This study, for what appears to be the first time, integrates the complex partial differential equations and boundary conditions describing the fiber drawing process into the loss function of a neural network. To more accurately solve the free boundary of the fiber's inner and outer diameters, we additionally construct a neural network to describe the free boundary conditions. This model not only captures the evolution of the fiber's inner and outer diameters but also provides the velocity distribution and pressure distribution within the molten glass, thus laying the foundation for a quantitative analysis of capillary collapse. Furthermore, results indicate that the trends in the effects of temperature, feed speed, and draw speed on the fiber drawing process align with actual fabrication conditions, validating the feasibility of the model. The methodology proposed in this study offers what we believe to be a novel approach to simulating the fiber drawing process and holds promise for advancing the practical applications of MOFs.
Collapse
|
6
|
Cross MP, Acevedo AM, Hunter JF. A Critique of Automated Approaches to Code Facial Expressions: What Do Researchers Need to Know? AFFECTIVE SCIENCE 2023; 4:500-505. [PMID: 37744972 PMCID: PMC10514002 DOI: 10.1007/s42761-023-00195-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 06/03/2023] [Indexed: 09/26/2023]
Abstract
Facial expression recognition software is becoming more commonly used by affective scientists to measure facial expressions. Although the use of this software has exciting implications, there are persistent and concerning issues regarding the validity and reliability of these programs. In this paper, we highlight three of these issues: biases of the programs against certain skin colors and genders; the common inability of these programs to capture facial expressions made in non-idealized conditions (e.g., "in the wild"); and programs being forced to adopt the underlying assumptions of the specific theory of emotion on which each software is based. We then discuss three directions for the future of affective science in the area of automated facial coding. First, researchers need to be cognizant of exactly how and on which data sets the machine learning algorithms underlying these programs are being trained. In addition, there are several ethical considerations, such as privacy and data storage, surrounding the use of facial expression recognition programs. Finally, researchers should consider collecting additional emotion data, such as body language, and combine these data with facial expression data in order to achieve a more comprehensive picture of complex human emotions. Facial expression recognition programs are an excellent method of collecting facial expression data, but affective scientists should ensure that they recognize the limitations and ethical implications of these programs.
Collapse
Affiliation(s)
- Marie P. Cross
- Department of Biobehavioral Health, Pennsylvania State University, University Park, PA USA
| | - Amanda M. Acevedo
- Basic Biobehavioral and Psychological Sciences Branch, National Cancer Institute, Rockville, MD USA
| | - John F. Hunter
- Department of Psychology, Chapman University, Orange, CA USA
| |
Collapse
|
7
|
Pushpalatha MN, Meherishi H, Vaishnav A, Anurag Pillai R, Gupta A. Facial emotion recognition and encoding application for the visually impaired. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07807-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
8
|
Saurav S, Saini R, Singh S. Fast facial expression recognition using Boosted Histogram of Oriented Gradient (BHOG) features. Pattern Anal Appl 2022. [DOI: 10.1007/s10044-022-01112-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
9
|
A cascaded spatiotemporal attention network for dynamic facial expression recognition. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03781-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
10
|
Mehta NK, Prasad SS, Saurav S, Saini R, Singh S. Three-dimensional DenseNet self-attention neural network for automatic detection of student's engagement. APPL INTELL 2022; 52:13803-13823. [PMID: 35340984 PMCID: PMC8932470 DOI: 10.1007/s10489-022-03200-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/04/2022] [Indexed: 01/17/2023]
Abstract
Today, due to the widespread outbreak of the deadly coronavirus, popularly known as COVID-19, the traditional classroom education has been shifted to computer-based learning. Students of various cognitive and psychological abilities participate in the learning process. However, most students are hesitant to provide regular and honest feedback on the comprehensiveness of the course, making it difficult for the instructor to ensure that all students are grasping the information at the same rate. The students' understanding of the course and their emotional engagement, as indicated via facial expressions, are intertwined. This paper attempts to present a three-dimensional DenseNet self-attention neural network (DenseAttNet) used to identify and evaluate student participation in modern and traditional educational programs. With the Dataset for Affective States in E-Environments (DAiSEE), the proposed DenseAttNet model outperformed all other existing methods, achieving baseline accuracy of 63.59% for engagement classification and 54.27% for boredom classification, respectively. Besides, DenseAttNet trained on all four multi-labels, namely boredom, engagement, confusion, and frustration has registered an accuracy of 81.17%, 94.85%, 90.96%, and 95.85%, respectively. In addition, we performed a regression experiment on DAiSEE and obtained the lowest Mean Square Error (MSE) value of 0.0347. Finally, the proposed approach achieves a competitive MSE of 0.0877 when validated on the Emotion Recognition in the Wild Engagement Prediction (EmotiW-EP) dataset.
Collapse
Affiliation(s)
- Naval Kishore Mehta
- Academy of Scientific and Innovative Research(AcSIR), Ghaziabad, India
- CSIR-Central Electronics Engineering Research Institute(CSIR-CEERI), Pilani, India
| | - Shyam Sunder Prasad
- Academy of Scientific and Innovative Research(AcSIR), Ghaziabad, India
- CSIR-Central Electronics Engineering Research Institute(CSIR-CEERI), Pilani, India
| | - Sumeet Saurav
- Academy of Scientific and Innovative Research(AcSIR), Ghaziabad, India
- CSIR-Central Electronics Engineering Research Institute(CSIR-CEERI), Pilani, India
| | - Ravi Saini
- Academy of Scientific and Innovative Research(AcSIR), Ghaziabad, India
- CSIR-Central Electronics Engineering Research Institute(CSIR-CEERI), Pilani, India
| | - Sanjay Singh
- Academy of Scientific and Innovative Research(AcSIR), Ghaziabad, India
- CSIR-Central Electronics Engineering Research Institute(CSIR-CEERI), Pilani, India
| |
Collapse
|
11
|
Wu P, Pan K, Ji L, Gong S, Feng W, Yuan W, Pain C. Navier–stokes Generative Adversarial Network: a physics-informed deep learning model for fluid flow generation. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07042-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|