1
|
Liao J, Lin Y, Ma T, He S, Liu X, He G. Facial Expression Recognition Methods in the Wild Based on Fusion Feature of Attention Mechanism and LBP. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23094204. [PMID: 37177408 PMCID: PMC10180539 DOI: 10.3390/s23094204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Revised: 03/16/2023] [Accepted: 04/12/2023] [Indexed: 05/15/2023]
Abstract
Facial expression methods play a vital role in human-computer interaction and other fields, but there are factors such as occlusion, illumination, and pose changes in wild facial recognition, as well as category imbalances between different datasets, that result in large variations in recognition rates and low accuracy rates for different categories of facial expression datasets. This study introduces RCL-Net, a method of recognizing wild facial expressions that is based on an attention mechanism and LBP feature fusion. The structure consists of two main branches, namely the ResNet-CBAM residual attention branch and the local binary feature (LBP) extraction branch (RCL-Net). First, by merging the residual network and hybrid attention mechanism, the residual attention network is presented to emphasize the local detail feature information of facial expressions; the significant characteristics of facial expressions are retrieved from both channel and spatial dimensions to build the residual attention classification model. Second, we present a locally improved residual network attention model. LBP features are introduced into the facial expression feature extraction stage in order to extract texture information on expression photographs in order to emphasize facial feature information and enhance the recognition accuracy of the model. Lastly, experimental validation is performed using the FER2013, FERPLUS, CK+, and RAF-DB datasets, and the experimental results demonstrate that the proposed method has superior generalization capability and robustness in the laboratory-controlled environment and field environment compared to the most recent experimental methods.
Collapse
Affiliation(s)
- Jun Liao
- Chongqing Institute of Green Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
- College of Mechanical Engineering, Chongqing University of Technology, Chongqing 400054, China
- Chongqing Key Laboratory of Artificial Intelligence and Service Robot Control Technology, Chongqing Institute of Green Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
| | - Yuanchang Lin
- Chongqing Institute of Green Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
- Chongqing Key Laboratory of Artificial Intelligence and Service Robot Control Technology, Chongqing Institute of Green Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
| | - Tengyun Ma
- Chongqing Institute of Green Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
- Chongqing Key Laboratory of Artificial Intelligence and Service Robot Control Technology, Chongqing Institute of Green Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
| | - Songxiying He
- Chongqing Key Laboratory of Artificial Intelligence and Service Robot Control Technology, Chongqing Institute of Green Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
| | - Xiaofang Liu
- Chongqing Institute of Green Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
- Chongqing Key Laboratory of Artificial Intelligence and Service Robot Control Technology, Chongqing Institute of Green Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
| | - Guotian He
- Chongqing Institute of Green Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
- Chongqing Key Laboratory of Artificial Intelligence and Service Robot Control Technology, Chongqing Institute of Green Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
| |
Collapse
|
2
|
Ahmad M, Sanawar S, Alfandi O, Qadri SF, Saeed IA, Khan S, Hayat B, Ahmad A. Facial expression recognition using lightweight deep learning modeling. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:8208-8225. [PMID: 37161193 DOI: 10.3934/mbe.2023357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Facial expression is a type of communication and is useful in many areas of computer vision, including intelligent visual surveillance, human-robot interaction and human behavior analysis. A deep learning approach is presented to classify happy, sad, angry, fearful, contemptuous, surprised and disgusted expressions. Accurate detection and classification of human facial expression is a critical task in image processing due to the inconsistencies amid the complexity, including change in illumination, occlusion, noise and the over-fitting problem. A stacked sparse auto-encoder for facial expression recognition (SSAE-FER) is used for unsupervised pre-training and supervised fine-tuning. SSAE-FER automatically extracts features from input images, and the softmax classifier is used to classify the expressions. Our method achieved an accuracy of 92.50% on the JAFFE dataset and 99.30% on the CK+ dataset. SSAE-FER performs well compared to the other comparative methods in the same domain.
Collapse
Affiliation(s)
- Mubashir Ahmad
- Department of Computer Science, COMSATS University Islamabad, Abbottabad Campus, Tobe Camp, Abbottabad-22060, Pakistan
- Department of Computer Science, the University of Lahore, Sargodha Campus 40100, Pakistan
| | - Saira Sanawar
- Department of Computer Science, the University of Lahore, Sargodha Campus 40100, Pakistan
| | - Omar Alfandi
- College of Technological Innovation at Zayed University in Abu Dhabi, UAE
| | - Syed Furqan Qadri
- Research Center for Healthcare Data Science, Zhejiang Lab, Hangzhou 311121, China
| | - Iftikhar Ahmed Saeed
- Department of Computer Science, the University of Lahore, Sargodha Campus 40100, Pakistan
| | - Salabat Khan
- College of Computer Science & Software Engineering, Shenzhen University, Shenzhen 518060, China
| | - Bashir Hayat
- Department of Computer Science, Institute of Management Sciences, Peshawar, Pakistan
| | - Arshad Ahmad
- Department of IT & CS, Pak-Austria Fachhochschule: Institute of Applied Sciences and Technology (PAF-IAST), Haripur 22620, Pakistan
| |
Collapse
|
3
|
Gong X, Jia L, Li N. Research on mobile traffic data augmentation methods based on SA-ACGAN-GN. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:11512-11532. [PMID: 36124601 DOI: 10.3934/mbe.2022536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
With the rapid development and application of the mobile Internet, it is necessary to analyze and classify mobile traffic to meet the needs of users. Due to the difficulty in collecting some application data, the mobile traffic data presents a long-tailed distribution, resulting in a decrease in classification accuracy. In addition, the original GAN is difficult to train, and it is prone to "mode collapse". Therefore, this paper introduces the self-attention mechanism and gradient normalization into the auxiliary classifier generative adversarial network to form SA-ACGAN-GN model to solve the long-tailed distribution and training stability problems of mobile traffic data. This method firstly converts the traffic into images; secondly, to improve the quality of the generated images, the self-attention mechanism is introduced into the ACGAN model to obtain the global geometric features of the images; finally, the gradient normalization strategy is added to SA-ACGAN to further improve the data augmentation effect and improve the training stability. It can be seen from the cross-validation experimental data that, on the basis of using the same classifier, the SA-ACGAN-GN algorithm proposed in this paper, compared with other comparison algorithms, has the best precision reaching 93.8%; after adding gradient normalization, during the training process of the model, the classification loss decreases rapidly and the loss curve fluctuates less, indicating that the method proposed in this paper can not only effectively improve the long-tail problem of the dataset, but also enhance the stability of the model training.
Collapse
Affiliation(s)
- Xingyu Gong
- College of Computer Science and Technology, Xi'an University of Science and Technology, Xi'an 710054, China
| | - Ling Jia
- College of Computer Science and Technology, Xi'an University of Science and Technology, Xi'an 710054, China
- Hanzhong Vocational School of Science and Technology, Hanzhong 723200, China
| | - Na Li
- College of Computer Science and Technology, Xi'an University of Science and Technology, Xi'an 710054, China
| |
Collapse
|
4
|
Wang H, Sun G, Zheng K, Li H, Liu J, Bai Y. Privacy protection generalization with adversarial fusion. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:7314-7336. [PMID: 35730308 DOI: 10.3934/mbe.2022345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Several biometric privacy-enhancing techniques have been appraised to protect face image privacy. However, a face privacy protection algorithm is usually designed for a specific face recognition algorithm. When the structure or threshold of the face recognition algorithm is fine-tuned, the protection algorithm may be invalid. It will cause the network bloated and make the image distortion target multiple FRAs through the existing technology simultaneously. To address this problem, a fusion technology is developed to cope with the changeable face recognition algorithms via an image perturbation method. The image perturbation is performed by using a GAN-improved algorithm including generator, nozzles and validator, referred to as the Adversarial Fusion algorithm. A nozzle structure is proposed to replace the discriminator. Paralleling multiple face recognition algorithms on the nozzle can improve the compatibility of the generated image. Next, a validator is added to the training network, which takes part in the inverse back coupling of the generator. This component can make the generated graphics have no impact on human vision. Furthermore, the group hunting theory is quoted to make the network stable and up to 4.8 times faster than other models in training. The experimental results show that the Adversarial Fusion algorithm can not only change the image feature distribution by over 42% but also deal with at least 5 commercial face recognition algorithms at the same time.
Collapse
Affiliation(s)
- Hao Wang
- Beijing University of Technology, Beijing 100124, China
| | - Guangmin Sun
- Beijing University of Technology, Beijing 100124, China
| | - Kun Zheng
- Beijing University of Technology, Beijing 100124, China
| | - Hui Li
- Beijing University of Technology, Beijing 100124, China
| | - Jie Liu
- Beijing University of Technology, Beijing 100124, China
| | - Yu Bai
- Beijing Friendship Hospital, Beijing 100050, China
| |
Collapse
|
5
|
Nandhini Abirami R, Durai Raj Vincent PM, Srinivasan K, Manic KS, Chang CY. Multimodal Medical Image Fusion of Positron Emission Tomography and Magnetic Resonance Imaging Using Generative Adversarial Networks. Behav Neurol 2022; 2022:6878783. [PMID: 35464043 PMCID: PMC9023223 DOI: 10.1155/2022/6878783] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 03/27/2022] [Indexed: 12/02/2022] Open
Abstract
Multimodal medical image fusion is a current technique applied in the applications related to medical field to combine images from the same modality or different modalities to improve the visual content of the image to perform further operations like image segmentation. Biomedical research and medical image analysis highly demand medical image fusion to perform higher level of medical analysis. Multimodal medical fusion assists medical practitioners to visualize the internal organs and tissues. Multimodal medical fusion of brain image helps to medical practitioners to simultaneously visualize hard portion like skull and soft portion like tissue. Brain tumor segmentation can be accurately performed by utilizing the image obtained after multimodal medical image fusion. The area of the tumor can be accurately located with the information obtained from both Positron Emission Tomography and Magnetic Resonance Image in a single fused image. This approach increases the accuracy in diagnosing the tumor and reduces the time consumed in diagnosing and locating the tumor. The functional information of the brain is available in the Positron Emission Tomography while the anatomy of the brain tissue is available in the Magnetic Resonance Image. Thus, the spatial characteristics and functional information can be obtained from a single image using a robust multimodal medical image fusion model. The proposed approach uses a generative adversarial network to fuse Positron Emission Tomography and Magnetic Resonance Image into a single image. The results obtained from the proposed approach can be used for further medical analysis to locate the tumor and plan for further surgical procedures. The performance of the GAN based model is evaluated using two metrics, namely, structural similarity index and mutual information. The proposed approach achieved a structural similarity index of 0.8551 and a mutual information of 2.8059.
Collapse
Affiliation(s)
- R. Nandhini Abirami
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore 632014, India
| | - P. M. Durai Raj Vincent
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore 632014, India
| | - Kathiravan Srinivasan
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, 632 014 Tamil Nadu, India
| | - K. Suresh Manic
- Department of Electrical and Communication Engineering, National University of Science and Technology, Muscat, Oman
| | - Chuan-Yu Chang
- Department of Computer Science and Information Engineering, National Yunlin University of Science and Technology, Yunlin 64002, Taiwan
- Service Systems Technology Center, Industrial Technology Research Institute, Hsinchu, Taiwan
| |
Collapse
|
6
|
Yang S, Qiao K, Qin R, Xie P, Shi S, Liang N, Wang L, Chen J, Hu G, Yan B. ShapeEditor: A StyleGAN Encoder for Stable and High Fidelity Face Swapping. Front Neurorobot 2022; 15:785808. [PMID: 35126081 PMCID: PMC8814752 DOI: 10.3389/fnbot.2021.785808] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 11/23/2021] [Indexed: 11/30/2022] Open
Abstract
With the continuous development of deep-learning technology, ever more advanced face-swapping methods are being proposed. Recently, face-swapping methods based on generative adversarial networks (GANs) have realized many-to-many face exchanges with few samples, which advances the development of this field. However, the images generated by previous GAN-based methods often show instability. The fundamental reason is that the GAN in these frameworks is difficult to converge to the distribution of face space in training completely. To solve this problem, we propose a novel face-swapping method based on pretrained StyleGAN generator with a stronger ability of high-quality face image generation. The critical issue is how to control StyleGAN to generate swapped images accurately. We design the control strategy of the generator based on the idea of encoding and decoding and propose an encoder called ShapeEditor to complete this task. ShapeEditor is a two-step encoder used to generate a set of coding vectors that integrate the identity and attribute of the input faces. In the first step, we extract the identity vector of the source image and the attribute vector of the target image; in the second step, we map the concatenation of the identity vector and attribute vector onto the potential internal space of StyleGAN. Extensive experiments on the test dataset show that the results of the proposed method are not only superior in clarity and authenticity than other state-of-the-art methods but also sufficiently integrate identity and attribute.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - Bin Yan
- Henan Key Laboratory of Imaging and Intelligent Processing, People's Liberation Army (PLA) Strategy Support Force Information Engineering University, Zhengzhou, China
| |
Collapse
|
7
|
Abirami R. N, Vincent P. M. DR. Low-Light Image Enhancement Based on Generative Adversarial Network. Front Genet 2021; 12:799777. [PMID: 34912381 PMCID: PMC8667858 DOI: 10.3389/fgene.2021.799777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 11/08/2021] [Indexed: 11/13/2022] Open
Abstract
Image enhancement is considered to be one of the complex tasks in image processing. When the images are captured under dim light, the quality of the images degrades due to low visibility degenerating the vision-based algorithms' performance that is built for very good quality images with better visibility. After the emergence of a deep neural network number of methods has been put forward to improve images captured under low light. But, the results shown by existing low-light enhancement methods are not satisfactory because of the lack of effective network structures. A low-light image enhancement technique (LIMET) with a fine-tuned conditional generative adversarial network is presented in this paper. The proposed approach employs two discriminators to acquire a semantic meaning that imposes the obtained results to be realistic and natural. Finally, the proposed approach is evaluated with benchmark datasets. The experimental results highlight that the presented approach attains state-of-the-performance when compared to existing methods. The models' performance is assessed using Visual Information Fidelitysse, which assesses the generated image's quality over the degraded input. VIF obtained for different datasets using the proposed approach are 0.709123 for LIME dataset, 0.849982 for DICM dataset, 0.619342 for MEF dataset.
Collapse
Affiliation(s)
| | - Durai Raj Vincent P. M.
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
| |
Collapse
|
8
|
Wang S, Sun G, Zheng B, Du Y. A Crop Image Segmentation and Extraction Algorithm Based on Mask RCNN. ENTROPY 2021; 23:e23091160. [PMID: 34573785 PMCID: PMC8469590 DOI: 10.3390/e23091160] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 08/25/2021] [Accepted: 08/30/2021] [Indexed: 11/16/2022]
Abstract
The wide variety of crops in the image of agricultural products and the confusion with the surrounding environment information makes it difficult for traditional methods to extract crops accurately and efficiently. In this paper, an automatic extraction algorithm is proposed for crop images based on Mask RCNN. First, the Fruits 360 Dataset label is set with Labelme. Then, the Fruits 360 Dataset is preprocessed. Next, the data are divided into a training set and a test set. Additionally, an improved Mask RCNN network model structure is established using the PyTorch 1.8.1 deep learning framework, and path aggregation and features are added to the network design enhanced functions, optimized region extraction network, and feature pyramid network. The spatial information of the feature map is saved by the bilinear interpolation method in ROIAlign. Finally, the edge accuracy of the segmentation mask is further improved by adding a micro-fully connected layer to the mask branch of the ROI output, employing the Sobel operator to predict the target edge, and adding the edge loss to the loss function. Compared with FCN and Mask RCNN and other image extraction algorithms, the experimental results demonstrate that the improved Mask RCNN algorithm proposed in this paper is better in the precision, Recall, Average precision, Mean Average Precision, and F1 scores of crop image extraction results.
Collapse
|