1
|
Yang X, Li R, Yang X, Zhou Y, Liu Y, Han JDJ. Coordinate-wise monotonic transformations enable privacy-preserving age estimation with 3D face point cloud. SCIENCE CHINA. LIFE SCIENCES 2024; 67:1489-1501. [PMID: 38573362 DOI: 10.1007/s11427-023-2518-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 12/25/2023] [Indexed: 04/05/2024]
Abstract
The human face is a valuable biomarker of aging, but the collection and use of its image raise significant privacy concerns. Here we present an approach for facial data masking that preserves age-related features using coordinate-wise monotonic transformations. We first develop a deep learning model that estimates age directly from non-registered face point clouds with high accuracy and generalizability. We show that the model learns a highly indistinguishable mapping using faces treated with coordinate-wise monotonic transformations, indicating that the relative positioning of facial information is a low-level biomarker of facial aging. Through visual perception tests and computational 3D face verification experiments, we demonstrate that transformed faces are significantly more difficult to perceive for human but not for machines, except when only the face shape information is accessible. Our study leads to a facial data protection guideline that has the potential to broaden public access to face datasets with minimized privacy risks.
Collapse
Affiliation(s)
- Xinyu Yang
- School of Life Sciences, Peking University, Beijing, 100871, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China
| | - Runhan Li
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China
| | - Xindi Yang
- Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100044, China
| | - Yong Zhou
- Clinical Research Institute, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Yi Liu
- Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100044, China
| | - Jing-Dong J Han
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
| |
Collapse
|
2
|
Pereira R, Mendes C, Ribeiro J, Ribeiro R, Miragaia R, Rodrigues N, Costa N, Pereira A. Systematic Review of Emotion Detection with Computer Vision and Deep Learning. SENSORS (BASEL, SWITZERLAND) 2024; 24:3484. [PMID: 38894274 PMCID: PMC11175284 DOI: 10.3390/s24113484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Revised: 05/20/2024] [Accepted: 05/24/2024] [Indexed: 06/21/2024]
Abstract
Emotion recognition has become increasingly important in the field of Deep Learning (DL) and computer vision due to its broad applicability by using human-computer interaction (HCI) in areas such as psychology, healthcare, and entertainment. In this paper, we conduct a systematic review of facial and pose emotion recognition using DL and computer vision, analyzing and evaluating 77 papers from different sources under Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Our review covers several topics, including the scope and purpose of the studies, the methods employed, and the used datasets. The scope of this work is to conduct a systematic review of facial and pose emotion recognition using DL methods and computer vision. The studies were categorized based on a proposed taxonomy that describes the type of expressions used for emotion detection, the testing environment, the currently relevant DL methods, and the datasets used. The taxonomy of methods in our review includes Convolutional Neural Network (CNN), Faster Region-based Convolutional Neural Network (R-CNN), Vision Transformer (ViT), and "Other NNs", which are the most commonly used models in the analyzed studies, indicating their trendiness in the field. Hybrid and augmented models are not explicitly categorized within this taxonomy, but they are still important to the field. This review offers an understanding of state-of-the-art computer vision algorithms and datasets for emotion recognition through facial expressions and body poses, allowing researchers to understand its fundamental components and trends.
Collapse
Affiliation(s)
- Rafael Pereira
- Computer Science and Communications Research Centre, School of Technology and Management, Polytechnic of Leiria, 2411-901 Leiria, Portugal; (R.P.); (C.M.); (J.R.); (R.R.); (R.M.); (N.R.); (N.C.)
| | - Carla Mendes
- Computer Science and Communications Research Centre, School of Technology and Management, Polytechnic of Leiria, 2411-901 Leiria, Portugal; (R.P.); (C.M.); (J.R.); (R.R.); (R.M.); (N.R.); (N.C.)
| | - José Ribeiro
- Computer Science and Communications Research Centre, School of Technology and Management, Polytechnic of Leiria, 2411-901 Leiria, Portugal; (R.P.); (C.M.); (J.R.); (R.R.); (R.M.); (N.R.); (N.C.)
| | - Roberto Ribeiro
- Computer Science and Communications Research Centre, School of Technology and Management, Polytechnic of Leiria, 2411-901 Leiria, Portugal; (R.P.); (C.M.); (J.R.); (R.R.); (R.M.); (N.R.); (N.C.)
| | - Rolando Miragaia
- Computer Science and Communications Research Centre, School of Technology and Management, Polytechnic of Leiria, 2411-901 Leiria, Portugal; (R.P.); (C.M.); (J.R.); (R.R.); (R.M.); (N.R.); (N.C.)
| | - Nuno Rodrigues
- Computer Science and Communications Research Centre, School of Technology and Management, Polytechnic of Leiria, 2411-901 Leiria, Portugal; (R.P.); (C.M.); (J.R.); (R.R.); (R.M.); (N.R.); (N.C.)
| | - Nuno Costa
- Computer Science and Communications Research Centre, School of Technology and Management, Polytechnic of Leiria, 2411-901 Leiria, Portugal; (R.P.); (C.M.); (J.R.); (R.R.); (R.M.); (N.R.); (N.C.)
| | - António Pereira
- Computer Science and Communications Research Centre, School of Technology and Management, Polytechnic of Leiria, 2411-901 Leiria, Portugal; (R.P.); (C.M.); (J.R.); (R.R.); (R.M.); (N.R.); (N.C.)
- INOV INESC Inovação, Institute of New Technologies, Leiria Office, 2411-901 Leiria, Portugal
| |
Collapse
|
3
|
Kong W, You Z, Lv X. 3D face recognition algorithm based on deep Laplacian pyramid under the normalization of epidemic control. COMPUTER COMMUNICATIONS 2023; 199:30-41. [PMID: 36531215 PMCID: PMC9744674 DOI: 10.1016/j.comcom.2022.12.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 12/02/2022] [Accepted: 12/06/2022] [Indexed: 06/17/2023]
Abstract
Under the normalization of epidemic control in COVID-19, it is essential to realize fast and high-precision face recognition without feeling for epidemic prevention and control. This paper proposes an innovative Laplacian pyramid algorithm for deep 3D face recognition, which can be used in public. Through multi-mode fusion, dense 3D alignment and multi-scale residual fusion are ensured. Firstly, the 2D to 3D structure representation method is used to fully correlate the information of crucial points, and dense alignment modeling is carried out. Then, based on the 3D critical point model, a five-layer Laplacian depth network is constructed. High-precision recognition can be achieved by multi-scale and multi-modal mapping and reconstruction of 3D face depth images. Finally, in the training process, the multi-scale residual weight is embedded into the loss function to improve the network's performance. In addition, to achieve high real-time performance, our network is designed in an end-to-end cascade. While ensuring the accuracy of identification, it guarantees personnel screening under the normalization of epidemic control. This ensures fast and high-precision face recognition and establishes a 3D face database. This method is adaptable and robust in harsh, low light, and noise environments. Moreover, it can complete face reconstruction and recognize various skin colors and postures.
Collapse
Affiliation(s)
- Weiyi Kong
- National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University, Chengdu, 610065, PR China
| | - Zhisheng You
- National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University, Chengdu, 610065, PR China
- School of Computer Science, Sichuan University, Chengdu, 610064, PR China
| | - Xuebin Lv
- School of Computer Science, Sichuan University, Chengdu, 610064, PR China
| |
Collapse
|
4
|
He Y, Fu K, Cheng P, Zhang J. Facial Expression Recognition with Geometric Scattering on 3D Point Clouds. SENSORS (BASEL, SWITZERLAND) 2022; 22:8293. [PMID: 36366000 PMCID: PMC9658755 DOI: 10.3390/s22218293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 10/22/2022] [Accepted: 10/25/2022] [Indexed: 06/16/2023]
Abstract
As one of the pioneering data representations, the point cloud has shown its straightforward capacity to depict fine geometry in many applications, including computer graphics, molecular structurology, modern sensing signal processing, and more. However, unlike computer graphs obtained with auxiliary regularization techniques or from syntheses, raw sensor/scanner (metric) data often contain natural random noise caused by multiple extrinsic factors, especially in the case of high-speed imaging scenarios. On the other hand, grid-like imaging techniques (e.g., RGB images or video frames) tend to entangle interesting aspects with environmental variations such as pose/illuminations with Euclidean sampling/processing pipelines. As one such typical problem, 3D Facial Expression Recognition (3D FER) has been developed into a new stage, with remaining difficulties involving the implementation of efficient feature abstraction methods for high dimensional observations and of stabilizing methods to obtain adequate robustness in cases of random exterior variations. In this paper, a localized and smoothed overlapping kernel is proposed to extract discriminative inherent geometric features. By association between the induced deformation stability and certain types of exterior perturbations through manifold scattering transform, we provide a novel framework that directly consumes point cloud coordinates for FER while requiring no predefined meshes or other features/signals. As a result, our compact framework achieves 78.33% accuracy on the Bosphorus dataset for expression recognition challenge and 77.55% on 3D-BUFE.
Collapse
Affiliation(s)
- Yi He
- National Key Laboratory of Fundamental Science on Synthetic Vision, Chengdu 610065, China
| | - Keren Fu
- College of Computer Science, Sichuan University, Chengdu 610065, China
| | - Peng Cheng
- School of Aeronautics and Astronautics, Sichuan University, Chengdu 610065, China
| | - Jianwei Zhang
- College of Computer Science, Sichuan University, Chengdu 610065, China
| |
Collapse
|
5
|
Fast 3D Face Reconstruction from a Single Image Using Different Deep Learning Approaches for Facial Palsy Patients. Bioengineering (Basel) 2022; 9:bioengineering9110619. [PMID: 36354529 PMCID: PMC9687570 DOI: 10.3390/bioengineering9110619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 10/20/2022] [Accepted: 10/22/2022] [Indexed: 11/16/2022] Open
Abstract
The 3D reconstruction of an accurate face model is essential for delivering reliable feedback for clinical decision support. Medical imaging and specific depth sensors are accurate but not suitable for an easy-to-use and portable tool. The recent development of deep learning (DL) models opens new challenges for 3D shape reconstruction from a single image. However, the 3D face shape reconstruction of facial palsy patients is still a challenge, and this has not been investigated. The contribution of the present study is to apply these state-of-the-art methods to reconstruct the 3D face shape models of facial palsy patients in natural and mimic postures from one single image. Three different methods (3D Basel Morphable model and two 3D Deep Pre-trained models) were applied to the dataset of two healthy subjects and two facial palsy patients. The reconstructed outcomes were compared to the 3D shapes reconstructed using Kinect-driven and MRI-based information. As a result, the best mean error of the reconstructed face according to the Kinect-driven reconstructed shape is 1.5±1.1 mm. The best error range is 1.9±1.4 mm when compared to the MRI-based shapes. Before using the procedure to reconstruct the 3D faces of patients with facial palsy or other facial disorders, several ideas for increasing the accuracy of the reconstruction can be discussed based on the results. This present study opens new avenues for the fast reconstruction of the 3D face shapes of facial palsy patients from a single image. As perspectives, the best DL method will be implemented into our computer-aided decision support system for facial disorders.
Collapse
|
6
|
|