1
|
Čuta M, Jurda M, Kováčová V, Jandová M, Bezděková V, Černý D, Urbanová P. Virtual fit and design improvements of a filtering half-mask for sub-adult wearers. ERGONOMICS 2024; 67:1267-1283. [PMID: 38351576 DOI: 10.1080/00140139.2023.2298984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 12/19/2023] [Indexed: 09/21/2024]
Abstract
The recent pandemic has shown that protecting the general population from hazardous substances or pathogens can be a challenging and urgent task. The key element to adequate protection is appropriately sized, well-fitted and sufficiently distributed personal protective equipment (PPE). While these conditions are followed for adult PPE wearers, they are less considered when it comes to protecting subadults. In this study, the assessment of the fit and design improvements of a 3D-printed facial half mask for subadult wearers (4-18 years) is designed. The target population was represented by 1137 subadults, aged 4.06-18.94 years, for whom 3D face models were acquired. The half mask tested, which was originally provided in one subadult size, did not fit appropriately the target population. This finding prompted the creation of four size categories using the age-dependent distribution of the centroid size calculated from 7 facial landmarks. For each size category, a modified half-mask virtual design was created, including resizing and reshaping, and fit was evaluated visually and numerically using averaged and random 3D face representatives.Practitioner summary: The reason for this study was to describe procedures which led to design improvement of an existing half-mask and provide respiratory protection for subadults. To address this, fit was assessed using an innovative metric approach. Four sizes were then created based on centroid size, resulting in improved fit and design.Abbreviations: CH: cheilion landmark; CS: centroid size; EX: exocanthion landmark; GN: gnathion landmark; N: nasion landmark; PPE: personal protective equipment; PR: pronasale landmark; RPE: respiratory protective equipment.
Collapse
Affiliation(s)
- Martin Čuta
- Department of Anthropology, Faculty of Science, Laboratory of Morphology and Forensic Anthropology, Masaryk University, Brno, Czech Republic
| | - Mikoláš Jurda
- Department of Anthropology, Faculty of Science, Laboratory of Morphology and Forensic Anthropology, Masaryk University, Brno, Czech Republic
| | - Veronika Kováčová
- Department of Anthropology, Faculty of Science, Laboratory of Morphology and Forensic Anthropology, Masaryk University, Brno, Czech Republic
| | - Marie Jandová
- Department of Anthropology, Faculty of Science, Laboratory of Morphology and Forensic Anthropology, Masaryk University, Brno, Czech Republic
| | - Vendula Bezděková
- Department of Anthropology, Faculty of Science, Laboratory of Morphology and Forensic Anthropology, Masaryk University, Brno, Czech Republic
| | - Dominik Černý
- Department of Anthropology, Faculty of Science, Laboratory of Morphology and Forensic Anthropology, Masaryk University, Brno, Czech Republic
| | - Petra Urbanová
- Department of Anthropology, Faculty of Science, Laboratory of Morphology and Forensic Anthropology, Masaryk University, Brno, Czech Republic
| |
Collapse
|
2
|
Sheng C, Kuang G, Bai L, Hou C, Guo Y, Xu X, Pietikainen M, Liu L. Deep Learning for Visual Speech Analysis: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:6001-6022. [PMID: 38478434 DOI: 10.1109/tpami.2024.3376710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/09/2024]
Abstract
Visual speech, referring to the visual domain of speech, has attracted increasing attention due to its wide applications, such as public security, medical treatment, military defense, and film entertainment. As a powerful AI strategy, deep learning techniques have extensively promoted the development of visual speech learning. Over the past five years, numerous deep learning based methods have been proposed to address various problems in this area, especially automatic visual speech recognition and generation. To push forward future research on visual speech, this paper will present a comprehensive review of recent progress in deep learning methods on visual speech analysis. We cover different aspects of visual speech, including fundamental problems, challenges, benchmark datasets, a taxonomy of existing methods, and state-of-the-art performance. Besides, we also identify gaps in current research and discuss inspiring future research directions.
Collapse
|
3
|
Zhang J, Zhou K, Luximon Y, Lee TY, Li P. MeshWGAN: Mesh-to-Mesh Wasserstein GAN With Multi-Task Gradient Penalty for 3D Facial Geometric Age Transformation. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:4927-4940. [PMID: 37307186 DOI: 10.1109/tvcg.2023.3284500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
As the metaverse develops rapidly, 3D facial age transformation is attracting increasing attention, which may bring many potential benefits to a wide variety of users, e.g., 3D aging figures creation, 3D facial data augmentation and editing. Compared with 2D methods, 3D face aging is an underexplored problem. To fill this gap, we propose a new mesh-to-mesh Wasserstein generative adversarial network (MeshWGAN) with a multi-task gradient penalty to model a continuous bi-directional 3D facial geometric aging process. To the best of our knowledge, this is the first architecture to achieve 3D facial geometric age transformation via real 3D scans. As previous image-to-image translation methods cannot be directly applied to the 3D facial mesh, which is totally different from 2D images, we built a mesh encoder, decoder, and multi-task discriminator to facilitate mesh-to-mesh transformations. To mitigate the lack of 3D datasets containing children's faces, we collected scans from 765 subjects aged 5-17 in combination with existing 3D face databases, which provided a large training dataset. Experiments have shown that our architecture can predict 3D facial aging geometries with better identity preservation and age closeness compared to 3D trivial baselines. We also demonstrated the advantages of our approach via various 3D face-related graphics applications.
Collapse
|
4
|
Luo Z, Du D, Zhu H, Yu Y, Fu H, Han X. SketchMetaFace: A Learning-Based Sketching Interface for High-Fidelity 3D Character Face Modeling. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:5260-5275. [PMID: 37467083 DOI: 10.1109/tvcg.2023.3291703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/21/2023]
Abstract
Modeling 3D avatars benefits various application scenarios such as AR/VR, gaming, and filming. Character faces contribute significant diversity and vividity as a vital component of avatars. However, building 3D character face models usually requires a heavy workload with commercial tools, even for experienced artists. Various existing sketch-based tools fail to support amateurs in modeling diverse facial shapes and rich geometric details. In this article, we present SketchMetaFace - a sketching system targeting amateur users to model high-fidelity 3D faces in minutes. We carefully design both the user interface and the underlying algorithm. First, curvature-aware strokes are adopted to better support the controllability of carving facial details. Second, considering the key problem of mapping a 2D sketch map to a 3D model, we develop a novel learning-based method termed "Implicit and Depth Guided Mesh Modeling" (IDGMM). It fuses the advantages of mesh, implicit, and depth representations to achieve high-quality results with high efficiency. In addition, to further support usability, we present a coarse-to-fine 2D sketching interface design and a data-driven stroke suggestion tool. User studies demonstrate the superiority of our system over existing modeling tools in terms of the ease to use and visual quality of results. Experimental analyses also show that IDGMM reaches a better trade-off between accuracy and efficiency.
Collapse
|
5
|
Osorio Quero C, Durini D, Rangel-Magdaleno J, Martinez-Carranza J, Ramos-Garcia R. Enhancing 3D human pose estimation with NIR single-pixel imaging and time-of-flight technology: a deep learning approach. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA. A, OPTICS, IMAGE SCIENCE, AND VISION 2024; 41:414-423. [PMID: 38437432 DOI: 10.1364/josaa.499933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 01/09/2024] [Indexed: 03/06/2024]
Abstract
The extraction of 3D human pose and body shape details from a single monocular image is a significant challenge in computer vision. Traditional methods use RGB images, but these are constrained by varying lighting and occlusions. However, cutting-edge developments in imaging technologies have introduced new techniques such as single-pixel imaging (SPI) that can surmount these hurdles. In the near-infrared spectrum, SPI demonstrates impressive capabilities in capturing a 3D human pose. This wavelength can penetrate clothing and is less influenced by lighting variations than visible light, thus providing a reliable means to accurately capture body shape and pose data, even in difficult settings. In this work, we explore the use of an SPI camera operating in the NIR with time-of-flight (TOF) at bands 850-1550 nm as a solution to detect humans in nighttime environments. The proposed system uses the vision transformers (ViT) model to detect and extract the characteristic features of humans for integration over a 3D body model SMPL-X through 3D body shape regression using deep learning. To evaluate the efficacy of NIR-SPI 3D image reconstruction, we constructed a laboratory scenario that simulates nighttime conditions, enabling us to test the feasibility of employing NIR-SPI as a vision sensor in outdoor environments. By assessing the results obtained from this setup, we aim to demonstrate the potential of NIR-SPI as an effective tool to detect humans in nighttime scenarios and capture their accurate 3D body pose and shape.
Collapse
|
6
|
Sariyanidi E, Zampella CJ, Schultz RT, Tunc B. Inequality-Constrained 3D Morphable Face Model Fitting. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:1305-1318. [PMID: 38015704 PMCID: PMC10823595 DOI: 10.1109/tpami.2023.3334948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
3D morphable model (3DMM) fitting on 2D data is traditionally done via unconstrained optimization with regularization terms to ensure that the result is a plausible face shape and is consistent with a set of 2D landmarks. This paper presents inequality-constrained 3DMM fitting as the first alternative to regularization in optimization-based 3DMM fitting. Inequality constraints on the 3DMM's shape coefficients ensure face-like shapes without modifying the objective function for smoothness, thus allowing for more flexibility to capture person-specific shape details. Moreover, inequality constraints on landmarks increase robustness in a way that does not require per-image tuning. We show that the proposed method stands out with its ability to estimate person-specific face shapes by jointly fitting a 3DMM to multiple frames of a person. Further, when used with a robust objective function, namely gradient correlation, the method can work "in-the-wild" even with a 3DMM constructed from controlled data. Lastly, we show how to use the log-barrier method to efficiently implement the method. To our knowledge, we present the first 3DMM fitting framework that requires no learning yet is accurate, robust, and efficient. The absence of learning enables a generic solution that allows flexibility in the input image size, interchangeable morphable models, and incorporation of camera matrix.
Collapse
|
7
|
Sun K, Wu S, Zhang N, Huang Z, Wang Q, Li H. CGOF++: Controllable 3D Face Synthesis With Conditional Generative Occupancy Fields. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:913-926. [PMID: 38153826 DOI: 10.1109/tpami.2023.3328912] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2023]
Abstract
Capitalizing on the recent advances in image generation models, existing controllable face image synthesis methods are able to generate high-fidelity images with some levels of controllability, e.g., controlling the shapes, expressions, textures, and poses of the generated face images. However, previous methods focus on controllable 2D image generative models, which are prone to producing inconsistent face images under large expression and pose changes. In this paper, we propose a new NeRF-based conditional 3D face synthesis framework, which enables 3D controllability over the generated face images by imposing explicit 3D conditions from 3D face priors. At its core is a conditional Generative Occupancy Field (cGOF++) that effectively enforces the shape of the generated face to conform to a given 3D Morphable Model (3DMM) mesh, built on top of EG3D (Chan et al. 2022), a recent tri-plane-based generative model. To achieve accurate control over fine-grained 3D face shapes of the synthesized images, we additionally incorporate a 3D landmark loss as well as a volume warping loss into our synthesis framework. Experiments validate the effectiveness of the proposed method, which is able to generate high-fidelity face images and shows more precise 3D controllability than state-of-the-art 2D-based controllable face synthesis methods.
Collapse
|
8
|
Zhu H, Yang H, Guo L, Zhang Y, Wang Y, Huang M, Wu M, Shen Q, Yang R, Cao X. FaceScape: 3D Facial Dataset and Benchmark for Single-View 3D Face Reconstruction. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:14528-14545. [PMID: 37607140 DOI: 10.1109/tpami.2023.3307338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
In this article, we present a large-scale detailed 3D face dataset, FaceScape, and the corresponding benchmark to evaluate single-view facial 3D reconstruction. By training on FaceScape data, a novel algorithm is proposed to predict elaborate riggable 3D face models from a single image input. FaceScape dataset releases 16,940 textured 3D faces, captured from 847 subjects and each with 20 specific expressions. The 3D models contain the pore-level facial geometry that is also processed to be topologically uniform. These fine 3D facial models can be represented as a 3D morphable model for coarse shapes and displacement maps for detailed geometry. Taking advantage of the large-scale and high-accuracy dataset, a novel algorithm is further proposed to learn the expression-specific dynamic details using a deep neural network. The learned relationship serves as the foundation of our 3D face prediction system from a single image input. Different from most previous methods, our predicted 3D models are riggable with highly detailed geometry under different expressions. We also use FaceScape data to generate the in-the-wild and in-the-lab benchmark to evaluate recent methods of single-view face reconstruction. The accuracy is reported and analyzed on the dimensions of camera pose and focal length, which provides a faithful and comprehensive evaluation and reveals new challenges. The unprecedented dataset, benchmark, and code have been released to the public for research purpose.
Collapse
|
9
|
Tian Y, Zhang H, Liu Y, Wang L. Recovering 3D Human Mesh From Monocular Images: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:15406-15425. [PMID: 37494160 DOI: 10.1109/tpami.2023.3298850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/28/2023]
Abstract
Estimating human pose and shape from monocular images is a long-standing problem in computer vision. Since the release of statistical body models, 3D human mesh recovery has been drawing broader attention. With the same goal of obtaining well-aligned and physically plausible mesh results, two paradigms have been developed to overcome challenges in the 2D-to-3D lifting process: i) an optimization-based paradigm, where different data terms and regularization terms are exploited as optimization objectives; and ii) a regression-based paradigm, where deep learning techniques are embraced to solve the problem in an end-to-end fashion. Meanwhile, continuous efforts are devoted to improving the quality of 3D mesh labels for a wide range of datasets. Though remarkable progress has been achieved in the past decade, the task is still challenging due to flexible body motions, diverse appearances, complex environments, and insufficient in-the-wild annotations. To the best of our knowledge, this is the first survey that focuses on the task of monocular 3D human mesh recovery. We start with the introduction of body models and then elaborate recovery frameworks and training objectives by providing in-depth analyses of their strengths and weaknesses. We also summarize datasets, evaluation metrics, and benchmark results. Open issues and future directions are discussed in the end, hoping to motivate researchers and facilitate their research in this area.
Collapse
|
10
|
Liu Y, Li Q, Deng Q, Sun Z, Yang MH. GAN-Based Facial Attribute Manipulation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:14590-14610. [PMID: 37494159 DOI: 10.1109/tpami.2023.3298868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/28/2023]
Abstract
Facial Attribute Manipulation (FAM) aims to aesthetically modify a given face image to render desired attributes, which has received significant attention due to its broad practical applications ranging from digital entertainment to biometric forensics. In the last decade, with the remarkable success of Generative Adversarial Networks (GANs) in synthesizing realistic images, numerous GAN-based models have been proposed to solve FAM with various problem formulation approaches and guiding information representations. This paper presents a comprehensive survey of GAN-based FAM methods with a focus on summarizing their principal motivations and technical details. The main contents of this survey include: (i) an introduction to the research background and basic concepts related to FAM, (ii) a systematic review of GAN-based FAM methods in three main categories, and (iii) an in-depth discussion of important properties of FAM methods, open issues, and future research directions. This survey not only builds a good starting point for researchers new to this field but also serves as a reference for the vision community.
Collapse
|
11
|
Gao Z, Yan J, Zhai G, Zhang J, Yang X. Robust Mesh Representation Learning via Efficient Local Structure-Aware Anisotropic Convolution. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:8566-8578. [PMID: 35226610 DOI: 10.1109/tnnls.2022.3151609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Mesh is a type of data structure commonly used for 3-D shapes. Representation learning for 3-D meshes is essential in many computer vision and graphics applications. The recent success of convolutional neural networks (CNNs) for structured data (e.g., images) suggests the value of adapting insights from CNN for 3-D shapes. However, 3-D shape data are irregular since each node's neighbors are unordered. Various graph neural networks for 3-D shapes have been developed with isotropic filters or predefined local coordinate systems to overcome the node inconsistency on graphs. However, isotropic filters or predefined local coordinate systems limit the representation power. In this article, we propose a local structure-aware anisotropic convolutional operation (LSA-Conv) that learns adaptive weighting matrices for each template's node according to its neighboring structure and performs shared anisotropic filters. In fact, the learnable weighting matrix is similar to the attention matrix in the random synthesizer-a new Transformer model for natural language processing (NLP). Since the learnable weighting matrices require large amounts of parameters for high-resolution 3-D shapes, we introduce a matrix factorization technique to notably reduce the parameter size, denoted as LSA-small. Furthermore, a residual connection with a linear transformation is introduced to improve the performance of our LSA-Conv. Comprehensive experiments demonstrate that our model produces significant improvement in 3-D shape reconstruction compared to state-of-the-art methods.
Collapse
|
12
|
Romero A, Carvalho P, Côrte-Real L, Pereira A. Synthesizing Human Activity for Data Generation. J Imaging 2023; 9:204. [PMID: 37888311 PMCID: PMC10607066 DOI: 10.3390/jimaging9100204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 09/22/2023] [Accepted: 09/27/2023] [Indexed: 10/28/2023] Open
Abstract
The problem of gathering sufficiently representative data, such as those about human actions, shapes, and facial expressions, is costly and time-consuming and also requires training robust models. This has led to the creation of techniques such as transfer learning or data augmentation. However, these are often insufficient. To address this, we propose a semi-automated mechanism that allows the generation and editing of visual scenes with synthetic humans performing various actions, with features such as background modification and manual adjustments of the 3D avatars to allow users to create data with greater variability. We also propose an evaluation methodology for assessing the results obtained using our method, which is two-fold: (i) the usage of an action classifier on the output data resulting from the mechanism and (ii) the generation of masks of the avatars and the actors to compare them through segmentation. The avatars were robust to occlusion, and their actions were recognizable and accurate to their respective input actors. The results also showed that even though the action classifier concentrates on the pose and movement of the synthetic humans, it strongly depends on contextual information to precisely recognize the actions. Generating the avatars for complex activities also proved problematic for action recognition and the clean and precise formation of the masks.
Collapse
Affiliation(s)
- Ana Romero
- Faculdade de Engenharia, Universidade do Porto, 4200-465 Porto, Portugal (L.C.-R.); (A.P.)
| | - Pedro Carvalho
- Instituto de Engenharia de Sistemas e Computadores, Tecnologia e Ciência, 4200-465 Porto, Portugal
- School of Engineering, Polytechnic of Porto, 4200-072 Porto, Portugal
| | - Luís Côrte-Real
- Faculdade de Engenharia, Universidade do Porto, 4200-465 Porto, Portugal (L.C.-R.); (A.P.)
- Instituto de Engenharia de Sistemas e Computadores, Tecnologia e Ciência, 4200-465 Porto, Portugal
| | - Américo Pereira
- Faculdade de Engenharia, Universidade do Porto, 4200-465 Porto, Portugal (L.C.-R.); (A.P.)
- Instituto de Engenharia de Sistemas e Computadores, Tecnologia e Ciência, 4200-465 Porto, Portugal
| |
Collapse
|
13
|
Wang C, Chai M, He M, Chen D, Liao J. Cross-Domain and Disentangled Face Manipulation With 3D Guidance. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:2053-2066. [PMID: 34982684 DOI: 10.1109/tvcg.2021.3139913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Face image manipulation via three-dimensional guidance has been widely applied in various interactive scenarios due to its semantically-meaningful understanding and user-friendly controllability. However, existing 3D-morphable-model-based manipulation methods are not directly applicable to out-of-domain faces, such as non-photorealistic paintings, cartoon portraits, or even animals, mainly due to the formidable difficulties in building the model for each specific face domain. To overcome this challenge, we propose, as far as we know, the first method to manipulate faces in arbitrary domains using human 3DMM. This is achieved through two major steps: 1) disentangled mapping from 3DMM parameters to the latent space embedding of a pre-trained StyleGAN2 [1] that guarantees disentangled and precise controls for each semantic attribute; and 2) cross-domain adaptation that bridges domain discrepancies and makes human 3DMM applicable to out-of-domain faces by enforcing a consistent latent space embedding. Experiments and comparisons demonstrate the superiority of our high-quality semantic manipulation method on a variety of face domains with all major 3D facial attributes controllable - pose, expression, shape, albedo, and illumination. Moreover, we develop an intuitive editing interface to support user-friendly control and instant feedback. Our project page is https://cassiepython.github.io/cddfm3d/index.html.
Collapse
|
14
|
Ye Z, Xia M, Sun Y, Yi R, Yu M, Zhang J, Lai YK, Liu YJ. 3D-CariGAN: An End-to-End Solution to 3D Caricature Generation From Normal Face Photos. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:2203-2210. [PMID: 34752397 DOI: 10.1109/tvcg.2021.3126659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Caricature is a type of artistic style of human faces that attracts considerable attention in the entertainment industry. So far a few 3D caricature generation methods exist and all of them require some caricature information (e.g., a caricature sketch or 2D caricature) as input. This kind of input, however, is difficult to provide by non-professional users. In this paper, we propose an end-to-end deep neural network model that generates high-quality 3D caricatures directly from a normal 2D face photo. The most challenging issue for our system is that the source domain of face photos (characterized by normal 2D faces) is significantly different from the target domain of 3D caricatures (characterized by 3D exaggerated face shapes and textures). To address this challenge, we: (1) build a large dataset of 5,343 3D caricature meshes and use it to establish a PCA model in the 3D caricature shape space; (2) reconstruct a normal full 3D head from the input face photo and use its PCA representation in the 3D caricature shape space to establish correspondences between the input photo and 3D caricature shape; and (3) propose a novel character loss and a novel caricature loss based on previous psychological studies on caricatures. Experiments including a novel two-level user study show that our system can generate high-quality 3D caricatures directly from normal face photos.
Collapse
|
15
|
Yang W, Chen Z, Chen C, Chen G, Wong KYK. Deep Face Video Inpainting via UV Mapping. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; PP:1145-1157. [PMID: 37022429 DOI: 10.1109/tip.2023.3240835] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
This paper addresses the problem of face video inpainting. Existing video inpainting methods target primarily at natural scenes with repetitive patterns. They do not make use of any prior knowledge of the face to help retrieve correspondences for the corrupted face. They therefore only achieve sub-optimal results, particularly for faces under large pose and expression variations where face components appear very differently across frames. In this paper, we propose a two-stage deep learning method for face video inpainting. We employ 3DMM as our 3D face prior to transform a face between the image space and the UV (texture) space. In Stage I, we perform face inpainting in the UV space. This helps to largely remove the influence of face poses and expressions and makes the learning task much easier with well aligned face features. We introduce a frame-wise attention module to fully exploit correspondences in neighboring frames to assist the inpainting task. In Stage II, we transform the inpainted face regions back to the image space and perform face video refinement that inpaints any background regions not covered in Stage I and also refines the inpainted face regions. Extensive experiments have been carried out which show our method can significantly outperform methods based merely on 2D information, especially for faces under large pose and expression variations. Project page: https://ywq.github.io/FVIP.
Collapse
|
16
|
Han F, Ye S, He M, Chai M, Liao J. Exemplar-Based 3D Portrait Stylization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:1371-1383. [PMID: 34559656 DOI: 10.1109/tvcg.2021.3114308] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Exemplar-based portrait stylization is widely attractive and highly desired. Despite recent successes, it remains challenging, especially when considering both texture and geometric styles. In this article, we present the first framework for one-shot 3D portrait style transfer, which can generate 3D face models with both the geometry exaggerated and the texture stylized while preserving the identity from the original content. It requires only one arbitrary style image instead of a large set of training examples for a particular style, provides geometry and texture outputs that are fully parameterized and disentangled, and enables further graphics applications with the 3D representations. The framework consists of two stages. In the first geometric style transfer stage, we use facial landmark translation to capture the coarse geometry style and guide the deformation of the dense 3D face geometry. In the second texture style transfer stage, we focus on performing style transfer on the canonical texture by adopting a differentiable renderer to optimize the texture in a multi-view framework. Experiments show that our method achieves robustly good results on different artistic styles and outperforms existing methods. We also demonstrate the advantages of our method via various 2D and 3D graphics applications.
Collapse
|
17
|
Zhu X, Yu C, Huang D, Lei Z, Wang H, Li SZ. Beyond 3DMM: Learning to Capture High-Fidelity 3D Face Shape. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:1442-1457. [PMID: 35363609 DOI: 10.1109/tpami.2022.3164131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
3D Morphable Model (3DMM) fitting has widely benefited face analysis due to its strong 3D priori. However, previous reconstructed 3D faces suffer from degraded visual verisimilitude due to the loss of fine-grained geometry, which is attributed to insufficient ground-truth 3D shapes, unreliable training strategies and limited representation power of 3DMM. To alleviate this issue, this paper proposes a complete solution to capture the personalized shape so that the reconstructed shape looks identical to the corresponding person. Specifically, given a 2D image as the input, we virtually render the image in several calibrated views to normalize pose variations while preserving the original image geometry. A many-to-one hourglass network serves as the encode-decoder to fuse multiview features and generate vertex displacements as the fine-grained geometry. Besides, the neural network is trained by directly optimizing the visual effect, where two 3D shapes are compared by measuring the similarity between the multiview images rendered from the shapes. Finally, we propose to generate the ground-truth 3D shapes by registering RGB-D images followed by pose and shape augmentation, providing sufficient data for network training. Experiments on several challenging protocols demonstrate the superior reconstruction accuracy of our proposal on the face shape.
Collapse
|
18
|
Kang Z, Sadeghi M, Horaud R, Alameda-Pineda X. Expression-Preserving Face Frontalization Improves Visually Assisted Speech Processing. Int J Comput Vis 2023. [DOI: 10.1007/s11263-022-01742-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
19
|
Yang Z, Chen B, Zheng Y, Chen X, Zhou K. Human Bas-Relief Generation From a Single Photograph. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:4558-4569. [PMID: 34191727 DOI: 10.1109/tvcg.2021.3092877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We present a semi-automatic method for producing human bas-relief from a single photograph. Given an input photo of one or multiple persons, our method first estimates a 3D skeleton for each person in the image. SMPL models are then fitted to the 3D skeletons to generate a 3D guide model. To align the 3D guide model with the image, we compute a 2D warping field to non-rigidly register the projected contours of the guide model with the body contours in the image. Then the normal map of the 3D guide model is warped by the 2D deformation field to reconstruct an overall base shape. Finally, the base shape is integrated with a fine-scale normal map to produce the final bas-relief. To tackle the complex intra- and inter-body interactions, we design an occlusion relationship resolution method that operates at the level of 3D skeletons with minimal user inputs. To tightly register the model contours to the image contours, we propose a non-rigid point matching algorithm harnessing user-specified sparse correspondences. Experiments demonstrate that our human bas-relief generation method is capable of producing perceptually realistic results on various single-person and multi-person images, on which the state-of-the-art depth and pose estimation methods often fail.
Collapse
|
20
|
Wu H, Gu J, Fan* X, Li H, Xie L, Zhao J. 3D-Guided Frontal Face Generation for Pose-Invariant Recognition. ACM T INTEL SYST TEC 2022. [DOI: 10.1145/3572035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Although deep learning techniques have achieved extraordinary accuracy in recognizing human faces, the pose variances of images captured in real-world scenarios still hinder reliable model appliance. To mitigate this gap, we propose to recognize faces via generation frontal face images with a
3D
-Guided Deep
P
ose-
I
nvariant Face Recognition
M
odel (3D-PIM) consisted of a simulator and a refiner module. The simulator employs a 3D Morphable Model (3D MM) to fit the shape and appearance features and recover primary frontal images with less training data. The refiner further enhances the image realism on both global facial structure and local details with adversarial training, while keeping the discriminative identity information consistent with original images. An Adaptive Weighting (AW) metric is then adopted to leverage the complimentary information from recovered frontal faces and original profile faces and to obtain credible similarity scores for recognition. Extended experiments verify the superiority of the proposed “recognition via generation” framework over state-of-the-arts.
Collapse
Affiliation(s)
- Hao Wu
- School of Artificial Intelligence, Beijing Normal University, China
| | - Jianyang Gu
- College of Control Science and Engineering, Zhejiang University, China
| | | | | | - Lidong Xie
- School of Artificial Intelligence, Beijing Normal University, China
| | - Jian Zhao
- Institute of North Electronic Equipment, China; Department of Mathematics and Theories, Peng Cheng Laboratory, China
| |
Collapse
|
21
|
Fast 3D Face Reconstruction from a Single Image Using Different Deep Learning Approaches for Facial Palsy Patients. Bioengineering (Basel) 2022; 9:bioengineering9110619. [PMID: 36354529 PMCID: PMC9687570 DOI: 10.3390/bioengineering9110619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 10/20/2022] [Accepted: 10/22/2022] [Indexed: 11/16/2022] Open
Abstract
The 3D reconstruction of an accurate face model is essential for delivering reliable feedback for clinical decision support. Medical imaging and specific depth sensors are accurate but not suitable for an easy-to-use and portable tool. The recent development of deep learning (DL) models opens new challenges for 3D shape reconstruction from a single image. However, the 3D face shape reconstruction of facial palsy patients is still a challenge, and this has not been investigated. The contribution of the present study is to apply these state-of-the-art methods to reconstruct the 3D face shape models of facial palsy patients in natural and mimic postures from one single image. Three different methods (3D Basel Morphable model and two 3D Deep Pre-trained models) were applied to the dataset of two healthy subjects and two facial palsy patients. The reconstructed outcomes were compared to the 3D shapes reconstructed using Kinect-driven and MRI-based information. As a result, the best mean error of the reconstructed face according to the Kinect-driven reconstructed shape is 1.5±1.1 mm. The best error range is 1.9±1.4 mm when compared to the MRI-based shapes. Before using the procedure to reconstruct the 3D faces of patients with facial palsy or other facial disorders, several ideas for increasing the accuracy of the reconstruction can be discussed based on the results. This present study opens new avenues for the fast reconstruction of the 3D face shapes of facial palsy patients from a single image. As perspectives, the best DL method will be implemented into our computer-aided decision support system for facial disorders.
Collapse
|
22
|
Ferrari C, Berretti S, Pala P, Bimbo AD. A Sparse and Locally Coherent Morphable Face Model for Dense Semantic Correspondence Across Heterogeneous 3D Faces. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:6667-6682. [PMID: 34156937 DOI: 10.1109/tpami.2021.3090942] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The 3D Morphable Model (3DMM) is a powerful statistical tool for representing 3D face shapes. To build a 3DMM, a training set of face scans in full point-to-point correspondence is required, and its modeling capabilities directly depend on the variability contained in the training data. Thus, to increase the descriptive power of the 3DMM, establishing a dense correspondence across heterogeneous scans with sufficient diversity in terms of identities, ethnicities, or expressions becomes essential. In this manuscript, we present a fully automatic approach that leverages a 3DMM to transfer its dense semantic annotation across raw 3D faces, establishing a dense correspondence between them. We propose a novel formulation to learn a set of sparse deformation components with local support on the face that, together with an original non-rigid deformation algorithm, allow the 3DMM to precisely fit unseen faces and transfer its semantic annotation. We extensively experimented our approach, showing it can effectively generalize to highly diverse samples and accurately establish a dense correspondence even in presence of complex facial expressions. The accuracy of the dense registration is demonstrated by building a heterogeneous, large-scale 3DMM from more than 9,000 fully registered scans obtained by joining three large datasets together.
Collapse
|
23
|
Spatio-Frequency Decoupled Weak-Supervision for Face Reconstruction. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:5903514. [PMID: 36188707 PMCID: PMC9522507 DOI: 10.1155/2022/5903514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 07/12/2022] [Indexed: 11/18/2022]
Abstract
3D face reconstruction has witnessed considerable progress in recovering 3D face shapes and textures from in-the-wild images. However, due to a lack of texture detail information, the reconstructed shape and texture based on deep learning could not be used to re-render a photorealistic facial image since it does not work in harmony with weak supervision only from the spatial domain. In the paper, we propose a method of spatio-frequency decoupled weak-supervision for face reconstruction, which applies the losses from not only the spatial domain but also the frequency domain to learn the reconstruction process that approaches photorealistic effect based on the output shape and texture. In detail, the spatial domain losses cover image-level and perceptual-level supervision. Moreover, the frequency domain information is separated from the input and rendered images, respectively, and is then used to build the frequency-based loss. In particular, we devise a spectrum-wise weighted Wing loss to implement balanced attention on different spectrums. Through the spatio-frequency decoupled weak-supervision, the reconstruction process can be learned in harmony and generate detailed texture and high-quality shape only with labels of landmarks. The experiments on several benchmarks show that our method can generate high-quality results and outperform state-of-the-art methods in qualitative and quantitative comparisons.
Collapse
|
24
|
Deng Q, Ma L, Jin A, Bi H, Le BH, Deng Z. Plausible 3D Face Wrinkle Generation Using Variational Autoencoders. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:3113-3125. [PMID: 33439847 DOI: 10.1109/tvcg.2021.3051251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Realistic 3D facial modeling and animation have been increasingly used in many graphics, animation, and virtual reality applications. However, generating realistic fine-scale wrinkles on 3D faces, in particular, on animated 3D faces, is still a challenging problem that is far away from being resolved. In this article we propose an end-to-end system to automatically augment coarse-scale 3D faces with synthesized fine-scale geometric wrinkles. By formulating the wrinkle generation problem as a supervised generation task, we implicitly model the continuous space of face wrinkles via a compact generative model, such that plausible face wrinkles can be generated through effective sampling and interpolation in the space. We also introduce a complete pipeline to transfer the synthesized wrinkles between faces with different shapes and topologies. Through many experiments, we demonstrate our method can robustly synthesize plausible fine-scale wrinkles on a variety of coarse-scale 3D faces with different shapes and expressions.
Collapse
|
25
|
Lu T, Peng Z, Xing X, Xu X, Pang J. A General Method of Realistic Avatar Modeling and Driving for Head-Mounted Display Users. IEEE Trans Cogn Dev Syst 2022. [DOI: 10.1109/tcds.2021.3080588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Ting Lu
- School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China
| | - Zhengfu Peng
- School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China
| | - Xiaofen Xing
- School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China
| | - Xiangmin Xu
- School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China
| | - Jianxin Pang
- UBTech Research, Ubtech Robotics Corporation, Shenzhen, China
| |
Collapse
|
26
|
Geng X, Zheng R, Lv J, Zhang Y. Multilabel Ranking With Inconsistent Rankers. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:5211-5224. [PMID: 33798071 DOI: 10.1109/tpami.2021.3070709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
While most existing multilabel ranking methods assume the availability of a single objective label ranking for each instance in the training set, this paper deals with a more common case where only subjective inconsistent rankings from multiple rankers are associated with each instance. Two ranking methods are proposed from the perspective of instances and rankers, respectively. The first method, Instance-oriented Preference Distribution Learning (IPDL), is to learn a latent preference distribution for each instance. IPDL generates a common preference distribution that is most compatible to all the personal rankings, and then learns a mapping from the instances to the preference distributions. The second method, Ranker-oriented Preference Distribution Learning (RPDL), is proposed by leveraging interpersonal inconsistency among rankers, to learn a unified model from personal preference distribution models of all rankers. These two methods are applied to natural scene images dataset and 3D facial expression dataset BU_3DFE. Experimental results show that IPDL and RPDL can effectively incorporate the information given by the inconsistent rankers, and perform remarkably better than the compared state-of-the-art multilabel ranking algorithms.
Collapse
|
27
|
Hartmann TJ, Hartmann JBJ, Friebe-Hoffmann U, Lato C, Janni W, Lato K. Novel Method for Three-Dimensional Facial Expression Recognition Using Self-Normalizing Neural Networks and Mobile Devices. Geburtshilfe Frauenheilkd 2022; 82:955-969. [PMID: 36110895 PMCID: PMC9470291 DOI: 10.1055/a-1866-2943] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 05/26/2022] [Indexed: 11/25/2022] Open
Abstract
Introduction To date, most ways to perform facial expression recognition rely on two-dimensional images, advanced approaches with three-dimensional data exist. These however demand stationary apparatuses and thus lack portability and possibilities to scale deployment. As human emotions, intent and even diseases may condense in distinct facial expressions or changes therein, the need for a portable yet capable solution is signified. Due to the superior informative value of three-dimensional data on facial morphology and because certain syndromes find expression in specific facial dysmorphisms, a solution should allow portable acquisition of true three-dimensional facial scans in real time. In this study we present a novel solution for the three-dimensional acquisition of facial geometry data and the recognition of facial expressions from it. The new technology presented here only requires the use of a smartphone or tablet with an integrated TrueDepth camera and enables real-time acquisition of the geometry and its categorization into distinct facial expressions. Material and Methods Our approach consisted of two parts: First, training data was acquired by asking a collective of 226 medical students to adopt defined facial expressions while their current facial morphology was captured by our specially developed app running on iPads, placed in front of the students. In total, the list of the facial expressions to be shown by the participants consisted of "disappointed", "stressed", "happy", "sad" and "surprised". Second, the data were used to train a self-normalizing neural network. A set of all factors describing the current facial expression at a time is referred to as "snapshot". Results In total, over half a million snapshots were recorded in the study. Ultimately, the network achieved an overall accuracy of 80.54% after 400 epochs of training. In test, an overall accuracy of 81.15% was determined. Recall values differed by the category of a snapshot and ranged from 74.79% for "stressed" to 87.61% for "happy". Precision showed similar results, whereas "sad" achieved the lowest value at 77.48% and "surprised" the highest at 86.87%. Conclusions With the present work it can be demonstrated that respectable results can be achieved even when using data sets with some challenges. Through various measures, already incorporated into an optimized version of our app, it is to be expected that the training results can be significantly improved and made more precise in the future. Currently a follow-up study with the new version of our app that encompasses the suggested alterations and adaptions, is being conducted. We aim to build a large and open database of facial scans not only for facial expression recognition but to perform disease recognition and to monitor diseases' treatment progresses.
Collapse
Affiliation(s)
- Tim Johannes Hartmann
- Universitäts-Hautklinik Tübingen, Tübingen, Germany
- Universitätsfrauenklinik Ulm, Ulm, Germany
| | | | | | | | | | | |
Collapse
|
28
|
Wang Z, Ling J, Feng C, Lu M, Xu F. Emotion-Preserving Blendshape Update With Real-Time Face Tracking. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:2364-2375. [PMID: 33104513 DOI: 10.1109/tvcg.2020.3033838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Blendshape representations are widely used in facial animation. Consistent semantics must be maintained for all the blendshapes to build the blendshapes of one character. However, this is difficult for real characters because the face shape of the same semantics varies significantly across identities. Previous studies have handled this issue by asking users to perform a set of predefined expressions with specified semantics. We observe that facial emotions can be used to define semantics. Herein, we propose a real-time technique that directly updates blendshapes without predefined expressions. Its aim is to preserve semantics based on the emotion information extracted from an arbitrary facial motion sequence. In addition, we have designed corresponding algorithms to efficiently update blendshapes with large- and middle-scale face shapes and fine-scale facial details, such as wrinkles, in a real-time face tracking system. The experimental results indicate that using a commodity RGBD sensor, we can achieve real-time online blendshape updates with well-preserved semantics and user-specific facial features and details.
Collapse
|
29
|
Giannakakis G, Koujan MR, Roussos A, Marias K. Correction to: Automatic stress analysis from facial videos based on deep facial action units recognition. Pattern Anal Appl 2022. [DOI: 10.1007/s10044-022-01060-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
30
|
Sharma S, Kumar V. 3D Face Reconstruction in Deep Learning Era: A Survey. ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING : STATE OF THE ART REVIEWS 2022; 29:3475-3507. [PMID: 35035213 PMCID: PMC8744573 DOI: 10.1007/s11831-021-09705-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/10/2021] [Accepted: 12/21/2021] [Indexed: 06/14/2023]
Abstract
3D face reconstruction is the most captivating topic in biometrics with the advent of deep learning and readily available graphical processing units. This paper explores the various aspects of 3D face reconstruction techniques. Five techniques have been discussed, namely, deep learning, epipolar geometry, one-shot learning, 3D morphable model, and shape from shading methods. This paper provides an in-depth analysis of 3D face reconstruction using deep learning techniques. The performance analysis of different face reconstruction techniques has been discussed in terms of software, hardware, pros and cons. The challenges and future scope of 3d face reconstruction techniques have also been discussed.
Collapse
Affiliation(s)
- Sahil Sharma
- Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala, India
| | - Vijay Kumar
- Computer Science and Engineering Department, National Institute of Technology, Hamirpur, India
| |
Collapse
|
31
|
Cui H, Yang J, Lai YK, Li K. Monocular 3D Face Reconstruction with Joint 2D and 3D Constraints. ARTIF INTELL 2022. [DOI: 10.1007/978-3-031-20497-5_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
32
|
Grasshof S, Ackermann H, Brandt SS, Ostermann J. Multilinear Modelling of Faces and Expressions. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:3540-3554. [PMID: 32305893 DOI: 10.1109/tpami.2020.2986496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this work, we present a new versatile 3D multilinear statistical face model, based on a tensor factorisation of 3D face scans, that decomposes the shapes into person and expression subspaces. Investigation of the expression subspace reveals an inherent low-dimensional substructure, and further, a star-shaped structure. This is due to two novel findings. (1) Increasing the strength of one emotion approximately forms a linear trajectory in the subspace. (2) All these trajectories intersect at a single point - not at the neutral expression as assumed by almost all prior works-but at an apathetic expression. We utilise these structural findings by reparameterising the expression subspace by the fourth-order moment tensor centred at the point of apathy. We propose a 3D face reconstruction method from single or multiple 2D projections by assuming an uncalibrated projective camera model. The non-linearity caused by the perspective projection can be neatly included into the model. The proposed algorithm separates person and expression subspaces convincingly, and enables flexible, natural modelling of expressions for a wide variety of human faces. Applying the method on independent faces showed that morphing between different persons and expressions can be performed without strong deformations.
Collapse
|
33
|
Automatic stress analysis from facial videos based on deep facial action units recognition. Pattern Anal Appl 2021. [DOI: 10.1007/s10044-021-01012-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
34
|
Abstract
AbstractStandard registration algorithms need to be independently applied to each surface to register, following careful pre-processing and hand-tuning. Recently, learning-based approaches have emerged that reduce the registration of new scans to running inference with a previously-trained model. The potential benefits are multifold: inference is typically orders of magnitude faster than solving a new instance of a difficult optimization problem, deep learning models can be made robust to noise and corruption, and the trained model may be re-used for other tasks, e.g. through transfer learning. In this paper, we cast the registration task as a surface-to-surface translation problem, and design a model to reliably capture the latent geometric information directly from raw 3D face scans. We introduce Shape-My-Face (SMF), a powerful encoder-decoder architecture based on an improved point cloud encoder, a novel visual attention mechanism, graph convolutional decoders with skip connections, and a specialized mouth model that we smoothly integrate with the mesh convolutions. Compared to the previous state-of-the-art learning algorithms for non-rigid registration of face scans, SMF only requires the raw data to be rigidly aligned (with scaling) with a pre-defined face template. Additionally, our model provides topologically-sound meshes with minimal supervision, offers faster training time, has orders of magnitude fewer trainable parameters, is more robust to noise, and can generalize to previously unseen datasets. We extensively evaluate the quality of our registrations on diverse data. We demonstrate the robustness and generalizability of our model with in-the-wild face scans across different modalities, sensor types, and resolutions. Finally, we show that, by learning to register scans, SMF produces a hybrid linear and non-linear morphable model. Manipulation of the latent space of SMF allows for shape generation, and morphing applications such as expression transfer in-the-wild. We train SMF on a dataset of human faces comprising 9 large-scale databases on commodity hardware.
Collapse
|
35
|
Cao M, Huang H, Wang H, Wang X, Shen L, Wang S, Bao L, Li Z, Luo J. UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video Editing. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:6107-6116. [PMID: 34166189 DOI: 10.1109/tip.2021.3089909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Recent research has witnessed advances in facial image editing tasks including face swapping and face reenactment. However, these methods are confined to dealing with one specific task at a time. In addition, for video facial editing, previous methods either simply apply transformations frame by frame or utilize multiple frames in a concatenated or iterative fashion, which leads to noticeable visual flickers. In this paper, we propose a unified temporally consistent facial video editing framework termed UniFaceGAN. Based on a 3D reconstruction model and a simple yet efficient dynamic training sample selection mechanism, our framework is designed to handle face swapping and face reenactment simultaneously. To enforce the temporal consistency, a novel 3D temporal loss constraint is introduced based on the barycentric coordinate interpolation. Besides, we propose a region-aware conditional normalization layer to replace the traditional AdaIN or SPADE to synthesize more context-harmonious results. Compared with the state-of-the-art facial image editing methods, our framework generates video portraits that are more photo-realistic and temporally smooth.
Collapse
|
36
|
Liu Y, Zheng C, Xu F, Tong X, Guo B. Data-Driven 3D Neck Modeling and Animation. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:3226-3237. [PMID: 31944979 DOI: 10.1109/tvcg.2020.2967036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, we present a data-driven approach for modeling and animation of 3D necks. Our method is based on a new neck animation model that decomposes the neck animation into local deformation caused by larynx motion and global deformation driven by head poses, facial expressions, and speech. A skinning model is introduced for modeling local deformation and underlying larynx motions, while the global neck deformation caused by each factor is modeled by its corrective blendshape set, respectively. Based on this neck model, we introduce a regression method to drive the larynx motion and neck deformation from speech. Both the neck model and the speech regressor are learned from a dataset of 3D neck animation sequences captured from different identities. Our neck model significantly improves the realism of facial animation and allows users to easily create plausible neck animations from speech and facial expressions. We verify our neck model and demonstrate its advantages in 3D neck tracking and animation.
Collapse
|
37
|
Khan A, Hayat S, Ahmad M, Cao J, Tahir MF, Ullah A, Javed MS. Learning-detailed 3D face reconstruction based on convolutional neural networks from a single image. Neural Comput Appl 2021. [DOI: 10.1007/s00521-020-05373-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
38
|
Liu N, Zhang D, Ru X, Zhao H, Wang X, Wu Z. Caricature Expression Extrapolation Based on Kendall Shape Space Theory. IEEE COMPUTER GRAPHICS AND APPLICATIONS 2021; 41:71-84. [PMID: 33788684 DOI: 10.1109/mcg.2021.3069948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Facial expression editing plays a fundamental role in facial expression generation and has been widely applied in modern film productions and computer games. While the existing 2-D caricature facial expression editing methods are mostly realized by expression interpolation from the original image to the target image, expression extrapolation has rarely been studied before. In this article, we propose a novel expression extrapolation method for caricature facial expressions based on the Kendall shape space, in which the key idea is to introduce a representation for the 3-D expression model to remove rigid transformations, such as translation, scaling, and rotation, from the Kendall shape space. Built upon the proposed representation, the 2-D caricature expression extrapolation process can be controlled by the 3-D model reconstructed from the input 2-D caricature image and the exaggerated expressions of the caricature images generated based on the extrapolated expression of a 3-D model that is robust to facial poses in the Kendall shape space; this 3-D model can be calculated with tools such as exponential mapping in Riemannian space. The experimental results demonstrate that our method can effectively and automatically extrapolate facial expressions in caricatures with high consistency and fidelity. In addition, we derive 3-D facial models with diverse expressions and expand the scale of the original FaceWarehouse database. Furthermore, compared with the deep learning method, our approach is based on standard face datasets and avoids the construction of complicated 3-D caricature training sets.
Collapse
|
39
|
Lou J, Cai X, Dong J, Yu H. Real-Time 3D Facial Tracking via Cascaded Compositional Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:3844-3857. [PMID: 33735081 DOI: 10.1109/tip.2021.3065819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
We propose to learn a cascade of globally-optimized modular boosted ferns (GoMBF) to solve multi-modal facial motion regression for real-time 3D facial tracking from a monocular RGB camera. GoMBF is a deep composition of multiple regression models with each is a boosted ferns initially trained to predict partial motion parameters of the same modality, and then concatenated together via a global optimization step to form a singular strong boosted ferns that can effectively handle the whole regression target. It can explicitly cope with the modality variety in output variables, while manifesting increased fitting power and a faster learning speed comparing against the conventional boosted ferns. By further cascading a sequence of GoMBFs (GoMBF-Cascade) to regress facial motion parameters, we achieve competitive tracking performance on a variety of in-the-wild videos comparing to the state-of-the-art methods which either have higher computational complexity or require much more training data. It provides a robust and highly elegant solution to real-time 3D facial tracking using a small set of training data and hence makes it more practical in real-world applications. We further deeply investigate the effect of synthesized facial images on training non-deep learning methods such as GoMBF-Cascade for 3D facial tracking. We apply three types synthetic images with various naturalness levels for training two different tracking methods, and compare the performance of the tracking models trained on real data, on synthetic data and on a mixture of data. The experimental results indicate that, i) the model trained purely on synthetic facial imageries can hardly generalize well to unconstrained real-world data, ii) involving synthetic faces into training benefits tracking in some certain scenarios but degrades the tracking model's generalization ability. These two insights could benefit a range of non-deep learning facial image analysis tasks where the labelled real data is difficult to acquire.
Collapse
|
40
|
Guo Y, Cai L, Zhang J. 3D Face From X: Learning Face Shape From Diverse Sources. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:3815-3827. [PMID: 33735079 DOI: 10.1109/tip.2021.3065798] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
We present a novel method to jointly learn a 3D face parametric model and 3D face reconstruction from diverse sources. Previous methods usually learn 3D face modeling from one kind of source, such as scanned data or in-the-wild images. Although 3D scanned data contain accurate geometric information of face shapes, the capture system is expensive and such datasets usually contain a small number of subjects. On the other hand, in-the-wild face images are easily obtained and there are a large number of facial images. However, facial images do not contain explicit geometric information. In this paper, we propose a method to learn a unified face model from diverse sources. Besides scanned face data and face images, we also utilize a large number of RGB-D images captured with an iPhone X to bridge the gap between the two sources. Experimental results demonstrate that with training data from more sources, we can learn a more powerful face model.
Collapse
|
41
|
Liu L, Ke Z, Huo J, Chen J. Head Pose Estimation through Keypoints Matching between Reconstructed 3D Face Model and 2D Image. SENSORS (BASEL, SWITZERLAND) 2021; 21:1841. [PMID: 33800750 PMCID: PMC7961623 DOI: 10.3390/s21051841] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 02/26/2021] [Accepted: 03/02/2021] [Indexed: 11/27/2022]
Abstract
Mainstream methods treat head pose estimation as a supervised classification/regression problem, whose performance heavily depends on the accuracy of ground-truth labels of training data. However, it is rather difficult to obtain accurate head pose labels in practice, due to the lack of effective equipment and reasonable approaches for head pose labeling. In this paper, we propose a method which does not need to be trained with head pose labels, but matches the keypoints between a reconstructed 3D face model and the 2D input image, for head pose estimation. The proposed head pose estimation method consists of two components: the 3D face reconstruction and the 3D-2D matching keypoints. At the 3D face reconstruction phase, a personalized 3D face model is reconstructed from the input head image using convolutional neural networks, which are jointly optimized by an asymmetric Euclidean loss and a keypoint loss. At the 3D-2D keypoints matching phase, an iterative optimization algorithm is proposed to match the keypoints between the reconstructed 3D face model and the 2D input image efficiently under the constraint of perspective transformation. The proposed method is extensively evaluated on five widely used head pose estimation datasets, including Pointing'04, BIWI, AFLW2000, Multi-PIE, and Pandora. The experimental results demonstrate that the proposed method achieves excellent cross-dataset performance and surpasses most of the existing state-of-the-art approaches, with average MAEs of 4.78∘ on Pointing'04, 6.83∘ on BIWI, 7.05∘ on AFLW2000, 5.47∘ on Multi-PIE, and 5.06∘ on Pandora, although the model of the proposed method is not trained on any of these five datasets.
Collapse
Affiliation(s)
- Leyuan Liu
- National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430079, China; (L.L.); (Z.K.); (J.H.)
- National Engineering Laboratory for Educational Big Data, Central China Normal University, Wuhan 430079, China
| | - Zeran Ke
- National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430079, China; (L.L.); (Z.K.); (J.H.)
| | - Jiao Huo
- National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430079, China; (L.L.); (Z.K.); (J.H.)
| | - Jingying Chen
- National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430079, China; (L.L.); (Z.K.); (J.H.)
- National Engineering Laboratory for Educational Big Data, Central China Normal University, Wuhan 430079, China
| |
Collapse
|
42
|
Xu T, An D, Jia Y, Yue Y. A Review: Point Cloud-Based 3D Human Joints Estimation. SENSORS (BASEL, SWITZERLAND) 2021; 21:1684. [PMID: 33804411 PMCID: PMC7957572 DOI: 10.3390/s21051684] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Revised: 02/23/2021] [Accepted: 02/23/2021] [Indexed: 11/17/2022]
Abstract
Joint estimation of the human body is suitable for many fields such as human-computer interaction, autonomous driving, video analysis and virtual reality. Although many depth-based researches have been classified and generalized in previous review or survey papers, the point cloud-based pose estimation of human body is still difficult due to the disorder and rotation invariance of the point cloud. In this review, we summarize the recent development on the point cloud-based pose estimation of the human body. The existing works are divided into three categories based on their working principles, including template-based method, feature-based method and machine learning-based method. Especially, the significant works are highlighted with a detailed introduction to analyze their characteristics and limitations. The widely used datasets in the field are summarized, and quantitative comparisons are provided for the representative methods. Moreover, this review helps further understand the pertinent applications in many frontier research directions. Finally, we conclude the challenges involved and problems to be solved in future researches.
Collapse
Affiliation(s)
- Tianxu Xu
- Institute of Modern Optics, Nankai University, Tianjin 300350, China; (T.X.); (D.A.); (Y.J.)
| | - Dong An
- Institute of Modern Optics, Nankai University, Tianjin 300350, China; (T.X.); (D.A.); (Y.J.)
| | - Yuetong Jia
- Institute of Modern Optics, Nankai University, Tianjin 300350, China; (T.X.); (D.A.); (Y.J.)
| | - Yang Yue
- Institute of Modern Optics, Nankai University, Tianjin 300350, China; (T.X.); (D.A.); (Y.J.)
- Angle AI (Tianjin) Technology Company Ltd., Tianjin 300450, China
| |
Collapse
|
43
|
Yu J, Ramamoorthi R. Selfie Video Stabilization. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:701-711. [PMID: 31380744 DOI: 10.1109/tpami.2019.2931897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We propose a novel algorithm for stabilizing selfie videos. Our goal is to automatically generate stabilized video that has optimal smooth motion in the sense of both foreground and background. The key insight is that non-rigid foreground motion in selfie videos can be analyzed using a 3D face model, and background motion can be analyzed using optical flow. We use second derivative of temporal trajectory of selected pixels as the measure of smoothness. Our algorithm stabilizes selfie videos by minimizing the smoothness measure of the background, regularized by the motion of the foreground. Experiments show that our method outperforms state-of-the-art general video stabilization techniques in selfie videos.
Collapse
|
44
|
Ariano L, Ferrari C, Berretti S, Del Bimbo A. Action Unit Detection by Learning the Deformation Coefficients of a 3D Morphable Model. SENSORS (BASEL, SWITZERLAND) 2021; 21:E589. [PMID: 33467595 PMCID: PMC7830313 DOI: 10.3390/s21020589] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 12/31/2020] [Accepted: 01/12/2021] [Indexed: 11/16/2022]
Abstract
Facial Action Units (AUs) correspond to the deformation/contraction of individual facial muscles or their combinations. As such, each AU affects just a small portion of the face, with deformations that are asymmetric in many cases. Generating and analyzing AUs in 3D is particularly relevant for the potential applications it can enable. In this paper, we propose a solution for 3D AU detection and synthesis by developing on a newly defined 3D Morphable Model (3DMM) of the face. Differently from most of the 3DMMs existing in the literature, which mainly model global variations of the face and show limitations in adapting to local and asymmetric deformations, the proposed solution is specifically devised to cope with such difficult morphings. During a training phase, the deformation coefficients are learned that enable the 3DMM to deform to 3D target scans showing neutral and facial expression of the same individual, thus decoupling expression from identity deformations. Then, such deformation coefficients are used, on the one hand, to train an AU classifier, on the other, they can be applied to a 3D neutral scan to generate AU deformations in a subject-independent manner. The proposed approach for AU detection is validated on the Bosphorus dataset, reporting competitive results with respect to the state-of-the-art, even in a challenging cross-dataset setting. We further show the learned coefficients are general enough to synthesize realistic 3D face instances with AUs activation.
Collapse
Affiliation(s)
| | | | | | - Alberto Del Bimbo
- Media Integration and Communication Center, University of Florence, 50134 Firenze, Italy; (L.A.); (C.F.); (S.B.)
| |
Collapse
|
45
|
Tran L, Liu X. On Learning 3D Face Morphable Model from In-the-Wild Images. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:157-171. [PMID: 31329546 DOI: 10.1109/tpami.2019.2927975] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
As a classic statistical model of 3D facial shape and albedo, 3D Morphable Model (3DMM) is widely used in facial analysis, e.g., model fitting, image synthesis. Conventional 3DMM is learned from a set of 3D face scans with associated well-controlled 2D face images, and represented by two sets of PCA basis functions. Due to the type and amount of training data, as well as, the linear bases, the representation power of 3DMM can be limited. To address these problems, this paper proposes an innovative framework to learn a nonlinear 3DMM model from a large set of in-the-wild face images, without collecting 3D face scans. Specifically, given a face image as input, a network encoder estimates the projection, lighting, shape and albedo parameters. Two decoders serve as the nonlinear 3DMM to map from the shape and albedo parameters to the 3D shape and albedo, respectively. With the projection parameter, lighting, 3D shape, and albedo, a novel analytically-differentiable rendering layer is designed to reconstruct the original input face. The entire network is end-to-end trainable with only weak supervision. We demonstrate the superior representation power of our nonlinear 3DMM over its linear counterpart, and its contribution to face alignment, 3D reconstruction, and face editing. Source code and additional results can be found at our project page: http://cvlab.cse.msu.edu/project-nonlinear-3dmm.html.
Collapse
|
46
|
Wen X, Wang M, Richardt C, Chen ZY, Hu SM. Photorealistic Audio-driven Video Portraits. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:3457-3466. [PMID: 32941145 DOI: 10.1109/tvcg.2020.3023573] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Video portraits are common in a variety of applications, such as videoconferencing, news broadcasting, and virtual education and training. We present a novel method to synthesize photorealistic video portraits for an input portrait video, automatically driven by a person's voice. The main challenge in this task is the hallucination of plausible, photorealistic facial expressions from input speech audio. To address this challenge, we employ a parametric 3D face model represented by geometry, facial expression, illumination, etc., and learn a mapping from audio features to model parameters. The input source audio is first represented as a high-dimensional feature, which is used to predict facial expression parameters of the 3D face model. We then replace the expression parameters computed from the original target video with the predicted one, and rerender the reenacted face. Finally, we generate a photorealistic video portrait from the reenacted synthetic face sequence via a neural face renderer. One appealing feature of our approach is the generalization capability for various input speech audio, including synthetic speech audio from text-to-speech software. Extensive experimental results show that our approach outperforms previous general-purpose audio-driven video portrait methods. This includes a user study demonstrating that our results are rated as more realistic than previous methods.
Collapse
|
47
|
|
48
|
Chen Y, Wu F, Wang Z, Song Y, Ling Y, Bao L. Self-supervised Learning of Detailed 3D Face Reconstruction. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8696-8705. [PMID: 32853150 DOI: 10.1109/tip.2020.3017347] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this paper, we present an end-to-end learning framework for detailed 3D face reconstruction from a single image1. Our approach uses a 3DMM-based coarse model and a displacement map in UV-space to represent a 3D face. Unlike previous work addressing the problem, our learning framework does not require supervision of surrogate ground-truth 3D models computed with traditional approaches. Instead, we utilize the input image itself as supervision during learning. In the first stage, we combine a photometric loss and a facial perceptual loss between the input face and the rendered face, to regress a 3DMM-based coarse model. In the second stage, both the input image and the regressed texture of the coarse model are unwrapped into UV-space, and then sent through an image-toimage translation network to predict a displacement map in UVspace. The displacement map and the coarse model are used to render a final detailed face, which again can be compared with the original input image to serve as a photometric loss for the second stage. The advantage of learning displacement map in UV-space is that face alignment can be explicitly done during the unwrapping, thus facial details are easier to learn from large amount of data. Extensive experiments demonstrate the superiority of the proposed method over previous work.
Collapse
|
49
|
Zhang YW, Zhang C, Wang W, Chen Y, Ji Z, Liu H. Portrait Relief Modeling from a Single Image. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:2659-2670. [PMID: 30640615 DOI: 10.1109/tvcg.2019.2892439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
We present a novel solution to enable portrait relief modeling from a single image. The main challenges are geometry reconstruction, facial details recovery and depth structure preservation. Previous image-based methods are developed for portrait bas-relief modeling in 2.5D form, but not adequate for 3D-like high relief modeling with undercut features. In this paper, we propose a template-based framework to generate portrait reliefs of various forms. Our method benefits from Shape-from-Shading (SFS). Specifically, we use bi-Laplacian mesh deformation to guide the relief modeling. Given a portrait image, we first use a template face to fit the portrait. We then apply bi-Laplacian mesh deformation to align the facial features. Afterwards, SFS-based reconstruction with a few user interactions is used to optimize the face depth, and create a relief with similar appearance to the input. Both depth structures and geometric details can be well constructed in the final relief. Experiments and comparisons to other methods demonstrate the effectiveness of the proposed method.
Collapse
|
50
|
Han X, Hou K, Du D, Qiu Y, Cui S, Zhou K, Yu Y. CaricatureShop: Personalized and Photorealistic Caricature Sketching. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:2349-2361. [PMID: 30575537 DOI: 10.1109/tvcg.2018.2886007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, we propose the first sketching system for interactively personalized and photorealistic face caricaturing. Input an image of a human face, the users can create caricature photos by manipulating its facial feature curves. Our system first performs exaggeration on the recovered 3D face model, which is conducted by assigning the laplacian of each vertex a scaling factor according to the edited sketches. The mapping between 2D sketches and the vertex-wise scaling field is constructed by a novel deep learning architecture. Our approach allows outputting different exaggerations when applying the same sketching on different input figures in term of their different geometric characteristics, which makes the generated results "personalized". With the obtained 3D caricature model, two images are generated, one obtained by applying 2D warping guided by the underlying 3D mesh deformation and the other obtained by re-rendering the deformed 3D textured model. These two images are then seamlessly integrated to produce our final output. Due to the severe stretching of meshes, the rendered texture is of blurry appearances. A deep learning approach is exploited to infer the missing details for enhancing these blurry regions. Moreover, a relighting operation is invented to further improve the photorealism of the result. These further make our results "photorealistic". The qualitative experiment results validated the efficiency of our sketching system.
Collapse
|