1
|
Patel KY, Wilcox LM, Maloney LT, Ehinger KA, Patel JY, Wiedenmann E, Murray RF. Lightness constancy in reality, in virtual reality, and on flat-panel displays. Behav Res Methods 2024; 56:6389-6407. [PMID: 38443726 DOI: 10.3758/s13428-024-02352-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/30/2024] [Indexed: 03/07/2024]
Abstract
Virtual reality (VR) displays are being used in an increasingly wide range of applications. However, previous work shows that viewers often perceive scene properties very differently in real and virtual environments and so realistic perception of virtual stimuli should always be a carefully tested conclusion, not an assumption. One important property for realistic scene perception is surface color. To evaluate how well virtual platforms support realistic perception of achromatic surface color, we assessed lightness constancy in a physical apparatus with real lights and surfaces, in a commercial VR headset, and on a traditional flat-panel display. We found that lightness constancy was good in all three environments, though significantly better in the real environment than on the flat-panel display. We also found that variability across observers was significantly greater in VR and on the flat-panel display than in the physical environment. We conclude that these discrepancies should be taken into account in applications where realistic perception is critical but also that in many cases VR can be used as a flexible alternative to flat-panel displays and a reasonable proxy for real environments.
Collapse
Affiliation(s)
- Khushbu Y Patel
- Department of Psychology and Centre for Vision Research, York University, Toronto, Canada.
| | - Laurie M Wilcox
- Department of Psychology and Centre for Vision Research, York University, Toronto, Canada
| | | | - Krista A Ehinger
- School of Computing and Information Systems, University of Melbourne, Melbourne, Australia
| | - Jaykishan Y Patel
- Department of Psychology and Centre for Vision Research, York University, Toronto, Canada
| | - Emma Wiedenmann
- Department of Psychology and Centre for Vision Research, York University, Toronto, Canada
- Department of Psychology, Carl Von Ossietzky Universität Oldenburg, Oldenburg, Germany
| | - Richard F Murray
- Department of Psychology and Centre for Vision Research, York University, Toronto, Canada
| |
Collapse
|
2
|
Ma Z, Li C, Liu X, Wu H, Wen Z. Separating Shading and Reflectance From Cartoon Illustrations. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:3664-3679. [PMID: 37021997 DOI: 10.1109/tvcg.2023.3239364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Shading plays an important role in cartoon drawings to present the 3D lighting and depth information in a 2D image to improve the visual information and pleasantness. But it also introduces apparent challenges in analyzing and processing the cartoon drawings for different computer graphics and vision applications, such as segmentation, depth estimation, and relighting. Extensive research has been made in removing or separating the shading information to facilitate these applications. Unfortunately, the existing researches only focused on natural images, which are natively different from cartoons since the shading in natural images is physically correct and can be modeled based on physical priors. However, shading in cartoons is manually created by artists, which may be imprecise, abstract, and stylized. This makes it extremely difficult to model the shading in cartoon drawings. Without modeling the shading prior, in the paper, we propose a learning-based solution to separate the shading from the original colors using a two-branch system consisting of two subnetworks. To the best of our knowledge, our method is the first attempt in separating shading information from cartoon drawings. Our method significantly outperforms the methods tailored for natural images. Extensive evaluations have been performed with convincing results in all cases.
Collapse
|
3
|
Schmitt C, Antic B, Neculai A, Lee JH, Geiger A. Towards Scalable Multi-View Reconstruction of Geometry and Materials. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:15850-15869. [PMID: 37708017 DOI: 10.1109/tpami.2023.3314348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/16/2023]
Abstract
In this paper, we propose a novel method for joint recovery of camera pose, object geometry and spatially-varying Bidirectional Reflectance Distribution Function (svBRDF) of 3D scenes that exceed object-scale and hence cannot be captured with stationary light stages. The input are high-resolution RGB-D images captured by a mobile, hand-held capture system with point lights for active illumination. Compared to previous works that jointly estimate geometry and materials from a hand-held scanner, we formulate this problem using a single objective function that can be minimized using off-the-shelf gradient-based solvers. To facilitate scalability to large numbers of observation views and optimization variables, we introduce a distributed optimization algorithm that reconstructs 2.5D keyframe-based representations of the scene. A novel multi-view consistency regularizer effectively synchronizes neighboring keyframes such that the local optimization results allow for seamless integration into a globally consistent 3D model. We provide a study on the importance of each component in our formulation and show that our method compares favorably to baselines. We further demonstrate that our method accurately reconstructs various objects and materials and allows for expansion to spatially larger scenes. We believe that this work represents a significant step towards making geometry and material estimation from hand-held scanners scalable.
Collapse
|
4
|
Liu Y, Li Q, Deng Q, Sun Z, Yang MH. GAN-Based Facial Attribute Manipulation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:14590-14610. [PMID: 37494159 DOI: 10.1109/tpami.2023.3298868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/28/2023]
Abstract
Facial Attribute Manipulation (FAM) aims to aesthetically modify a given face image to render desired attributes, which has received significant attention due to its broad practical applications ranging from digital entertainment to biometric forensics. In the last decade, with the remarkable success of Generative Adversarial Networks (GANs) in synthesizing realistic images, numerous GAN-based models have been proposed to solve FAM with various problem formulation approaches and guiding information representations. This paper presents a comprehensive survey of GAN-based FAM methods with a focus on summarizing their principal motivations and technical details. The main contents of this survey include: (i) an introduction to the research background and basic concepts related to FAM, (ii) a systematic review of GAN-based FAM methods in three main categories, and (iii) an in-depth discussion of important properties of FAM methods, open issues, and future research directions. This survey not only builds a good starting point for researchers new to this field but also serves as a reference for the vision community.
Collapse
|
5
|
Guo Z, Gu Z, Zheng B, Dong J, Zheng H. Transformer for Image Harmonization and Beyond. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:12960-12977. [PMID: 36107900 DOI: 10.1109/tpami.2022.3207091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Image harmonization, aiming to make composite images look more realistic, is an important and challenging task. The composite, synthesized by combining foreground from one image with background from another image, inevitably suffers from the issue of inharmonious appearance caused by distinct imaging conditions, i.e., lights. Current solutions mainly adopt an encoder-decoder architecture with convolutional neural network (CNN) to capture the context of composite images, trying to understand what it should look like in the foreground referring to surrounding background. In this work, we seek to solve image harmonization with Transformer, by leveraging its powerful ability of modeling long-range context dependencies, for adjusting foreground light to make it compatible with background light while keeping structure and semantics unchanged. We present the design of our two vision Transformer frameworks and corresponding methods, as well as comprehensive experiments and empirical study, demonstrating the power of Transformer and investigating the Transformer for vision. Our methods achieve state-of-the-art performance on the image harmonization as well as four additional vision and graphics tasks, i.e., image enhancement, image inpainting, white-balance editing, and portrait relighting, indicating the superiority of our work. Code, models, more results and details can be found at the project website http://ouc.ai/project/HarmonyTransformer.
Collapse
|
6
|
Marlow PJ, Prior de Heer B, Anderson BL. The role of self-occluding contours in material perception. Curr Biol 2023:S0960-9822(23)00538-9. [PMID: 37196655 DOI: 10.1016/j.cub.2023.04.056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Revised: 03/27/2023] [Accepted: 04/24/2023] [Indexed: 05/19/2023]
Abstract
The human visual system extracts both the three-dimensional (3D) shape and the material properties of surfaces from single images.1,2,3,4,5,6,7,8,9,10,11,12,13,14 Understanding this remarkable ability is difficult because the problem of extracting both shape and material is formally ill posed: information about one appears to be needed to recover the other.14,15,16,17 Recent work has suggested that a particular class of image contours formed by a surface curving smoothly out of sight (self-occluding contours) contains information that co-specifies both surface shape and material for opaque surfaces.18 However, many natural materials are light permeable (translucent); it is unknown whether there is information along self-occluding contours that can be used to distinguish opaque and translucent materials. Here, we present physical simulations, which demonstrate that variations in intensity generated by opaque and translucent materials are linked to different shape attributes of self-occluding contours. Psychophysical experiments demonstrate that the human visual system exploits the different forms of intensity-shape covariation along self-occluding contours to distinguish opaque and translucent materials. These results provide insight into how the visual system manages to solve the putatively ill-posed problem of extracting both the shape and material properties of 3D surfaces from images.
Collapse
Affiliation(s)
- Phillip J Marlow
- The University of Sydney, School of Psychology, Griffith Taylor Building, Manning Road, Camperdown, Sydney, NSW 2006, Australia.
| | - Belinda Prior de Heer
- The University of Sydney, School of Psychology, Griffith Taylor Building, Manning Road, Camperdown, Sydney, NSW 2006, Australia
| | - Barton L Anderson
- The University of Sydney, School of Psychology, Griffith Taylor Building, Manning Road, Camperdown, Sydney, NSW 2006, Australia.
| |
Collapse
|
7
|
Langenegger J, Karunaratne G, Hersche M, Benini L, Sebastian A, Rahimi A. In-memory factorization of holographic perceptual representations. NATURE NANOTECHNOLOGY 2023; 18:479-485. [PMID: 36997756 DOI: 10.1038/s41565-023-01357-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 02/21/2023] [Indexed: 05/21/2023]
Abstract
Disentangling the attributes of a sensory signal is central to sensory perception and cognition and hence is a critical task for future artificial intelligence systems. Here we present a compute engine capable of efficiently factorizing high-dimensional holographic representations of combinations of such attributes, by exploiting the computation-in-superposition capability of brain-inspired hyperdimensional computing, and the intrinsic stochasticity associated with analogue in-memory computing based on nanoscale memristive devices. Such an iterative in-memory factorizer is shown to solve at least five orders of magnitude larger problems that cannot be solved otherwise, as well as substantially lowering the computational time and space complexity. We present a large-scale experimental demonstration of the factorizer by employing two in-memory compute chips based on phase-change memristive devices. The dominant matrix-vector multiplication operations take a constant time, irrespective of the size of the matrix, thus reducing the computational time complexity to merely the number of iterations. Moreover, we experimentally demonstrate the ability to reliably and efficiently factorize visual perceptual representations.
Collapse
Affiliation(s)
- Jovin Langenegger
- IBM Research-Zurich, Rüschlikon, Switzerland
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zürich, Switzerland
| | - Geethan Karunaratne
- IBM Research-Zurich, Rüschlikon, Switzerland
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zürich, Switzerland
| | - Michael Hersche
- IBM Research-Zurich, Rüschlikon, Switzerland
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zürich, Switzerland
| | - Luca Benini
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zürich, Switzerland
| | | | | |
Collapse
|
8
|
Liu Q, Deng W, Pham DT, Hu J, Wang Y, Zhou Z. A Two-Stage Screw Detection Framework for Automatic Disassembly Using a Reflection Feature Regression Model. MICROMACHINES 2023; 14:mi14050946. [PMID: 37241570 DOI: 10.3390/mi14050946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Revised: 04/24/2023] [Accepted: 04/25/2023] [Indexed: 05/28/2023]
Abstract
For remanufacturing to be more economically attractive, there is a need to develop automatic disassembly and automated visual detection methods. Screw removal is a common step in end-of-life product disassembly for remanufacturing. This paper presents a two-stage detection framework for structurally damaged screws and a linear regression model of reflection features that allows the detection framework to be conducted under uneven illumination conditions. The first stage employs reflection features to extract screws together with the reflection feature regression model. The second stage uses texture features to filter out false areas that have reflection features similar to those of screws. A self-optimisation strategy and weighted fusion are employed to connect the two stages. The detection framework was implemented on a robotic platform designed for disassembling electric vehicle batteries. This method allows screw removal to be conducted automatically in complex disassembly tasks, and the utilisation of the reflection feature and data learning provides new ideas for further research.
Collapse
Affiliation(s)
- Quan Liu
- School of Information Engineering, Wuhan University of Technology, Wuhan 430070, China
| | - Wupeng Deng
- School of Information Engineering, Wuhan University of Technology, Wuhan 430070, China
- Department of Mechanical Engineering, University of Birmingham, Birmingham B15 2TT, UK
| | - Duc Truong Pham
- Department of Mechanical Engineering, University of Birmingham, Birmingham B15 2TT, UK
| | - Jiwei Hu
- School of Information Engineering, Wuhan University of Technology, Wuhan 430070, China
| | - Yongjing Wang
- Department of Mechanical Engineering, University of Birmingham, Birmingham B15 2TT, UK
| | - Zude Zhou
- School of Information Engineering, Wuhan University of Technology, Wuhan 430070, China
| |
Collapse
|
9
|
Vo AV, Bertolotto M, Ofterdinger U, Laefer DF. In Search of Basement Indicators from Street View Imagery Data: An Investigation of Data Sources and Analysis Strategies. KUNSTLICHE INTELLIGENZ 2023; 37:41-53. [PMID: 37283695 PMCID: PMC10239739 DOI: 10.1007/s13218-022-00792-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 12/12/2022] [Indexed: 01/22/2023]
Abstract
Street view imagery databases such as Google Street View, Mapillary, and Karta View provide great spatial and temporal coverage for many cities globally. Those data, when coupled with appropriate computer vision algorithms, can provide an effective means to analyse aspects of the urban environment at scale. As an effort to enhance current practices in urban flood risk assessment, this project investigates a potential use of street view imagery data to identify building features that indicate buildings' vulnerability to flooding (e.g., basements and semi-basements). In particular, this paper discusses (1) building features indicating the presence of basement structures, (2) available imagery data sources capturing those features, and (3) computer vision algorithms capable of automatically detecting the features of interest. The paper also reviews existing methods for reconstructing geometry representations of the extracted features from images and potential approaches to account for data quality issues. Preliminary experiments were conducted, which confirmed the usability of the freely available Mapillary images for detecting basement railings as an example type of basement features, as well as geolocating the features.
Collapse
Affiliation(s)
- Anh Vu Vo
- School of Computer Science, University College Dublin, Belfield, Dublin 4, D04 V1W8 Dublin, Ireland
| | - Michela Bertolotto
- School of Computer Science, University College Dublin, Belfield, Dublin 4, D04 V1W8 Dublin, Ireland
| | - Ulrich Ofterdinger
- School of Natural and Built Environment, Queen’s University Belfast, Stranmillis Road, Belfast, BT 95AG Northern Ireland
| | - Debra F. Laefer
- Center for Urban Science & Progress, New York University, 370 Jay Street, Brooklyn, NY 11201 USA
- Department of Civil and Urban Engineering, New York University, 6 MetroTech Center, Brooklyn, NY 11201 USA
| |
Collapse
|
10
|
Anderson BL, Marlow PJ. Perceiving the shape and material properties of 3D surfaces. Trends Cogn Sci 2023; 27:98-110. [PMID: 36372694 DOI: 10.1016/j.tics.2022.10.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 10/18/2022] [Accepted: 10/19/2022] [Indexed: 11/11/2022]
Abstract
Our visual experience of the world relies on the interaction of light with the different substances, surfaces, and objects in our environment. These optical interactions generate images that contain a conflated mixture of different scene variables, which our visual system must somehow disentangle to extract information about the shape and material properties of the world. Such problems have historically been considered to be ill-posed, but recent work suggests that there are complex patterns of covariation in light that co-specify the 3D shape and material properties of surfaces. This work provides new insights into how the visual system acquired the ability to solve problems that have historically been considered intractable.
Collapse
Affiliation(s)
| | - Phillip J Marlow
- School of Psychology, University of Sydney, Sydney 2006, Australia
| |
Collapse
|
11
|
Hu R, Ye Z, Chen B, van Kaick O, Huang H. Self-Supervised Color-Concept Association via Image Colorization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:247-256. [PMID: 36166543 DOI: 10.1109/tvcg.2022.3209481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The interpretation of colors in visualizations is facilitated when the assignments between colors and concepts in the visualizations match human's expectations, implying that the colors can be interpreted in a semantic manner. However, manually creating a dataset of suitable associations between colors and concepts for use in visualizations is costly, as such associations would have to be collected from humans for a large variety of concepts. To address the challenge of collecting this data, we introduce a method to extract color-concept associations automatically from a set of concept images. While the state-of-the-art method extracts associations from data with supervised learning, we developed a self-supervised method based on colorization that does not require the preparation of ground truth color-concept associations. Our key insight is that a set of images of a concept should be sufficient for learning color-concept associations, since humans also learn to associate colors to concepts mainly from past visual input. Thus, we propose to use an automatic colorization method to extract statistical models of the color-concept associations that appear in concept images. Specifically, we take a colorization model pre-trained on ImageNet and fine-tune it on the set of images associated with a given concept, to predict pixel-wise probability distributions in Lab color space for the images. Then, we convert the predicted probability distributions into color ratings for a given color library and aggregate them for all the images of a concept to obtain the final color-concept associations. We evaluate our method using four different evaluation metrics and via a user study. Experiments show that, although the state-of-the-art method based on supervised learning with user-provided ratings is more effective at capturing relative associations, our self-supervised method obtains overall better results according to metrics like Earth Mover's Distance (EMD) and Entropy Difference (ED), which are closer to human perception of color distributions.
Collapse
|
12
|
Zhang Q, Zhou J, Zhu L, Sun W, Xiao C, Zheng WS. Unsupervised Intrinsic Image Decomposition Using Internal Self-Similarity Cues. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:9669-9686. [PMID: 34813466 DOI: 10.1109/tpami.2021.3129795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Recent learning-based intrinsic image decomposition methods have achieved remarkable progress. However, they usually require massive ground truth intrinsic images for supervised learning, which limits their applicability on real-world images since obtaining ground truth intrinsic decomposition for natural images is very challenging. In this paper, we present an unsupervised framework that is able to learn the decomposition effectively from a single natural image by training solely with the image itself. Our approach is built upon the observations that the reflectance of a natural image typically has high internal self-similarity of patches, and a convolutional generation network tends to boost the self-similarity of an image when trained for image reconstruction. Based on the observations, an unsupervised intrinsic decomposition network (UIDNet) consisting of two fully convolutional encoder-decoder sub-networks, i.e., reflectance prediction network (RPN) and shading prediction network (SPN), is devised to decompose an image into reflectance and shading by promoting the internal self-similarity of the reflectance component, in a way that jointly trains RPN and SPN to reproduce the given image. A novel loss function is also designed to make effective the training for intrinsic decomposition. Experimental results on three benchmark real-world datasets demonstrate the superiority of the proposed method.
Collapse
|
13
|
Yang Z, Chen B, Zheng Y, Chen X, Zhou K. Human Bas-Relief Generation From a Single Photograph. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:4558-4569. [PMID: 34191727 DOI: 10.1109/tvcg.2021.3092877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We present a semi-automatic method for producing human bas-relief from a single photograph. Given an input photo of one or multiple persons, our method first estimates a 3D skeleton for each person in the image. SMPL models are then fitted to the 3D skeletons to generate a 3D guide model. To align the 3D guide model with the image, we compute a 2D warping field to non-rigidly register the projected contours of the guide model with the body contours in the image. Then the normal map of the 3D guide model is warped by the 2D deformation field to reconstruct an overall base shape. Finally, the base shape is integrated with a fine-scale normal map to produce the final bas-relief. To tackle the complex intra- and inter-body interactions, we design an occlusion relationship resolution method that operates at the level of 3D skeletons with minimal user inputs. To tightly register the model contours to the image contours, we propose a non-rigid point matching algorithm harnessing user-specified sparse correspondences. Experiments demonstrate that our human bas-relief generation method is capable of producing perceptually realistic results on various single-person and multi-person images, on which the state-of-the-art depth and pose estimation methods often fail.
Collapse
|
14
|
Cooper VL, Bieron JC, Peers P. Estimating Homogeneous Data-Driven BRDF Parameters From a Reflectance Map Under Known Natural Lighting. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:4289-4303. [PMID: 34061745 DOI: 10.1109/tvcg.2021.3085560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this article we demonstrate robust estimation of the model parameters of a fully-linear data-driven BRDF model from a reflectance map under known natural lighting. To regularize the estimation of the model parameters, we leverage the reflectance similarities within a material class. We approximate the space of homogeneous BRDFs using a Gaussian mixture model, and assign a material class to each Gaussian in the mixture model. We formulate the estimation of the model parameters as a non-linear maximum a-posteriori optimization, and introduce a linear approximation that estimates a solution per material class from which the best solution is selected. We demonstrate the efficacy and robustness of our method using the MERL BRDF database under a variety of natural lighting conditions, and we provide a proof-of-concept real-world experiment.
Collapse
|
15
|
Forsyth D, Rock JJ. Intrinsic Image Decomposition Using Paradigms. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:7624-7637. [PMID: 34648429 DOI: 10.1109/tpami.2021.3119551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Intrinsic image decomposition is the task of mapping image to albedo and shading. Classical approaches derive methods from spatial models. The modern literature stresses evaluation, by comparing predictions to human judgements ("lighter", "same as", "darker"). The best modern intrinsic image methods train a map from image to albedo using images rendered from computer graphics models and example human judgements. This approach yields practical methods, but obtaining rendered images can be inconvenient. Furthermore, the approach cannot explain how a one could learn to recover intrinsic images without geometric, surface and illumination models, as people and animals appear to do. This paper describes a method that learns intrinsic image decomposition without seeing human annotations, rendered data, or ground truth data. Instead, the method relies on paradigms - spatial models of albedo and of shading. Rather than finding the "best" albedo and shading for an image via optimization, our approach trains a neural network on synthetic images. The synthetic images are constructed by multiplying albedos and shading fields sampled from our models. The network is subject to a novel smoothing procedure that ensures good behavior at short scales on real images. An averaging procedure ensures that reported albedo and shading are largely equivariant - different crops and scalings of an image will report the same albedo and shading at shared points. This averaging procedure controls long scale error. The standard evaluation for an intrinsic image method is a WHDR score. Our method achieves WHDR scores competitive with those of strong recent methods allowed to see training WHDR annotations, rendered data, and ground truth data. Our method produces albedo and shading maps with attractive qualitative properties - for example, albedo fields do not suppress wood grain and represent narrow grooves in surfaces well. Because our method is unsupervised, we can compute estimates of the test/train variance of WHDR scores; these are quite large, and suggest is unsafe to rely small differences in reported WHDR.
Collapse
|
16
|
Dave A, Hold-Geoffroy Y, Hašan M, Sunkavalli K, Veeraraghavan A. Snapshot polarimetric diffuse-specular separation. OPTICS EXPRESS 2022; 30:34239-34255. [PMID: 36242441 DOI: 10.1364/oe.460984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Accepted: 07/29/2022] [Indexed: 06/16/2023]
Abstract
We present a polarization-based approach to perform diffuse-specular separation from a single polarimetric image, acquired using a flexible, practical capture setup. Our key technical insight is that, unlike previous polarization-based separation methods that assume completely unpolarized diffuse reflectance, we use a more general polarimetric model that accounts for partially polarized diffuse reflections. We capture the scene with a polarimetric sensor and produce an initial analytical diffuse-specular separation that we further pass into a deep network trained to refine the separation. We demonstrate that our combination of analytical separation and deep network refinement produces state-of-the-art diffuse-specular separation, which enables image-based appearance editing of dynamic scenes and enhanced appearance estimation.
Collapse
|
17
|
Wasee FR, Joy A, Poullis C. Predicting Surface Reflectance Properties of Outdoor Scenes Under Unknown Natural Illumination. IEEE COMPUTER GRAPHICS AND APPLICATIONS 2022; 42:19-27. [PMID: 35157581 DOI: 10.1109/mcg.2022.3151010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Estimating and modeling the appearance of an object under outdoor illumination conditions is a complex process. This article addresses this problem and proposes a complete framework to predict the surface reflectance properties of outdoor scenes under unknown natural illumination. Uniquely, we recast the problem into its two constituent components involving the bidirectional reflectance distribution function incoming light and outgoing view directions: first, surface points' radiance captured in the images, and outgoing view directions are aggregated and encoded into reflectance maps, and second, a neural network trained on reflectance maps infers a low-parameter reflection model. Our model is based on phenomenological and physics-based scattering models. Experiments show that rendering with the predicted reflectance properties results in a visually similar appearance to using textures that cannot otherwise be disentangled from the reflectance properties.
Collapse
|
18
|
Zhu Y, Li C, Li S, Shi B, Tai YW. Hybrid Face Reflectance, Illumination, and Shape From a Single Image. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:5002-5015. [PMID: 33989152 DOI: 10.1109/tpami.2021.3080586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
We propose HyFRIS-Net to jointly estimate the hybrid reflectance and illumination models, as well as the refined face shape from a single unconstrained face image in a pre-defined texture space. The proposed hybrid reflectance and illumination representation ensure photometric face appearance modeling in both parametric and non-parametric spaces for efficient learning. While forcing the reflectance consistency constraint for the same person and face identity constraint for different persons, our approach recovers an occlusion-free face albedo with disambiguated color from the illumination color. Our network is trained in a self-evolving manner to achieve general applicability on real-world data. We conduct comprehensive qualitative and quantitative evaluations with state-of-the-art methods to demonstrate the advantages of HyFRIS-Net in modeling photo-realistic face albedo, illumination, and shape.
Collapse
|
19
|
Zhang Y, Tsang IW, Luo Y, Hu C, Lu X, Yu X. Recursive Copy and Paste GAN: Face Hallucination From Shaded Thumbnails. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:4321-4338. [PMID: 33621168 DOI: 10.1109/tpami.2021.3061312] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Existing face hallucination methods based on convolutional neural networks (CNNs) have achieved impressive performance on low-resolution (LR) faces in a normal illumination condition. However, their performance degrades dramatically when LR faces are captured in non-uniform illumination conditions. This paper proposes a Recursive Copy and Paste Generative Adversarial Network (Re-CPGAN) to recover authentic high-resolution (HR) face images while compensating for non-uniform illumination. To this end, we develop two key components in our Re-CPGAN: internal and recursive external Copy and Paste networks (CPnets). Our internal CPnet exploits facial self-similarity information residing in the input image to enhance facial details; while our recursive external CPnet leverages an external guided face for illumination compensation. Specifically, our recursive external CPnet stacks multiple external Copy and Paste (EX-CP) units in a compact model to learn normal illumination and enhance facial details recursively. By doing so, our method offsets illumination and upsamples facial details progressively in a coarse-to-fine fashion, thus alleviating the ambiguity of correspondences between LR inputs and external guided inputs. Furthermore, a new illumination compensation loss is developed to capture illumination from the external guided face image effectively. Extensive experiments demonstrate that our method achieves authentic HR face images in a uniform illumination condition with a 16× magnification factor and outperforms state-of-the-art methods qualitatively and quantitatively.
Collapse
|
20
|
Yu Y, Smith WAP. Outdoor Inverse Rendering From a Single Image Using Multiview Self-Supervision. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:3659-3675. [PMID: 33560977 DOI: 10.1109/tpami.2021.3058105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this paper we show how to perform scene-level inverse rendering to recover shape, reflectance and lighting from a single, uncontrolled image using a fully convolutional neural network. The network takes an RGB image as input, regresses albedo, shadow and normal maps from which we infer least squares optimal spherical harmonic lighting coefficients. Our network is trained using large uncontrolled multiview and timelapse image collections without ground truth. By incorporating a differentiable renderer, our network can learn from self-supervision. Since the problem is ill-posed we introduce additional supervision. Our key insight is to perform offline multiview stereo (MVS) on images containing rich illumination variation. From the MVS pose and depth maps, we can cross project between overlapping views such that Siamese training can be used to ensure consistent estimation of photometric invariants. MVS depth also provides direct coarse supervision for normal map estimation. We believe this is the first attempt to use MVS supervision for learning inverse rendering. In addition, we learn a statistical natural illumination prior. We evaluate performance on inverse rendering, normal map estimation and intrinsic image decomposition benchmarks.
Collapse
|
21
|
Hu Z, Nsampi NE, Wang X, Wang Q. PNRNet: Physically-Inspired Neural Rendering for Any-to-Any Relighting. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:3935-3948. [PMID: 35635816 DOI: 10.1109/tip.2022.3177311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Existing any-to-any relighting methods suffer from the task-aliasing effects and the loss of local details in the image generation process, such as shading and attached-shadow. In this paper, we present PNRNet, a novel neural architecture that decomposes the any-to-any relighting task into three simpler sub-tasks, i.e. lighting estimation, color temperature transfer, and lighting direction transfer, to avoid the task-aliasing effects. These sub-tasks are easy to learn and can be trained with direct supervisions independently. To better preserve local shading and attached-shadow details, we propose a parallel multi-scale network that incorporates multiple physical attributes to model local illuminations for lighting direction transfer. We also introduce a simple yet effective color temperature transfer network to learn a pixel-level non-linear function which allows color temperature adjustment beyond the predefined color temperatures and generalizes well to real images. Extensive experiments demonstrate that our proposed approach achieves better results quantitatively and qualitatively than prior works.
Collapse
|
22
|
Sengupta S, Lichy D, Kanazawa A, Castillo CD, Jacobs DW. SfSNet: Learning Shape, Reflectance and Illuminance of Faces in the Wild. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:3272-3284. [PMID: 33360981 DOI: 10.1109/tpami.2020.3046915] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
We present SfSNet, an end-to-end learning framework for producing an accurate decomposition of an unconstrained human face image into shape, reflectance and illuminance. SfSNet is designed to reflect a physical lambertian rendering model. SfSNet learns from a mixture of labeled synthetic and unlabeled real-world images. This allows the network to capture low-frequency variations from synthetic and high-frequency details from real images through the photometric reconstruction loss. SfSNet consists of a new decomposition architecture with residual blocks that learns a complete separation of albedo and normal. This is used along with the original image to predict lighting. SfSNet produces significantly better quantitative and qualitative results than state-of-the-art methods for inverse rendering and independent normal and illumination estimation. We also introduce a companion network, SfSMesh, that utilizes normals estimated by SfSNet to reconstruct a 3D face mesh. We demonstrate that SfSMesh produces face meshes with greater accuracy than state-of-the-art methods on real-world images.
Collapse
|
23
|
Su Z, Wan W, Yu T, Liu L, Fang L, Wang W, Liu Y. MulayCap: Multi-Layer Human Performance Capture Using a Monocular Video Camera. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:1862-1879. [PMID: 32991282 DOI: 10.1109/tvcg.2020.3027763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
We introduce MulayCap, a novel human performance capture method using a monocular video camera without the need for pre-scanning. The method uses "multi-layer" representations for geometry reconstruction and texture rendering, respectively. For geometry reconstruction, we decompose the clothed human into multiple geometry layers, namely a body mesh layer and a garment piece layer. The key technique behind is a Garment-from-Video (GfV) method for optimizing the garment shape and reconstructing the dynamic cloth to fit the input video sequence, based on a cloth simulation model which is effectively solved with gradient descent. For texture rendering, we decompose each input image frame into a shading layer and an albedo layer, and propose a method for fusing a fixed albedo map and solving for detailed garment geometry using the shading layer. Compared with existing single view human performance capture systems, our "multi-layer" approach bypasses the tedious and time consuming scanning step for obtaining a human specific mesh template. Experimental results demonstrate that MulayCap produces realistic rendering of dynamically changing details that has not been achieved in any previous monocular video camera systems. Benefiting from its fully semantic modeling, MulayCap can be applied to various important editing applications, such as cloth editing, re-targeting, relighting, and AR applications.
Collapse
|
24
|
Liu X, Zhou S, Wu S, Tan D, Yao R. 3D visualization model construction based on generative adversarial networks. PeerJ Comput Sci 2022; 8:e768. [PMID: 35494873 PMCID: PMC9044199 DOI: 10.7717/peerj-cs.768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 10/12/2021] [Indexed: 06/14/2023]
Abstract
The development of computer vision technology is rapid, which supports the automatic quality control of precision components efficiently and reliably. This paper focuses on the application of computer vision technology in manufacturing quality control. A new deep learning algorithm is presented, Multi-angle projective Generative Adversarial Networks (MapGANs), to automatically generate 3D visualization models of products and components. The generated 3D visualization models can intuitively and accurately display the product parameters and indicators. Based on these indicators, our model can accurately determine whether the product meets the standard. The working principle of the MapGANs algorithm is to automatically infer the basic three-dimensional shape distribution through the product's projection module, while using multiple angles and multiple views to improve the fineness and accuracy of the three-dimensional visualization model. The experimental results prove that MapGANs can effectively reconstruct two-dimensional images into three-dimensional visualization models, and meanwhile accurately predict whether the quality of the product meets the standard.
Collapse
Affiliation(s)
- Xiaojuan Liu
- College of Computer Science, Chongqing University, Chongqing, China
| | - Shangbo Zhou
- Key Laboratory of Dependable Service Computing in Cyber Physical Society, Ministry of Education, Chongqing University, Chongqing, China
| | - Sheng Wu
- College of Computer and Information Science, SouthWest University, Chongqing, China
| | - Duo Tan
- College of Computer and Information Science, SouthWest University, Chongqing, China
| | - Rui Yao
- College of Computer and Information Science, SouthWest University, Chongqing, China
| |
Collapse
|
25
|
Garces E, Rodriguez-Pardo C, Casas D, Lopez-Moreno J. A Survey on Intrinsic Images: Delving Deep into Lambert and Beyond. Int J Comput Vis 2022. [DOI: 10.1007/s11263-021-01563-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
26
|
Qi X, Liu Z, Liao R, Torr PHS, Urtasun R, Jia J. GeoNet++: Iterative Geometric Neural Network with Edge-Aware Refinement for Joint Depth and Surface Normal Estimation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:969-984. [PMID: 32870785 DOI: 10.1109/tpami.2020.3020800] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this paper, we propose a geometric neural network with edge-aware refinement (GeoNet++) to jointly predict both depth and surface normal maps from a single image. Building on top of two-stream CNNs, GeoNet++ captures the geometric relationships between depth and surface normals with the proposed depth-to-normal and normal-to-depth modules. In particular, the "depth-to-normal" module exploits the least square solution of estimating surface normals from depth to improve their quality, while the "normal-to-depth" module refines the depth map based on the constraints on surface normals through kernel regression. Boundary information is exploited via an edge-aware refinement module. GeoNet++ effectively predicts depth and surface normals with high 3D consistency and sharp boundaries resulting in better reconstructed 3D scenes. Note that GeoNet++ is generic and can be used in other depth/normal prediction frameworks to improve 3D reconstruction quality and pixel-wise accuracy of depth and surface normals. Furthermore, we propose a new 3D geometric metric (3DGM) for evaluating depth prediction in 3D. In contrast to current metrics that focus on evaluating pixel-wise error/accuracy, 3DGM measures whether the predicted depth can reconstruct high quality 3D surface normals. This is a more natural metric for many 3D application domains. Our experiments on NYUD-V2 [1] and KITTI [2] datasets verify that GeoNet++ produces fine boundary details and the predicted depth can be used to reconstruct high quality 3D surfaces.
Collapse
|
27
|
Siddique A, Lee S. Sym3DNet: Symmetric 3D Prior Network for Single-View 3D Reconstruction. SENSORS 2022; 22:s22020518. [PMID: 35062479 PMCID: PMC8781397 DOI: 10.3390/s22020518] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 12/30/2021] [Accepted: 01/04/2022] [Indexed: 11/29/2022]
Abstract
The three-dimensional (3D) symmetry shape plays a critical role in the reconstruction and recognition of 3D objects under occlusion or partial viewpoint observation. Symmetry structure prior is particularly useful in recovering missing or unseen parts of an object. In this work, we propose Sym3DNet for single-view 3D reconstruction, which employs a three-dimensional reflection symmetry structure prior of an object. More specifically, Sym3DNet includes 2D-to-3D encoder-decoder networks followed by a symmetry fusion step and multi-level perceptual loss. The symmetry fusion step builds flipped and overlapped 3D shapes that are fed to a 3D shape encoder to calculate the multi-level perceptual loss. Perceptual loss calculated in different feature spaces counts on not only voxel-wise shape symmetry but also on the overall global symmetry shape of an object. Experimental evaluations are conducted on both large-scale synthetic 3D data (ShapeNet) and real-world 3D data (Pix3D). The proposed method outperforms state-of-the-art approaches in terms of efficiency and accuracy on both synthetic and real-world datasets. To demonstrate the generalization ability of our approach, we conduct an experiment with unseen category samples of ShapeNet, exhibiting promising reconstruction results as well.
Collapse
|
28
|
Mehami J, Falque R, Vidal-Calleja T, Alempijevic A. Multi-modal Non-Isotropic Light Source Modelling for Reflectance Estimation in Hyperspectral Imaging. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3192208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- Jasprabhjit Mehami
- UTS Robotics Institute, University of Technology Sydney, NSW 2007, Australia
| | - Raphael Falque
- UTS Robotics Institute, University of Technology Sydney, NSW 2007, Australia
| | | | - Alen Alempijevic
- UTS Robotics Institute, University of Technology Sydney, NSW 2007, Australia
| |
Collapse
|
29
|
Sulc A, Johannsen O, Goldluecke B. Recovery of geometry, natural illumination, and BRDF from a single light field image. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA. A, OPTICS, IMAGE SCIENCE, AND VISION 2022; 39:72-85. [PMID: 35200984 DOI: 10.1364/josaa.433491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 11/16/2021] [Indexed: 06/14/2023]
Abstract
We propose an inverse rendering model for light fields to recover surface normals, depth, reflectance, and natural illumination. Our setting is fully uncalibrated, with the reflectance modeled with a spatially constant Blinn-Phong bidirectional reflectance distribution function (BRDF) and illumination as an environment map. While previous work makes strong assumptions in this difficult scenario, focusing solely on specific types of objects such as faces or imposing very strong priors, our approach leverages only the light field structure, where a solution consistent across all subaperture views is sought. The optimization is based primarily on shading, which is sensitive to fine geometric details that are propagated to the initial coarse depth map. Despite the problem being inherently ill posed, we achieve encouraging results on synthetic as well as real-world data.
Collapse
|
30
|
Zhang A, Zhao Y, Wang S. An Improved Augmented-Reality Framework for Differential Rendering Beyond the Lambertian-World Assumption. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:4374-4386. [PMID: 32746268 DOI: 10.1109/tvcg.2020.3004195] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In augmented reality, it is important to achieve visual consistency between inserted virtual objects and the real scene. As specular and transparent objects can produce caustics, which affect the appearance of inserted virtual objects, we herein propose a framework for differential rendering beyond the Lambertian-world assumption. Our key idea is to jointly optimize illumination and parameters of specular and transparent objects. To estimate the parameters of transparent objects efficiently, the psychophysical scaling method is introduced while considering visual characteristics of the human eye to obtain the step size for estimating the refractive index. We verify our technique on multiple real scenes, and the experimental results show that the fusion effects are visually consistent.
Collapse
|
31
|
Liang B, Weng D, Tu Z, Luo L, Hao J. Research on face specular removal and intrinsic decomposition based on polarization characteristics. OPTICS EXPRESS 2021; 29:32256-32270. [PMID: 34615301 DOI: 10.1364/oe.440778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Accepted: 09/12/2021] [Indexed: 06/13/2023]
Abstract
It is well known that the specular component in the face image destroys the true informantion of the original image and is detrimental to the feature extraction and subsequent processing. However, in many face image processing tasks based on Deep Learning methods, the lack of effective datasets and methods has led researchers to routinely neglect the specular removal process. To solve this problem, we formed the first high-resolution Asian Face Specular-Diffuse-Image-Material (FaceSDIM) dataset based on polarization characterisitics, which consists of real human face specular images, diffuse images, and various corresponding material maps. Secondly, we proposed a joint specular removal and intrinsic decomposition multi-task GAN to generate a de-specular image, normal map, albedo map, residue map and visibility map from a single face image, and also further verified that the prediected de-specular images have a positive enhancement effect on face intrinsic decomposition. Compared with the SOTA algorithm, our method achieves optimal performance both in corrected linear images and in uncorrected wild images of faces.
Collapse
|
32
|
Pizlo Z, de Barros JA. The Concept of Symmetry and the Theory of Perception. Front Comput Neurosci 2021; 15:681162. [PMID: 34497499 PMCID: PMC8419223 DOI: 10.3389/fncom.2021.681162] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 07/22/2021] [Indexed: 11/13/2022] Open
Abstract
Perceptual constancy refers to the fact that the perceived geometrical and physical characteristics of objects remain constant despite transformations of the objects such as rigid motion. Perceptual constancy is essential in everything we do, like recognition of familiar objects and scenes, planning and executing visual navigation, visuomotor coordination, and many more. Perceptual constancy would not exist without the geometrical and physical permanence of objects: their shape, size, and weight. Formally, perceptual constancy and permanence of objects are invariants, also known in mathematics and physics as symmetries. Symmetries of the Laws of Physics received a central status due to mathematical theorems of Emmy Noether formulated and proved over 100 years ago. These theorems connected symmetries of the physical laws to conservation laws through the least-action principle. We show how Noether's theorem is applied to mirror-symmetrical objects and establishes mental shape representation (perceptual conservation) through the application of a simplicity (least-action) principle. This way, the formalism of Noether's theorem provides a computational explanation of the relation between the physical world and its mental representation.
Collapse
Affiliation(s)
- Zygmunt Pizlo
- Department of Cognitive Sciences, University of California, Irvine, Irvine, CA, United States
| | - J Acacio de Barros
- School of Humanities and Liberal Studies, San Francisco State University, San Francisco, CA, United States
| |
Collapse
|
33
|
Ishihara S, Sulc A, Sato I. Depth estimation using spectrally varying defocus blur. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA. A, OPTICS, IMAGE SCIENCE, AND VISION 2021; 38:1140-1149. [PMID: 34613308 DOI: 10.1364/josaa.422059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 06/09/2021] [Indexed: 06/13/2023]
Abstract
This paper proposes a method to estimate depth from a single multispectral image by using a lens property known as chromatic aberration. Chromatic aberration causes light passing through a lens to be refracted depending on the wavelength. The refraction causes the angle of rays to vary depending on their wavelength and a change in focal length, which leads to a defocus blur for different wavelengths. We propose a theory to recover a continuous depth map from the blur in a single multispectral image that includes chromatic aberration. The proposed method needs only a standard wide-aperture lens, which naturally exhibits chromatic aberration, and a multispectral camera. Moreover, we use a simple yet effective depth-of-field synthesis method to calculate the derivatives and obtain all-in-focus images necessary to approximate spectral derivatives. We verified the effectiveness of the proposed method on various real-world scenes.
Collapse
|
34
|
Abstract
Lightness perception is the perception of achromatic surface colors: black, white, and shades of grey. Lightness has long been a central research topic in experimental psychology, as perceiving surface color is an important visual task but also a difficult one due to the deep ambiguity of retinal images. In this article, I review psychophysical work on lightness perception in complex scenes over the past 20 years, with an emphasis on work that supports the development of computational models. I discuss Bayesian models, equivalent illumination models, multidimensional scaling, anchoring theory, spatial filtering models, natural scene statistics, and related work in computer vision. I review open topics in lightness perception that seem ready for progress, including the relationship between lightness and brightness, and developing more sophisticated computational models of lightness in complex scenes. Expected final online publication date for the Annual Review of Vision Science, Volume 7 is September 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Richard F Murray
- Department of Psychology and Centre for Vision Research, York University, Toronto M3J 1P3, Canada;
| |
Collapse
|
35
|
Baslamisli AS, Gevers T. Invariant descriptors for intrinsic reflectance optimization. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA. A, OPTICS, IMAGE SCIENCE, AND VISION 2021; 38:887-896. [PMID: 34143158 DOI: 10.1364/josaa.414682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 05/11/2021] [Indexed: 06/12/2023]
Abstract
Intrinsic image decomposition aims to factorize an image into albedo (reflectance) and shading (illumination) sub-components. Being ill posed and under-constrained, it is a very challenging computer vision problem. There are infinite pairs of reflectance and shading images that can reconstruct the same input. To address the problem, Intrinsic Images in the Wild by Bell et al. provides an optimization framework based on a dense conditional random field (CRF) formulation that considers long-range material relations. We improve upon their model by introducing illumination invariant image descriptors: color ratios. The color ratios and the intrinsic reflectance are both invariant to illumination and thus are highly correlated. Through detailed experiments, we provide ways to inject the color ratios into the dense CRF optimization. Our approach is physics based and learning free and leads to more accurate and robust reflectance decompositions.
Collapse
|
36
|
Hold-Geoffroy Y, Gotardo P, Lalonde JF. Single Day Outdoor Photometric Stereo. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:2062-2074. [PMID: 31899414 DOI: 10.1109/tpami.2019.2962693] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Photometric Stereo (PS) under outdoor illumination remains a challenging, ill-posed problem due to insufficient variability in illumination. Months-long capture sessions are typically used in this setup, with little success on shorter, single-day time intervals. In this paper, we investigate the solution of outdoor PS over a single day, under different weather conditions. First, we investigate the relationship between weather and surface reconstructability in order to understand when natural lighting allows existing PS algorithms to work. Our analysis reveals that partially cloudy days improve the conditioning of the outdoor PS problem while sunny days do not allow the unambiguous recovery of surface normals from photometric cues alone. We demonstrate that calibrated PS algorithms can thus be employed to reconstruct Lambertian surfaces accurately under partially cloudy days. Second, we solve the ambiguity arising in clear days by combining photometric cues with prior knowledge on material properties, local surface geometry and the natural variations in outdoor lighting through a CNN-based, weakly-calibrated PS technique. Given a sequence of outdoor images captured during a single sunny day, our method robustly estimates the scene surface normals with unprecedented quality for the considered scenario. Our approach does not require precise geolocation and significantly outperforms several state-of-the-art methods on images with real lighting, showing that our CNN can combine efficiently learned priors and photometric cues available during a single sunny day.
Collapse
|
37
|
Baslamisli AS, Das P, Le HA, Karaoglu S, Gevers T. ShadingNet: Image Intrinsics by Fine-Grained Shading Decomposition. Int J Comput Vis 2021. [DOI: 10.1007/s11263-021-01477-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
AbstractIn general, intrinsic image decomposition algorithms interpret shading as one unified component including all photometric effects. As shading transitions are generally smoother than reflectance (albedo) changes, these methods may fail in distinguishing strong photometric effects from reflectance variations. Therefore, in this paper, we propose to decompose the shading component into direct (illumination) and indirect shading (ambient light and shadows) subcomponents. The aim is to distinguish strong photometric effects from reflectance variations. An end-to-end deep convolutional neural network (ShadingNet) is proposed that operates in a fine-to-coarse manner with a specialized fusion and refinement unit exploiting the fine-grained shading model. It is designed to learn specific reflectance cues separated from specific photometric effects to analyze the disentanglement capability. A large-scale dataset of scene-level synthetic images of outdoor natural environments is provided with fine-grained intrinsic image ground-truths. Large scale experiments show that our approach using fine-grained shading decompositions outperforms state-of-the-art algorithms utilizing unified shading on NED, MPI Sintel, GTA V, IIW, MIT Intrinsic Images, 3DRMS and SRD datasets.
Collapse
|
38
|
Colonoscopic 3D reconstruction by tubular non-rigid structure-from-motion. Int J Comput Assist Radiol Surg 2021; 16:1237-1241. [PMID: 34031817 DOI: 10.1007/s11548-021-02409-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 05/11/2021] [Indexed: 10/21/2022]
Abstract
PURPOSE The visual examination of colonoscopic images fails to extract precise geometric information of the colonic surface. Reconstructing the 3D surface of the colon from colonoscopic image sequences may thus add valuable clinical information. We address this problem of extracting precise spatio-temporal 3D structure information from colonoscopic images. METHODS Using just the intrinsically calibrated monocular image stream, we develop a technique to compute the depth of certain feature points that have been tracked across images. Our method uses the prior knowledge of an approximate geometry of the colon, called the (TTP). It works by fitting a deformable cylindrical model to points reconstructed independently by non-rigid structure-from-motion (NRSfM), compromising between the data term and a novel tubular smoothing prior. Our method represents the first method ever to exploit a very weak topological prior to improve NRSfM. As such, it lies in-between standard NRSfM, which does not use a topological prior beyond the mere plane, and shape-from-template (SfT), which uses a very strong prior as a full deformable 3D object model. RESULTS We validate our method on both synthetic images of tubular structures and real colonoscopic data. Our method improves the results obtained by existing NRSfM methods by 71.74% on average on synthetic data and succeeds in obtaining 3D reconstruction from a real colonoscopic sequence defeating the existing methods. CONCLUSION Colonoscopic 3D reconstruction is a difficult problem, which is yet unresolved by the existing methods from computer vision. Our proposed dedicated NRSfM method and experiments show that the visual motion might be the right visual cue to use in colonoscopy.
Collapse
|
39
|
Huang B, Ling H. DeProCams: Simultaneous Relighting, Compensation and Shape Reconstruction for Projector-Camera Systems. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:2725-2735. [PMID: 33750703 DOI: 10.1109/tvcg.2021.3067771] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Image-based relighting, projector compensation and depth/normal reconstruction are three important tasks of projector-camera systems (ProCams) and spatial augmented reality (SAR). Although they share a similar pipeline of finding projector-camera image mappings, in tradition, they are addressed independently, sometimes with different prerequisites, devices and sampling images. In practice, this may be cumbersome for SAR applications to address them one-by-one. In this paper, we propose a novel end-to-end trainable model named DeProCams to explicitly learn the photometric and geometric mappings of ProCams, and once trained, DeProCams can be applied simultaneously to the three tasks. DeProCams explicitly decomposes the projector-camera image mappings into three subprocesses: shading attributes estimation, rough direct light estimation and photorealistic neural rendering. A particular challenge addressed by DeProCams is occlusion, for which we exploit epipolar constraint and propose a novel differentiable projector direct light mask. Thus, it can be learned end-to-end along with the other modules. Afterwards, to improve convergence, we apply photometric and geometric constraints such that the intermediate results are plausible. In our experiments, DeProCams shows clear advantages over previous arts with promising quality and meanwhile being fully differentiable. Moreover, by solving the three tasks in a unified model, DeProCams waives the need for additional optical devices, radiometric calibrations and structured light.
Collapse
|
40
|
A Systematic Comparison of Depth Map Representations for Face Recognition. SENSORS 2021; 21:s21030944. [PMID: 33572608 PMCID: PMC7867027 DOI: 10.3390/s21030944] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 01/22/2021] [Accepted: 01/26/2021] [Indexed: 11/17/2022]
Abstract
Nowadays, we are witnessing the wide diffusion of active depth sensors. However, the generalization capabilities and performance of the deep face recognition approaches that are based on depth data are hindered by the different sensor technologies and the currently available depth-based datasets, which are limited in size and acquired through the same device. In this paper, we present an analysis on the use of depth maps, as obtained by active depth sensors and deep neural architectures for the face recognition task. We compare different depth data representations (depth and normal images, voxels, point clouds), deep models (two-dimensional and three-dimensional Convolutional Neural Networks, PointNet-based networks), and pre-processing and normalization techniques in order to determine the configuration that maximizes the recognition accuracy and is capable of generalizing better on unseen data and novel acquisition settings. Extensive intra- and cross-dataset experiments, which were performed on four public databases, suggest that representations and methods that are based on normal images and point clouds perform and generalize better than other 2D and 3D alternatives. Moreover, we propose a novel challenging dataset, namely MultiSFace, in order to specifically analyze the influence of the depth map quality and the acquisition distance on the face recognition accuracy.
Collapse
|
41
|
Frady EP, Kent SJ, Olshausen BA, Sommer FT. Resonator Networks, 1: An Efficient Solution for Factoring High-Dimensional, Distributed Representations of Data Structures. Neural Comput 2020; 32:2311-2331. [PMID: 33080162 DOI: 10.1162/neco_a_01331] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
The ability to encode and manipulate data structures with distributed neural representations could qualitatively enhance the capabilities of traditional neural networks by supporting rule-based symbolic reasoning, a central property of cognition. Here we show how this may be accomplished within the framework of Vector Symbolic Architectures (VSAs) (Plate, 1991; Gayler, 1998; Kanerva, 1996), whereby data structures are encoded by combining high-dimensional vectors with operations that together form an algebra on the space of distributed representations. In particular, we propose an efficient solution to a hard combinatorial search problem that arises when decoding elements of a VSA data structure: the factorization of products of multiple codevectors. Our proposed algorithm, called a resonator network, is a new type of recurrent neural network that interleaves VSA multiplication operations and pattern completion. We show in two examples-parsing of a tree-like data structure and parsing of a visual scene-how the factorization problem arises and how the resonator network can solve it. More broadly, resonator networks open the possibility of applying VSAs to myriad artificial intelligence problems in real-world domains. The companion article in this issue (Kent, Frady, Sommer, & Olshausen, 2020) presents a rigorous analysis and evaluation of the performance of resonator networks, showing it outperforms alternative approaches.
Collapse
Affiliation(s)
- E Paxon Frady
- Redwood Center for Theoretical Neuroscience, University of California, Berkeley, Berkeley, CA 94720, U.S.A., and Intel Laboratories, Neuromorphic Computing Lab, San Francisco, CA, 94111, U.S.A.
| | - Spencer J Kent
- Redwood Center for Theoretical Neuroscience and Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA 94720, U.S.A.
| | - Bruno A Olshausen
- Redwood Center for Theoretical Neuroscience, Helen Wills Neuroscience Institute, and School of Optometry, University of California, Berkeley, Berkeley, CA 94720, U.S.A.
| | - Friedrich T Sommer
- Redwood Center for Theoretical Neuroscience and Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA 94720, U.S.A., and Intel Laboratories, Neuromorphic Computing Lab, San Francisco, CA 94111, U.S.A.
| |
Collapse
|
42
|
Luo J, Huang Z, Li Y, Zhou X, Zhang G, Bao H. NIID-Net: Adapting Surface Normal Knowledge for Intrinsic Image Decomposition in Indoor Scenes. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:3434-3445. [PMID: 32941141 DOI: 10.1109/tvcg.2020.3023565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Intrinsic image decomposition, i.e., decomposing a natural image into a reflectance image and a shading image, is used in many augmented reality applications for achieving better visual coherence between virtual contents and real scenes. The main challenge is that the decomposition is ill-posed, especially in indoor scenes where lighting conditions are complicated, while real training data is inadequate. To solve this challenge, we propose NIID-Net, a novel learning-based framework that adapts surface normal knowledge for improving the decomposition. The knowledge learned from relatively more abundant data for surface normal estimation is integrated into intrinsic image decomposition in two novel ways. First, normal feature adapters are proposed to incorporate scene geometry features when decomposing the image. Secondly, a map of integrated lighting is proposed for propagating object contour and planarity information during shading rendering. Furthermore, this map is capable of representing spatially-varying lighting conditions indoors. Experiments show that NIID-Net achieves competitive performance in reflectance estimation and outperforms all previous methods in shading estimation quantitatively and qualitatively. The source code of our implementation is released at https://github.com/zju3dv/NIID-Net.
Collapse
|
43
|
Torrents-Barrena J, Piella G, Valenzuela-Alcaraz B, Gratacos E, Eixarch E, Ceresa M, Gonzalez Ballester MA. TTTS-STgan: Stacked Generative Adversarial Networks for TTTS Fetal Surgery Planning Based on 3D Ultrasound. IEEE TRANSACTIONS ON MEDICAL IMAGING 2020; 39:3595-3606. [PMID: 32746107 DOI: 10.1109/tmi.2020.3001028] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Twin-to-twin transfusion syndrome (TTTS) is characterized by an unbalanced blood transfer through placental abnormal vascular connections. Prenatal ultrasound (US) is the imaging technique to monitor monochorionic pregnancies and diagnose TTTS. Fetoscopic laser photocoagulation is an elective treatment to coagulate placental communications between both twins. To locate the anomalous connections ahead of surgery, preoperative planning is crucial. In this context, we propose a novel multi-task stacked generative adversarial framework to jointly learn synthetic fetal US generation, multi-class segmentation of the placenta, its inner acoustic shadows and peripheral vasculature, and placenta shadowing removal. Specifically, the designed architecture is able to learn anatomical relationships and global US image characteristics. In addition, we also extract for the first time the umbilical cord insertion on the placenta surface from 3D HD-flow US images. The database consisted of 70 US volumes including singleton, mono- and dichorionic twins at 17-37 gestational weeks. Our experiments show that 71.8% of the synthesized US slices were categorized as realistic by clinicians, and that the multi-class segmentation achieved Dice scores of 0.82 ± 0.13, 0.71 ± 0.09, and 0.72 ± 0.09, for placenta, acoustic shadows, and vasculature, respectively. Moreover, fetal surgeons classified 70.2% of our completed placenta shadows as satisfactory texture reconstructions. The umbilical cord was successfully detected on 85.45% of the volumes. The framework developed could be implemented in a TTTS fetal surgery planning software to improve the intrauterine scene understanding and facilitate the location of the optimum fetoscope entry point.
Collapse
|
44
|
Hu X, Fu CW, Zhu L, Qin J, Heng PA. Direction-Aware Spatial Context Features for Shadow Detection and Removal. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:2795-2808. [PMID: 31150337 DOI: 10.1109/tpami.2019.2919616] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Shadow detection and shadow removal are fundamental and challenging tasks, requiring an understanding of the global image semantics. This paper presents a novel deep neural network design for shadow detection and removal by analyzing the spatial image context in a direction-aware manner. To achieve this, we first formulate the direction-aware attention mechanism in a spatial recurrent neural network (RNN) by introducing attention weights when aggregating spatial context features in the RNN. By learning these weights through training, we can recover direction-aware spatial context (DSC) for detecting and removing shadows. This design is developed into the DSC module and embedded in a convolutional neural network (CNN) to learn the DSC features at different levels. Moreover, we design a weighted cross entropy loss to make effective the training for shadow detection and further adopt the network for shadow removal by using a euclidean loss function and formulating a color transfer function to address the color and luminosity inconsistencies in the training pairs. We employed two shadow detection benchmark datasets and two shadow removal benchmark datasets, and performed various experiments to evaluate our method. Experimental results show that our method performs favorably against the state-of-the-art methods for both shadow detection and shadow removal.
Collapse
|
45
|
|
46
|
Haefner B, Peng S, Verma A, Queau Y, Cremers D. Photometric Depth Super-Resolution. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:2453-2464. [PMID: 31226068 DOI: 10.1109/tpami.2019.2923621] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
This study explores the use of photometric techniques (shape-from-shading and uncalibrated photometric stereo) for upsampling the low-resolution depth map from an RGB-D sensor to the higher resolution of the companion RGB image. A single-shot variational approach is first put forward, which is effective as long as the target's reflectance is piecewise-constant. It is then shown that this dependency upon a specific reflectance model can be relaxed by focusing on a specific class of objects (e.g., faces), and delegate reflectance estimation to a deep neural network. A multi-shot strategy based on randomly varying lighting conditions is eventually discussed. It requires no training or prior on the reflectance, yet this comes at the price of a dedicated acquisition setup. Both quantitative and qualitative evaluations illustrate the effectiveness of the proposed methods on synthetic and real-world scenarios.
Collapse
|
47
|
Zuo X, Wang S, Zheng J, Pan Z, Yang R. Detailed Surface Geometry and Albedo Recovery from RGB-D Video under Natural Illumination. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:2720-2734. [PMID: 31765304 DOI: 10.1109/tpami.2019.2955459] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This article presents a novel approach for depth map enhancement from an RGB-D video sequence. The basic idea is to exploit the photometric information in the color sequence to resolve the inherent ambiguity of shape from shading problem. Instead of making any assumption about surface albedo or controlled object motion and lighting, we use the lighting variations introduced by casual object movement. We are effectively calculating photometric stereo from a moving object under natural illuminations. One of the key technical challenges is to establish correspondences over the entire image set. We, therefore, develop a lighting insensitive robust pixel matching technique that out-performs optical flow method in presence of lighting variations. An adaptive reference frame selection procedure is introduced to get more robust to imperfect lambertian reflections. In addition, we present an expectation-maximization framework to recover the surface normal and albedo simultaneously, without any regularization term. We have validated our method on both synthetic and real datasets to show its superior performance on both surface details recovery and intrinsic decomposition.
Collapse
|
48
|
Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from Single and Multiple Images. Int J Comput Vis 2020. [DOI: 10.1007/s11263-020-01347-6] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
49
|
Zhou M, Ding Y, Ji Y, Young SS, Yu J, Ye J. Shape and Reflectance Reconstruction Using Concentric Multi-Spectral Light Field. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:1594-1605. [PMID: 32305895 DOI: 10.1109/tpami.2020.2986764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Recovering the shape and reflectance of non-Lambertian surfaces remains a challenging problem in computer vision since the view-dependent appearance invalidates traditional photo-consistency constraint. In this paper, we introduce a novel concentric multi-spectral light field (CMSLF) design that is able to recover the shape and reflectance of surfaces of various materials in one shot. Our CMSLF system consists of an array of cameras arranged on concentric circles where each ring captures a specific spectrum. Coupled with a multi-spectral ring light, we are able to sample viewpoint and lighting variations in a single shot via spectral multiplexing. We further show that our concentric camera and light source setting results in a unique single-peak pattern in specularity variations across viewpoints. This property enables robust depth estimation for specular points. To estimate depth and multi-spectral reflectance map, we formulate a physics-based reflectance model for the CMSLF under the surface camera (S-Cam) representation. Extensive synthetic and real experiments show that our method outperforms the state-of-the-art shape reconstruction methods, especially for non-Lambertian surfaces.
Collapse
|
50
|
Murray RF. A model of lightness perception guided by probabilistic assumptions about lighting and reflectance. J Vis 2020; 20:28. [PMID: 32725175 PMCID: PMC7424934 DOI: 10.1167/jov.20.7.28] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Lightness perception is the ability to perceive black, white, and gray surface colors in a wide range of lighting conditions and contexts. This ability is fundamental for any biological or artificial visual system, but it poses a difficult computational problem, and how the human visual system computes lightness is not well understood. Here I show that several key phenomena in lightness perception can be explained by a probabilistic graphical model that makes a few simple assumptions about local patterns of lighting and reflectance, and infers globally optimal interpretations of stimulus images. Like human observers, the model exhibits partial lightness constancy, codetermination, contrast, glow, and articulation effects. It also arrives at human-like interpretations of strong lightness illusions that have challenged previous models. The model's assumptions are reasonable and generic, including, for example, that lighting intensity spans a much wider range than surface reflectance and that shadow boundaries tend to be straighter than reflectance edges. Thus, a probabilistic model based on simple assumptions about lighting and reflectance gives a good computational account of lightness perception over a wide range of conditions. This work also shows how graphical models can be extended to develop more powerful models of constancy that incorporate features such color and depth.
Collapse
Affiliation(s)
- Richard F Murray
- Department of Psychology and Centre for Vision Research, York University, Toronto, Ontario, Canada
| |
Collapse
|