1
|
Zohar E, Kozak S, Abeles D, Shahar M, Censor N. Convolutional neural networks uncover the dynamics of human visual memory representations over time. Cereb Cortex 2024; 34:bhae447. [PMID: 39530747 DOI: 10.1093/cercor/bhae447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Revised: 09/21/2024] [Accepted: 10/25/2024] [Indexed: 11/16/2024] Open
Abstract
The ability to accurately retrieve visual details of past events is a fundamental cognitive function relevant for daily life. While a visual stimulus contains an abundance of information, only some of it is later encoded into long-term memory representations. However, an ongoing challenge has been to isolate memory representations that integrate various visual features and uncover their dynamics over time. To address this question, we leveraged a novel combination of empirical and computational frameworks based on the hierarchal structure of convolutional neural networks and their correspondence to human visual processing. This enabled to reveal the contribution of different levels of visual representations to memory strength and their dynamics over time. Visual memory strength was measured with distractors selected based on their shared similarity to the target memory along low or high layers of the convolutional neural network hierarchy. The results show that visual working memory relies similarly on low and high-level visual representations. However, already after a few minutes and on to the next day, visual memory relies more strongly on high-level visual representations. These findings suggest that visual representations transform from a distributed to a stronger high-level conceptual representation, providing novel insights into the dynamics of visual memory over time.
Collapse
Affiliation(s)
- Eden Zohar
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Stas Kozak
- School of Psychological Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Dekel Abeles
- School of Psychological Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Moni Shahar
- The Center for Artificial Intelligence and Data Science (TAD), Tel Aviv University, Tel Aviv 6997801, Israel
| | - Nitzan Censor
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv 6997801, Israel
- School of Psychological Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| |
Collapse
|
2
|
Tominaga S, Horiuchi T. High Dynamic Range Image Reconstruction from Saturated Images of Metallic Objects. J Imaging 2024; 10:92. [PMID: 38667990 PMCID: PMC11051178 DOI: 10.3390/jimaging10040092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 04/12/2024] [Accepted: 04/12/2024] [Indexed: 04/28/2024] Open
Abstract
This study considers a method for reconstructing a high dynamic range (HDR) original image from a single saturated low dynamic range (LDR) image of metallic objects. A deep neural network approach was adopted for the direct mapping of an 8-bit LDR image to HDR. An HDR image database was first constructed using a large number of various metallic objects with different shapes. Each captured HDR image was clipped to create a set of 8-bit LDR images. All pairs of HDR and LDR images were used to train and test the network. Subsequently, a convolutional neural network (CNN) was designed in the form of a deep U-Net-like architecture. The network consisted of an encoder, a decoder, and a skip connection to maintain high image resolution. The CNN algorithm was constructed using the learning functions in MATLAB. The entire network consisted of 32 layers and 85,900 learnable parameters. The performance of the proposed method was examined in experiments using a test image set. The proposed method was also compared with other methods and confirmed to be significantly superior in terms of reconstruction accuracy, histogram fitting, and psychological evaluation.
Collapse
Affiliation(s)
- Shoji Tominaga
- Department of Computer Science, Norwegian University of Science and Technology, 2815 Gjøvik, Norway
- Department of Business and Informatics, Nagano University, Ueda 386-0032, Japan
| | - Takahiko Horiuchi
- Graduate School of Engineering, Chiba University, Chiba 263-8522, Japan;
| |
Collapse
|
3
|
Morimoto T, Akbarinia A, Storrs K, Cheeseman JR, Smithson HE, Gegenfurtner KR, Fleming RW. Color and gloss constancy under diverse lighting environments. J Vis 2023; 23:8. [PMID: 37432844 PMCID: PMC10351023 DOI: 10.1167/jov.23.7.8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2023] Open
Abstract
When we look at an object, we simultaneously see how glossy or matte it is, how light or dark, and what color. Yet, at each point on the object's surface, both diffuse and specular reflections are mixed in different proportions, resulting in substantial spatial chromatic and luminance variations. To further complicate matters, this pattern changes radically when the object is viewed under different lighting conditions. The purpose of this study was to simultaneously measure our ability to judge color and gloss using an image set capturing diverse object and illuminant properties. Participants adjusted the hue, lightness, chroma, and specular reflectance of a reference object so that it appeared to be made of the same material as a test object. Critically, the two objects were presented under different lighting environments. We found that hue matches were highly accurate, except for under a chromatically atypical illuminant. Chroma and lightness constancy were generally poor, but these failures correlated well with simple image statistics. Gloss constancy was particularly poor, and these failures were only partially explained by reflection contrast. Importantly, across all measures, participants were highly consistent with one another in their deviations from constancy. Although color and gloss constancy hold well in simple conditions, the variety of lighting and shape in the real world presents significant challenges to our visual system's ability to judge intrinsic material properties.
Collapse
Affiliation(s)
- Takuma Morimoto
- Justus Liebig University Giessen, Giessen, Germany
- Department of Experimental Psychology, University of Oxford, Oxford, UK
| | | | - Katherine Storrs
- Justus Liebig University Giessen, Giessen, Germany
- School of Psychology, University of Auckland, New Zealand
| | - Jacob R Cheeseman
- Justus Liebig University Giessen, Giessen, Germany
- Center for Mind, Brain and Behavior (CMBB), Universities of Marburg, Giessen and Darmstadt, Germany
| | - Hannah E Smithson
- Department of Experimental Psychology, University of Oxford, Oxford, UK
| | | | - Roland W Fleming
- Justus Liebig University Giessen, Giessen, Germany
- Center for Mind, Brain and Behavior (CMBB), Universities of Marburg, Giessen and Darmstadt, Germany
| |
Collapse
|
4
|
Nohira H, Nagai T. Texture statistics involved in specular highlight exclusion for object lightness perception. J Vis 2023; 23:1. [PMID: 36857040 PMCID: PMC9987166 DOI: 10.1167/jov.23.3.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2023] Open
Abstract
The human visual system estimates the physical properties of objects, such as their lightness. Previous studies on the lightness perception of glossy three-dimensional objects have suggested that specular highlights are detected and excluded in lightness perception. However, only a few studies have attempted to elucidate the mechanisms underlying this exclusion. This study aimed to elucidate the image features that contribute to the highlight exclusion of lightness perception. We used Portilla-Simoncelli texture statistics (PS statistics), an image feature set similar to the representation in the early visual cortex, to explore their relationships with highlight exclusion for lightness perception. In experiment 1, computer graphics images of bumpy plastic plates with various physical parameters were used as stimuli, and the lightness perception on them was measured using a lightness matching task. We then calculated the highlight exclusion index, which represented the degree of highlight exclusion. Finally, we evaluated the correlation between the highlight exclusion index and the four PS statistic subsets. In experiment 2, an image synthesis algorithm was used to create images in which either the PS statistic subset was manipulated. The highlight exclusion indexes of the synthesized images were then measured. The results revealed that the PS statistic subset consisting of lowest-order image features, such as moment statistics of luminance, acts as a necessary condition for highlight exclusion, whereas the other three subsets consisting of higher order features are not crucial. These results suggest that the low-order image features are the most important among the features in PS statistics for highlight exclusion, even though image features higher order than those in PS statistics must be directly involved.
Collapse
Affiliation(s)
- Hiroki Nohira
- Department of Information and Communications Engineering, Tokyo Institute of Technology, Nagatsuta-cho, Midori-ku, Yokohama, Japan.,
| | - Takehiro Nagai
- Department of Information and Communications Engineering, Tokyo Institute of Technology, Nagatsuta-cho, Midori-ku, Yokohama, Japan.,
| |
Collapse
|
5
|
Ponting S, Morimoto T, Smithson HE. Modeling surface color discrimination under different lighting environments using image chromatic statistics and convolutional neural networks. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA. A, OPTICS, IMAGE SCIENCE, AND VISION 2023; 40:A149-A159. [PMID: 36846077 PMCID: PMC7614229 DOI: 10.1364/josaa.479986] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 01/12/2023] [Accepted: 01/12/2023] [Indexed: 08/10/2023]
Abstract
We modeled discrimination thresholds for object colors under different lighting environments [J. Opt. Soc. Am. 35, B244 (2018)]. First, we built models based on chromatic statistics, testing 60 models in total. Second, we trained convolutional neural networks (CNNs), using 160,280 images labeled by either the ground-truth or human responses. No single chromatic statistics model was sufficient to describe human discrimination thresholds across conditions, while human-response-trained CNNs nearly perfectly predicted human thresholds. Guided by region-of-interest analysis of the network, we modified the chromatic statistics models to use only the lower regions of the objects, which substantially improved performance.
Collapse
Affiliation(s)
- Samuel Ponting
- Department of Experimental Psychology, University of Oxford, Oxford, UK
- These authors contributed equally to this paper
| | - Takuma Morimoto
- Department of Experimental Psychology, University of Oxford, Oxford, UK
- Department of Psychology, Justus-Liebig-Universitat-Giessen, Giessen, Germany
- These authors contributed equally to this paper
| | | |
Collapse
|
6
|
Ponting S, Morimoto T, Smithson HE. Modeling surface color discrimination under different lighting environments using image chromatic statistics and convolutional neural networks. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA. A, OPTICS, IMAGE SCIENCE, AND VISION 2023; 40:A149-A159. [PMID: 36846077 PMCID: PMC7614229 DOI: 10.1364/josaa.4799861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
We modeled discrimination thresholds for object colors under different lighting environments [J. Opt. Soc. Am. 35, B244 (2018)]. First, we built models based on chromatic statistics, testing 60 models in total. Second, we trained convolutional neural networks (CNNs), using 160,280 images labeled by either the ground-truth or human responses. No single chromatic statistics model was sufficient to describe human discrimination thresholds across conditions, while human-response-trained CNNs nearly perfectly predicted human thresholds. Guided by region-of-interest analysis of the network, we modified the chromatic statistics models to use only the lower regions of the objects, which substantially improved performance.
Collapse
Affiliation(s)
- Samuel Ponting
- Department of Experimental Psychology, University of Oxford, Oxford, UK
| | - Takuma Morimoto
- Department of Experimental Psychology, University of Oxford, Oxford, UK
- Department of Psychology, Justus-Liebig-Universitat-Giessen, Giessen, Germany
- Corresponding author:
| | | |
Collapse
|
7
|
Liao C, Sawayama M, Xiao B. Unsupervised learning reveals interpretable latent representations for translucency perception. PLoS Comput Biol 2023; 19:e1010878. [PMID: 36753520 PMCID: PMC9942964 DOI: 10.1371/journal.pcbi.1010878] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 02/21/2023] [Accepted: 01/18/2023] [Indexed: 02/09/2023] Open
Abstract
Humans constantly assess the appearance of materials to plan actions, such as stepping on icy roads without slipping. Visual inference of materials is important but challenging because a given material can appear dramatically different in various scenes. This problem especially stands out for translucent materials, whose appearance strongly depends on lighting, geometry, and viewpoint. Despite this, humans can still distinguish between different materials, and it remains unsolved how to systematically discover visual features pertinent to material inference from natural images. Here, we develop an unsupervised style-based image generation model to identify perceptually relevant dimensions for translucent material appearances from photographs. We find our model, with its layer-wise latent representation, can synthesize images of diverse and realistic materials. Importantly, without supervision, human-understandable scene attributes, including the object's shape, material, and body color, spontaneously emerge in the model's layer-wise latent space in a scale-specific manner. By embedding an image into the learned latent space, we can manipulate specific layers' latent code to modify the appearance of the object in the image. Specifically, we find that manipulation on the early-layers (coarse spatial scale) transforms the object's shape, while manipulation on the later-layers (fine spatial scale) modifies its body color. The middle-layers of the latent space selectively encode translucency features and manipulation of such layers coherently modifies the translucency appearance, without changing the object's shape or body color. Moreover, we find the middle-layers of the latent space can successfully predict human translucency ratings, suggesting that translucent impressions are established in mid-to-low spatial scale features. This layer-wise latent representation allows us to systematically discover perceptually relevant image features for human translucency perception. Together, our findings reveal that learning the scale-specific statistical structure of natural images might be crucial for humans to efficiently represent material properties across contexts.
Collapse
Affiliation(s)
- Chenxi Liao
- Department of Neuroscience, American University, Washington, D.C., District of Columbia, United States of America
| | - Masataka Sawayama
- Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
| | - Bei Xiao
- Department of Computer Science, American University, Washington, D.C., District of Columbia, United States of America
| |
Collapse
|
8
|
Prokott E, Fleming RW. Identifying specular highlights: Insights from deep learning. J Vis 2022; 22:6. [PMID: 35713928 PMCID: PMC9206496 DOI: 10.1167/jov.22.7.6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 04/04/2022] [Indexed: 11/24/2022] Open
Abstract
Specular highlights are the most important image feature for surface gloss perception. Yet, recognizing whether a bright patch in an image is due to specular reflection or some other cause (e.g., texture marking) is challenging, and it remains unclear how the visual system reliably identifies highlights. There is currently no image-computable model that emulates human highlight identification, so here we sought to develop a neural network that reproduces observers' characteristic successes and failures. We rendered 179,085 images of glossy, undulating, textured surfaces. Given such images as input, a feedforward convolutional neural network was trained to output an image containing only the specular reflectance component. Participants viewed such images and reported whether or not specific pixels were highlights. The queried pixels were carefully selected to distinguish between ground truth and a simple thresholding of image intensity. The neural network outperformed the simple thresholding model-and ground truth-at predicting human responses. We then used a genetic algorithm to selectively delete connections within the neural network to identify variants of the network that approximated human judgments even more closely. The best resulting network shared 68% of the variance with human judgments-more than the unpruned network. As a first step toward interpreting the network, we then used representational similarity analysis to compare its inner representations to a wide variety of hand-engineered image features. We find that the network learns representations that are similar not only to directly image-computable predictors but also to more complex predictors such as intrinsic or geometric factors, as well as some indications of photo-geometrical constraints learned by the network. However, our network fails to replicate human response patterns to violations of photo-geometric constraints (rotated highlights) as described by other authors.
Collapse
Affiliation(s)
- Eugen Prokott
- Department of Experimental Psychology, Justus-Liebig-University Giessen, Giessen, Germany
| | - Roland W Fleming
- Department of Experimental Psychology, Justus-Liebig-University Giessen, Giessen, Germany
- Center for Mind, Brain and Behavior, University of Marburg and Justus-Liebig-University Giessen, Giessen, Germany
| |
Collapse
|
9
|
Flachot A, Akbarinia A, Schütt HH, Fleming RW, Wichmann FA, Gegenfurtner KR. Deep neural models for color classification and color constancy. J Vis 2022; 22:17. [PMID: 35353153 PMCID: PMC8976922 DOI: 10.1167/jov.22.4.17] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Color constancy is our ability to perceive constant colors across varying illuminations. Here, we trained deep neural networks to be color constant and evaluated their performance with varying cues. Inputs to the networks consisted of two-dimensional images of simulated cone excitations derived from three-dimensional (3D) rendered scenes of 2,115 different 3D shapes, with spectral reflectances of 1,600 different Munsell chips, illuminated under 278 different natural illuminations. The models were trained to classify the reflectance of the objects. Testing was done with four new illuminations with equally spaced CIEL*a*b* chromaticities, two along the daylight locus and two orthogonal to it. High levels of color constancy were achieved with different deep neural networks, and constancy was higher along the daylight locus. When gradually removing cues from the scene, constancy decreased. Both ResNets and classical ConvNets of varying degrees of complexity performed well. However, DeepCC, our simplest sequential convolutional network, represented colors along the three color dimensions of human color vision, while ResNets showed a more complex representation.
Collapse
Affiliation(s)
- Alban Flachot
- Abteilung Allgemeine Psychologie, Justus Liebig University, Giessen, Germany.,
| | - Arash Akbarinia
- Abteilung Allgemeine Psychologie, Justus Liebig University, Giessen, Germany.,
| | - Heiko H Schütt
- Center for Neural Science, New York University, New York, NY, USA.,
| | - Roland W Fleming
- Experimental Psychology, Justus Liebig University, Giessen, Germany.,
| | - Felix A Wichmann
- Neural Information Processing Group, University of Tübingen, Germany.,
| | - Karl R Gegenfurtner
- Abteilung Allgemeine Psychologie, Justus Liebig University, Giessen, Germany.,
| |
Collapse
|