1
|
Vincent J, Maertens M, Aguilar G. What Fechner could not do: Separating perceptual encoding and decoding with difference scaling. J Vis 2024; 24:5. [PMID: 38722273 PMCID: PMC11090143 DOI: 10.1167/jov.24.5.5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 02/29/2024] [Indexed: 05/15/2024] Open
Abstract
A key question in perception research is how stimulus variations translate into perceptual magnitudes, that is, the perceptual encoding process. As experimenters, we cannot probe perceptual magnitudes directly, but infer the encoding process from responses obtained in a psychophysical experiment. The most prominent experimental technique to measure perceptual appearance is matching, where observers adjust a probe stimulus to match a target in its appearance along the dimension of interest. The resulting data quantify the perceived magnitude of the target in physical units of the probe, and are thus an indirect expression of the underlying encoding process. In this paper, we show analytically and in simulation that data from matching tasks do not sufficiently constrain perceptual encoding functions, because there exist an infinite number of pairs of encoding functions that generate the same matching data. We use simulation to demonstrate that maximum likelihood conjoint measurement (Ho, Landy, & Maloney, 2008; Knoblauch & Maloney, 2012) does an excellent job of recovering the shape of ground truth encoding functions from data that were generated with these very functions. Finally, we measure perceptual scales and matching data for White's effect (White, 1979) and show that the matching data can be predicted from the estimated encoding functions, down to individual differences.
Collapse
Affiliation(s)
- Joris Vincent
- Computational Psychology, Technische Universität, Berlin, Germany
- https://www.psyco.tu-berlin.de/vincent.html
| | - Marianne Maertens
- Computational Psychology, Technische Universität, Berlin, Germany
- https://www.psyco.tu-berlin.de/maertens.html
| | - Guillermo Aguilar
- Computational Psychology, Technische Universität, Berlin, Germany
- https://www.psyco.tu-berlin.de/aguilar.html
| |
Collapse
|
2
|
Luna R, Zabaleta I, Bertalmío M. State-of-the-art image and video quality assessment with a metric based on an intrinsically non-linear neural summation model. Front Neurosci 2023; 17:1222815. [PMID: 37559700 PMCID: PMC10408451 DOI: 10.3389/fnins.2023.1222815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 06/30/2023] [Indexed: 08/11/2023] Open
Abstract
The development of automatic methods for image and video quality assessment that correlate well with the perception of human observers is a very challenging open problem in vision science, with numerous practical applications in disciplines such as image processing and computer vision, as well as in the media industry. In the past two decades, the goal of image quality research has been to improve upon classical metrics by developing models that emulate some aspects of the visual system, and while the progress has been considerable, state-of-the-art quality assessment methods still share a number of shortcomings, like their performance dropping considerably when they are tested on a database that is quite different from the one used to train them, or their significant limitations in predicting observer scores for high framerate videos. In this work we propose a novel objective method for image and video quality assessment that is based on the recently introduced Intrinsically Non-linear Receptive Field (INRF) formulation, a neural summation model that has been shown to be better at predicting neural activity and visual perception phenomena than the classical linear receptive field. Here we start by optimizing, on a classic image quality database, the four parameters of a very simple INRF-based metric, and proceed to test this metric on three other databases, showing that its performance equals or surpasses that of the state-of-the-art methods, some of them having millions of parameters. Next, we extend to the temporal domain this INRF image quality metric, and test it on several popular video quality datasets; again, the results of our proposed INRF-based video quality metric are shown to be very competitive.
Collapse
Affiliation(s)
- Raúl Luna
- Institute of Optics, Spanish National Research Council (CSIC), Madrid, Spain
| | - Itziar Zabaleta
- Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, Spain
| | - Marcelo Bertalmío
- Institute of Optics, Spanish National Research Council (CSIC), Madrid, Spain
| |
Collapse
|
3
|
Abstract
Human vision relies on mechanisms that respond to luminance edges in space and time. Most edge models use orientation-selective mechanisms on multiple spatial scales and operate on static inputs assuming that edge processing occurs within a single fixational instance. Recent studies, however, demonstrate functionally relevant temporal modulations of the sensory input due to fixational eye movements. Here we propose a spatiotemporal model of human edge detection that combines elements of spatial and active vision. The model augments a spatial vision model by temporal filtering and shifts the input images over time, mimicking an active sampling scheme via fixational eye movements. The first model test was White's illusion, a lightness effect that has been shown to depend on edges. The model reproduced the spatial-frequency-specific interference with the edges by superimposing narrowband noise (1–5 cpd), similar to the psychophysical interference observed in White's effect. Second, we compare the model's edge detection performance in natural images in the presence and absence of Gaussian white noise with human-labeled contours for the same (noise-free) images. Notably, the model detects edges robustly against noise in both test cases without relying on orientation-selective processes. Eliminating model components, we demonstrate the relevance of multiscale spatiotemporal filtering and scale-specific normalization for edge detection. The proposed model facilitates efficient edge detection in (artificial) vision systems and challenges the notion that orientation-selective mechanisms are required for edge detection.
Collapse
Affiliation(s)
- Lynn Schmittwilken
- Science of Intelligence and Computational Psychology, Faculty of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany.,
| | - Marianne Maertens
- Science of Intelligence and Computational Psychology, Faculty of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany.,
| |
Collapse
|
4
|
Lerer A, Supèr H, Keil MS. Dynamic decorrelation as a unifying principle for explaining a broad range of brightness phenomena. PLoS Comput Biol 2021; 17:e1007907. [PMID: 33901165 PMCID: PMC8102013 DOI: 10.1371/journal.pcbi.1007907] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Revised: 05/06/2021] [Accepted: 04/06/2021] [Indexed: 11/29/2022] Open
Abstract
The visual system is highly sensitive to spatial context for encoding luminance patterns. Context sensitivity inspired the proposal of many neural mechanisms for explaining the perception of luminance (brightness). Here we propose a novel computational model for estimating the brightness of many visual illusions. We hypothesize that many aspects of brightness can be explained by a dynamic filtering process that reduces the redundancy in edge representations on the one hand, while non-redundant activity is enhanced on the other. The dynamic filter is learned for each input image and implements context sensitivity. Dynamic filtering is applied to the responses of (model) complex cells in order to build a gain control map. The gain control map then acts on simple cell responses before they are used to create a brightness map via activity propagation. Our approach is successful in predicting many challenging visual illusions, including contrast effects, assimilation, and reverse contrast with the same set of model parameters.
Collapse
Affiliation(s)
- Alejandro Lerer
- Departament de Cognició, Desenvolupament i Psicologia de l’Educació, Faculty of Psychology, University of Barcelona, Barcelona, Spain
| | - Hans Supèr
- Departament de Cognició, Desenvolupament i Psicologia de l’Educació, Faculty of Psychology, University of Barcelona, Barcelona, Spain
- Institut de Neurociències, Universitat de Barcelona, Barcelona, Spain
- Institut de Recerca Pediàtrica Hospital Sant Joan de Déu, Barcelona, Spain
- Catalan Institute for Advanced Studies (ICREA), Barcelona, Spain
| | - Matthias S. Keil
- Departament de Cognició, Desenvolupament i Psicologia de l’Educació, Faculty of Psychology, University of Barcelona, Barcelona, Spain
- Institut de Neurociències, Universitat de Barcelona, Barcelona, Spain
| |
Collapse
|
5
|
Bertalmío M, Gomez-Villa A, Martín A, Vazquez-Corral J, Kane D, Malo J. Evidence for the intrinsically nonlinear nature of receptive fields in vision. Sci Rep 2020; 10:16277. [PMID: 33004868 PMCID: PMC7530701 DOI: 10.1038/s41598-020-73113-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 09/11/2020] [Indexed: 11/10/2022] Open
Abstract
The responses of visual neurons, as well as visual perception phenomena in general, are highly nonlinear functions of the visual input, while most vision models are grounded on the notion of a linear receptive field (RF). The linear RF has a number of inherent problems: it changes with the input, it presupposes a set of basis functions for the visual system, and it conflicts with recent studies on dendritic computations. Here we propose to model the RF in a nonlinear manner, introducing the intrinsically nonlinear receptive field (INRF). Apart from being more physiologically plausible and embodying the efficient representation principle, the INRF has a key property of wide-ranging implications: for several vision science phenomena where a linear RF must vary with the input in order to predict responses, the INRF can remain constant under different stimuli. We also prove that Artificial Neural Networks with INRF modules instead of linear filters have a remarkably improved performance and better emulate basic human perception. Our results suggest a change of paradigm for vision science as well as for artificial intelligence.
Collapse
Affiliation(s)
| | | | | | | | - David Kane
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Jesús Malo
- Universitat de Valencia, Valencia, Spain
| |
Collapse
|
6
|
Gomez-Villa A, Martín A, Vazquez-Corral J, Bertalmío M, Malo J. Color illusions also deceive CNNs for low-level vision tasks: Analysis and implications. Vision Res 2020; 176:156-174. [PMID: 32896717 DOI: 10.1016/j.visres.2020.07.010] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Revised: 07/10/2020] [Accepted: 07/22/2020] [Indexed: 11/18/2022]
Abstract
The study of visual illusions has proven to be a very useful approach in vision science. In this work we start by showing that, while convolutional neural networks (CNNs) trained for low-level visual tasks in natural images may be deceived by brightness and color illusions, some network illusions can be inconsistent with the perception of humans. Next, we analyze where these similarities and differences may come from. On one hand, the proposed linear eigenanalysis explains the overall similarities: in simple CNNs trained for tasks like denoising or deblurring, the linear version of the network has center-surround receptive fields, and global transfer functions are very similar to the human achromatic and chromatic contrast sensitivity functions in human-like opponent color spaces. These similarities are consistent with the long-standing hypothesis that considers low-level visual illusions as a by-product of the optimization to natural environments. Specifically, here human-like features emerge from error minimization. On the other hand, the observed differences must be due to the behavior of the human visual system not explained by the linear approximation. However, our study also shows that more 'flexible' network architectures, with more layers and a higher degree of nonlinearity, may actually have a worse capability of reproducing visual illusions. This implies, in line with other works in the vision science literature, a word of caution on using CNNs to study human vision: on top of the intrinsic limitations of the L + NL formulation of artificial networks to model vision, the nonlinear behavior of flexible architectures may easily be markedly different from that of the visual system.
Collapse
Affiliation(s)
- A Gomez-Villa
- Dept. Inf. Comm. Tech., Universitat Pompeu Fabra, Barcelona, Spain.
| | - A Martín
- Dept. Inf. Comm. Tech., Universitat Pompeu Fabra, Barcelona, Spain.
| | - J Vazquez-Corral
- Dept. Inf. Comm. Tech., Universitat Pompeu Fabra, Barcelona, Spain.
| | - M Bertalmío
- Dept. Inf. Comm. Tech., Universitat Pompeu Fabra, Barcelona, Spain.
| | - J Malo
- Image Proc., Lab, Universitat de València, València, Spain.
| |
Collapse
|
7
|
Coia AJ, Crognale MA. Contour adaptation reduces the spreading of edge induced colors. Vision Res 2017; 151:135-140. [PMID: 28427892 DOI: 10.1016/j.visres.2017.01.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Revised: 12/24/2016] [Accepted: 01/19/2017] [Indexed: 10/19/2022]
Abstract
Brief exposure to flickering achromatic outlines of an area causes a reduction in the brightness contrast of the surface inside the area. This contour adaptation to achromatic contours does not reduce surface contrast when the surface is chromatic (the saturation or colorimetric purity of the surface is maintained). In addition to reducing the brightness of physical luminance contrast, contour adaptation also reduces (or even reverses) the illusory brightness contrast seen in the Craik-O'Brien-Cornsweet illusion, in which two physically identical grey areas appear different brightness because of a sharp luminance edge separating them. Chromatic color spreading illusions also occur with chromatic inducing edges, and an unanswered question is whether contour adaptation can reduce the perceived contrast of illusory color spreading from edges, even though it cannot reduce the perceived contrast of physical surface color. The current studies use a color spreading illusion known as the watercolor effect in order to test whether illusory color spreading is affected by contour adaptation. The general findings of physical achromatic contrast being reduced and chromatic contrast being robust to contour adaptation were replicated. However, both illusory brightness and color were reduced by contour adaptation, even when the illusion edges only differed in chromatic contrast with each other and the background. Additional studies adapting to chromatic contours showed opposite effects on illusory color contrast than achromatic adaptation.
Collapse
Affiliation(s)
- Andrew J Coia
- University of Nevada Reno, Psychology Department, 1664 N Virginia St, Reno, NV 89557, United States.
| | - Michael A Crognale
- University of Nevada Reno, Psychology Department, 1664 N Virginia St, Reno, NV 89557, United States
| |
Collapse
|
8
|
Blakeslee B, Padmanabhan G, McCourt ME. Dissecting the influence of the collinear and flanking bars in White's effect. Vision Res 2016; 127:11-17. [PMID: 27425384 DOI: 10.1016/j.visres.2016.07.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2015] [Revised: 06/29/2016] [Accepted: 07/05/2016] [Indexed: 11/18/2022]
Abstract
In White's effect equiluminant test patches placed on the black and white bars of a square-wave grating appear different in brightness. The illusion has generated intense interest because the direction of the brightness effect does not correlate with the amount of black or white border in contact with the test patch, or in its general vicinity. Therefore, unlike brightness induction effects such as simultaneous contrast, White's effect is not consistent with explanations based on contrast or assimilation that depend solely on the relative amounts of black and white surrounding the test patches. We independently manipulated the luminance of the collinear and flanking bars to investigate their influence on test patch matching luminance (brightness). The inducing grating was a 0.5c/d square-wave and test patches measured 1.0° in width and either 0.5° or 3.0° in height. Test patches measuring 0.5° in height had more extensive contact with the collinear bars and test patches measuring 3.0° in height had more extensive contact with the flanking bars. The luminance of the collinear (or flanking) bars assumed twenty values from 3.2 to 124.8cd/m(2), while the luminance of the flanking (or collinear) bars remained white (124.8cd/m(2)) or black (3.2cd/m(2)). Under these conditions the influence of the collinear and flanking bars was found to be purely in the direction of contrast. The effect was dominated by contrast from the collinear bars (which results in White's effect), however, the influence of the flanking bars was also in the contrast direction. The data elucidate the luminance relationships between the collinear and flanking bars which produce the behavior associated with White's effect as well as that associated with "the inverted White effect" which is akin to simultaneous contrast.
Collapse
Affiliation(s)
- Barbara Blakeslee
- Center for Visual and Cognitive Neuroscience, Department of Psychology, North Dakota State University, Fargo, ND 58105-5075, United States.
| | - Ganesh Padmanabhan
- Center for Visual and Cognitive Neuroscience, Department of Psychology, North Dakota State University, Fargo, ND 58105-5075, United States
| | - Mark E McCourt
- Center for Visual and Cognitive Neuroscience, Department of Psychology, North Dakota State University, Fargo, ND 58105-5075, United States
| |
Collapse
|
9
|
Betz T, Shapley R, Wichmann FA, Maertens M. Noise masking of White's illusion exposes the weakness of current spatial filtering models of lightness perception. J Vis 2015; 15:1. [PMID: 26426914 PMCID: PMC6894438 DOI: 10.1167/15.14.1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2015] [Accepted: 08/23/2015] [Indexed: 11/24/2022] Open
Abstract
Spatial filtering models are currently a widely accepted mechanistic account of human lightness perception. Their popularity can be ascribed to two reasons: They correctly predict how human observers perceive a variety of lightness illusions, and the processing steps involved in the models bear an apparent resemblance with known physiological mechanisms at early stages of visual processing. Here, we tested the adequacy of these models by probing their response to stimuli that have been modified by adding narrowband noise. Psychophysically, it has been shown that noise in the range of one to five cycles per degree (cpd) can drastically reduce the strength of some lightness phenomena, while noise outside this range has little or no effect on perceived lightness. Choosing White's illusion (White, 1979) as a test case, we replicated and extended the psychophysical results, and found that none of the spatial filtering models tested was able to reproduce the spatial frequency specific effect of narrowband noise. We discuss the reasons for failure for each model individually, but we argue that the failure is indicative of the general inadequacy of this class of spatial filtering models. Given the present evidence we do not believe that spatial filtering models capture the mechanisms that are responsible for producing many of the lightness phenomena observed in human perception. Instead we think that our findings support the idea that low-level contributions to perceived lightness are primarily determined by the luminance contrast at surface boundaries.
Collapse
|