1
|
Manning TS, Alexander E, Cumming BG, DeAngelis GC, Huang X, Cooper EA. Transformations of sensory information in the brain suggest changing criteria for optimality. PLoS Comput Biol 2024; 20:e1011783. [PMID: 38206969 PMCID: PMC10807827 DOI: 10.1371/journal.pcbi.1011783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 01/24/2024] [Accepted: 12/22/2023] [Indexed: 01/13/2024] Open
Abstract
Neurons throughout the brain modulate their firing rate lawfully in response to sensory input. Theories of neural computation posit that these modulations reflect the outcome of a constrained optimization in which neurons aim to robustly and efficiently represent sensory information. Our understanding of how this optimization varies across different areas in the brain, however, is still in its infancy. Here, we show that neural sensory responses transform along the dorsal stream of the visual system in a manner consistent with a transition from optimizing for information preservation towards optimizing for perceptual discrimination. Focusing on the representation of binocular disparities-the slight differences in the retinal images of the two eyes-we re-analyze measurements characterizing neuronal tuning curves in brain areas V1, V2, and MT (middle temporal) in the macaque monkey. We compare these to measurements of the statistics of binocular disparity typically encountered during natural behaviors using a Fisher Information framework. The differences in tuning curve characteristics across areas are consistent with a shift in optimization goals: V1 and V2 population-level responses are more consistent with maximizing the information encoded about naturally occurring binocular disparities, while MT responses shift towards maximizing the ability to support disparity discrimination. We find that a change towards tuning curves preferring larger disparities is a key driver of this shift. These results provide new insight into previously-identified differences between disparity-selective areas of cortex and suggest these differences play an important role in supporting visually-guided behavior. Our findings emphasize the need to consider not just information preservation and neural resources, but also relevance to behavior, when assessing the optimality of neural codes.
Collapse
Affiliation(s)
- Tyler S. Manning
- Herbert Wertheim School of Optometry & Vision Science, University of California, Berkeley
| | - Emma Alexander
- Department of Computer Science, Northwestern University, Illinois, United States of America
| | - Bruce G. Cumming
- Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health, Maryland, United States of America
| | - Gregory C. DeAngelis
- Department of Brain and Cognitive Sciences, University of Rochester, New York, United States of America
| | - Xin Huang
- Department of Neuroscience, University of Wisconsin, Madison
| | - Emily A. Cooper
- Herbert Wertheim School of Optometry & Vision Science, University of California, Berkeley
- Helen Wills Neuroscience Institute, University of California, Berkeley
| |
Collapse
|
2
|
Burge J, Burge T. Shape, perspective, and what is and is not perceived: Comment on Morales, Bax, and Firestone (2020). Psychol Rev 2023; 130:1125-1136. [PMID: 35549319 PMCID: PMC11366222 DOI: 10.1037/rev0000363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Psychology and philosophy have long reflected on the role of perspective in vision. Since the dawn of modern vision science-roughly, since Helmholtz in the late 1800s-scientific explanations in vision have focused on understanding the computations that transform the sensed retinal image into percepts of the three-dimensional environment. The standard view in the science is that distal properties-viewpoint-independent properties of the environment (object shape) and viewpoint-dependent relational properties (3D orientation relative to the viewer)-are perceptually represented and that properties of the proximal stimulus (in vision, the retinal image) are not. This view is woven into the nature of scientific explanation in perceptual psychology, and has guided impressive advances over the past 150 years. A recently published article suggests that in shape perception, the standard view must be revised. It argues, on the basis of new empirical data, that a new entity-perspectival shape-should be introduced into scientific explanations of shape perception. Specifically, the article's centrally advertised claim is that, in addition to distal shape, perspectival shape is perceived. We argue that this claim rests on a series of mistakes. Problems in experimental design entail that the article provides no empirical support for any claims regarding either perspective or the perception of shape. There are further problems in scientific reasoning and conceptual development. Detailing these criticisms and explaining how science treats these issues are meant to clarify method and theory, and to improve exchanges between the science and philosophy of perception. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
Collapse
Affiliation(s)
- Johannes Burge
- Department of Psychology, University of Pennsylvania
- Neuroscience Graduate Group, University of Pennsylvania
- Bioengineering Graduate Group, University of Pennsylvania
| | - Tyler Burge
- Department of Philosophy, University of California, Los Angeles
| |
Collapse
|
3
|
Anderson MD, Elder JH, Graf EW, Adams WJ. The time-course of real-world scene perception: Spatial and semantic processing. iScience 2022; 25:105633. [PMID: 36505927 PMCID: PMC9732406 DOI: 10.1016/j.isci.2022.105633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 09/16/2022] [Accepted: 11/16/2022] [Indexed: 11/21/2022] Open
Abstract
Real-world scene perception unfolds remarkably quickly, yet the underlying visual processes are poorly understood. Space-centered theory maintains that a scene's spatial structure (e.g., openness, mean depth) can be rapidly recovered from low-level image statistics. In turn, the statistical relationship between a scene's spatial properties and semantic content allows for semantic identity to be inferred from its layout. We tested this theory by investigating (1) the temporal dynamics of spatial and semantic perception in real-world scenes, and (2) dependencies between spatial and semantic judgments. Participants viewed backward-masked images for 13.3 to 106.7 ms, and identified the semantic (e.g., beach, road) or spatial structure (e.g., open, closed-off) category. We found no temporal precedence of spatial discrimination relative to semantic discrimination. Computational analyses further suggest that, instead of using spatial layout to infer semantic categories, humans exploit semantic information to discriminate spatial structure categories. These findings challenge traditional 'bottom-up' views of scene perception.
Collapse
Affiliation(s)
- Matt D. Anderson
- Centre for Perception and Cognition, Psychology, University of Southampton, Southampton, UK
| | - James H. Elder
- Centre for Vision Research, Department of Psychology, Department of Electrical Engineering and Computer Science, York University, Toronto, Canada
| | - Erich W. Graf
- Centre for Perception and Cognition, Psychology, University of Southampton, Southampton, UK
| | - Wendy J. Adams
- Centre for Perception and Cognition, Psychology, University of Southampton, Southampton, UK
| |
Collapse
|
4
|
Chin BM, Burge J. Perceptual consequences of interocular differences in the duration of temporal integration. J Vis 2022; 22:12. [PMID: 36355360 PMCID: PMC9652723 DOI: 10.1167/jov.22.12.12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 07/25/2022] [Indexed: 11/12/2022] Open
Abstract
Temporal differences in visual information processing between the eyes can cause dramatic misperceptions of motion and depth. Processing delays between the eyes cause the Pulfrich effect: oscillating targets in the frontal plane are misperceived as moving along near-elliptical motion trajectories in depth (Pulfrich, 1922). Here, we explain a previously reported but poorly understood variant: the anomalous Pulfrich effect. When this variant is perceived, the illusory motion trajectory appears oriented left- or right-side back in depth, rather than aligned with the true direction of motion. Our data indicate that this perceived misalignment is due to interocular differences in neural temporal integration periods, as opposed to interocular differences in delay. For oscillating motion, differences in the duration of temporal integration dampen the effective motion amplitude in one eye relative to the other. In a dynamic analog of the Geometric effect in stereo-surface-orientation perception (Ogle, 1950), the different motion amplitudes cause the perceived misorientation of the motion trajectories. Forced-choice psychophysical experiments, conducted with both different spatial frequencies and different onscreen motion damping in the two eyes show that the perceived misorientation in depth is associated with the eye having greater motion damping. A target-tracking experiment provided more direct evidence that the anomalous Pulfrich effect is caused by interocular differences in temporal integration and delay. These findings highlight the computational hurdles posed to the visual system by temporal differences in sensory processing. Future work will explore how the visual system overcomes these challenges to achieve accurate perception.
Collapse
Affiliation(s)
- Benjamin M Chin
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
| | - Johannes Burge
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
- Neuroscience Graduate Group, University of Pennsylvania, Philadelphia, PA, USA
- Bioengineering Graduate Group, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
5
|
Kalou K, Sedda G, Gibaldi A, Sabatini SP. Learning bio-inspired head-centric representations of 3D shapes in an active fixation setting. Front Robot AI 2022; 9:994284. [PMID: 36329691 PMCID: PMC9623882 DOI: 10.3389/frobt.2022.994284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 09/13/2022] [Indexed: 12/03/2022] Open
Abstract
When exploring the surrounding environment with the eyes, humans and primates need to interpret three-dimensional (3D) shapes in a fast and invariant way, exploiting a highly variant and gaze-dependent visual information. Since they have front-facing eyes, binocular disparity is a prominent cue for depth perception. Specifically, it serves as computational substrate for two ground mechanisms of binocular active vision: stereopsis and binocular coordination. To this aim, disparity information, which is expressed in a retinotopic reference frame, is combined along the visual cortical pathways with gaze information and transformed in a head-centric reference frame. Despite the importance of this mechanism, the underlying neural substrates still remain widely unknown. In this work, we investigate the capabilities of the human visual system to interpret the 3D scene exploiting disparity and gaze information. In a psychophysical experiment, human subjects were asked to judge the depth orientation of a planar surface either while fixating a target point or while freely exploring the surface. Moreover, we used the same stimuli to train a recurrent neural network to exploit the responses of a modelled population of cortical (V1) cells to interpret the 3D scene layout. The results for both human performance and from the model network show that integrating disparity information across gaze directions is crucial for a reliable and invariant interpretation of the 3D geometry of the scene.
Collapse
Affiliation(s)
- Katerina Kalou
- Department of Informatics, Bioengineering, Robotics and Systems Engineering, University of Genoa, Genoa, Italy
| | - Giulia Sedda
- Department of Informatics, Bioengineering, Robotics and Systems Engineering, University of Genoa, Genoa, Italy
| | - Agostino Gibaldi
- University of California Berkeley, School of Optometry, Berkeley, CA, United States
| | - Silvio P. Sabatini
- Department of Informatics, Bioengineering, Robotics and Systems Engineering, University of Genoa, Genoa, Italy
| |
Collapse
|
6
|
Oluk C, Bonnen K, Burge J, Cormack LK, Geisler WS. Stereo slant discrimination of planar 3D surfaces: Frontoparallel versus planar matching. J Vis 2022; 22:6. [PMID: 35467704 PMCID: PMC9055558 DOI: 10.1167/jov.22.5.6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Accepted: 02/19/2022] [Indexed: 11/24/2022] Open
Abstract
Binocular stereo cues are important for discriminating 3D surface orientation, especially at near distances. We devised a single-interval task where observers discriminated the slant of a densely textured planar test surface relative to a textured planar surround reference surface. Although surfaces were rendered with correct perspective, the stimuli were designed so that the binocular cues dominated performance. Slant discrimination performance was measured as a function of the reference slant and the level of uncorrelated white noise added to the test-plane images in the left and right eyes. We compared human performance with an approximate ideal observer (planar matching [PM]) and two subideal observers. The PM observer uses the image in one eye and back projection to predict a test image in the other eye for all possible slants, tilts, and distances. The estimated slant, tilt, and distance are determined by the prediction that most closely matches the measured image in the other eye. The first subideal observer (local planar matching [LPM]) applies PM over local neighborhoods and then pools estimates across the test plane. The second suboptimal observer (local frontoparallel matching [LFM]) uses only location disparity. We find that the ideal observer (PM) and the first subideal observer (LPM) outperforms the second subideal observer (LFM), demonstrating the additional benefit of pattern disparities. We also find that all three model observers can account for human performance, if two free parameters are included: a fixed small level of internal estimation noise, and a fixed overall efficiency scalar on slant discriminability.
Collapse
Affiliation(s)
- Can Oluk
- Center for Perceptual Systems and Department of Psychology, University of Texas at Austin, Austin, TX, USA
| | - Kathryn Bonnen
- School of Optometry, Indiana University Bloomington, Bloomington, IN, USA
| | - Johannes Burge
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
| | - Lawrence K Cormack
- Center for Perceptual Systems and Department of Psychology, University of Texas at Austin, Austin, TX, USA
| | - Wilson S Geisler
- Center for Perceptual Systems and Department of Psychology, University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
7
|
Basgöze Z, White DN, Burge J, Cooper EA. Natural statistics of depth edges modulate perceptual stability. J Vis 2020; 20:10. [PMID: 32761107 PMCID: PMC7438667 DOI: 10.1167/jov.20.8.10] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Binocular fusion relies on matching points in the two eyes that correspond to the same physical feature in the world; however, not all world features are binocularly visible. Near depth edges, some regions of a scene are often visible to only one eye (so-called half occlusions). Accurate detection of these monocularly visible regions is likely to be important for stable visual perception. If monocular regions are not detected as such, the visual system may attempt to binocularly fuse non-corresponding points, which can result in unstable percepts. We investigated the hypothesis that the visual system capitalizes on statistical regularities associated with depth edges in natural scenes to aid binocular fusion and facilitate perceptual stability. By sampling from a large set of stereoscopic natural images with co-registered distance information, we found evidence that monocularly visible regions near depth edges primarily result from background occlusions. Accordingly, monocular regions tended to be more visually similar to the adjacent binocularly visible background region than to the adjacent binocularly visible foreground. Consistent with our hypothesis, perceptual experiments showed that perception tended to be more stable when the image properties of the depth edge were statistically more likely given the probability of occurrence in natural scenes (i.e., when monocular regions were more visually similar to the binocular background). The generality of these results was supported by a parametric study with simulated environments. Exploiting regularities in natural environments may allow the visual system to facilitate fusion and perceptual stability when both binocular and monocular regions are visible.
Collapse
|
8
|
Chauhan T, Héjja-Brichard Y, Cottereau BR. Modelling binocular disparity processing from statistics in natural scenes. Vision Res 2020; 176:27-39. [DOI: 10.1016/j.visres.2020.07.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Revised: 07/19/2020] [Accepted: 07/20/2020] [Indexed: 11/25/2022]
|
9
|
Kim S, Burge J. Natural scene statistics predict how humans pool information across space in surface tilt estimation. PLoS Comput Biol 2020; 16:e1007947. [PMID: 32579559 PMCID: PMC7340327 DOI: 10.1371/journal.pcbi.1007947] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Revised: 07/07/2020] [Accepted: 05/14/2020] [Indexed: 11/30/2022] Open
Abstract
Visual systems estimate the three-dimensional (3D) structure of scenes from information in two-dimensional (2D) retinal images. Visual systems use multiple sources of information to improve the accuracy of these estimates, including statistical knowledge of the probable spatial arrangements of natural scenes. Here, we examine how 3D surface tilts are spatially related in real-world scenes, and show that humans pool information across space when estimating surface tilt in accordance with these spatial relationships. We develop a hierarchical model of surface tilt estimation that is grounded in the statistics of tilt in natural scenes and images. The model computes a global tilt estimate by pooling local tilt estimates within an adaptive spatial neighborhood. The spatial neighborhood in which local estimates are pooled changes according to the value of the local estimate at a target location. The hierarchical model provides more accurate estimates of groundtruth tilt in natural scenes and provides a better account of human performance than the local estimates. Taken together, the results imply that the human visual system pools information about surface tilt across space in accordance with natural scene statistics. Visual systems estimate three-dimensional (3D) properties of scenes from two-dimensional images on the retinas. To solve this difficult problem as accurately as possible, visual systems use many available sources of information, including information about how the 3D properties of the world are spatially arranged. This manuscript reports a systematic analysis of 3D surface tilt in natural scenes, a model of surface tilt estimation that makes use of these scene statistics, and human psychophysical data on the estimation of surface tilt from natural images. The results show that the regularities present in the natural environment predict both how to maximize the accuracy of tilt estimation and how to maximize the prediction of human performance. This work contributes to a growing line of work that establishes links between rigorous measurements of natural scenes and the function of sensory and perceptual systems.
Collapse
Affiliation(s)
- Seha Kim
- Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- * E-mail:
| | - Johannes Burge
- Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Neuroscience Graduate Group, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Bioengineering Graduate Group, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
10
|
Abstract
An ideal observer is a theoretical model observer that performs a specific sensory-perceptual task optimally, making the best possible use of the available information given physical and biological constraints. An image-computable ideal observer (pixels in, estimates out) is a particularly powerful type of ideal observer that explicitly models the flow of visual information from the stimulus-encoding process to the eventual decoding of a sensory-perceptual estimate. Image-computable ideal observer analyses underlie some of the most important results in vision science. However, most of what we know from ideal observers about visual processing and performance derives from relatively simple tasks and relatively simple stimuli. This review describes recent efforts to develop image-computable ideal observers for a range of tasks with natural stimuli and shows how these observers can be used to predict and understand perceptual and neurophysiological performance. The reviewed results establish principled links among models of neural coding, computational methods for dimensionality reduction, and sensory-perceptual performance in tasks with natural stimuli.
Collapse
Affiliation(s)
- Johannes Burge
- Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; .,Neuroscience Graduate Group, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.,Bioengineering Graduate Group, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| |
Collapse
|
11
|
Abstract
To model the responses of neurons in the early visual system, at least three basic components are required: a receptive field, a normalization term, and a specification of encoding noise. Here, we examine how the receptive field, the normalization factor, and the encoding noise affect the drive to model-neuron responses when stimulated with natural images. We show that when these components are modeled appropriately, the response drives elicited by natural stimuli are Gaussian-distributed and scale invariant, and very nearly maximize the sensitivity (d') for natural-image discrimination. We discuss the statistical models of natural stimuli that can account for these response statistics, and we show how some commonly used modeling practices may distort these results. Finally, we show that normalization can equalize important properties of neural response across different stimulus types. Specifically, narrowband (stimulus- and feature-specific) normalization causes model neurons to yield Gaussian response-drive statistics when stimulated with natural stimuli, 1/f noise stimuli, and white-noise stimuli. The current work makes recommendations for best practices and lays a foundation, grounded in the response statistics to natural stimuli, upon which to build principled models of more complex visual tasks.
Collapse
Affiliation(s)
- Arvind Iyer
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
| | - Johannes Burge
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA.,Neuroscience Graduate Group, University of Pennsylvania, Philadelphia, PA, USA.,Bioengineering Graduate Group, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
12
|
Chin BM, Burge J. Predicting the Partition of Behavioral Variability in Speed Perception with Naturalistic Stimuli. J Neurosci 2020; 40:864-879. [PMID: 31772139 PMCID: PMC6975300 DOI: 10.1523/jneurosci.1904-19.2019] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Revised: 11/12/2019] [Accepted: 11/17/2019] [Indexed: 11/21/2022] Open
Abstract
A core goal of visual neuroscience is to predict human perceptual performance from natural signals. Performance in any natural task can be limited by at least three sources of uncertainty: stimulus variability, internal noise, and suboptimal computations. Determining the relative importance of these factors has been a focus of interest for decades but requires methods for predicting the fundamental limits imposed by stimulus variability on sensory-perceptual precision. Most successes have been limited to simple stimuli and simple tasks. But perception science ultimately aims to understand how vision works with natural stimuli. Successes in this domain have proven elusive. Here, we develop a model of humans based on an image-computable (images in, estimates out) Bayesian ideal observer. Given biological constraints, the ideal optimally uses the statistics relating local intensity patterns in moving images to speed, specifying the fundamental limits imposed by natural stimuli. Next, we propose a theoretical link between two key decision-theoretic quantities that suggests how to experimentally disentangle the impacts of internal noise and deterministic suboptimal computations. In several interlocking discrimination experiments with three male observers, we confirm this link and determine the quantitative impact of each candidate performance-limiting factor. Human performance is near-exclusively limited by natural stimulus variability and internal noise, and humans use near-optimal computations to estimate speed from naturalistic image movies. The findings indicate that the partition of behavioral variability can be predicted from a principled analysis of natural images and scenes. The approach should be extendable to studies of neural variability with natural signals.SIGNIFICANCE STATEMENT Accurate estimation of speed is critical for determining motion in the environment, but humans cannot perform this task without error. Different objects moving at the same speed cast different images on the eyes. This stimulus variability imposes fundamental external limits on the human ability to estimate speed. Predicting these limits has proven difficult. Here, by analyzing natural signals, we predict the quantitative impact of natural stimulus variability on human performance given biological constraints. With integrated experiments, we compare its impact to well-studied performance-limiting factors internal to the visual system. The results suggest that the deterministic computations humans perform are near optimal, and that behavioral responses to natural stimuli can be studied with the rigor and interpretability defining work with simpler stimuli.
Collapse
Affiliation(s)
| | - Johannes Burge
- Department of Psychology,
- Neuroscience Graduate Group, and
- Bioengineering Graduate Group, University of Pennsylvania, Philadelphia, Pennsylvania 19104
| |
Collapse
|
13
|
Optimized but Not Maximized Cue Integration for 3D Visual Perception. eNeuro 2020; 7:ENEURO.0411-19.2019. [PMID: 31836597 PMCID: PMC6948924 DOI: 10.1523/eneuro.0411-19.2019] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Revised: 12/05/2019] [Accepted: 12/08/2019] [Indexed: 02/02/2023] Open
Abstract
Reconstructing three-dimensional (3D) scenes from two-dimensional (2D) retinal images is an ill-posed problem. Despite this, 3D perception of the world based on 2D retinal images is seemingly accurate and precise. The integration of distinct visual cues is essential for robust 3D perception in humans, but it is unclear whether this is true for non-human primates (NHPs). Here, we assessed 3D perception in macaque monkeys using a planar surface orientation discrimination task. Perception was accurate across a wide range of spatial poses (orientations and distances), but precision was highly dependent on the plane's pose. The monkeys achieved robust 3D perception by dynamically reweighting the integration of stereoscopic and perspective cues according to their pose-dependent reliabilities. Errors in performance could be explained by a prior resembling the 3D orientation statistics of natural scenes. We used neural network simulations based on 3D orientation-selective neurons recorded from the same monkeys to assess how neural computation might constrain perception. The perceptual data were consistent with a model in which the responses of two independent neuronal populations representing stereoscopic cues and perspective cues (with perspective signals from the two eyes combined using nonlinear canonical computations) were optimally integrated through linear summation. Perception of combined-cue stimuli was optimal given this architecture. However, an alternative architecture in which stereoscopic cues, left eye perspective cues, and right eye perspective cues were represented by three independent populations yielded two times greater precision than the monkeys. This result suggests that, due to canonical computations, cue integration for 3D perception is optimized but not maximized.
Collapse
|
14
|
Abstract
Virtual reality (VR) is becoming an increasingly important way to investigate sensory processing. The converse is also true: in order to build good VR technologies, one needs an intimate understanding of how our brain processes sensory information. One of the key advantages of studying perception with VR is that it allows an experimenter to probe perceptual processing in a more naturalistic way than has been possible previously. In VR, one is able to actively explore and interact with the environment, just as one would do in real life. In this article, we review the history of VR displays, including the philosophical origins of VR, before discussing some key challenges involved in generating good VR and how a sense of presence in a virtual environment can be measured. We discuss the importance of multisensory VR and evaluate the experimental tension that exists between artifice and realism when investigating sensory processing.
Collapse
Affiliation(s)
- Peter Scarfe
- School of Psychology and Clinical Language Sciences, University of Reading, Reading RG6 7BE, United Kingdom; ,
| | - Andrew Glennerster
- School of Psychology and Clinical Language Sciences, University of Reading, Reading RG6 7BE, United Kingdom; ,
| |
Collapse
|
15
|
Singh V, Cottaris NP, Heasly BS, Brainard DH, Burge J. Computational luminance constancy from naturalistic images. J Vis 2018; 18:19. [PMID: 30593061 PMCID: PMC6314111 DOI: 10.1167/18.13.19] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
The human visual system supports stable percepts of object color even though the light that reflects from object surfaces varies significantly with the scene illumination. To understand the computations that support stable color perception, we study how estimating a target object's luminous reflectance factor (LRF; a measure of the light reflected from the object under a standard illuminant) depends on variation in key properties of naturalistic scenes. Specifically, we study how variation in target object reflectance, illumination spectra, and the reflectance of background objects in a scene impact estimation of a target object's LRF. To do this, we applied supervised statistical learning methods to the simulated excitations of human cone photoreceptors, obtained from labeled naturalistic images. The naturalistic images were rendered with computer graphics. The illumination spectra of the light sources and the reflectance spectra of the surfaces in the scene were generated using statistical models of natural spectral variation. Optimally decoding target object LRF from the responses of a small learned set of task-specific linear receptive fields that operate on a contrast representation of the cone excitations yields estimates that are within 13% of the correct LRF. Our work provides a framework for evaluating how different sources of scene variability limit performance on luminance constancy.
Collapse
Affiliation(s)
- Vijay Singh
- Computational Neuroscience Initiative, Department of Physics, University of Pennsylvania, Philadelphia, PA, USA
| | - Nicolas P Cottaris
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
| | - Benjamin S Heasly
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
| | - David H Brainard
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
| | - Johannes Burge
- Neuroscience Graduate Group, Bioengineering Graduate Group, Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
16
|
Iyer AV, Burge J. Depth variation and stereo processing tasks in natural scenes. J Vis 2018; 18:4. [PMID: 30029214 PMCID: PMC6005632 DOI: 10.1167/18.6.4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 03/30/2018] [Indexed: 01/02/2023] Open
Abstract
Local depth variation is a distinctive property of natural scenes, but its effects on perception have only recently begun to be investigated. Depth variation in natural scenes is due to depth edges between objects and surface nonuniformities within objects. Here, we demonstrate how natural depth variation impacts performance in two fundamental tasks related to stereopsis: half-occlusion detection and disparity detection. We report the results of a computational study that uses a large database of natural stereo-images and coregistered laser-based distance measurements. First, we develop a procedure for precisely sampling stereo-image patches from the stereo-images and then quantify the local depth variation in each patch by its disparity contrast. Next, we show that increased disparity contrast degrades half-occlusion detection and disparity detection performance and changes the size and shape of the spatial integration areas ("receptive fields") that optimize performance. Then, we show that a simple image-computable binocular statistic predicts disparity contrast in natural scenes. Finally, we report the most likely spatial patterns of disparity variation and disparity discontinuities (half-occlusions) in natural scenes. Our findings motivate computational and psychophysical investigations of the mechanisms that underlie stereo processing tasks in local regions of natural scenes.
Collapse
Affiliation(s)
- Arvind V Iyer
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
| | - Johannes Burge
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
- Neuroscience Graduate Group, University of Pennsylvania, Philadelphia, PA, USA
- Bioengineering Graduate Group, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
17
|
McCann BC, Hayhoe MM, Geisler WS. Contributions of monocular and binocular cues to distance discrimination in natural scenes. J Vis 2018; 18:12. [PMID: 29710302 PMCID: PMC5901372 DOI: 10.1167/18.4.12] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2017] [Accepted: 02/19/2018] [Indexed: 01/28/2023] Open
Abstract
Little is known about distance discrimination in real scenes, especially at long distances. This is not surprising given the logistical difficulties of making such measurements. To circumvent these difficulties, we collected 81 stereo images of outdoor scenes, together with precisely registered range images that provided the ground-truth distance at each pixel location. We then presented the stereo images in the correct viewing geometry and measured the ability of human subjects to discriminate the distance between locations in the scene, as a function of absolute distance (3 m to 30 m) and the angular spacing between the locations being compared (2°, 5°, and 10°). Measurements were made for binocular and monocular viewing. Thresholds for binocular viewing were quite small at all distances (Weber fractions less than 1% at 2° spacing and less than 4% at 10° spacing). Thresholds for monocular viewing were higher than those for binocular viewing out to distances of 15-20 m, beyond which they were the same. Using standard cue-combination analysis, we also estimated what the thresholds would be based on binocular-stereo cues alone. With two exceptions, we show that the entire pattern of results is consistent with what one would expect from classical studies of binocular disparity thresholds and separation/size discrimination thresholds measured with simple laboratory stimuli. The first exception is some deviation from the expected pattern at close distances (especially for monocular viewing). The second exception is that thresholds in natural scenes are lower, presumably because of the rich figural cues contained in natural images.
Collapse
Affiliation(s)
- Brian C McCann
- Texas Advanced Computing Center, Center for Perceptual Systems and Department of Psychology, University of Texas at Austin, Austin, TX, USA
| | - Mary M Hayhoe
- Center for Perceptual Systems and Department of Psychology, University of Texas at Austin, Austin, TX, USA
| | - Wilson S Geisler
- Center for Perceptual Systems and Department of Psychology, University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
18
|
Duan Y, Yakovleva A, Norcia AM. Determinants of neural responses to disparity in natural scenes. J Vis 2018; 18:21. [PMID: 29677337 PMCID: PMC6097643 DOI: 10.1167/18.3.21] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Accepted: 02/05/2018] [Indexed: 11/24/2022] Open
Abstract
We studied disparity-evoked responses in natural scenes using high-density electroencephalography (EEG) in an event-related design. Thirty natural scenes that mainly included outdoor settings with trees and buildings were used. Twenty-four subjects viewed a series of trials composed of sequential two-alternative temporal forced-choice presentation of two different versions (two-dimensional [2D] vs. three-dimensional [3D]) of the same scene interleaved by a scrambled image with the same power spectrum. Scenes were viewed orthostereoscopically at 3 m through a pair of shutter glasses. After each trial, participants indicated with a key press which version of the scene was 3D. Performance on the discrimination was >90%. Participants who were more accurate also tended to respond faster; scenes that were reported more accurately as 3D also led to faster reaction times. We compared visual evoked potentials elicited by scrambled, 2D, and 3D scenes using reliable component analysis to reduce dimensionality. The disparity-evoked response to natural scene stimuli, measured from the difference potential between 2D and 3D scenes, comprised a sustained relative negativity in the dominant response component. The magnitude of the disparity-specific response was correlated with the observer's stereoacuity. Scenes with more homogeneous depth maps also tended to elicit large disparity-specific responses. Finally, the magnitude of the disparity-specific response was correlated with the magnitude of the differential response between scrambled and 2D scenes, suggesting that monocular higher-order scene statistics modulate disparity-specific responses.
Collapse
Affiliation(s)
- Yiran Duan
- Department of Psychology, Stanford University, Stanford, CA, USA
| | | | - Anthony M Norcia
- Department of Psychology, Stanford University, Stanford, CA, USA
| |
Collapse
|
19
|
Kim S, Burge J. The lawful imprecision of human surface tilt estimation in natural scenes. eLife 2018; 7:31448. [PMID: 29384477 PMCID: PMC5844693 DOI: 10.7554/elife.31448] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Accepted: 01/29/2018] [Indexed: 01/03/2023] Open
Abstract
Estimating local surface orientation (slant and tilt) is fundamental to recovering the three-dimensional structure of the environment. It is unknown how well humans perform this task in natural scenes. Here, with a database of natural stereo-images having groundtruth surface orientation at each pixel, we find dramatic differences in human tilt estimation with natural and artificial stimuli. Estimates are precise and unbiased with artificial stimuli and imprecise and strongly biased with natural stimuli. An image-computable Bayes optimal model grounded in natural scene statistics predicts human bias, precision, and trial-by-trial errors without fitting parameters to the human data. The similarities between human and model performance suggest that the complex human performance patterns with natural stimuli are lawful, and that human visual systems have internalized local image and scene statistics to optimally infer the three-dimensional structure of the environment. These results generalize our understanding of vision from the lab to the real world.
Collapse
Affiliation(s)
- Seha Kim
- Department of Psychology, University of Pennsylvania, Philadelphia, United States
| | - Johannes Burge
- Department of Psychology, University of Pennsylvania, Philadelphia, United States
| |
Collapse
|
20
|
Jaini P, Burge J. Linking normative models of natural tasks to descriptive models of neural response. J Vis 2017; 17:16. [PMID: 29071353 PMCID: PMC6097587 DOI: 10.1167/17.12.16] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Understanding how nervous systems exploit task-relevant properties of sensory stimuli to perform natural tasks is fundamental to the study of perceptual systems. However, there are few formal methods for determining which stimulus properties are most useful for a given natural task. As a consequence, it is difficult to develop principled models for how to compute task-relevant latent variables from natural signals, and it is difficult to evaluate descriptive models fit to neural response. Accuracy maximization analysis (AMA) is a recently developed Bayesian method for finding the optimal task-specific filters (receptive fields). Here, we introduce AMA–Gauss, a new faster form of AMA that incorporates the assumption that the class-conditional filter responses are Gaussian distributed. Then, we use AMA–Gauss to show that its assumptions are justified for two fundamental visual tasks: retinal speed estimation and binocular disparity estimation. Next, we show that AMA–Gauss has striking formal similarities to popular quadratic models of neural response: the energy model and the generalized quadratic model (GQM). Together, these developments deepen our understanding of why the energy model of neural response have proven useful, improve our ability to evaluate results from subunit model fits to neural data, and should help accelerate psychophysics and neuroscience research with natural stimuli.
Collapse
Affiliation(s)
- Priyank Jaini
- Cheriton School of Computer Science, Waterloo, Ontario, Canada.,Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
| | - Johannes Burge
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA.,Neuroscience Graduate Group, University of Pennsylvania, Philadelphia, PA, USA.,Bioengineering Graduate Group, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
21
|
Accuracy Maximization Analysis for Sensory-Perceptual Tasks: Computational Improvements, Filter Robustness, and Coding Advantages for Scaled Additive Noise. PLoS Comput Biol 2017; 13:e1005281. [PMID: 28178266 PMCID: PMC5298250 DOI: 10.1371/journal.pcbi.1005281] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2016] [Accepted: 12/04/2016] [Indexed: 11/19/2022] Open
Abstract
Accuracy Maximization Analysis (AMA) is a recently developed Bayesian ideal observer method for task-specific dimensionality reduction. Given a training set of proximal stimuli (e.g. retinal images), a response noise model, and a cost function, AMA returns the filters (i.e. receptive fields) that extract the most useful stimulus features for estimating a user-specified latent variable from those stimuli. Here, we first contribute two technical advances that significantly reduce AMA’s compute time: we derive gradients of cost functions for which two popular estimators are appropriate, and we implement a stochastic gradient descent (AMA-SGD) routine for filter learning. Next, we show how the method can be used to simultaneously probe the impact on neural encoding of natural stimulus variability, the prior over the latent variable, noise power, and the choice of cost function. Then, we examine the geometry of AMA’s unique combination of properties that distinguish it from better-known statistical methods. Using binocular disparity estimation as a concrete test case, we develop insights that have general implications for understanding neural encoding and decoding in a broad class of fundamental sensory-perceptual tasks connected to the energy model. Specifically, we find that non-orthogonal (partially redundant) filters with scaled additive noise tend to outperform orthogonal filters with constant additive noise; non-orthogonal filters and scaled additive noise can interact to sculpt noise-induced stimulus encoding uncertainty to match task-irrelevant stimulus variability. Thus, we show that some properties of neural response thought to be biophysical nuisances can confer coding advantages to neural systems. Finally, we speculate that, if repurposed for the problem of neural systems identification, AMA may be able to overcome a fundamental limitation of standard subunit model estimation. As natural stimuli become more widely used in the study of psychophysical and neurophysiological performance, we expect that task-specific methods for feature learning like AMA will become increasingly important. In psychophysics and neurophysiology, the stimulus features that are manipulated in experiments are often selected based on intuition, trial-and-error, and historical precedence. Accuracy Maximization Analysis (AMA) is a Bayesian ideal observer method for determining the task-relevant features (i.e. filters) from natural stimuli that nervous systems should select for. In other words, AMA is a method for finding optimal receptive fields for specific tasks. Early results suggest that this method has the potential to be of fundamental importance to neuroscience and perception science. First, we develop AMA-SGD, a new version of AMA that significantly reduces filter-learning time, and use it to learn optimal filters for the classic task of binocular disparity estimation. Then, we find that measureable, task-relevant properties of natural stimuli are the most important determinants of the optimal filters; changes to the prior, cost function, and internal noise have little effect on the filters. Last, we demonstrate that some ubiquitous properties of neural systems, generally thought to be biophysical nuisances, can actually improve the fidelity of neural codes. In particular, we show for the first time that scaled additive noise and redundant (non-orthogonal) filters can interact to sculpt uncertainty due to internal noise to match task-irrelevant natural stimulus variability.
Collapse
|