1
|
Kemp JT, Cesanek E, Domini F. Perceiving depth from texture and disparity cues: Evidence for a non-probabilistic account of cue integration. J Vis 2023; 23:13. [PMID: 37486299 PMCID: PMC10382782 DOI: 10.1167/jov.23.7.13] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 06/12/2023] [Indexed: 07/25/2023] Open
Abstract
Bayesian inference theories have been extensively used to model how the brain derives three-dimensional (3D) information from ambiguous visual input. In particular, the maximum likelihood estimation (MLE) model combines estimates from multiple depth cues according to their relative reliability to produce the most probable 3D interpretation. Here, we tested an alternative theory of cue integration, termed the intrinsic constraint (IC) theory, which postulates that the visual system derives the most stable, not most probable, interpretation of the visual input amid variations in viewing conditions. The vector sum model provides a normative approach for achieving this goal where individual cue estimates are components of a multidimensional vector whose norm determines the combined estimate. Individual cue estimates are not accurate but related to distal 3D properties through a deterministic mapping. In three experiments, we show that the IC theory can more adeptly account for 3D cue integration than MLE models. In Experiment 1, we show systematic biases in the perception of depth from texture and depth from binocular disparity. Critically, we demonstrate that the vector sum model predicts an increase in perceived depth when these cues are combined. In Experiment 2, we illustrate the IC theory radical reinterpretation of the just noticeable difference (JND) and test the related vector sum model prediction of the classic finding of smaller JNDs for combined-cue versus single-cue stimuli. In Experiment 3, we confirm the vector sum prediction that biases found in cue integration experiments cannot be attributed to flatness cues, as the MLE model predicts.
Collapse
Affiliation(s)
- Jovan T Kemp
- Department of Cognitive, Linguistic, and Psychological Sciences, Brown University, Providence, RI, USA
| | - Evan Cesanek
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Fulvio Domini
- Department of Cognitive, Linguistic, and Psychological Sciences, Brown University, Providence, RI, USA
- Italian Institute of Technology, Rovereto, Italy
| |
Collapse
|
2
|
Domini F. The case against probabilistic inference: a new deterministic theory of 3D visual processing. Philos Trans R Soc Lond B Biol Sci 2023; 378:20210458. [PMID: 36511407 PMCID: PMC9745883 DOI: 10.1098/rstb.2021.0458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 10/03/2022] [Indexed: 12/15/2022] Open
Abstract
How the brain derives 3D information from inherently ambiguous visual input remains the fundamental question of human vision. The past two decades of research have addressed this question as a problem of probabilistic inference, the dominant model being maximum-likelihood estimation (MLE). This model assumes that independent depth-cue modules derive noisy but statistically accurate estimates of 3D scene parameters that are combined through a weighted average. Cue weights are adjusted based on the system representation of each module's output variability. Here I demonstrate that the MLE model fails to account for important psychophysical findings and, importantly, misinterprets the just noticeable difference, a hallmark measure of stimulus discriminability, to be an estimate of perceptual uncertainty. I propose a new theory, termed Intrinsic Constraint, which postulates that the visual system does not derive the most probable interpretation of the visual input, but rather, the most stable interpretation amid variations in viewing conditions. This goal is achieved with the Vector Sum model, which represents individual cue estimates as components of a multi-dimensional vector whose norm determines the combined output. This model accounts for the psychophysical findings cited in support of MLE, while predicting existing and new findings that contradict the MLE model. This article is part of a discussion meeting issue 'New approaches to 3D vision'.
Collapse
Affiliation(s)
- Fulvio Domini
- CLPS, Brown University, 190 Thayer Street Providence, Rhode Island 02912-9067, USA
| |
Collapse
|
3
|
Campagnoli C, Hung B, Domini F. Explicit and implicit depth-cue integration: Evidence of systematic biases with real objects. Vision Res 2021; 190:107961. [PMID: 34757304 DOI: 10.1016/j.visres.2021.107961] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 09/28/2021] [Accepted: 10/03/2021] [Indexed: 11/27/2022]
Abstract
In previous studies using VR, we found evidence that 3D shape estimation agrees to a superadditivity rule of depth-cue combination, by which adding depth cues leads to greater perceived depth and, in principle, to depth overestimation. Superadditivity can be quantitatively accounted for by a normative theory of cue integration, via adapting a model termed Intrinsic Constraint (IC). As for its qualitative nature, it remains unclear whether superadditivity represents the genuine readout of depth-cue integration, as predicted by IC, or alternatively a byproduct of artificial virtual displays, because they carry flatness cues that can bias depth estimates in a Bayesian fashion, or even just a way for observers to express that a scene "looks deeper" with more depth cues by explicitly inflating their depth judgments. In the present study, we addressed this question by testing whether the IC model's prediction of superadditivity generalizes to real world settings. We asked participants to judge the perceived 3D shape of cardboard prisms through a matching task. To control for the potential interference of explicit reasoning, we also asked participants to reach-to-grasp the same objects and we analyzed the in-flight grip size throughout the reaching. We designed a novel technique to carefully control binocular and monocular 3D cues independently, allowing to add or remove depth information seamlessly. Even with real objects, participants exhibited a clear superadditivity effect in both tasks. Furthermore, the magnitude of this effect was accurately predicted by the IC model. These results confirm that superadditivity is an inherent feature of depth estimation.
Collapse
Affiliation(s)
- Carlo Campagnoli
- School of Psychology, University of Leeds, Leeds, UK; Department of Psychology, Princeton University, Princeton, NJ, USA.
| | - Bethany Hung
- The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Fulvio Domini
- Department of Cognitive, Linguistic and Psychological Science, Brown University, Providence, RI, USA
| |
Collapse
|
4
|
The Z-Box illusion: dominance of motion perception among multiple 3D objects. PSYCHOLOGICAL RESEARCH 2021; 86:1683-1697. [PMID: 34480245 DOI: 10.1007/s00426-021-01589-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Accepted: 08/27/2021] [Indexed: 10/20/2022]
Abstract
In the present article, we examine a novel illusion of motion-the Z-Box illusion-in which the presence of a bounding object influences the perception of motion of an ambiguous stimulus that appears within. Specifically, the stimuli are a structure-from-motion (SFM) particle orb and a wireframe cube. The orb could be perceived as rotating clockwise or counterclockwise while the cube could only be perceived as moving in one direction. Both stimuli were presented on a two-dimensional (2D) display with inferred three-dimensional (3D) properties. In a single experiment, we examine motion perception of a particle orb, both in isolation and when it appears within a rotating cube. Participants indicated the orb's direction of motion and whether the direction changed at any point during the trial. Accuracy was the critical measure while motion direction, the number of particles in the orb and presence of the wireframe cube were all manipulated. The results suggest that participants could perceive the orb's true rotation in the absence of the cube so long as it was made up of at least ten particles. The presence of the cube dominated perception as participants consistently perceived congruent motion of the orb and cube, even when they moved in objectively different directions. These findings are considered as they relate to prior research on motion perception, computational modelling of motion perception, structure from motion and 3D object perception.
Collapse
|
5
|
Cesanek E, Taylor JA, Domini F. Persistent grasping errors produce depth cue reweighting in perception. Vision Res 2020; 178:1-11. [PMID: 33070029 DOI: 10.1016/j.visres.2020.09.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 09/08/2020] [Accepted: 09/17/2020] [Indexed: 11/18/2022]
Abstract
When a grasped object is larger or smaller than expected, haptic feedback automatically recalibrates motor planning. Intriguingly, haptic feedback can also affect 3D shape perception through a process called depth cue reweighting. Although signatures of cue reweighting also appear in motor behavior, it is unclear whether this motor reweighting is the result of upstream perceptual reweighting, or a separate process. We propose that perceptual reweighting is directly related to motor control; in particular, that it is caused by persistent, systematic movement errors that cannot be resolved by motor recalibration alone. In Experiment 1, we inversely varied texture and stereo cues to create a set of depth-metamer objects: when texture specified a deep object, stereo specified a shallow object, and vice versa, such that all objects appeared equally deep. The stereo-texture pairings that produced this perceptual metamerism were determined for each participant in a matching task (Pre-test). Next, participants repeatedly grasped these depth metamers, receiving haptic feedback that was positively correlated with one cue and negatively correlated with the other, resulting in persistent movement errors. Finally, participants repeated the perceptual matching task (Post-test). In the condition where haptic feedback reinforced the texture cue, perceptual changes were correlated with changes in grasping performance across individuals, demonstrating a link between perceptual reweighting and improved motor control. Experiment 2 showed that cue reweighting does not occur when movement errors are rapidly corrected by standard motor adaptation. These findings suggest a mutual dependency between perception and action, with perception directly guiding action, and actions producing error signals that drive motor and perceptual learning.
Collapse
Affiliation(s)
- Evan Cesanek
- Department of Cognitive, Linguistic, & Psychological Sciences, Brown University, Providence, RI, United States.
| | - Jordan A Taylor
- Department of Psychology, Princeton University, Princeton, NJ, United States
| | - Fulvio Domini
- Department of Cognitive, Linguistic, & Psychological Sciences, Brown University, Providence, RI, United States
| |
Collapse
|
6
|
Nawrot M, Ratzlaff M, Leonard Z, Stroyan K. Modeling depth from motion parallax with the motion/pursuit ratio. Front Psychol 2014; 5:1103. [PMID: 25339926 PMCID: PMC4186274 DOI: 10.3389/fpsyg.2014.01103] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2014] [Accepted: 09/11/2014] [Indexed: 11/13/2022] Open
Abstract
The perception of unambiguous scaled depth from motion parallax relies on both retinal image motion and an extra-retinal pursuit eye movement signal. The motion/pursuit ratio represents a dynamic geometric model linking these two proximal cues to the ratio of depth to viewing distance. An important step in understanding the visual mechanisms serving the perception of depth from motion parallax is to determine the relationship between these stimulus parameters and empirically determined perceived depth magnitude. Observers compared perceived depth magnitude of dynamic motion parallax stimuli to static binocular disparity comparison stimuli at three different viewing distances, in both head-moving and head-stationary conditions. A stereo-viewing system provided ocular separation for stereo stimuli and monocular viewing of parallax stimuli. For each motion parallax stimulus, a point of subjective equality (PSE) was estimated for the amount of binocular disparity that generates the equivalent magnitude of perceived depth from motion parallax. Similar to previous results, perceived depth from motion parallax had significant foreshortening. Head-moving conditions produced even greater foreshortening due to the differences in the compensatory eye movement signal. An empirical version of the motion/pursuit law, termed the empirical motion/pursuit ratio, which models perceived depth magnitude from these stimulus parameters, is proposed.
Collapse
Affiliation(s)
- Mark Nawrot
- Department of Psychology, Center for Visual and Cognitive Neuroscience, North Dakota State University Fargo, ND, USA
| | - Michael Ratzlaff
- Department of Psychology, Center for Visual and Cognitive Neuroscience, North Dakota State University Fargo, ND, USA
| | - Zachary Leonard
- Department of Psychology, Center for Visual and Cognitive Neuroscience, North Dakota State University Fargo, ND, USA
| | - Keith Stroyan
- Math Department, University of Iowa Iowa City, IA, USA
| |
Collapse
|
7
|
Abstract
How do retinal images lead to perceived environmental objects? Vision involves a series of spatial and material transformations--from environmental objects to retinal images, to neurophysiological patterns, and finally to perceptual experience and action. A rationale for understanding functional relations among these physically different systems occurred to Gustav Fechner: Differences in sensation correspond to differences in physical stimulation. The concept of information is similar: Relationships in one system may correspond to, and thus represent, those in another. Criteria for identifying and evaluating information include (a) resolution, or the precision of correspondence; (b) uncertainty about which input (output) produced a given output (input); and (c) invariance, or the preservation of correspondence under transformations of input and output. We apply this framework to psychophysical evidence to identify visual information for perceiving surfaces. The elementary spatial structure shared by objects and images is the second-order differential structure of local surface shape. Experiments have shown that human vision is directly sensitive to this higher-order spatial information from interimage disparities (stereopsis and motion parallax), boundary contours, texture, shading, and combined variables. Psychophysical evidence contradicts other common ideas about retinal information for spatial vision and object perception.
Collapse
|
8
|
Dokka K, MacNeilage PR, DeAngelis GC, Angelaki DE. Estimating distance during self-motion: a role for visual-vestibular interactions. J Vis 2011; 11:11.13.2. [PMID: 22045777 DOI: 10.1167/11.13.2] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
A fundamental challenge for the visual system is to extract the 3D spatial structure of the environment. When an observer translates without moving the eyes, the retinal speed of a stationary object is related to its distance by a scale factor that depends on the velocity of the observer's self-motion. Here, we aim to test whether the brain uses vestibular cues to self-motion to estimate distance to stationary surfaces in the environment. This relationship was systematically probed using a two-alternative forced-choice task in which distance perceived from monocular image motion during passive body translation was compared to distance perceived from binocular disparity while subjects were stationary. We show that perceived distance from motion depended on both observer velocity and retinal speed. For a given head speed, slower retinal speeds led to the perception of farther distances. Likewise, for a given retinal speed, slower head speeds led to the perception of nearer distances. However, these relationships were weak in some subjects and absent in others, and distance estimated from self-motion and retinal image motion was substantially compressed relative to distance estimated from binocular disparity. Overall, our findings suggest that the combination of retinal image motion and vestibular signals related to head velocity can provide a rudimentary capacity for distance estimation.
Collapse
Affiliation(s)
- Kalpana Dokka
- Department of Anatomy and Neurobiology, Washington University in St. Louis, USA
| | | | | | | |
Collapse
|
9
|
Domini F, Shah R, Caudek C. Do we perceive a flattened world on the monitor screen? Acta Psychol (Amst) 2011; 138:359-66. [PMID: 21986481 DOI: 10.1016/j.actpsy.2011.07.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2011] [Revised: 07/27/2011] [Accepted: 07/29/2011] [Indexed: 11/28/2022] Open
Abstract
The current model of three-dimensional perception hypothesizes that the brain integrates the depth cues in a statistically optimal fashion through a weighted linear combination with weights proportional to the reliabilities obtained for each cue in isolation (Landy, Maloney, Johnston, & Young, 1995). Even though many investigations support such theoretical framework, some recent empirical findings are at odds with this view (e.g., Domini, Caudek, & Tassinari, 2006). Failures of linear cue integration have been attributed to cue-conflict and to unmodelled cues to flatness present in computer-generated displays. We describe two cue-combination experiments designed to test the integration of stereo and motion cues, in the presence of consistent or conflicting blur and accommodation information (i.e., when flatness cues are either absent, with physical stimuli, or present, with computer-generated displays). In both conditions, we replicated the results of Domini et al. (2006): The amount of perceived depth increased as more cues were available, also producing an over-estimation of depth in some conditions. These results can be explained by the Intrinsic Constraint model, but not by linear cue combination.
Collapse
Affiliation(s)
- Fulvio Domini
- Department of Cognitive, Linguistic & Psychological Sciences, Brown University, Providence, RI 02912, USA.
| | | | | |
Collapse
|
10
|
Integration of disparity and velocity information for haptic and perceptual judgments of object depth. Acta Psychol (Amst) 2011; 136:300-10. [PMID: 21237442 DOI: 10.1016/j.actpsy.2010.12.003] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2010] [Revised: 12/09/2010] [Accepted: 12/10/2010] [Indexed: 11/23/2022] Open
Abstract
Do reach-to-grasp (prehension) movements require a metric representation of three-dimensional (3D) layouts and objects? We propose a model relying only on direct sensory information to account for the planning and execution of prehension movements in the absence of haptic feedback and when the hand is not visible. In the present investigation, we isolate relative motion and binocular disparity information from other depth cues and we study their efficacy for reach-to-grasp movements and visual judgments. We show that (i) the amplitude of the grasp increases when relative motion is added to binocular disparity information, even if depth from disparity information is already veridical, and (ii) similar distortions of derived depth are found for haptic tasks and perceptual judgments. With a quantitative test, we demonstrate that our results are consistent with the Intrinsic Constraint model and do not require 3D metric inferences (Domini, Caudek, & Tassinari, 2006). By contrast, the linear cue integration model (Landy, Maloney, Johnston, & Young, 1995) cannot explain the present results, even if the flatness cues are taken into account.
Collapse
|
11
|
Di Luca M, Domini F, Caudek C. Inconsistency of perceived 3D shape. Vision Res 2010; 50:1519-31. [PMID: 20470815 DOI: 10.1016/j.visres.2010.05.006] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2010] [Revised: 05/05/2010] [Accepted: 05/05/2010] [Indexed: 11/16/2022]
Abstract
Internal consistency of local depth, slant, and curvature judgments was studied by asking participants to match two 3D surfaces rendered by different mixtures of 3D cues (velocity, texture, and shading). We found that perceptual judgments were not consistent with each other, with cue-specific distortions. Adding multiple cues did not eliminate the inconsistencies of the judgments. These results can be predicted by the Intrinsic Constraint (IC) model according to which the perceptual metric local estimates are a monotonically increasing function of the Signal-to-Noise Ratio of the optimal combination of direct information of 3D shape (Domini, Caudek, & Tassinari, 2006).
Collapse
Affiliation(s)
- M Di Luca
- Max Planck Institute for Biological Cybernetics, Tuebingen, Germany
| | | | | |
Collapse
|