1
|
Kim C, Chong SC. Metacognition of perceptual resolution across and around the visual field. Cognition 2024; 253:105938. [PMID: 39232476 DOI: 10.1016/j.cognition.2024.105938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Revised: 06/21/2024] [Accepted: 08/27/2024] [Indexed: 09/06/2024]
Abstract
Do people have accurate metacognition of non-uniformities in perceptual resolution across (i.e., eccentricity) and around (i.e., polar angle) the visual field? Despite its theoretical and practical importance, this question has not yet been empirically tested. This study investigated metacognition of perceptual resolution by guessing patterns during a degradation (i.e., loss of high spatial frequencies) localization task. Participants localized the degraded face among the nine faces that simultaneously appeared throughout the visual field: fovea (fixation at the center of the screen), parafovea (left, right, above, and below fixation at 4° eccentricity), and periphery (left, right, above, and below fixation at 10° eccentricity). We presumed that if participants had accurate metacognition, in the absence of a degraded face, they would exhibit compensatory guessing patterns based on counterfactual reasoning ("The degraded face must have been presented at locations with lower perceptual resolution, because if it were presented at locations with higher perceptual resolution, I would have easily detected it."), meaning that we would expect more guess responses for locations with lower perceptual resolution. In two experiments, we observed guessing patterns that suggest that people can monitor non-uniformities in perceptual resolution across, but not around, the visual field during tasks, indicating partial in-the-moment metacognition. Additionally, we found that global explicit knowledge of perceptual resolution is not sufficient to guide in-the-moment metacognition during tasks, which suggests a dissociation between local and global metacognition.
Collapse
Affiliation(s)
- Cheongil Kim
- Graduate Program in Cognitive Science, Yonsei University, South Korea
| | - Sang Chul Chong
- Graduate Program in Cognitive Science, Yonsei University, South Korea; Department of Psychology, Yonsei University, South Korea.
| |
Collapse
|
2
|
Kim T, Pasupathy A. Neural Correlates of Crowding in Macaque Area V4. J Neurosci 2024; 44:e2260232024. [PMID: 38670806 PMCID: PMC11170949 DOI: 10.1523/jneurosci.2260-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 03/29/2024] [Accepted: 04/17/2024] [Indexed: 04/28/2024] Open
Abstract
Visual crowding refers to the phenomenon where a target object that is easily identifiable in isolation becomes difficult to recognize when surrounded by other stimuli (distractors). Many psychophysical studies have investigated this phenomenon and proposed alternative models for the underlying mechanisms. One prominent hypothesis, albeit with mixed psychophysical support, posits that crowding arises from the loss of information due to pooled encoding of features from target and distractor stimuli in the early stages of cortical visual processing. However, neurophysiological studies have not rigorously tested this hypothesis. We studied the responses of single neurons in macaque (one male, one female) area V4, an intermediate stage of the object-processing pathway, to parametrically designed crowded displays and texture statistics-matched metameric counterparts. Our investigations reveal striking parallels between how crowding parameters-number, distance, and position of distractors-influence human psychophysical performance and V4 shape selectivity. Importantly, we also found that enhancing the salience of a target stimulus could alleviate crowding effects in highly cluttered scenes, and this could be temporally protracted reflecting a dynamical process. Thus, a pooled encoding of nearby stimuli cannot explain the observed responses, and we propose an alternative model where V4 neurons preferentially encode salient stimuli in crowded displays. Overall, we conclude that the magnitude of crowding effects is determined not just by the number of distractors and target-distractor separation but also by the relative salience of targets versus distractors based on their feature attributes-the similarity of distractors and the contrast between target and distractor stimuli.
Collapse
Affiliation(s)
- Taekjun Kim
- Department of Biological Structure, University of Washington, Seattle, Washington 98195
- Washington National Primate Research Center, University of Washington, Seattle, Washington 98195
| | - Anitha Pasupathy
- Department of Biological Structure, University of Washington, Seattle, Washington 98195
- Washington National Primate Research Center, University of Washington, Seattle, Washington 98195
| |
Collapse
|
3
|
Kim T, Pasupathy A. Neural correlates of crowding in macaque area V4. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.16.562617. [PMID: 37905025 PMCID: PMC10614871 DOI: 10.1101/2023.10.16.562617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Visual crowding refers to the phenomenon where a target object that is easily identifiable in isolation becomes difficult to recognize when surrounded by other stimuli (distractors). Extensive psychophysical studies support two alternative possibilities for the underlying mechanisms. One hypothesis suggests that crowding results from the loss of visual information due to pooled encoding of multiple nearby stimuli in the mid-level processing stages along the ventral visual pathway. Alternatively, crowding may arise from limited resolution in decoding object information during recognition and the encoded information may remain inaccessible unless it is salient. To rigorously test these alternatives, we studied the responses of single neurons in macaque area V4, an intermediate stage of the ventral, object-processing pathway, to parametrically designed crowded displays and their texture-statistics matched metameric counterparts. Our investigations reveal striking parallels between how crowding parameters, e.g., number, distance, and position of distractors, influence human psychophysical performance and V4 shape selectivity. Importantly, we found that enhancing the salience of a target stimulus could reverse crowding effects even in highly cluttered scenes and such reversals could be protracted reflecting a dynamical process. Overall, we conclude that a pooled encoding of nearby stimuli cannot explain the observed responses and we propose an alternative model where V4 neurons preferentially encode salient stimuli in crowded displays.
Collapse
Affiliation(s)
- Taekjun Kim
- Department of Biological Structure, University of Washington, Seattle, WA 98195
- Washington National Primate Research Center, University of Washington, Seattle, WA 98195
| | - Anitha Pasupathy
- Department of Biological Structure, University of Washington, Seattle, WA 98195
- Washington National Primate Research Center, University of Washington, Seattle, WA 98195
| |
Collapse
|
4
|
Vacher J, Launay C, Mamassian P, Coen-Cagli R. Measuring uncertainty in human visual segmentation. ARXIV 2023:arXiv:2301.07807v3. [PMID: 36824425 PMCID: PMC9949179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
Segmenting visual stimuli into distinct groups of features and visual objects is central to visual function. Classical psychophysical methods have helped uncover many rules of human perceptual segmentation, and recent progress in machine learning has produced successful algorithms. Yet, the computational logic of human segmentation remains unclear, partially because we lack well-controlled paradigms to measure perceptual segmentation maps and compare models quantitatively. Here we propose a new, integrated approach: given an image, we measure multiple pixel-based same-different judgments and perform model-based reconstruction of the underlying segmentation map. The reconstruction is robust to several experimental manipulations and captures the variability of individual participants. We demonstrate the validity of the approach on human segmentation of natural images and composite textures. We show that image uncertainty affects measured human variability, and it influences how participants weigh different visual features. Because any putative segmentation algorithm can be inserted to perform the reconstruction, our paradigm affords quantitative tests of theories of perception as well as new benchmarks for segmentation algorithms.
Collapse
Affiliation(s)
- Jonathan Vacher
- Laboratoire des systèmes perceptifs, Département d’études cognitives, École normale supérieure, PSL University, CNRS, Paris, France
| | - Claire Launay
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Pascal Mamassian
- Laboratoire des systèmes perceptifs, Département d’études cognitives, École normale supérieure, PSL University, CNRS, Paris, France
| | - Ruben Coen-Cagli
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, NY, USA
- Dominick P. Purpura Department of Neuroscience, Albert Einstein College of Medicine, Bronx, NY, USA
- Department. of Ophthalmology and Visual Sciences, Albert Einstein College of Medicine, Bronx, NY, USA
| |
Collapse
|
5
|
Leadholm N, Stringer S. Hierarchical binding in convolutional neural networks: Making adversarial attacks geometrically challenging. Neural Netw 2022; 155:258-286. [DOI: 10.1016/j.neunet.2022.07.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Revised: 03/31/2022] [Accepted: 07/06/2022] [Indexed: 01/02/2023]
|
6
|
Abstract
Humans are exquisitely sensitive to the spatial arrangement of visual features in objects and scenes, but not in visual textures. Category-selective regions in the visual cortex are widely believed to underlie object perception, suggesting such regions should distinguish natural images of objects from synthesized images containing similar visual features in scrambled arrangements. Contrarily, we demonstrate that representations in category-selective cortex do not discriminate natural images from feature-matched scrambles but can discriminate images of different categories, suggesting a texture-like encoding. We find similar insensitivity to feature arrangement in Imagenet-trained deep convolutional neural networks. This suggests the need to reconceptualize the role of category-selective cortex as representing a basis set of complex texture-like features, useful for a myriad of behaviors. The human visual ability to recognize objects and scenes is widely thought to rely on representations in category-selective regions of the visual cortex. These representations could support object vision by specifically representing objects, or, more simply, by representing complex visual features regardless of the particular spatial arrangement needed to constitute real-world objects, that is, by representing visual textures. To discriminate between these hypotheses, we leveraged an image synthesis approach that, unlike previous methods, provides independent control over the complexity and spatial arrangement of visual features. We found that human observers could easily detect a natural object among synthetic images with similar complex features that were spatially scrambled. However, observer models built from BOLD responses from category-selective regions, as well as a model of macaque inferotemporal cortex and Imagenet-trained deep convolutional neural networks, were all unable to identify the real object. This inability was not due to a lack of signal to noise, as all observer models could predict human performance in image categorization tasks. How then might these texture-like representations in category-selective regions support object perception? An image-specific readout from category-selective cortex yielded a representation that was more selective for natural feature arrangement, showing that the information necessary for natural object discrimination is available. Thus, our results suggest that the role of the human category-selective visual cortex is not to explicitly encode objects but rather to provide a basis set of texture-like features that can be infinitely reconfigured to flexibly learn and identify new object categories.
Collapse
|
7
|
Rideaux R, West RK, Wallis TSA, Bex PJ, Mattingley JB, Harrison WJ. Spatial structure, phase, and the contrast of natural images. J Vis 2022; 22:4. [PMID: 35006237 PMCID: PMC8762697 DOI: 10.1167/jov.22.1.4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 11/25/2021] [Indexed: 11/24/2022] Open
Abstract
The sensitivity of the human visual system is thought to be shaped by environmental statistics. A major endeavor in vision science, therefore, is to uncover the image statistics that predict perceptual and cognitive function. When searching for targets in natural images, for example, it has recently been proposed that target detection is inversely related to the spatial similarity of the target to its local background. We tested this hypothesis by measuring observers' sensitivity to targets that were blended with natural image backgrounds. Targets were designed to have a spatial structure that was either similar or dissimilar to the background. Contrary to masking from similarity, we found that observers were most sensitive to targets that were most similar to their backgrounds. We hypothesized that a coincidence of phase alignment between target and background results in a local contrast signal that facilitates detection when target-background similarity is high. We confirmed this prediction in a second experiment. Indeed, we show that, by solely manipulating the phase of a target relative to its background, the target can be rendered easily visible or undetectable. Our study thus reveals that, in addition to its structural similarity, the phase of the target relative to the background must be considered when predicting detection sensitivity in natural images.
Collapse
Affiliation(s)
- Reuben Rideaux
- Queensland Brain Institute, University of Queensland, St. Lucia, Queensland, Australia
| | - Rebecca K West
- School of Psychology, University of Queensland, St. Lucia, Queensland, Australia
| | - Thomas S A Wallis
- Institut für Psychologie & Centre for Cognitive Science, Technische Universität Darmstadt, Darmstadt, Germany
| | - Peter J Bex
- Department of Psychology, Northeastern University, Boston, MA, USA
| | - Jason B Mattingley
- Queensland Brain Institute, University of Queensland, St. Lucia, Queensland, Australia
- School of Psychology, University of Queensland, St. Lucia, Queensland, Australia
| | - William J Harrison
- Queensland Brain Institute, University of Queensland, St. Lucia, Queensland, Australia
- School of Psychology, University of Queensland, St. Lucia, Queensland, Australia
| |
Collapse
|
8
|
Bornet A, Choung OH, Doerig A, Whitney D, Herzog MH, Manassi M. Global and high-level effects in crowding cannot be predicted by either high-dimensional pooling or target cueing. J Vis 2021; 21:10. [PMID: 34812839 PMCID: PMC8626847 DOI: 10.1167/jov.21.12.10] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 09/30/2021] [Indexed: 11/24/2022] Open
Abstract
In visual crowding, the perception of a target deteriorates in the presence of nearby flankers. Traditionally, target-flanker interactions have been considered as local, mostly deleterious, low-level, and feature specific, occurring when information is pooled along the visual processing hierarchy. Recently, a vast literature of high-level effects in crowding (grouping effects and face-holistic crowding in particular) led to a different understanding of crowding, as a global, complex, and multilevel phenomenon that cannot be captured or explained by simple pooling models. It was recently argued that these high-level effects may still be captured by more sophisticated pooling models, such as the Texture Tiling model (TTM). Unlike simple pooling models, the high-dimensional pooling stage of the TTM preserves rich information about a crowded stimulus and, in principle, this information may be sufficient to drive high-level and global aspects of crowding. In addition, it was proposed that grouping effects in crowding may be explained by post-perceptual target cueing. Here, we extensively tested the predictions of the TTM on the results of six different studies that highlighted high-level effects in crowding. Our results show that the TTM cannot explain any of these high-level effects, and that the behavior of the model is equivalent to a simple pooling model. In addition, we show that grouping effects in crowding cannot be predicted by post-perceptual factors, such as target cueing. Taken together, these results reinforce once more the idea that complex target-flanker interactions determine crowding and that crowding occurs at multiple levels of the visual hierarchy.
Collapse
Affiliation(s)
- Alban Bornet
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Oh-Hyeon Choung
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Adrien Doerig
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen, Netherlands
| | - David Whitney
- Department of Psychology, University of California, Berkeley, California, USA
- Helen Wills Neuroscience Institute, University of California, Berkeley, California, USA
- Vision Science Group, University of California, Berkeley, California, USA
| | - Michael H Herzog
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Mauro Manassi
- School of Psychology, University of Aberdeen, King's College, Aberdeen, UK
| |
Collapse
|
9
|
Abstract
In crowding, perception of a target deteriorates in the presence of nearby flankers. Surprisingly, perception can be rescued from crowding if additional flankers are added (uncrowding). Uncrowding is a major challenge for all classic models of crowding and vision in general, because the global configuration of the entire stimulus is crucial. However, it is unclear which characteristics of the configuration impact (un)crowding. Here, we systematically dissected flanker configurations and showed that (un)crowding cannot be easily explained by the effects of the sub-parts or low-level features of the stimulus configuration. Our modeling results suggest that (un)crowding requires global processing. These results are well in line with previous studies showing the importance of global aspects in crowding.
Collapse
Affiliation(s)
- Oh-Hyeon Choung
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Alban Bornet
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Adrien Doerig
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Michael H Herzog
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| |
Collapse
|
10
|
Ziemba CM, Simoncelli EP. Opposing effects of selectivity and invariance in peripheral vision. Nat Commun 2021; 12:4597. [PMID: 34321483 PMCID: PMC8319169 DOI: 10.1038/s41467-021-24880-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Accepted: 07/08/2021] [Indexed: 02/07/2023] Open
Abstract
Sensory processing necessitates discarding some information in service of preserving and reformatting more behaviorally relevant information. Sensory neurons seem to achieve this by responding selectively to particular combinations of features in their inputs, while averaging over or ignoring irrelevant combinations. Here, we expose the perceptual implications of this tradeoff between selectivity and invariance, using stimuli and tasks that explicitly reveal their opposing effects on discrimination performance. We generate texture stimuli with statistics derived from natural photographs, and ask observers to perform two different tasks: Discrimination between images drawn from families with different statistics, and discrimination between image samples with identical statistics. For both tasks, the performance of an ideal observer improves with stimulus size. In contrast, humans become better at family discrimination but worse at sample discrimination. We demonstrate through simulations that these behaviors arise naturally in an observer model that relies on a common set of physiologically plausible local statistical measurements for both tasks.
Collapse
Affiliation(s)
- Corey M Ziemba
- Center for Perceptual Systems, The University of Texas at Austin, Austin, TX, USA.
- Center for Neural Science, New York University, New York, NY, USA.
| | - Eero P Simoncelli
- Center for Neural Science, New York University, New York, NY, USA
- Flatiron Institute, Simons Foundation, New York, NY, USA
| |
Collapse
|
11
|
Bornet A, Doerig A, Herzog MH, Francis G, Van der Burg E. Shrinking Bouma's window: How to model crowding in dense displays. PLoS Comput Biol 2021; 17:e1009187. [PMID: 34228703 PMCID: PMC8284675 DOI: 10.1371/journal.pcbi.1009187] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 07/16/2021] [Accepted: 06/16/2021] [Indexed: 11/22/2022] Open
Abstract
In crowding, perception of a target deteriorates in the presence of nearby flankers. Traditionally, it is thought that visual crowding obeys Bouma's law, i.e., all elements within a certain distance interfere with the target, and that adding more elements always leads to stronger crowding. Crowding is predominantly studied using sparse displays (a target surrounded by a few flankers). However, many studies have shown that this approach leads to wrong conclusions about human vision. Van der Burg and colleagues proposed a paradigm to measure crowding in dense displays using genetic algorithms. Displays were selected and combined over several generations to maximize human performance. In contrast to Bouma's law, only the target's nearest neighbours affected performance. Here, we tested various models to explain these results. We used the same genetic algorithm, but instead of selecting displays based on human performance we selected displays based on the model's outputs. We found that all models based on the traditional feedforward pooling framework of vision were unable to reproduce human behaviour. In contrast, all models involving a dedicated grouping stage explained the results successfully. We show how traditional models can be improved by adding a grouping stage.
Collapse
Affiliation(s)
- Alban Bornet
- Laboratory of Psychophysics, Brain Mind Institute, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Adrien Doerig
- Laboratory of Psychophysics, Brain Mind Institute, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Michael H. Herzog
- Laboratory of Psychophysics, Brain Mind Institute, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Gregory Francis
- Department of Psychological Sciences, Purdue University, West Lafayette, Indiana, United States of America
| | - Erik Van der Burg
- TNO, Human Factors, Soesterberg, The Netherlands
- Brain and Cognition, University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
12
|
Redundancy between spectral and higher-order texture statistics for natural image segmentation. Vision Res 2021; 187:55-65. [PMID: 34217005 DOI: 10.1016/j.visres.2021.06.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 06/09/2021] [Accepted: 06/11/2021] [Indexed: 11/23/2022]
Abstract
Visual texture, defined by local image statistics, provides important information to the human visual system for perceptual segmentation. Second-order or spectral statistics (equivalent to the Fourier power spectrum) are a well-studied segmentation cue. However, the role of higher-order statistics (HOS) in segmentation remains unclear, particularly for natural images. Recent experiments indicate that, in peripheral vision, the HOS of the widely adopted Portilla-Simoncelli texture model are a weak segmentation cue compared to spectral statistics, despite the fact that both are necessary to explain other perceptual phenomena and to support high-quality texture synthesis. Here we test whether this discrepancy reflects a property of natural image statistics. First, we observe that differences in spectral statistics across segments of natural images are redundant with differences in HOS. Second, using linear and nonlinear classifiers, we show that each set of statistics individually affords high performance in natural scenes and texture segmentation tasks, but combining spectral statistics and HOS produces relatively small improvements. Third, we find that HOS improve segmentation for a subset of images, although these images are difficult to identify. We also find that different subsets of HOS improve segmentation to a different extent, in agreement with previous physiological and perceptual work. These results show that the HOS add modestly to spectral statistics for natural image segmentation. We speculate that tuning to natural image statistics under resource constraints could explain the weak contribution of HOS to perceptual segmentation in human peripheral vision.
Collapse
|
13
|
Yildirim FZ, Coates DR, Sayim B. Redundancy masking: The loss of repeated items in crowded peripheral vision. J Vis 2021; 20:14. [PMID: 32330230 PMCID: PMC7405779 DOI: 10.1167/jov.20.4.14] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Crowding is the deterioration of target identification in the presence of neighboring objects. Recent studies using appearance-based methods showed that the perceived number of target elements is often diminished in crowding. Here we introduce a related type of diminishment in repeating patterns (sets of parallel lines), which we term “redundancy masking.” In four experiments, observers were presented with arrays of small numbers of lines centered at 10° eccentricity. The task was to indicate the number of lines. In Experiment 1, spatial characteristics of redundancy masking were examined by varying the inter-line spacing. We found that redundancy masking decreased with increasing inter-line spacing and ceased at spacings of approximately 0.25 times the eccentricity. In Experiment 2, we assessed whether the strength of redundancy masking differed between radial and tangential arrangements of elements as it does in crowding. Redundancy masking was strong with radially arranged lines (horizontally arranged vertical lines), and absent with tangentially arranged lines (vertically arranged horizontal lines). In Experiment 3, we investigated whether target size (line width and length) modulated redundancy masking. There was an effect of width: Thinner lines yielded stronger redundancy masking. We did not find any differences between the tested line lengths. In Experiment 4, we varied the regularity of the line arrays by vertically or horizontally jittering the positions of the lines. Redundancy masking was strongest with regular spacings and weakened with decreasing regularity. Our experiments show under which conditions whole items are lost in crowded displays, and how this redundancy masking resembles—and partly diverges from—crowded identification. We suggest that redundancy masking is a contributor to the deterioration of performance in crowded displays with redundant patterns.
Collapse
|
14
|
Herrera-Esposito D, Coen-Cagli R, Gomez-Sena L. Flexible contextual modulation of naturalistic texture perception in peripheral vision. J Vis 2021; 21:1. [PMID: 33393962 PMCID: PMC7794279 DOI: 10.1167/jov.21.1.1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Accepted: 12/01/2020] [Indexed: 11/24/2022] Open
Abstract
Peripheral vision comprises most of our visual field, and is essential in guiding visual behavior. Its characteristic capabilities and limitations, which distinguish it from foveal vision, have been explained by the most influential theory of peripheral vision as the product of representing the visual input using summary statistics. Despite its success, this account may provide a limited understanding of peripheral vision, because it neglects processes of perceptual grouping and segmentation. To test this hypothesis, we studied how contextual modulation, namely the modulation of the perception of a stimulus by its surrounds, interacts with segmentation in human peripheral vision. We used naturalistic textures, which are directly related to summary-statistics representations. We show that segmentation cues affect contextual modulation, and that this is not captured by our implementation of the summary-statistics model. We then characterize the effects of different texture statistics on contextual modulation, providing guidance for extending the model, as well as for probing neural mechanisms of peripheral vision.
Collapse
Affiliation(s)
- Daniel Herrera-Esposito
- Laboratorio de Neurociencias, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| | - Ruben Coen-Cagli
- Department of Systems and Computational Biology and Dominick P. Purpura Department of Neuroscience, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Leonel Gomez-Sena
- Laboratorio de Neurociencias, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| |
Collapse
|
15
|
Abstract
In this article, I present a framework that would accommodate the classic ideas of visual information processing together with more recent computational approaches. I used the current knowledge about visual crowding, capacity limitations, attention, and saliency to place these phenomena within a standard neural network model. I suggest some revisions to traditional mechanisms of attention and feature integration that are required to fit better into this framework. The results allow us to explain some apparent theoretical controversies in vision research, suggesting a rationale for the limited spatial extent of crowding, a role of saliency in crowding experiments, and several amendments to the feature integration theory. The scheme can be elaborated or modified by future research.
Collapse
Affiliation(s)
- Endel Põder
- Institute of Psychology, University of Tartu, Tartu, Estonia
- www.ut.ee/~endelp/
| |
Collapse
|
16
|
Vacher J, Davila A, Kohn A, Coen-Cagli R. Texture Interpolation for Probing Visual Perception. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2020; 33:22146-22157. [PMID: 36420050 PMCID: PMC9681139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Texture synthesis models are important tools for understanding visual processing. In particular, statistical approaches based on neurally relevant features have been instrumental in understanding aspects of visual perception and of neural coding. New deep learning-based approaches further improve the quality of synthetic textures. Yet, it is still unclear why deep texture synthesis performs so well, and applications of this new framework to probe visual perception are scarce. Here, we show that distributions of deep convolutional neural network (CNN) activations of a texture are well described by elliptical distributions and therefore, following optimal transport theory, constraining their mean and covariance is sufficient to generate new texture samples. Then, we propose the natural geodesics (i.e. the shortest path between two points) arising with the optimal transport metric to interpolate between arbitrary textures. Compared to other CNN-based approaches, our interpolation method appears to match more closely the geometry of texture perception, and our mathematical framework is better suited to study its statistical nature. We apply our method by measuring the perceptual scale associated to the interpolation parameter in human observers, and the neural sensitivity of different areas of visual cortex in macaque monkeys.
Collapse
Affiliation(s)
- Jonathan Vacher
- Albert Einstein College of Medicine, Dept. of Systems and Comp. Biology, 10461 Bronx, NY, USA
| | - Aida Davila
- Albert Einstein College of Medicine, Dominick P. Purpura Dept. of Neuroscience, 10461 Bronx, NY, USA
| | - Adam Kohn
- Albert Einstein College of Medicine, Dept. of Systems and Comp. Biology, and Dominick P. Purpura Dept. of Neuroscience, 10461 Bronx, NY, USA
| | - Ruben Coen-Cagli
- Albert Einstein College of Medicine, Dept. of Systems and Comp. Biology, and Dominick P. Purpura Dept. of Neuroscience, 10461 Bronx, NY, USA
| |
Collapse
|
17
|
Abstract
Area V4-the focus of this review-is a mid-level processing stage along the ventral visual pathway of the macaque monkey. V4 is extensively interconnected with other visual cortical areas along the ventral and dorsal visual streams, with frontal cortical areas, and with several subcortical structures. Thus, it is well poised to play a broad and integrative role in visual perception and recognition-the functional domain of the ventral pathway. Neurophysiological studies in monkeys engaged in passive fixation and behavioral tasks suggest that V4 responses are dictated by tuning in a high-dimensional stimulus space defined by form, texture, color, depth, and other attributes of visual stimuli. This high-dimensional tuning may underlie the development of object-based representations in the visual cortex that are critical for tracking, recognizing, and interacting with objects. Neurophysiological and lesion studies also suggest that V4 responses are important for guiding perceptual decisions and higher-order behavior.
Collapse
Affiliation(s)
- Anitha Pasupathy
- Department of Biological Structure, University of Washington, Seattle, Washington 98195, USA; ,
- Washington National Primate Research Center, University of Washington, Seattle, Washington 98121, USA
| | - Dina V Popovkina
- Department of Psychology, University of Washington, Seattle, Washington 98105, USA;
| | - Taekjun Kim
- Department of Biological Structure, University of Washington, Seattle, Washington 98195, USA; ,
- Washington National Primate Research Center, University of Washington, Seattle, Washington 98121, USA
| |
Collapse
|
18
|
Doerig A, Bornet A, Choung OH, Herzog MH. Crowding reveals fundamental differences in local vs. global processing in humans and machines. Vision Res 2020; 167:39-45. [PMID: 31918074 DOI: 10.1016/j.visres.2019.12.006] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 12/10/2019] [Accepted: 12/16/2019] [Indexed: 11/17/2022]
Abstract
Feedforward Convolutional Neural Networks (ffCNNs) have become state-of-the-art models both in computer vision and neuroscience. However, human-like performance of ffCNNs does not necessarily imply human-like computations. Previous studies have suggested that current ffCNNs do not make use of global shape information. However, it is currently unclear whether this reflects fundamental differences between ffCNN and human processing or is merely an artefact of how ffCNNs are trained. Here, we use visual crowding as a well-controlled, specific probe to test global shape computations. Our results provide evidence that ffCNNs cannot produce human-like global shape computations for principled architectural reasons. We lay out approaches that may address shortcomings of ffCNNs to provide better models of the human visual system.
Collapse
Affiliation(s)
- A Doerig
- Laboratory of Psychophysics, Brain Mind Institute, Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland.
| | - A Bornet
- Laboratory of Psychophysics, Brain Mind Institute, Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland
| | - O H Choung
- Laboratory of Psychophysics, Brain Mind Institute, Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland
| | - M H Herzog
- Laboratory of Psychophysics, Brain Mind Institute, Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland
| |
Collapse
|
19
|
Object shape and surface properties are jointly encoded in mid-level ventral visual cortex. Curr Opin Neurobiol 2019; 58:199-208. [PMID: 31586749 DOI: 10.1016/j.conb.2019.09.009] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Revised: 08/30/2019] [Accepted: 09/11/2019] [Indexed: 11/22/2022]
Abstract
Recognizing a myriad visual objects rapidly is a hallmark of the primate visual system. Traditional theories of object recognition have focused on how crucial form features, for example, the orientation of edges, may be extracted in early visual cortex and utilized to recognize objects. An alternative view argues that much of early and mid-level visual processing focuses on encoding surface characteristics, for example, texture. Neurophysiological evidence from primate area V4 supports a third alternative - the joint, but independent, encoding of form and texture - that would be advantageous for segmenting objects from the background in natural scenes and for object recognition that is independent of surface texture. Future studies that leverage deep convolutional network models, especially focusing on network failures to match biology and behavior, can advance our insights into how such a joint representation of form and surface properties might emerge in visual cortex.
Collapse
|
20
|
Wallis TS, Funke CM, Ecker AS, Gatys LA, Wichmann FA, Bethge M. Image content is more important than Bouma's Law for scene metamers. eLife 2019; 8:42512. [PMID: 31038458 PMCID: PMC6491040 DOI: 10.7554/elife.42512] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Accepted: 03/09/2019] [Indexed: 11/16/2022] Open
Abstract
We subjectively perceive our visual field with high fidelity, yet peripheral distortions can go unnoticed and peripheral objects can be difficult to identify (crowding). Prior work showed that humans could not discriminate images synthesised to match the responses of a mid-level ventral visual stream model when information was averaged in receptive fields with a scaling of about half their retinal eccentricity. This result implicated ventral visual area V2, approximated ‘Bouma’s Law’ of crowding, and has subsequently been interpreted as a link between crowding zones, receptive field scaling, and our perceptual experience. However, this experiment never assessed natural images. We find that humans can easily discriminate real and model-generated images at V2 scaling, requiring scales at least as small as V1 receptive fields to generate metamers. We speculate that explaining why scenes look as they do may require incorporating segmentation and global organisational constraints in addition to local pooling. As you read this digest, your eyes move to follow the lines of text. But now try to hold your eyes in one position, while reading the text on either side and below: it soon becomes clear that peripheral vision is not as good as we tend to assume. It is not possible to read text far away from the center of your line of vision, but you can see ‘something’ out of the corner of your eye. You can see that there is text there, even if you cannot read it, and you can see where your screen or page ends. So how does the brain generate peripheral vision, and why does it differ from what you see when you look straight ahead? One idea is that the visual system averages information over areas of the peripheral visual field. This gives rise to texture-like patterns, as opposed to images made up of fine details. Imagine looking at an expanse of foliage, gravel or fur, for example. Your eyes cannot make out the individual leaves, pebbles or hairs. Instead, you perceive an overall pattern in the form of a texture. Our peripheral vision may also consist of such textures, created when the brain averages information over areas of space. Wallis, Funke et al. have now tested this idea using an existing computer model that averages visual input in this way. By giving the model a series of photographs to process, Wallis, Funke et al. obtained images that should in theory simulate peripheral vision. If the model mimics the mechanisms that generate peripheral vision, then healthy volunteers should be unable to distinguish the processed images from the original photographs. But in fact, the participants could easily discriminate the two sets of images. This suggests that the visual system does not solely use textures to represent information in the peripheral visual field. Wallis, Funke et al. propose that other factors, such as how the visual system separates and groups objects, may instead determine what we see in our peripheral vision. This knowledge could ultimately benefit patients with eye diseases such as macular degeneration, a condition that causes loss of vision in the center of the visual field and forces patients to rely on their peripheral vision.
Collapse
Affiliation(s)
- Thomas Sa Wallis
- Werner Reichardt Center for Integrative Neuroscience, Eberhard Karls Universität Tübingen, Tübingen, Germany.,Bernstein Center for Computational Neuroscience, Berlin, Germany
| | - Christina M Funke
- Werner Reichardt Center for Integrative Neuroscience, Eberhard Karls Universität Tübingen, Tübingen, Germany.,Bernstein Center for Computational Neuroscience, Berlin, Germany
| | - Alexander S Ecker
- Werner Reichardt Center for Integrative Neuroscience, Eberhard Karls Universität Tübingen, Tübingen, Germany.,Bernstein Center for Computational Neuroscience, Berlin, Germany.,Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, United States.,Institute for Theoretical Physics, Eberhard Karls Universität Tübingen, Tübingen, Germany
| | - Leon A Gatys
- Werner Reichardt Center for Integrative Neuroscience, Eberhard Karls Universität Tübingen, Tübingen, Germany
| | - Felix A Wichmann
- Neural Information Processing Group, Faculty of Science, Eberhard Karls Universität Tübingen, Tübingen, Germany
| | - Matthias Bethge
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, United States.,Institute for Theoretical Physics, Eberhard Karls Universität Tübingen, Tübingen, Germany.,Max Planck Institute for Biological Cybernetics, Tübingen, Germany
| |
Collapse
|
21
|
Knotts JD, Odegaard B, Lau H, Rosenthal D. Subjective inflation: phenomenology's get-rich-quick scheme. Curr Opin Psychol 2018; 29:49-55. [PMID: 30503986 DOI: 10.1016/j.copsyc.2018.11.006] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2018] [Revised: 11/02/2018] [Accepted: 11/07/2018] [Indexed: 11/27/2022]
Abstract
How do we explain the seemingly rich nature of visual phenomenology while accounting for impoverished perception in the periphery? This apparent mismatch has led some to posit that rich phenomenological content overflows cognitive access, whereas others hold that phenomenology is in fact sparse and constrained by cognitive access. Here, we review the Rich versus Sparse debate as it relates to a phenomenon called subjective inflation, wherein minimally attended or peripheral visual perception tends to be subjectively evaluated as more reliable than attended or foveal perception when objective performance is matched. We argue that subjective inflation can account for rich phenomenology without invoking phenomenological overflow. On this view, visual phenomenology is constrained by cognitive access, but seemingly inflated above what would be predicted based on sparse sensory content.
Collapse
Affiliation(s)
- J D Knotts
- Department of Psychology, University of California, Los Angeles, CA 90095, USA.
| | - Brian Odegaard
- Department of Psychology, University of California, Los Angeles, CA 90095, USA
| | - Hakwan Lau
- Department of Psychology, University of California, Los Angeles, CA 90095, USA; Brain Research Institute, University of California, Los Angeles, CA 90095, USA; Department of Psychology, University of Hong Kong, Hong Kong; State Key Laboratory of Brain and Cognitive Sciences, University of Hong Kong, Hong Kong
| | - David Rosenthal
- Philosophy, Cognitive Science, and Cognitive Neuroscience, CUNY Graduate Center, New York, NY 10016, USA
| |
Collapse
|