1
|
Moore CM, Zheng Q. Limited midlevel mediation of visual crowding: Surface completion fails to support uncrowding. J Vis 2024; 24:11. [PMID: 38294775 PMCID: PMC10839818 DOI: 10.1167/jov.24.1.11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 12/10/2023] [Indexed: 02/01/2024] Open
Abstract
Visual crowding refers to impaired object recognition that is caused by nearby stimuli. It increases with eccentricity. Image-level explanations of crowding maintain that it is caused by information loss within early encoding processes that vary in functionality with eccentricity. Alternative explanations maintain that the interference is not limited to two-dimensional image-level interactions but that it is mediated within representations that reflect three-dimensional scene structure. Uncrowding refers to when adding stimulus information to a display, which increases the noise at an image level, nonetheless decreasing the amount of crowding that occurs. Uncrowding has been interpreted as evidence of midlevel mediation of crowding because the additional information tends to provide an opportunity for perceptually organizing stimuli into distinct and therefore protected representations. It is difficult, however, to rule out image-level explanations of crowding and uncrowding when stimulus differences exist between conditions. We adapted displays of a specific form of uncrowding to minimize stimulus differences across conditions, while retaining the potential for perceptual organization, specifically perceptual surface completion. Uncrowding under these conditions would provide strong support for midlevel mediation of crowding. In five experiments, however, we found no evidence of midlevel mediation of crowding, indicating that at least for this version of uncrowding, image-level explanations cannot be ruled out.
Collapse
Affiliation(s)
- Cathleen M Moore
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, IA, USA
| | - Qingzi Zheng
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, IA, USA
| |
Collapse
|
2
|
Choung OH, Gordillo D, Roinishvili M, Brand A, Herzog MH, Chkonia E. Intact and deficient contextual processing in schizophrenia patients. Schizophr Res Cogn 2022; 30:100265. [PMID: 36119400 PMCID: PMC9477851 DOI: 10.1016/j.scog.2022.100265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2022] [Revised: 07/09/2022] [Accepted: 07/09/2022] [Indexed: 11/25/2022] Open
Abstract
Schizophrenia patients are known to have deficits in contextual vision. However, results are often very mixed. In some paradigms, patients do not take the context into account and, hence, perform more veridically than healthy controls. In other paradigms, context deteriorates performance much more strongly in patients compared to healthy controls. These mixed results may be explained by differences in the paradigms as well as by small or biased samples, given the large heterogeneity of patients' deficits. Here, we show that mixed results may also come from idiosyncrasies of the stimuli used because in variants of the same visual paradigm, tested with the same participants, we found intact and deficient processing.
Collapse
Affiliation(s)
- Oh-Hyeon Choung
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Corresponding author. http://lpsy.epfl.ch
| | - Dario Gordillo
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Maya Roinishvili
- Laboratory of Vision Physiology, Ivane Beritashvili Centre of Experimental Biomedicine, Tbilisi, Georgia
- Institute of Cognitive Neurosciences, Free University of Tbilisi, Tbilisi, Georgia
| | - Andreas Brand
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Michael H. Herzog
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Eka Chkonia
- Department of Psychiatry, Tbilisi State Medical University, Tbilisi, Georgia
| |
Collapse
|
3
|
Herzog MH. The Irreducibility of Vision: Gestalt, Crowding and the Fundamentals of Vision. Vision (Basel) 2022; 6:vision6020035. [PMID: 35737422 PMCID: PMC9228288 DOI: 10.3390/vision6020035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Revised: 05/25/2022] [Accepted: 05/31/2022] [Indexed: 11/16/2022] Open
Abstract
What is fundamental in vision has been discussed for millennia. For philosophical realists and the physiological approach to vision, the objects of the outer world are truly given, and failures to perceive objects properly, such as in illusions, are just sporadic misperceptions. The goal is to replace the subjectivity of the mind by careful physiological analyses. Continental philosophy and the Gestaltists are rather skeptical or ignorant about external objects. The percepts themselves are their starting point, because it is hard to deny the truth of one own′s percepts. I will show that, whereas both approaches can well explain many visual phenomena with classic visual stimuli, they both have trouble when stimuli become slightly more complex. I suggest that these failures have a deeper conceptual reason, namely that their foundations (objects, percepts) do not hold true. I propose that only physical states exist in a mind independent manner and that everyday objects, such as bottles and trees, are perceived in a mind-dependent way. The fundamental processing units to process objects are extended windows of unconscious processing, followed by short, discrete conscious percepts.
Collapse
Affiliation(s)
- Michael H Herzog
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
| |
Collapse
|
4
|
Bornet A, Choung OH, Doerig A, Whitney D, Herzog MH, Manassi M. Global and high-level effects in crowding cannot be predicted by either high-dimensional pooling or target cueing. J Vis 2021; 21:10. [PMID: 34812839 PMCID: PMC8626847 DOI: 10.1167/jov.21.12.10] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 09/30/2021] [Indexed: 11/24/2022] Open
Abstract
In visual crowding, the perception of a target deteriorates in the presence of nearby flankers. Traditionally, target-flanker interactions have been considered as local, mostly deleterious, low-level, and feature specific, occurring when information is pooled along the visual processing hierarchy. Recently, a vast literature of high-level effects in crowding (grouping effects and face-holistic crowding in particular) led to a different understanding of crowding, as a global, complex, and multilevel phenomenon that cannot be captured or explained by simple pooling models. It was recently argued that these high-level effects may still be captured by more sophisticated pooling models, such as the Texture Tiling model (TTM). Unlike simple pooling models, the high-dimensional pooling stage of the TTM preserves rich information about a crowded stimulus and, in principle, this information may be sufficient to drive high-level and global aspects of crowding. In addition, it was proposed that grouping effects in crowding may be explained by post-perceptual target cueing. Here, we extensively tested the predictions of the TTM on the results of six different studies that highlighted high-level effects in crowding. Our results show that the TTM cannot explain any of these high-level effects, and that the behavior of the model is equivalent to a simple pooling model. In addition, we show that grouping effects in crowding cannot be predicted by post-perceptual factors, such as target cueing. Taken together, these results reinforce once more the idea that complex target-flanker interactions determine crowding and that crowding occurs at multiple levels of the visual hierarchy.
Collapse
Affiliation(s)
- Alban Bornet
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Oh-Hyeon Choung
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Adrien Doerig
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen, Netherlands
| | - David Whitney
- Department of Psychology, University of California, Berkeley, California, USA
- Helen Wills Neuroscience Institute, University of California, Berkeley, California, USA
- Vision Science Group, University of California, Berkeley, California, USA
| | - Michael H Herzog
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Mauro Manassi
- School of Psychology, University of Aberdeen, King's College, Aberdeen, UK
| |
Collapse
|
5
|
Abstract
In crowding, perception of a target deteriorates in the presence of nearby flankers. Surprisingly, perception can be rescued from crowding if additional flankers are added (uncrowding). Uncrowding is a major challenge for all classic models of crowding and vision in general, because the global configuration of the entire stimulus is crucial. However, it is unclear which characteristics of the configuration impact (un)crowding. Here, we systematically dissected flanker configurations and showed that (un)crowding cannot be easily explained by the effects of the sub-parts or low-level features of the stimulus configuration. Our modeling results suggest that (un)crowding requires global processing. These results are well in line with previous studies showing the importance of global aspects in crowding.
Collapse
Affiliation(s)
- Oh-Hyeon Choung
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Alban Bornet
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Adrien Doerig
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Michael H Herzog
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| |
Collapse
|
6
|
Unraveling brain interactions in vision: The example of crowding. Neuroimage 2021; 240:118390. [PMID: 34271157 DOI: 10.1016/j.neuroimage.2021.118390] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 07/09/2021] [Accepted: 07/12/2021] [Indexed: 11/22/2022] Open
Abstract
Crowding, the impairment of target discrimination in clutter, is the standard situation in vision. Traditionally, crowding is explained with (feedforward) models, in which only neighboring elements interact, leading to a "bottleneck" at the earliest stages of vision. It is with this implicit prior that most functional magnetic resonance imaging (fMRI) studies approach the identification of the "neural locus" of crowding, searching for the earliest visual area in which the blood-oxygenation-level-dependent (BOLD) signal is suppressed under crowded conditions. Using this classic approach, we replicated previous findings of crowding-related BOLD suppression starting in V2 and increasing up the visual hierarchy. Surprisingly, under conditions of uncrowding, in which adding flankers improves performance, the BOLD signal was further suppressed. This suggests an important role for top-down connections, which is in line with global models of crowding. To discriminate between various possible models, we used dynamic causal modeling (DCM). We show that recurrent interactions between all visual areas, including higher-level areas like V4 and the lateral occipital complex (LOC), are crucial in crowding and uncrowding. Our results explain the discrepancies in previous findings: in a recurrent visual hierarchy, the crowding effect can theoretically be detected at any stage. Beyond crowding, we demonstrate the need for models like DCM to understand the complex recurrent processing which most likely underlies human perception in general.
Collapse
|
7
|
Bornet A, Doerig A, Herzog MH, Francis G, Van der Burg E. Shrinking Bouma's window: How to model crowding in dense displays. PLoS Comput Biol 2021; 17:e1009187. [PMID: 34228703 PMCID: PMC8284675 DOI: 10.1371/journal.pcbi.1009187] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 07/16/2021] [Accepted: 06/16/2021] [Indexed: 11/22/2022] Open
Abstract
In crowding, perception of a target deteriorates in the presence of nearby flankers. Traditionally, it is thought that visual crowding obeys Bouma's law, i.e., all elements within a certain distance interfere with the target, and that adding more elements always leads to stronger crowding. Crowding is predominantly studied using sparse displays (a target surrounded by a few flankers). However, many studies have shown that this approach leads to wrong conclusions about human vision. Van der Burg and colleagues proposed a paradigm to measure crowding in dense displays using genetic algorithms. Displays were selected and combined over several generations to maximize human performance. In contrast to Bouma's law, only the target's nearest neighbours affected performance. Here, we tested various models to explain these results. We used the same genetic algorithm, but instead of selecting displays based on human performance we selected displays based on the model's outputs. We found that all models based on the traditional feedforward pooling framework of vision were unable to reproduce human behaviour. In contrast, all models involving a dedicated grouping stage explained the results successfully. We show how traditional models can be improved by adding a grouping stage.
Collapse
Affiliation(s)
- Alban Bornet
- Laboratory of Psychophysics, Brain Mind Institute, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Adrien Doerig
- Laboratory of Psychophysics, Brain Mind Institute, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Michael H. Herzog
- Laboratory of Psychophysics, Brain Mind Institute, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Gregory Francis
- Department of Psychological Sciences, Purdue University, West Lafayette, Indiana, United States of America
| | - Erik Van der Burg
- TNO, Human Factors, Soesterberg, The Netherlands
- Brain and Cognition, University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
8
|
Melnik N, Coates DR, Sayim B. Geometrically restricted image descriptors: A method to capture the appearance of shape. J Vis 2021; 21:14. [PMID: 33688921 PMCID: PMC7961119 DOI: 10.1167/jov.21.3.14] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Shape perception varies depending on many factors. For example, presenting a stimulus in the periphery often yields a different appearance compared with its foveal presentation. However, how exactly shape appearance is altered under different conditions remains elusive. One reason for this is that studies typically measure identification performance, leaving details about target appearance unknown. The lack of appearance-based methods and general challenges to quantify appearance complicate the investigation of shape appearance. Here, we introduce Geometrically Restricted Image Descriptors (GRIDs), a method to investigate the appearance of shapes. Stimuli in the GRID paradigm are shapes consisting of distinct line elements placed on a grid by connecting grid nodes. Each line is treated as a discrete target. Observers are asked to capture target appearance by placing lines on a freely viewed response grid. We used GRIDs to investigate the appearance of letters and letter-like shapes. Targets were presented at 10° eccentricity in the right visual field. Gaze-contingent stimulus presentation was used to prevent eye movements to the target. The data were analyzed by quantifying the differences between targets and response in regard to overall accuracy, element discriminability, and several distinct error types. Our results show how shape appearance can be captured by GRIDs, and how a fine-grained analysis of stimulus parts provides quantifications of appearance typically not available in standard measures of performance. We propose that GRIDs are an effective tool to investigate the appearance of shapes.
Collapse
Affiliation(s)
- Natalia Melnik
- Institute of Psychology, University of Bern, Bern, Switzerland.,
| | - Daniel R Coates
- Institute of Psychology, University of Bern, Bern, Switzerland and College of Optometry, University of Houston, Houston, Texas, USA.,
| | - Bilge Sayim
- Institute of Psychology, University of Bern, Bern, Switzerland and Univ. Lille, CNRS, UMR 9193 - SCALab - Sciences Cognitives et Sciences Affectives, Lille, France., http://www.appearancelab.org/
| |
Collapse
|
9
|
Herrera-Esposito D, Coen-Cagli R, Gomez-Sena L. Flexible contextual modulation of naturalistic texture perception in peripheral vision. J Vis 2021; 21:1. [PMID: 33393962 PMCID: PMC7794279 DOI: 10.1167/jov.21.1.1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Accepted: 12/01/2020] [Indexed: 11/24/2022] Open
Abstract
Peripheral vision comprises most of our visual field, and is essential in guiding visual behavior. Its characteristic capabilities and limitations, which distinguish it from foveal vision, have been explained by the most influential theory of peripheral vision as the product of representing the visual input using summary statistics. Despite its success, this account may provide a limited understanding of peripheral vision, because it neglects processes of perceptual grouping and segmentation. To test this hypothesis, we studied how contextual modulation, namely the modulation of the perception of a stimulus by its surrounds, interacts with segmentation in human peripheral vision. We used naturalistic textures, which are directly related to summary-statistics representations. We show that segmentation cues affect contextual modulation, and that this is not captured by our implementation of the summary-statistics model. We then characterize the effects of different texture statistics on contextual modulation, providing guidance for extending the model, as well as for probing neural mechanisms of peripheral vision.
Collapse
Affiliation(s)
- Daniel Herrera-Esposito
- Laboratorio de Neurociencias, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| | - Ruben Coen-Cagli
- Department of Systems and Computational Biology and Dominick P. Purpura Department of Neuroscience, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Leonel Gomez-Sena
- Laboratorio de Neurociencias, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| |
Collapse
|
10
|
van Bergen RS, Kriegeskorte N. Going in circles is the way forward: the role of recurrence in visual inference. Curr Opin Neurobiol 2020; 65:176-193. [PMID: 33279795 DOI: 10.1016/j.conb.2020.11.009] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Revised: 11/16/2020] [Accepted: 11/16/2020] [Indexed: 11/30/2022]
Abstract
Biological visual systems exhibit abundant recurrent connectivity. State-of-the-art neural network models for visual recognition, by contrast, rely heavily or exclusively on feedforward computation. Any finite-time recurrent neural network (RNN) can be unrolled along time to yield an equivalent feedforward neural network (FNN). This important insight suggests that computational neuroscientists may not need to engage recurrent computation, and that computer-vision engineers may be limiting themselves to a special case of FNN if they build recurrent models. Here we argue, to the contrary, that FNNs are a special case of RNNs and that computational neuroscientists and engineers should engage recurrence to understand how brains and machines can (1) achieve greater and more flexible computational depth (2) compress complex computations into limited hardware (3) integrate priors and priorities into visual inference through expectation and attention (4) exploit sequential dependencies in their data for better inference and prediction and (5) leverage the power of iterative computation.
Collapse
Affiliation(s)
- Ruben S van Bergen
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, United States
| | - Nikolaus Kriegeskorte
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, United States; Department of Psychology, Columbia University, New York, NY, United States; Department of Neuroscience, Columbia University, New York, NY, United States; Affiliated member, Electrical Engineering, Columbia University, New York, NY, United States.
| |
Collapse
|
11
|
Abstract
In this article, I present a framework that would accommodate the classic ideas of visual information processing together with more recent computational approaches. I used the current knowledge about visual crowding, capacity limitations, attention, and saliency to place these phenomena within a standard neural network model. I suggest some revisions to traditional mechanisms of attention and feature integration that are required to fit better into this framework. The results allow us to explain some apparent theoretical controversies in vision research, suggesting a rationale for the limited spatial extent of crowding, a role of saliency in crowding experiments, and several amendments to the feature integration theory. The scheme can be elaborated or modified by future research.
Collapse
Affiliation(s)
- Endel Põder
- Institute of Psychology, University of Tartu, Tartu, Estonia
- www.ut.ee/~endelp/
| |
Collapse
|
12
|
Xi H, Wu R, Wang B, Chen L. Topological difference between target and flankers alleviates crowding effect. J Vis 2020; 20:9. [PMID: 32926072 PMCID: PMC7509911 DOI: 10.1167/jov.20.9.9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
In the crowding effect, object recognition in the periphery deteriorates when other items flank the target, especially if they share similarities. Here, we report that the similarity defined by topological property (differences in number of holes) influences the crowding effect. Orientation discrimination tasks suggested that the crowding effect was weaker with a topological different (TD) flanker than a topological equivalent (TE) flanker and an existing inward-outward anisotropy phenomenon. In another experiment, both an outer and an inner flanker were used to constitute four different conditions. The performance of an outer TD flanker and an inner TE flanker was superior to that of an outer TE flanker and an inner TD flanker, even though the items of the stimuli were the same. Different stimuli were used to control for local features. To eliminate the possible explanation of confusability, we selected pairs of letters with matched confusability, but one pair was TD and another was TE. The letter identification performance was better for the TD condition. Lastly, we investigated the digit identification under four conditions with varied spacing. Regardless of different spacing, the crowding effect was reduced by a topological different flanker. The results collectively suggest that topological property plays a role in the perceptual grouping, which modulates the crowding effect.
Collapse
Affiliation(s)
- Huanjun Xi
- State Key Laboratory of Brain and Cognitive Science, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Ruijie Wu
- State Key Laboratory of Brain and Cognitive Science, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Bo Wang
- State Key Laboratory of Brain and Cognitive Science, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Lin Chen
- State Key Laboratory of Brain and Cognitive Science, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China.,CAS Center for Excellence in Brain Science and Intelligence Technology, Shanghai, China
| |
Collapse
|
13
|
Doerig A, Schmittwilken L, Sayim B, Manassi M, Herzog MH. Capsule networks as recurrent models of grouping and segmentation. PLoS Comput Biol 2020; 16:e1008017. [PMID: 32692780 PMCID: PMC7394447 DOI: 10.1371/journal.pcbi.1008017] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Revised: 07/31/2020] [Accepted: 06/04/2020] [Indexed: 11/18/2022] Open
Abstract
Classically, visual processing is described as a cascade of local feedforward computations. Feedforward Convolutional Neural Networks (ffCNNs) have shown how powerful such models can be. However, using visual crowding as a well-controlled challenge, we previously showed that no classic model of vision, including ffCNNs, can explain human global shape processing. Here, we show that Capsule Neural Networks (CapsNets), combining ffCNNs with recurrent grouping and segmentation, solve this challenge. We also show that ffCNNs and standard recurrent CNNs do not, suggesting that the grouping and segmentation capabilities of CapsNets are crucial. Furthermore, we provide psychophysical evidence that grouping and segmentation are implemented recurrently in humans, and show that CapsNets reproduce these results well. We discuss why recurrence seems needed to implement grouping and segmentation efficiently. Together, we provide mutually reinforcing psychophysical and computational evidence that a recurrent grouping and segmentation process is essential to understand the visual system and create better models that harness global shape computations.
Collapse
Affiliation(s)
- Adrien Doerig
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Lynn Schmittwilken
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Dept. Computational Psychology, Institute of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, Berlin, Germany
| | - Bilge Sayim
- Institute of Psychology, University of Bern, Bern, Switzerland
- Univ. Lille, CNRS, UMR 9193—SCALab—Sciences Cognitives et Sciences Affectives, F-59000 Lille, France
| | - Mauro Manassi
- School of Psychology, University of Aberdeen, Scotland, United Kingdom
| | - Michael H. Herzog
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| |
Collapse
|
14
|
Doerig A, Bornet A, Choung OH, Herzog MH. Crowding reveals fundamental differences in local vs. global processing in humans and machines. Vision Res 2020; 167:39-45. [PMID: 31918074 DOI: 10.1016/j.visres.2019.12.006] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 12/10/2019] [Accepted: 12/16/2019] [Indexed: 11/17/2022]
Abstract
Feedforward Convolutional Neural Networks (ffCNNs) have become state-of-the-art models both in computer vision and neuroscience. However, human-like performance of ffCNNs does not necessarily imply human-like computations. Previous studies have suggested that current ffCNNs do not make use of global shape information. However, it is currently unclear whether this reflects fundamental differences between ffCNN and human processing or is merely an artefact of how ffCNNs are trained. Here, we use visual crowding as a well-controlled, specific probe to test global shape computations. Our results provide evidence that ffCNNs cannot produce human-like global shape computations for principled architectural reasons. We lay out approaches that may address shortcomings of ffCNNs to provide better models of the human visual system.
Collapse
Affiliation(s)
- A Doerig
- Laboratory of Psychophysics, Brain Mind Institute, Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland.
| | - A Bornet
- Laboratory of Psychophysics, Brain Mind Institute, Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland
| | - O H Choung
- Laboratory of Psychophysics, Brain Mind Institute, Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland
| | - M H Herzog
- Laboratory of Psychophysics, Brain Mind Institute, Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland
| |
Collapse
|
15
|
Rosenholtz R, Yu D, Keshvari S. Challenges to pooling models of crowding: Implications for visual mechanisms. J Vis 2019. [DOI: 10.1167/jov.19.7.15] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Affiliation(s)
- Ruth Rosenholtz
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Brain & Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Dian Yu
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Shaiyan Keshvari
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Brain & Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
16
|
Rosenholtz R, Yu D, Keshvari S. Challenges to pooling models of crowding: Implications for visual mechanisms. J Vis 2019; 19:15. [PMID: 31348486 PMCID: PMC6660188 DOI: 10.1167/19.7.15] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 03/10/2019] [Indexed: 12/02/2022] Open
Abstract
A set of phenomena known as crowding reveal peripheral vision's vulnerability in the face of clutter. Crowding is important both because of its ubiquity, making it relevant for many real-world tasks and stimuli, and because of the window it provides onto mechanisms of visual processing. Here we focus on models of the underlying mechanisms. This review centers on a popular class of models known as pooling models, as well as the phenomenology that appears to challenge a pooling account. Using a candidate high-dimensional pooling model, we gain intuitions about whether a pooling model suffices and reexamine the logic behind the pooling challenges. We show that pooling mechanisms can yield substitution phenomena and therefore predict better performance judging the properties of a set versus a particular item. Pooling models can also exhibit some similarity effects without requiring mechanisms that pool at multiple levels of processing, and without constraining pooling to a particular perceptual group. Moreover, we argue that other similarity effects may in part be due to noncrowding influences like cuing. Unlike low-dimensional straw-man pooling models, high-dimensional pooling preserves rich information about the stimulus, which may be sufficient to support high-level processing. To gain insights into the implications for pooling mechanisms, one needs a candidate high-dimensional pooling model and cannot rely on intuitions from low-dimensional models. Furthermore, to uncover the mechanisms of crowding, experiments need to separate encoding from decision effects. While future work must quantitatively examine all of the challenges to a high-dimensional pooling account, insights from a candidate model allow us to conclude that a high-dimensional pooling mechanism remains viable as a model of the loss of information leading to crowding.
Collapse
Affiliation(s)
- Ruth Rosenholtz
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Brain & Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Dian Yu
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Shaiyan Keshvari
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Brain & Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
17
|
Doerig A, Bornet A, Rosenholtz R, Francis G, Clarke AM, Herzog MH. Beyond Bouma's window: How to explain global aspects of crowding? PLoS Comput Biol 2019; 15:e1006580. [PMID: 31075131 PMCID: PMC6530878 DOI: 10.1371/journal.pcbi.1006580] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Revised: 05/22/2019] [Accepted: 10/04/2018] [Indexed: 11/19/2022] Open
Abstract
In crowding, perception of an object deteriorates in the presence of nearby elements. Although crowding is a ubiquitous phenomenon, since elements are rarely seen in isolation, to date there exists no consensus on how to model it. Previous experiments showed that the global configuration of the entire stimulus must be taken into account. These findings rule out simple pooling or substitution models and favor models sensitive to global spatial aspects. In order to investigate how to incorporate global aspects into models, we tested a large number of models with a database of forty stimuli tailored for the global aspects of crowding. Our results show that incorporating grouping like components strongly improves model performance.
Collapse
Affiliation(s)
- Adrien Doerig
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Alban Bornet
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Ruth Rosenholtz
- Department of Brain and Cognitive Sciences, Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, United States of America
| | - Gregory Francis
- Department of Psychological Sciences, Purdue University, West Lafayette, IN, United States of America
| | - Aaron M. Clarke
- Laboratory of Computational Vision, Psychology Department, Bilkent University, Ankara, Turkey
| | - Michael H. Herzog
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| |
Collapse
|
18
|
A few remarks on spatial interference in visual stimuli. Behav Res Methods 2017; 50:1716-1722. [PMID: 29067673 DOI: 10.3758/s13428-017-0978-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Many vision experiments, e.g., tests of masking and visual crowding, involve the effect of adding a second stimulus to an initial one. The effects of such additions are generally considered in terms of physiological mechanisms and the possibility of interference in the stimuli is generally not considered. In the present study, interference between two stimuli was assessed by comparing the sum of amplitudes in the combined stimulus to the sums of the amplitudes in the two stimuli determined separately. With this approach, evidence for interference was found. It was also found that adding a second stimulus may alter the phase angles. These observations mean that the same stimulus presented together with other stimuli may have less stimulus power than when presented by itself. Thus, it is necessary to take account of the possibility of interference when interpreting results from experiments in which the effect of one stimulus element upon another is explored.
Collapse
|
19
|
Grossberg S. Towards solving the hard problem of consciousness: The varieties of brain resonances and the conscious experiences that they support. Neural Netw 2016; 87:38-95. [PMID: 28088645 DOI: 10.1016/j.neunet.2016.11.003] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Revised: 10/21/2016] [Accepted: 11/20/2016] [Indexed: 10/20/2022]
Abstract
The hard problem of consciousness is the problem of explaining how we experience qualia or phenomenal experiences, such as seeing, hearing, and feeling, and knowing what they are. To solve this problem, a theory of consciousness needs to link brain to mind by modeling how emergent properties of several brain mechanisms interacting together embody detailed properties of individual conscious psychological experiences. This article summarizes evidence that Adaptive Resonance Theory, or ART, accomplishes this goal. ART is a cognitive and neural theory of how advanced brains autonomously learn to attend, recognize, and predict objects and events in a changing world. ART has predicted that "all conscious states are resonant states" as part of its specification of mechanistic links between processes of consciousness, learning, expectation, attention, resonance, and synchrony. It hereby provides functional and mechanistic explanations of data ranging from individual spikes and their synchronization to the dynamics of conscious perceptual, cognitive, and cognitive-emotional experiences. ART has reached sufficient maturity to begin classifying the brain resonances that support conscious experiences of seeing, hearing, feeling, and knowing. Psychological and neurobiological data in both normal individuals and clinical patients are clarified by this classification. This analysis also explains why not all resonances become conscious, and why not all brain dynamics are resonant. The global organization of the brain into computationally complementary cortical processing streams (complementary computing), and the organization of the cerebral cortex into characteristic layers of cells (laminar computing), figure prominently in these explanations of conscious and unconscious processes. Alternative models of consciousness are also discussed.
Collapse
Affiliation(s)
- Stephen Grossberg
- Center for Adaptive Systems, Boston University, 677 Beacon Street, Boston, MA 02215, USA; Graduate Program in Cognitive and Neural Systems, Departments of Mathematics & Statistics, Psychological & Brain Sciences, and Biomedical Engineering Boston University, 677 Beacon Street, Boston, MA 02215, USA.
| |
Collapse
|