1
|
Vaishnav M, Cadene R, Alamia A, Linsley D, VanRullen R, Serre T. Understanding the Computational Demands Underlying Visual Reasoning. Neural Comput 2022; 34:1075-1099. [PMID: 35231926 DOI: 10.1162/neco_a_01485] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 12/07/2021] [Indexed: 11/04/2022]
Abstract
Visual understanding requires comprehending complex visual relations between objects within a scene. Here, we seek to characterize the computational demands for abstract visual reasoning. We do this by systematically assessing the ability of modern deep convolutional neural networks (CNNs) to learn to solve the synthetic visual reasoning test (SVRT) challenge, a collection of 23 visual reasoning problems. Our analysis reveals a novel taxonomy of visual reasoning tasks, which can be primarily explained by both the type of relations (same-different versus spatial-relation judgments) and the number of relations used to compose the underlying rules. Prior cognitive neuroscience work suggests that attention plays a key role in humans' visual reasoning ability. To test this hypothesis, we extended the CNNs with spatial and feature-based attention mechanisms. In a second series of experiments, we evaluated the ability of these attention networks to learn to solve the SVRT challenge and found the resulting architectures to be much more efficient at solving the hardest of these visual reasoning tasks. Most important, the corresponding improvements on individual tasks partially explained our novel taxonomy. Overall, this work provides a granular computational account of visual reasoning and yields testable neuroscience predictions regarding the differential need for feature-based versus spatial attention depending on the type of visual reasoning problem.
Collapse
Affiliation(s)
- Mohit Vaishnav
- Artificial and Natural Intelligence Toulouse Institute, Université de Toulouse, 31052 Toulose, France.,Carney Institute for Brain Science, Department of Cognitive Linguistic and Psychological Sciences, Brown University, Providence, RI 02912, U.S.A.
| | - Remi Cadene
- Carney Institute for Brain Science, Department of Cognitive Linguistic and Psychological Sciences, Brown University, Providence, RI 02912, U.S.A.
| | - Andrea Alamia
- Centre de Recherche Cerveau et Cognition, CNRS, Université de Toulouse, 31052 Toulouse, France
| | - Drew Linsley
- Carney Institute for Brain Science, Department of Cognitive Linguistic and Psychological Sciences, Brown University, Providence, RI 02912, U.S.A.
| | - Rufin VanRullen
- Artificial and Natural Intelligence, Toulouse Institute, Université de Toulouse, and Centre de Recherche Cerveau et Cognition, CNRS, Université de Toulouse, 31052 Toulouse, France
| | - Thomas Serre
- Artificial and Natural Intelligence Toulouse Institute, Université de Toulouse, 31052 Toulouse, France.,Carney Institute for Brain Science, Department of Cognitive Linguistic and Psychological Sciences, Brown University, Providence, RI 02912, U.S.A.
| |
Collapse
|
2
|
Kim J, Ricci M, Serre T. Not-So-CLEVR: learning same-different relations strains feedforward neural networks. Interface Focus 2018; 8:20180011. [PMID: 29951191 DOI: 10.1098/rsfs.2018.0011] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/08/2018] [Indexed: 11/12/2022] Open
Abstract
The advent of deep learning has recently led to great successes in various engineering applications. As a prime example, convolutional neural networks, a type of feedforward neural network, now approach human accuracy on visual recognition tasks like image classification and face recognition. However, here we will show that feedforward neural networks struggle to learn abstract visual relations that are effortlessly recognized by non-human primates, birds, rodents and even insects. We systematically study the ability of feedforward neural networks to learn to recognize a variety of visual relations and demonstrate that same-different visual relations pose a particular strain on these networks. Networks fail to learn same-different visual relations when stimulus variability makes rote memorization difficult. Further, we show that learning same-different problems becomes trivial for a feedforward network that is fed with perceptually grouped stimuli. This demonstration and the comparative success of biological vision in learning visual relations suggests that feedback mechanisms such as attention, working memory and perceptual grouping may be the key components underlying human-level abstract visual reasoning.
Collapse
Affiliation(s)
- Junkyung Kim
- Department of Cognitive, Linguistic & Psychological Sciences, Carney Institute for Brain Science, Brown University, Providence, RI 02912, USA
| | - Matthew Ricci
- Department of Cognitive, Linguistic & Psychological Sciences, Carney Institute for Brain Science, Brown University, Providence, RI 02912, USA
| | - Thomas Serre
- Department of Cognitive, Linguistic & Psychological Sciences, Carney Institute for Brain Science, Brown University, Providence, RI 02912, USA
| |
Collapse
|
3
|
van der Ham IJM, Brummelman J, Aerts ME, de Haan AM, Dijkerman HC. Lateralized pointing does not cause a cognitive bias. Cogn Process 2017; 19:17-25. [PMID: 28871445 DOI: 10.1007/s10339-017-0833-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2016] [Accepted: 08/30/2017] [Indexed: 10/18/2022]
Abstract
Lateralized pointing has been shown to cause not only a shift in visuo-motor midline, but also a shift in non-lateralized spatial attention. Non-lateralized cognitive consequences of lateralized pointing have been reported for local and global visuospatial processing. Here, we evaluate these findings and examine this effect for categorical and coordinate spatial relation processing, for which the attentional processes are thought to be highly similar to local and global visuospatial processing, respectively. Participants performed a commonly used working memory task to assess categorical and coordinate spatial relation processing. Lateralized pointing with either the left or the right hand, to either the left or the right side was introduced as a manipulation, as well as a new control condition without any pointing. Performance on the spatial relation task was measured before and after pointing. The results suggest that non-lateralized consequences of lateralized pointing cannot be generalized to other cognitive tasks relying on attentional processing. Further examination of lateralized pointing is recommended before drawing further conclusions concerning its impact on non-lateralized cognition.
Collapse
Affiliation(s)
- Ineke J M van der Ham
- Department of Health, Medical, and Neuropsychology, Leiden University, Wassenaarseweg 52, 2333 AK, Leiden, The Netherlands.
| | - Jantina Brummelman
- Department of Experimental Psychology, Helmholtz Institute Utrecht University, Utrecht, The Netherlands
| | - Marie Elise Aerts
- Department of Experimental Psychology, Helmholtz Institute Utrecht University, Utrecht, The Netherlands
| | - Alyanne M de Haan
- Department of Experimental Psychology, Helmholtz Institute Utrecht University, Utrecht, The Netherlands
| | - H Chris Dijkerman
- Department of Experimental Psychology, Helmholtz Institute Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
4
|
Ruotolo F, Iachini T, Ruggiero G, van der Ham IJM, Postma A. Frames of reference and categorical/coordinate spatial relations in a "what was where" task. Exp Brain Res 2016; 234:2687-96. [PMID: 27180248 PMCID: PMC4978766 DOI: 10.1007/s00221-016-4672-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2016] [Accepted: 05/05/2016] [Indexed: 11/30/2022]
Abstract
The aim of this study was to explore how people use egocentric (i.e., with respect to their body) and allocentric (i.e., with respect to another element in the environment) references in combination with coordinate (metric) or categorical (abstract) spatial information to identify a target element. Participants were asked to memorize triads of 3D objects or 2D figures, and immediately or after a delay of 5 s, they had to verbally indicate what was the object/figure: (1) closest/farthest to them (egocentric coordinate task); (2) on their right/left (egocentric categorical task); (3) closest/farthest to another object/figure (allocentric coordinate task); (4) on the right/left of another object/figure (allocentric categorical task). Results showed that the use of 2D figures favored categorical judgments over the coordinate ones with either an egocentric or an allocentric reference frame, whereas the use of 3D objects specifically favored egocentric coordinate judgments rather than the allocentric ones. Furthermore, egocentric judgments were more accurate than allocentric judgments when the response was Immediate rather than delayed and 3D objects rather than 2D figures were used. This pattern of results is discussed in the light of the functional roles attributed to the frames of reference and spatial relations by relevant theories of visuospatial processing.
Collapse
Affiliation(s)
- Francesco Ruotolo
- Helmholtz Institute, Experimental Psychology, Utrecht University, Utrecht, The Netherlands. .,Laboratory of Cognitive Science and Immersive Virtual Reality, Department of Psychology, Second University of Naples, Caserta, Italy.
| | - Tina Iachini
- Laboratory of Cognitive Science and Immersive Virtual Reality, Department of Psychology, Second University of Naples, Caserta, Italy
| | - Gennaro Ruggiero
- Laboratory of Cognitive Science and Immersive Virtual Reality, Department of Psychology, Second University of Naples, Caserta, Italy
| | - Ineke J M van der Ham
- Faculty of Social and Behavioral Sciences, Leiden University, Leiden, The Netherlands
| | - Albert Postma
- Helmholtz Institute, Experimental Psychology, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
5
|
van der Ham IJ, Postma A, Laeng B. Lateralized perception: The role of attention in spatial relation processing. Neurosci Biobehav Rev 2014; 45:142-8. [DOI: 10.1016/j.neubiorev.2014.05.006] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Revised: 05/08/2014] [Accepted: 05/13/2014] [Indexed: 10/25/2022]
|
6
|
Franciotti R, D’Ascenzo S, Di Domenico A, Onofrj M, Tommasi L, Laeng B. Focusing narrowly or broadly attention when judging categorical and coordinate spatial relations: a MEG study. PLoS One 2013; 8:e83434. [PMID: 24386197 PMCID: PMC3873295 DOI: 10.1371/journal.pone.0083434] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2013] [Accepted: 11/05/2013] [Indexed: 12/02/2022] Open
Abstract
We measured activity in the dorsal system of the human cortex with magnetoencephalography (MEG) during a matching-to-sample plus cueing paradigm, where participants judged the occurrence of changes in either categorical or coordinate spatial relations (e.g., exchanges of left versus right positions or changes in the relative distances) between images of pairs of animals. The attention window was primed in each trial to be either small or large by using cues that immediately preceded the matching image. In this manner, we could assess the modulatory effects of the scope of attention on the activity of the dorsal system of the human cortex during spatial relations processing. The MEG measurements revealed that large spatial cues yielded greater activations and longer peak latencies in the right inferior parietal lobe for coordinate trials, whereas small cues yielded greater activations and longer peak latencies in the left inferior parietal lobe for categorical trials. The activity in the superior parietal lobe, middle frontal gyrus, and visual cortex, was also modulated by the size of the spatial cues and by the type of spatial relation change. The present results support the theory that the lateralization of each kind of spatial processing hinges on differences in the sizes of regions of space attended to by the two hemispheres. In addition, the present findings are inconsistent with the idea of a right-hemispheric dominance for all kinds of challenging spatial tasks, since response times and accuracy rates showed that the categorical spatial relation task was more difficult than the coordinate task and the cortical activations were overall greater in the left hemisphere than in the right hemisphere.
Collapse
Affiliation(s)
- Raffaella Franciotti
- Department of Neuroscience and Imaging, G. d’Annunzio University, Chieti, Italy
- ITAB, “G. d’Annunzio” University Foundation, Chieti, Italy
- * E-mail:
| | - Stefania D’Ascenzo
- Department of Neuroscience and Imaging, G. d’Annunzio University, Chieti, Italy
| | - Alberto Di Domenico
- Department of Neuroscience and Imaging, G. d’Annunzio University, Chieti, Italy
| | - Marco Onofrj
- Department of Neuroscience and Imaging, G. d’Annunzio University, Chieti, Italy
| | - Luca Tommasi
- Department of Psychology, Humanities and Territory, G. d’Annunzio University, Chieti, Italy
| | - Bruno Laeng
- Department of Psychology, University of Oslo, Oslo, Norway
| |
Collapse
|