1
|
Machida I, Shishikura M, Yamane Y, Sakai K. Representation of Natural Contours by a Neural Population in Monkey V4. eNeuro 2024; 11:ENEURO.0445-23.2024. [PMID: 38423791 PMCID: PMC10946029 DOI: 10.1523/eneuro.0445-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 02/18/2024] [Accepted: 02/22/2024] [Indexed: 03/02/2024] Open
Abstract
The cortical visual area, V4, has been considered to code contours that contribute to the intermediate-level representation of objects. The neural responses to the complex contour features intrinsic to natural contours are expected to clarify the essence of the representation. To approach the cortical coding of natural contours, we investigated the simultaneous coding of multiple contour features in monkey (Macaca fuscata) V4 neurons and their population-level representation. A substantial number of neurons showed significant tuning for two or more features such as curvature and closure, indicating that a substantial number of V4 neurons simultaneously code multiple contour features. A large portion of the neurons responded vigorously to acutely curved contours that surrounded the center of classical receptive field, suggesting that V4 neurons tend to code prominent features of object contours. The analysis of mutual information (MI) between the neural responses and each contour feature showed that most neurons exhibited similar magnitudes for each type of MI, indicating that many neurons showing the responses depended on multiple contour features. We next examined the population-level representation by using multidimensional scaling analysis. The neural preferences to the multiple contour features and that to natural stimuli compared with silhouette stimuli increased along with the primary and secondary axes, respectively, indicating the contribution of the multiple contour features and surface textures in the population responses. Our analyses suggested that V4 neurons simultaneously code multiple contour features in natural images and represent contour and surface properties in population.
Collapse
Affiliation(s)
- Itsuki Machida
- Department of Computer Science, University of Tsukuba, Tsukuba 305-8573, Japan
| | - Motofumi Shishikura
- Department of Computer Science, University of Tsukuba, Tsukuba 305-8573, Japan
| | - Yukako Yamane
- Neural Computation Unit, Okinawa Institute of Science and Technology, Okinawa 904-0495, Japan
| | - Ko Sakai
- Department of Computer Science, University of Tsukuba, Tsukuba 305-8573, Japan
| |
Collapse
|
2
|
DiMattina C. Luminance texture boundaries and luminance step boundaries are segmented using different mechanisms. Vision Res 2022; 190:107968. [PMID: 34794083 PMCID: PMC8712411 DOI: 10.1016/j.visres.2021.107968] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Revised: 10/18/2021] [Accepted: 10/28/2021] [Indexed: 01/03/2023]
Abstract
In natural scenes, two adjacent surfaces may differ in mean luminance without any sharp change in luminance at their boundary, but rather due to different relative proportions of light and dark regions within each surface. We refer to such boundaries as luminance texture boundaries (LTBs), and in this study we investigate whether LTBs are segmented using different mechanisms than luminance step boundaries (LSBs). We develop a novel method to generate luminance texture boundaries from natural uniform textures, and using these natural LTB stimuli in a boundary segmentation task, we find that observers are much more sensitive to identical luminance differences which are defined by textures (LTBs) than by uniform luminance steps (LSBs), consistent with the possibility of different mechanisms. In a second and third set of experiments, we characterize observer performance segmenting natural LTBs in the presence of masking LSBs which observers are instructed to ignore. We show that there is very little effect of masking LSBs on LTB segmentation performance. Furthermore, any masking effects we find are far less than those observed in a control experiment where both the masker and target are LSBs, and far less than those predicted by a model assuming identical mechanisms. Finally, we perform a fourth set of boundary segmentation experiments using artificial LTB stimuli comprised of differing proportions of white and black dots on opposite sides of the boundary. We find that these stimuli are also highly robust to masking by supra-threshold LSBs, consistent with our results using natural stimuli, and with our earlier studies using similar stimuli. Taken as a whole, these results suggest that the visual system contains mechanisms well suited to detecting surface boundaries that are robust to interference from luminance differences arising from luminance steps like those formed by cast shadows.
Collapse
Affiliation(s)
- Christopher DiMattina
- Computational Perception Laboratory, Fort Myers, FL, USA 33965-6565,Department of Psychology Florida Gulf Coast University, Fort Myers, FL, USA 33965-6565
| |
Collapse
|
3
|
Nobre ADP, de Melo GM, Gauer G, Wagemans J. Implicit processing during inattentional blindness: A systematic review and meta-analysis. Neurosci Biobehav Rev 2020; 119:355-375. [PMID: 33086130 DOI: 10.1016/j.neubiorev.2020.10.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 08/17/2020] [Accepted: 10/05/2020] [Indexed: 01/25/2023]
Abstract
The occurrence of implicit processing of visual stimuli during inattentional blindness is still a matter of debate. To assess the evidence available in this debate, we conducted a systematic review of articles that explored whether unexpected visual stimuli presented during inattentional blindness are implicitly processed despite not being reported. Additionally, we employed meta-analysis to combine 59 behavioral experiments and investigate the statistical support for such implicit processing across experiments. Results showed that visual stimuli can be processed when unattended and unnoticed. Additionally, we reviewed the measures used to assess participants' awareness of the unexpected stimuli. We also employed meta-analysis to search for differences in awareness of the unexpected stimuli that may result from adopting distinct criteria to categorize participants as aware or unaware. The results showed that the overall effect of awareness changed depending on whether more demanding or less demanding measures of awareness were employed. This suggests that the choice of awareness measure may influence conclusions about whether processing of the US is implicit or explicit. We discuss the implications of these results for the study of implicit processing and the role of attention in visual cognition.
Collapse
Affiliation(s)
- Alexandre de Pontes Nobre
- Institute of Psychology, Federal University of Rio Grande do Sul, Ramiro Barcelos 2600, room 227, 90035-003, Porto Alegre, Rio Grande do Sul, Brazil; Brain & Cognition, KU Leuven, Tiensestraat 102, box 3711, 3000, Leuven, Belgium.
| | - Gabriela Mueller de Melo
- Institute of Biosciences, University of São Paulo (P)US, Rua do Matão, tv. 14, n° 321, Cidade Universitária, 05508-090, São Paulo, SP, Brazil
| | - Gustavo Gauer
- Institute of Psychology, Federal University of Rio Grande do Sul, Ramiro Barcelos 2600, room 227, 90035-003, Porto Alegre, Rio Grande do Sul, Brazil
| | - Johan Wagemans
- Brain & Cognition, KU Leuven, Tiensestraat 102, box 3711, 3000, Leuven, Belgium
| |
Collapse
|
4
|
Le QV, Le QV, Nishimaru H, Matsumoto J, Takamura Y, Hori E, Maior RS, Tomaz C, Ono T, Nishijo H. A Prototypical Template for Rapid Face Detection Is Embedded in the Monkey Superior Colliculus. Front Syst Neurosci 2020; 14:5. [PMID: 32158382 PMCID: PMC7025518 DOI: 10.3389/fnsys.2020.00005] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2019] [Accepted: 01/20/2020] [Indexed: 01/30/2023] Open
Abstract
Human babies respond preferentially to faces or face-like images. It has been proposed that an innate and rapid face detection system is present at birth before the cortical visual pathway is developed in many species, including primates. However, in primates, the visual area responsible for this process is yet to be unraveled. We hypothesized that the superior colliculus (SC) that receives direct and indirect retinal visual inputs may serve as an innate rapid face-detection system in primates. To test this hypothesis, we examined the responsiveness of monkey SC neurons to first-order information of faces required for face detection (basic spatial layout of facial features including eyes, nose, and mouth), by analyzing neuronal responses to line drawing images of: (1) face-like patterns with contours and properly placed facial features; (2) non-face patterns including face contours only; and (3) nonface random patterns with contours and randomly placed face features. Here, we show that SC neurons respond stronger and faster to upright and inverted face-like patterns compared to the responses to nonface patterns, regardless of contrast polarity and contour shapes. Furthermore, SC neurons with central receptive fields (RFs) were more selective to face-like patterns. In addition, the population activity of SC neurons with central RFs can discriminate face-like patterns from nonface patterns as early as 50 ms after the stimulus onset. Our results provide strong neurophysiological evidence for the involvement of the primate SC in face detection and suggest the existence of a broadly tuned template for face detection in the subcortical visual pathway.
Collapse
Affiliation(s)
- Quang Van Le
- System Emotional Science, Faculty of Medicine, University of Toyama, Toyama, Japan
| | - Quan Van Le
- System Emotional Science, Faculty of Medicine, University of Toyama, Toyama, Japan
| | - Hiroshi Nishimaru
- System Emotional Science, Faculty of Medicine, University of Toyama, Toyama, Japan
| | - Jumpei Matsumoto
- System Emotional Science, Faculty of Medicine, University of Toyama, Toyama, Japan
| | - Yusaku Takamura
- System Emotional Science, Faculty of Medicine, University of Toyama, Toyama, Japan
| | - Etsuro Hori
- System Emotional Science, Faculty of Medicine, University of Toyama, Toyama, Japan
| | - Rafael S Maior
- Primate Center and Laboratory of Neurosciences and Behavior, Department of Physiological Sciences, Institute of Biology, University of Brasília, Brasilia, Brazil
| | - Carlos Tomaz
- Laboratory of Neuroscience and Behavior, CEUMA University, São Luis, Brazil
| | - Taketoshi Ono
- System Emotional Science, Faculty of Medicine, University of Toyama, Toyama, Japan
| | - Hisao Nishijo
- System Emotional Science, Faculty of Medicine, University of Toyama, Toyama, Japan
| |
Collapse
|
5
|
Modelling face memory reveals task-generalizable representations. Nat Hum Behav 2019; 3:817-826. [PMID: 31209368 DOI: 10.1038/s41562-019-0625-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Accepted: 05/02/2019] [Indexed: 11/08/2022]
Abstract
Current cognitive theories are cast in terms of information-processing mechanisms that use mental representations1-4. For example, people use their mental representations to identify familiar faces under various conditions of pose, illumination and ageing, or to draw resemblance between family members. Yet, the actual information contents of these representations are rarely characterized, which hinders knowledge of the mechanisms that use them. Here, we modelled the three-dimensional representational contents of 4 faces that were familiar to 14 participants as work colleagues. The representational contents were created by reverse-correlating identity information generated on each trial with judgements of the face's similarity to the individual participant's memory of this face. In a second study, testing new participants, we demonstrated the validity of the modelled contents using everyday face tasks that generalize identity judgements to new viewpoints, age and sex. Our work highlights that such models of mental representations are critical to understanding generalization behaviour and its underlying information-processing mechanisms.
Collapse
|
6
|
Papale P, Betta M, Handjaras G, Malfatti G, Cecchetti L, Rampinini A, Pietrini P, Ricciardi E, Turella L, Leo A. Common spatiotemporal processing of visual features shapes object representation. Sci Rep 2019; 9:7601. [PMID: 31110195 PMCID: PMC6527710 DOI: 10.1038/s41598-019-43956-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2018] [Accepted: 04/25/2019] [Indexed: 02/02/2023] Open
Abstract
Biological vision relies on representations of the physical world at different levels of complexity. Relevant features span from simple low-level properties, as contrast and spatial frequencies, to object-based attributes, as shape and category. However, how these features are integrated into coherent percepts is still debated. Moreover, these dimensions often share common biases: for instance, stimuli from the same category (e.g., tools) may have similar shapes. Here, using magnetoencephalography, we revealed the temporal dynamics of feature processing in human subjects attending to objects from six semantic categories. By employing Relative Weights Analysis, we mitigated collinearity between model-based descriptions of stimuli and showed that low-level properties (contrast and spatial frequencies), shape (medial-axis) and category are represented within the same spatial locations early in time: 100–150 ms after stimulus onset. This fast and overlapping processing may result from independent parallel computations, with categorical representation emerging later than the onset of low-level feature processing, yet before shape coding. Categorical information is represented both before and after shape, suggesting a role for this feature in the refinement of categorical matching.
Collapse
Affiliation(s)
- Paolo Papale
- Momilab, IMT School for Advanced Studies Lucca, 55100, Lucca, Italy
| | - Monica Betta
- Momilab, IMT School for Advanced Studies Lucca, 55100, Lucca, Italy
| | | | - Giulia Malfatti
- Center for Mind/Brain Sciences (CIMeC), University of Trento, 38068, Trento, Italy
| | - Luca Cecchetti
- Momilab, IMT School for Advanced Studies Lucca, 55100, Lucca, Italy
| | | | - Pietro Pietrini
- Momilab, IMT School for Advanced Studies Lucca, 55100, Lucca, Italy
| | | | - Luca Turella
- Center for Mind/Brain Sciences (CIMeC), University of Trento, 38068, Trento, Italy
| | - Andrea Leo
- Momilab, IMT School for Advanced Studies Lucca, 55100, Lucca, Italy.
| |
Collapse
|
7
|
Papale P, Leo A, Cecchetti L, Handjaras G, Kay KN, Pietrini P, Ricciardi E. Foreground-Background Segmentation Revealed during Natural Image Viewing. eNeuro 2018; 5:ENEURO.0075-18.2018. [PMID: 29951579 PMCID: PMC6019392 DOI: 10.1523/eneuro.0075-18.2018] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Revised: 05/15/2018] [Accepted: 05/15/2018] [Indexed: 11/21/2022] Open
Abstract
One of the major challenges in visual neuroscience is represented by foreground-background segmentation. Data from nonhuman primates show that segmentation leads to two distinct, but associated processes: the enhancement of neural activity during figure processing (i.e., foreground enhancement) and the suppression of background-related activity (i.e., background suppression). To study foreground-background segmentation in ecological conditions, we introduce a novel method based on parametric modulation of low-level image properties followed by application of simple computational image-processing models. By correlating the outcome of this procedure with human fMRI activity, measured during passive viewing of 334 natural images, we produced easily interpretable "correlation images" from visual populations. Results show evidence of foreground enhancement in all tested regions, from V1 to lateral occipital complex (LOC), while background suppression occurs in V4 and LOC only. Correlation images derived from V4 and LOC revealed a preserved spatial resolution of foreground textures, indicating a richer representation of the salient part of natural images, rather than a simplistic model of object shape. Our results indicate that scene segmentation occurs during natural viewing, even when individuals are not required to perform any particular task.
Collapse
Affiliation(s)
- Paolo Papale
- Molecular Mind Lab, IMT School for Advanced Studies Lucca, Lucca, 55100 Italy
| | - Andrea Leo
- Molecular Mind Lab, IMT School for Advanced Studies Lucca, Lucca, 55100 Italy
| | - Luca Cecchetti
- Molecular Mind Lab, IMT School for Advanced Studies Lucca, Lucca, 55100 Italy
| | - Giacomo Handjaras
- Molecular Mind Lab, IMT School for Advanced Studies Lucca, Lucca, 55100 Italy
| | - Kendrick N. Kay
- Center for Magnetic Resonance Research, Department of Radiology, University of Minnesota, Twin Cities, Minneapolis, MN, 55455
| | - Pietro Pietrini
- Molecular Mind Lab, IMT School for Advanced Studies Lucca, Lucca, 55100 Italy
| | - Emiliano Ricciardi
- Molecular Mind Lab, IMT School for Advanced Studies Lucca, Lucca, 55100 Italy
| |
Collapse
|
8
|
Neri P. Object segmentation controls image reconstruction from natural scenes. PLoS Biol 2017; 15:e1002611. [PMID: 28827801 PMCID: PMC5565198 DOI: 10.1371/journal.pbio.1002611] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2016] [Accepted: 07/25/2017] [Indexed: 11/22/2022] Open
Abstract
The structure of the physical world projects images onto our eyes. However, those images are often poorly representative of environmental structure: well-defined boundaries within the eye may correspond to irrelevant features of the physical world, while critical features of the physical world may be nearly invisible at the retinal projection. The challenge for the visual cortex is to sort these two types of features according to their utility in ultimately reconstructing percepts and interpreting the constituents of the scene. We describe a novel paradigm that enabled us to selectively evaluate the relative role played by these two feature classes in signal reconstruction from corrupted images. Our measurements demonstrate that this process is quickly dominated by the inferred structure of the environment, and only minimally controlled by variations of raw image content. The inferential mechanism is spatially global and its impact on early visual cortex is fast. Furthermore, it retunes local visual processing for more efficient feature extraction without altering the intrinsic transduction noise. The basic properties of this process can be partially captured by a combination of small-scale circuit models and large-scale network architectures. Taken together, our results challenge compartmentalized notions of bottom-up/top-down perception and suggest instead that these two modes are best viewed as an integrated perceptual mechanism. Biological vision is designed to discover the structure of the environment around us. To do this, it relies on ambiguous and often misleading information from the eyes: the boundary of a critical object may be invisible against a background of similar appearance, and may be overlooked in favour of the sharp contour projected by an irrelevant shadow. It remains unclear how human vision sorts different image features according to their relevance to the layout of objects within the scene. We demonstrate that vision achieves this goal via a specialized perceptual system for object segmentation that is one and the same with the feature extraction system: immediately after information is relayed to cortex by the eyes, the process of reconstructing image content from local features is controlled by a dedicated inferential mechanism that attempts to recover the underlying environmental structure; perception is quickly organized around the operation of this mechanism, which becomes the primary contextual influence on image reconstruction. The integrated nature of this perceptual mechanism defies current notions of separate top-down and bottom-up processes, offering a fresh view of how human vision operates on natural signals.
Collapse
Affiliation(s)
- Peter Neri
- Laboratoire des Systèmes Perceptifs, Département d'études cognitives, Ecole Normale Supérieure, PSL Research University, CNRS, Paris, France
- * E-mail:
| |
Collapse
|
9
|
Gauker C. Three Kinds of Nonconceptual Seeing-as. REVIEW OF PHILOSOPHY AND PSYCHOLOGY 2017; 8:763-779. [PMID: 29104705 PMCID: PMC5660133 DOI: 10.1007/s13164-017-0339-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
It is commonly supposed that perceptual representations in some way embed concepts and that this embedding accounts for the phenomenon of seeing-as. But there are good reasons, which will be reviewed here, to doubt that perceptions embed concepts. The alternative is to suppose that perceptions are marks in a perceptual similarity space that map into locations in an objective quality space. From this point of view, there are at least three sorts of seeing-as. First, in cases of ambiguity resolution (such as the duck-rabbit), the schematicity of the figure leaves us with a choice as to where in perceptual similarity space to place a mark (closer to the marks that represent rabbits or closer to the marks that represent ducks). Second, in cases where expertise affects perception (as when, for example, we learn to distinguish various kinds of tree leaves), the accumulation of perceptual landmarks permits a more precise placement of a mark in perceptual similarity space. Third, extensive experience with an object (e.g., the family dog) allows similarity to that object to serve as an acquired dimension in perceptual similarity space, which in turn affects the relative similarities of other objects.
Collapse
Affiliation(s)
- Christopher Gauker
- Department of Philosophy, Faculty of Cultural and Social Sciences, University of Salzburg, Franziskanergasse 1, 5020 Salzburg, Austria
| |
Collapse
|
10
|
Modality-independent encoding of individual concepts in the left parietal cortex. Neuropsychologia 2017; 105:39-49. [PMID: 28476573 DOI: 10.1016/j.neuropsychologia.2017.05.001] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2016] [Revised: 04/29/2017] [Accepted: 05/02/2017] [Indexed: 02/02/2023]
Abstract
The organization of semantic information in the brain has been mainly explored through category-based models, on the assumption that categories broadly reflect the organization of conceptual knowledge. However, the analysis of concepts as individual entities, rather than as items belonging to distinct superordinate categories, may represent a significant advancement in the comprehension of how conceptual knowledge is encoded in the human brain. Here, we studied the individual representation of thirty concrete nouns from six different categories, across different sensory modalities (i.e., auditory and visual) and groups (i.e., sighted and congenitally blind individuals) in a core hub of the semantic network, the left angular gyrus, and in its neighboring regions within the lateral parietal cortex. Four models based on either perceptual or semantic features at different levels of complexity (i.e., low- or high-level) were used to predict fMRI brain activity using representational similarity encoding analysis. When controlling for the superordinate component, high-level models based on semantic and shape information led to significant encoding accuracies in the intraparietal sulcus only. This region is involved in feature binding and combination of concepts across multiple sensory modalities, suggesting its role in high-level representation of conceptual knowledge. Moreover, when the information regarding superordinate categories is retained, a large extent of parietal cortex is engaged. This result indicates the need to control for the coarse-level categorial organization when performing studies on higher-level processes related to the retrieval of semantic information.
Collapse
|
11
|
Kubilius J, Sleurs C, Wagemans J. Sensitivity to Nonaccidental Configurations of Two-Line Stimuli. Iperception 2017; 8:2041669517699628. [PMID: 28491272 PMCID: PMC5405893 DOI: 10.1177/2041669517699628] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
According to Recognition-By-Components theory, object recognition relies on a specific subset of three-dimensional shapes called geons. In particular, these configurations constitute a powerful cue to three-dimensional object reconstruction because their two-dimensional projection remains viewpoint-invariant. While a large body of literature has demonstrated sensitivity to changes in these so-called nonaccidental configurations, it remains unclear what information is used in establishing such sensitivity. In this study, we explored the possibility that nonaccidental configurations can already be inferred from the basic constituents of objects, namely, their edges. We constructed a set of stimuli composed of two lines corresponding to various nonaccidental properties and configurations underlying the distinction between geons, including collinearity, alignment, curvature of contours, curvature of configuration axis, expansion, cotermination, and junction type. Using a simple visual search paradigm, we demonstrated that participants were faster at detecting targets that differed from distractors in a nonaccidental property than in a metric property. We also found that only some but not all of the observed sensitivity could have resulted from simple low-level properties of our stimuli. Given that such sensitivity emerged from a configuration of only two lines, our results support the view that nonaccidental configurations could be encoded throughout the visual processing hierarchy even in the absence of object context.
Collapse
|
12
|
Kubilius J, Bracci S, Op de Beeck HP. Deep Neural Networks as a Computational Model for Human Shape Sensitivity. PLoS Comput Biol 2016; 12:e1004896. [PMID: 27124699 PMCID: PMC4849740 DOI: 10.1371/journal.pcbi.1004896] [Citation(s) in RCA: 132] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2015] [Accepted: 03/30/2016] [Indexed: 11/19/2022] Open
Abstract
Theories of object recognition agree that shape is of primordial importance, but there is no consensus about how shape might be represented, and so far attempts to implement a model of shape perception that would work with realistic stimuli have largely failed. Recent studies suggest that state-of-the-art convolutional ‘deep’ neural networks (DNNs) capture important aspects of human object perception. We hypothesized that these successes might be partially related to a human-like representation of object shape. Here we demonstrate that sensitivity for shape features, characteristic to human and primate vision, emerges in DNNs when trained for generic object recognition from natural photographs. We show that these models explain human shape judgments for several benchmark behavioral and neural stimulus sets on which earlier models mostly failed. In particular, although never explicitly trained for such stimuli, DNNs develop acute sensitivity to minute variations in shape and to non-accidental properties that have long been implicated to form the basis for object recognition. Even more strikingly, when tested with a challenging stimulus set in which shape and category membership are dissociated, the most complex model architectures capture human shape sensitivity as well as some aspects of the category structure that emerges from human judgments. As a whole, these results indicate that convolutional neural networks not only learn physically correct representations of object categories but also develop perceptually accurate representational spaces of shapes. An even more complete model of human object representations might be in sight by training deep architectures for multiple tasks, which is so characteristic in human development. Shape plays an important role in object recognition. Despite years of research, no models of vision could account for shape understanding as found in human vision of natural images. Given recent successes of deep neural networks (DNNs) in object recognition, we hypothesized that DNNs might in fact learn to capture perceptually salient shape dimensions. Using a variety of stimulus sets, we demonstrate here that the output layers of several DNNs develop representations that relate closely to human perceptual shape judgments. Surprisingly, such sensitivity to shape develops in these models even though they were never explicitly trained for shape processing. Moreover, we show that these models also represent categorical object similarity that follows human semantic judgments, albeit to a lesser extent. Taken together, our results bring forward the exciting idea that DNNs capture not only objective dimensions of stimuli, such as their category, but also their subjective, or perceptual, aspects, such as shape and semantic similarity as judged by humans.
Collapse
Affiliation(s)
- Jonas Kubilius
- Brain and Cognition, University of Leuven (KU Leuven), Leuven, Belgium
- * E-mail: (JK); (HPOdB)
| | - Stefania Bracci
- Brain and Cognition, University of Leuven (KU Leuven), Leuven, Belgium
| | - Hans P. Op de Beeck
- Brain and Cognition, University of Leuven (KU Leuven), Leuven, Belgium
- * E-mail: (JK); (HPOdB)
| |
Collapse
|
13
|
Gilaie-Dotan S. Which visual functions depend on intermediate visual regions? Insights from a case of developmental visual form agnosia. Neuropsychologia 2016. [DOI: 10.1016/j.neuropsychologia.2015.07.023] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
14
|
Rodríguez-Sánchez AJ, Fallah M, Leonardis A. Editorial: Hierarchical Object Representations in the Visual Cortex and Computer Vision. Front Comput Neurosci 2015; 9:142. [PMID: 26635595 PMCID: PMC4653288 DOI: 10.3389/fncom.2015.00142] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2015] [Accepted: 11/06/2015] [Indexed: 11/29/2022] Open
Affiliation(s)
- Antonio J Rodríguez-Sánchez
- Intelligent and Interactive Systems, Department of Computer Science, University of Innsbruck Innsbruck, Austria
| | - Mazyar Fallah
- Visual Perception and Attention Laboratory, Centre for Vision Research, School of Kinesiology and Health Science, York University Toronto, ON, Canada
| | - Aleš Leonardis
- School of Computer Science, University of Birmingham Birmingham, UK
| |
Collapse
|