1
|
Bogadhi AR, Hafed ZM. Express detection of visual objects by primate superior colliculus neurons. Sci Rep 2023; 13:21730. [PMID: 38066070 PMCID: PMC10709564 DOI: 10.1038/s41598-023-48979-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 12/02/2023] [Indexed: 12/18/2023] Open
Abstract
Primate superior colliculus (SC) neurons exhibit visual feature tuning properties and are implicated in a subcortical network hypothesized to mediate fast threat and/or conspecific detection. However, the mechanisms through which SC neurons contribute to peripheral object detection, for supporting rapid orienting responses, remain unclear. Here we explored whether, and how quickly, SC neurons detect real-life object stimuli. We presented experimentally-controlled gray-scale images of seven different object categories, and their corresponding luminance- and spectral-matched image controls, within the extrafoveal response fields of SC neurons. We found that all of our functionally-identified SC neuron types preferentially detected real-life objects even in their very first stimulus-evoked visual bursts. Intriguingly, even visually-responsive motor-related neurons exhibited such robust early object detection. We further identified spatial frequency information in visual images as an important, but not exhaustive, source for the earliest (within 100 ms) but not for the late (after 100 ms) component of object detection by SC neurons. Our results demonstrate rapid and robust detection of extrafoveal visual objects by the SC. Besides supporting recent evidence that even SC saccade-related motor bursts can preferentially represent visual objects, these results reveal a plausible mechanism through which rapid orienting responses to extrafoveal visual objects can be mediated.
Collapse
Affiliation(s)
- Amarender R Bogadhi
- Werner Reichardt Centre for Integrative Neuroscience, University of Tübingen, Otfried-Müller Str. 25, 72076, Tübingen, Germany
- Hertie Institute for Clinical Brain Research, University of Tübingen, 72076, Tübingen, Germany
- Central Nervous System Diseases Research, Boehringer Ingelheim Pharma GmbH & Co. KG, 88400, Biberach, Germany
| | - Ziad M Hafed
- Werner Reichardt Centre for Integrative Neuroscience, University of Tübingen, Otfried-Müller Str. 25, 72076, Tübingen, Germany.
- Hertie Institute for Clinical Brain Research, University of Tübingen, 72076, Tübingen, Germany.
| |
Collapse
|
2
|
Xu Y. Parietal-driven visual working memory representation in occipito-temporal cortex. Curr Biol 2023; 33:4516-4523.e5. [PMID: 37741281 PMCID: PMC10615870 DOI: 10.1016/j.cub.2023.08.080] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 07/24/2023] [Accepted: 08/25/2023] [Indexed: 09/25/2023]
Abstract
Human fMRI studies have documented extensively that the content of visual working memory (VWM) can be reliably decoded from fMRI voxel response patterns during the delay period in both the occipito-temporal cortex (OTC), including early visual areas (EVC), and the posterior parietal cortex (PPC).1,2,3,4 Further work has revealed that VWM signal in OTC is largely sustained by feedback from associative areas such as prefrontal cortex (PFC) and PPC.4,5,6,7,8,9 It is unclear, however, if feedback during VWM simply restores sensory representations initially formed in OTC or if it can reshape the representational content of OTC during VWM delay. Taking advantage of a recent finding showing that object representational geometry differs between OTC and PPC in perception,10 here we find that, during VWM delay, the object representational geometry in OTC becomes more aligned with that of PPC during perception than with itself during perception. This finding supports the role of feedback in shaping the content of VWM in OTC, with the VWM content of OTC more determined by information retained in PPC than by the sensory information initially encoded in OTC.
Collapse
Affiliation(s)
- Yaoda Xu
- Department of Psychology, Yale University, 100 College Street, New Haven, CT 06510, USA.
| |
Collapse
|
3
|
An expanded neural framework for shape perception. Trends Cogn Sci 2023; 27:212-213. [PMID: 36635181 DOI: 10.1016/j.tics.2022.12.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 12/02/2022] [Indexed: 01/12/2023]
|
4
|
Xu Y. Global object shape representations in the primate brain. Trends Cogn Sci 2023; 27:210-211. [PMID: 36635178 DOI: 10.1016/j.tics.2022.11.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 11/02/2022] [Indexed: 01/12/2023]
Affiliation(s)
- Yaoda Xu
- Yale University, New Haven, CT 06520, USA.
| |
Collapse
|
5
|
Yargholi E, Hossein-Zadeh GA, Vaziri-Pashkam M. Two distinct networks containing position-tolerant representations of actions in the human brain. Cereb Cortex 2023; 33:1462-1475. [PMID: 35511702 PMCID: PMC10310977 DOI: 10.1093/cercor/bhac149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Revised: 03/24/2022] [Accepted: 03/25/2022] [Indexed: 11/13/2022] Open
Abstract
Humans can recognize others' actions in the social environment. This action recognition ability is rarely hindered by the movement of people in the environment. The neural basis of this position tolerance for observed actions is not fully understood. Here, we aimed to identify brain regions capable of generalizing representations of actions across different positions and investigate the representational content of these regions. In a functional magnetic resonance imaging experiment, participants viewed point-light displays of different human actions. Stimuli were presented in either the upper or the lower visual field. Multivariate pattern analysis and a surface-based searchlight approach were employed to identify brain regions that contain position-tolerant action representation: Classifiers were trained with patterns in response to stimuli presented in one position and were tested with stimuli presented in another position. Results showed above-chance classification in the left and right lateral occipitotemporal cortices, right intraparietal sulcus, and right postcentral gyrus. Further analyses exploring the representational content of these regions showed that responses in the lateral occipitotemporal regions were more related to subjective judgments, while those in the parietal regions were more related to objective measures. These results provide evidence for two networks that contain abstract representations of human actions with distinct representational content.
Collapse
Affiliation(s)
- Elahé Yargholi
- School of Cognitive Sciences, Institute for Research in Fundamental Sciences, Tehran 1956836484, Iran
- Laboratory of Biological Psychology, Department of Brain and Cognition, Leuven Brain Institute, Katholieke Universiteit Leuven, Leuven 3714, Belgium
| | - Gholam-Ali Hossein-Zadeh
- School of Cognitive Sciences, Institute for Research in Fundamental Sciences, Tehran 1956836484, Iran
- School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran 1439957131, Iran
| | - Maryam Vaziri-Pashkam
- Laboratory of Brain and Cognition, National Institute of Mental Health (NIMH), Bethesda, MD 20814, United States
| |
Collapse
|
6
|
Disentangling Object Category Representations Driven by Dynamic and Static Visual Input. J Neurosci 2023; 43:621-634. [PMID: 36639892 PMCID: PMC9888510 DOI: 10.1523/jneurosci.0371-22.2022] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2022] [Revised: 10/01/2022] [Accepted: 10/06/2022] [Indexed: 12/12/2022] Open
Abstract
Humans can label and categorize objects in a visual scene with high accuracy and speed, a capacity well characterized with studies using static images. However, motion is another cue that could be used by the visual system to classify objects. To determine how motion-defined object category information is processed by the brain in the absence of luminance-defined form information, we created a novel stimulus set of "object kinematograms" to isolate motion-defined signals from other sources of visual information. Object kinematograms were generated by extracting motion information from videos of 6 object categories and applying the motion to limited-lifetime random dot patterns. Using functional magnetic resonance imaging (fMRI) (n = 15, 40% women), we investigated whether category information from the object kinematograms could be decoded within the occipitotemporal and parietal cortex and evaluated whether the information overlapped with category responses to static images from the original videos. We decoded object category for both stimulus formats in all higher-order regions of interest (ROIs). More posterior occipitotemporal and ventral regions showed higher accuracy in the static condition, while more anterior occipitotemporal and dorsal regions showed higher accuracy in the dynamic condition. Further, decoding across the two stimulus formats was possible in all regions. These results demonstrate that motion cues can elicit widespread and robust category responses on par with those elicited by static luminance cues, even in ventral regions of visual cortex that have traditionally been associated with primarily image-defined form processing.SIGNIFICANCE STATEMENT Much research on visual object recognition has focused on recognizing objects in static images. However, motion is a rich source of information that humans might also use to categorize objects. Here, we present the first study to compare neural representations of several animate and inanimate objects when category information is presented in two formats: static cues or isolated dynamic motion cues. Our study shows that, while higher-order brain regions differentially process object categories depending on format, they also contain robust, abstract category representations that generalize across format. These results expand our previous understanding of motion-derived animate and inanimate object category processing and provide useful tools for future research on object category processing driven by multiple sources of visual information.
Collapse
|
7
|
Ayzenberg V, Simmons C, Behrmann M. Temporal asymmetries and interactions between dorsal and ventral visual pathways during object recognition. Cereb Cortex Commun 2023; 4:tgad003. [PMID: 36726794 PMCID: PMC9883614 DOI: 10.1093/texcom/tgad003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 12/30/2022] [Accepted: 01/02/2023] [Indexed: 01/15/2023] Open
Abstract
Despite their anatomical and functional distinctions, there is growing evidence that the dorsal and ventral visual pathways interact to support object recognition. However, the exact nature of these interactions remains poorly understood. Is the presence of identity-relevant object information in the dorsal pathway simply a byproduct of ventral input? Or, might the dorsal pathway be a source of input to the ventral pathway for object recognition? In the current study, we used high-density EEG-a technique with high temporal precision and spatial resolution sufficient to distinguish parietal and temporal lobes-to characterise the dynamics of dorsal and ventral pathways during object viewing. Using multivariate analyses, we found that category decoding in the dorsal pathway preceded that in the ventral pathway. Importantly, the dorsal pathway predicted the multivariate responses of the ventral pathway in a time-dependent manner, rather than the other way around. Together, these findings suggest that the dorsal pathway is a critical source of input to the ventral pathway for object recognition.
Collapse
Affiliation(s)
- Vladislav Ayzenberg
- Neuroscience Institute and Psychology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Claire Simmons
- School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Marlene Behrmann
- Neuroscience Institute and Psychology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Department of Ophthalmology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| |
Collapse
|
8
|
Yoshihara S, Fukiage T, Nishida S. Does training with blurred images bring convolutional neural networks closer to humans with respect to robust object recognition and internal representations? Front Psychol 2023; 14:1047694. [PMID: 36874839 PMCID: PMC9975555 DOI: 10.3389/fpsyg.2023.1047694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Accepted: 01/20/2023] [Indexed: 02/17/2023] Open
Abstract
It has been suggested that perceiving blurry images in addition to sharp images contributes to the development of robust human visual processing. To computationally investigate the effect of exposure to blurry images, we trained convolutional neural networks (CNNs) on ImageNet object recognition with a variety of combinations of sharp and blurred images. In agreement with recent reports, mixed training on blurred and sharp images (B+S training) brings CNNs closer to humans with respect to robust object recognition against a change in image blur. B+S training also slightly reduces the texture bias of CNNs in recognition of shape-texture cue conflict images, but the effect is not strong enough to achieve human-level shape bias. Other tests also suggest that B+S training cannot produce robust human-like object recognition based on global configuration features. Using representational similarity analysis and zero-shot transfer learning, we also show that B+S-Net does not facilitate blur-robust object recognition through separate specialized sub-networks, one network for sharp images and another for blurry images, but through a single network analyzing image features common across sharp and blurry images. However, blur training alone does not automatically create a mechanism like the human brain in which sub-band information is integrated into a common representation. Our analysis suggests that experience with blurred images may help the human brain recognize objects in blurred images, but that alone does not lead to robust, human-like object recognition.
Collapse
Affiliation(s)
- Sou Yoshihara
- Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Kyoto, Japan
| | - Taiki Fukiage
- NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, Atsugi, Japan
| | - Shin'ya Nishida
- Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Kyoto, Japan.,NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, Atsugi, Japan
| |
Collapse
|
9
|
Ayzenberg V, Behrmann M. Does the brain's ventral visual pathway compute object shape? Trends Cogn Sci 2022; 26:1119-1132. [PMID: 36272937 DOI: 10.1016/j.tics.2022.09.019] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 09/22/2022] [Accepted: 09/26/2022] [Indexed: 11/11/2022]
Abstract
A rich behavioral literature has shown that human object recognition is supported by a representation of shape that is tolerant to variations in an object's appearance. Such 'global' shape representations are achieved by describing objects via the spatial arrangement of their local features, or structure, rather than by the appearance of the features themselves. However, accumulating evidence suggests that the ventral visual pathway - the primary substrate underlying object recognition - may not represent global shape. Instead, ventral representations may be better described as a basis set of local image features. We suggest that this evidence forces a reevaluation of the role of the ventral pathway in object perception and posits a broader network for shape perception that encompasses contributions from the dorsal pathway.
Collapse
Affiliation(s)
- Vladislav Ayzenberg
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA; Psychology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
| | - Marlene Behrmann
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA; Psychology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA; The Department of Ophthalmology, University of Pittsburgh, Pittsburgh, PA 15260, USA.
| |
Collapse
|
10
|
Mocz V, Vaziri-Pashkam M, Chun M, Xu Y. Predicting Identity-Preserving Object Transformations in Human Posterior Parietal Cortex and Convolutional Neural Networks. J Cogn Neurosci 2022; 34:2406-2435. [PMID: 36122358 PMCID: PMC9988239 DOI: 10.1162/jocn_a_01916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Previous research shows that, within human occipito-temporal cortex (OTC), we can use a general linear mapping function to link visual object responses across nonidentity feature changes, including Euclidean features (e.g., position and size) and non-Euclidean features (e.g., image statistics and spatial frequency). Although the learned mapping is capable of predicting responses of objects not included in training, these predictions are better for categories included than those not included in training. These findings demonstrate a near-orthogonal representation of object identity and nonidentity features throughout human OTC. Here, we extended these findings to examine the mapping across both Euclidean and non-Euclidean feature changes in human posterior parietal cortex (PPC), including functionally defined regions in inferior and superior intraparietal sulcus. We additionally examined responses in five convolutional neural networks (CNNs) pretrained with object classification, as CNNs are considered as the current best model of the primate ventral visual system. We separately compared results from PPC and CNNs with those of OTC. We found that a linear mapping function could successfully link object responses in different states of nonidentity transformations in human PPC and CNNs for both Euclidean and non-Euclidean features. Overall, we found that object identity and nonidentity features are represented in a near-orthogonal, rather than complete-orthogonal, manner in PPC and CNNs, just like they do in OTC. Meanwhile, some differences existed among OTC, PPC, and CNNs. These results demonstrate the similarities and differences in how visual object information across an identity-preserving image transformation may be represented in OTC, PPC, and CNNs.
Collapse
|
11
|
Xu Y, Vaziri-Pashkam M. Understanding transformation tolerant visual object representations in the human brain and convolutional neural networks. Neuroimage 2022; 263:119635. [PMID: 36116617 PMCID: PMC11283825 DOI: 10.1016/j.neuroimage.2022.119635] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 09/12/2022] [Accepted: 09/14/2022] [Indexed: 11/16/2022] Open
Abstract
Forming transformation-tolerant object representations is critical to high-level primate vision. Despite its significance, many details of tolerance in the human brain remain unknown. Likewise, despite the ability of convolutional neural networks (CNNs) to exhibit human-like object categorization performance, whether CNNs form tolerance similar to that of the human brain is unknown. Here we provide the first comprehensive documentation and comparison of three tolerance measures in the human brain and CNNs. We measured fMRI responses from human ventral visual areas to real-world objects across both Euclidean and non-Euclidean feature changes. In single fMRI voxels in higher visual areas, we observed robust object response rank-order preservation across feature changes. This is indicative of functional smoothness in tolerance at the fMRI meso-scale level that has never been reported before. At the voxel population level, we found highly consistent object representational structure across feature changes towards the end of ventral processing. Rank-order preservation, consistency, and a third tolerance measure, cross-decoding success (i.e., a linear classifier's ability to generalize performance across feature changes) showed an overall tight coupling. These tolerance measures were in general lower for Euclidean than non-Euclidean feature changes in lower visual areas, but increased over the course of ventral processing for all feature changes. These characteristics of tolerance, however, were absent in eight CNNs pretrained with ImageNet images with varying network architecture, depth, the presence/absence of recurrent processing, or whether a network was pretrained with the original or stylized ImageNet images that encouraged shape processing. CNNs do not appear to develop the same kind of tolerance as the human brain over the course of visual processing.
Collapse
Affiliation(s)
- Yaoda Xu
- Psychology Department, Yale University, New Haven, CT 06520, USA.
| | | |
Collapse
|
12
|
Adámek P, Langová V, Horáček J. Early-stage visual perception impairment in schizophrenia, bottom-up and back again. SCHIZOPHRENIA (HEIDELBERG, GERMANY) 2022; 8:27. [PMID: 35314712 PMCID: PMC8938488 DOI: 10.1038/s41537-022-00237-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 02/17/2022] [Indexed: 01/01/2023]
Abstract
Visual perception is one of the basic tools for exploring the world. However, in schizophrenia, this modality is disrupted. So far, there has been no clear answer as to whether the disruption occurs primarily within the brain or in the precortical areas of visual perception (the retina, visual pathways, and lateral geniculate nucleus [LGN]). A web-based comprehensive search of peer-reviewed journals was conducted based on various keyword combinations including schizophrenia, saliency, visual cognition, visual pathways, retina, and LGN. Articles were chosen with respect to topic relevance. Searched databases included Google Scholar, PubMed, and Web of Science. This review describes the precortical circuit and the key changes in biochemistry and pathophysiology that affect the creation and characteristics of the retinal signal as well as its subsequent modulation and processing in other parts of this circuit. Changes in the characteristics of the signal and the misinterpretation of visual stimuli associated with them may, as a result, contribute to the development of schizophrenic disease.
Collapse
Affiliation(s)
- Petr Adámek
- Third Faculty of Medicine, Charles University, Prague, Czech Republic. .,Center for Advanced Studies of Brain and Consciousness, National Institute of Mental Health, Klecany, Czech Republic.
| | - Veronika Langová
- Third Faculty of Medicine, Charles University, Prague, Czech Republic.,Center for Advanced Studies of Brain and Consciousness, National Institute of Mental Health, Klecany, Czech Republic
| | - Jiří Horáček
- Third Faculty of Medicine, Charles University, Prague, Czech Republic.,Center for Advanced Studies of Brain and Consciousness, National Institute of Mental Health, Klecany, Czech Republic
| |
Collapse
|
13
|
Vaziri-Pashkam M, Conway BR. How The visual system turns things the right way up. Cogn Neuropsychol 2022; 39:54-57. [PMID: 35624546 PMCID: PMC10759311 DOI: 10.1080/02643294.2022.2073808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 03/09/2022] [Accepted: 04/20/2022] [Indexed: 11/03/2022]
Affiliation(s)
- Maryam Vaziri-Pashkam
- Laboratory of Brain and Cognition, National Institute of Mental Health, Bethesda, MD, USA
| | - Bevil R. Conway
- Laboratory of Brain and Cognition, National Institute of Mental Health, Bethesda, MD, USA
- Laboratory of Sensorimotor Research, National Eye Institute, Bethesda, MD, USA
| |
Collapse
|
14
|
Mocz V, Vaziri-Pashkam M, Chun MM, Xu Y. Predicting Identity-Preserving Object Transformations across the Human Ventral Visual Stream. J Neurosci 2021; 41:7403-7419. [PMID: 34253629 PMCID: PMC8412993 DOI: 10.1523/jneurosci.2137-20.2021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 06/25/2021] [Accepted: 07/01/2021] [Indexed: 11/21/2022] Open
Abstract
In everyday life, we have no trouble categorizing objects varying in position, size, and orientation. Previous fMRI research shows that higher-level object processing regions in the human lateral occipital cortex may link object responses from different affine states (i.e., size and viewpoint) through a general linear mapping function capable of predicting responses to novel objects. In this study, we extended this approach to examine the mapping for both Euclidean (e.g., position and size) and non-Euclidean (e.g., image statistics and spatial frequency) transformations across the human ventral visual processing hierarchy, including areas V1, V2, V3, V4, ventral occipitotemporal cortex, and lateral occipitotemporal cortex. The predicted pattern generated from a linear mapping function could capture a significant amount of the changes associated with the transformations throughout the ventral visual stream. The derived linear mapping functions were not category independent as performance was better for the categories included than those not included in training and better between two similar versus two dissimilar categories in both lower and higher visual regions. Consistent with object representations being stronger in higher than in lower visual regions, pattern selectivity and object category representational structure were somewhat better preserved in the predicted patterns in higher than in lower visual regions. There were no notable differences between Euclidean and non-Euclidean transformations. These findings demonstrate a near-orthogonal representation of object identity and these nonidentity features throughout the human ventral visual processing pathway with these nonidentity features largely untangled from the identity features early in visual processing.SIGNIFICANCE STATEMENT Presently we still do not fully understand how object identity and nonidentity (e.g., position, size) information are simultaneously represented in the primate ventral visual system to form invariant representations. Previous work suggests that the human lateral occipital cortex may be linking different affine states of object representations through general linear mapping functions. Here, we show that across the entire human ventral processing pathway, we could link object responses in different states of nonidentity transformations through linear mapping functions for both Euclidean and non-Euclidean transformations. These mapping functions are not identity independent, suggesting that object identity and nonidentity features are represented in a near rather than a completely orthogonal manner.
Collapse
Affiliation(s)
- Viola Mocz
- Visual Cognitive Neuroscience Lab, Department of Psychology, Yale University, New Haven, Connecticut 06520
| | - Maryam Vaziri-Pashkam
- Laboratory of Brain and Cognition, National Institute of Mental Health, Bethesda, Maryland 20892
| | - Marvin M Chun
- Visual Cognitive Neuroscience Lab, Department of Psychology, Yale University, New Haven, Connecticut 06520
- Department of Neuroscience, Yale School of Medicine, New Haven, Connecticut 06520
| | - Yaoda Xu
- Visual Cognitive Neuroscience Lab, Department of Psychology, Yale University, New Haven, Connecticut 06520
| |
Collapse
|
15
|
Xu Y, Vaziri-Pashkam M. Limits to visual representational correspondence between convolutional neural networks and the human brain. Nat Commun 2021; 12:2065. [PMID: 33824315 PMCID: PMC8024324 DOI: 10.1038/s41467-021-22244-7] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Accepted: 03/05/2021] [Indexed: 02/01/2023] Open
Abstract
Convolutional neural networks (CNNs) are increasingly used to model human vision due to their high object categorization capabilities and general correspondence with human brain responses. Here we evaluate the performance of 14 different CNNs compared with human fMRI responses to natural and artificial images using representational similarity analysis. Despite the presence of some CNN-brain correspondence and CNNs' impressive ability to fully capture lower level visual representation of real-world objects, we show that CNNs do not fully capture higher level visual representations of real-world objects, nor those of artificial objects, either at lower or higher levels of visual representations. The latter is particularly critical, as the processing of both real-world and artificial visual stimuli engages the same neural circuits. We report similar results regardless of differences in CNN architecture, training, or the presence of recurrent processing. This indicates some fundamental differences exist in how the brain and CNNs represent visual information.
Collapse
Affiliation(s)
- Yaoda Xu
- Psychology Department, Yale University, New Haven, CT, USA.
| | - Maryam Vaziri-Pashkam
- Laboratory of Brain and Cognition, National Institute of Mental Health, Bethesda, MD, USA
| |
Collapse
|
16
|
Examining the Coding Strength of Object Identity and Nonidentity Features in Human Occipito-Temporal Cortex and Convolutional Neural Networks. J Neurosci 2021; 41:4234-4252. [PMID: 33789916 DOI: 10.1523/jneurosci.1993-20.2021] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 03/12/2021] [Accepted: 03/15/2021] [Indexed: 12/17/2022] Open
Abstract
A visual object is characterized by multiple visual features, including its identity, position and size. Despite the usefulness of identity and nonidentity features in vision and their joint coding throughout the primate ventral visual processing pathway, they have so far been studied relatively independently. Here in both female and male human participants, the coding of identity and nonidentity features was examined together across the human ventral visual pathway. The nonidentity features tested included two Euclidean features (position and size) and two non-Euclidean features (image statistics and spatial frequency (SF) content of an image). Overall, identity representation increased and nonidentity feature representation decreased along the ventral visual pathway, with identity outweighing the non-Euclidean but not the Euclidean features at higher levels of visual processing. In 14 convolutional neural networks (CNNs) pretrained for object categorization with varying architecture, depth, and with/without recurrent processing, nonidentity feature representation showed an initial large increase from early to mid-stage of processing, followed by a decrease at later stages of processing, different from brain responses. Additionally, from lower to higher levels of visual processing, position became more underrepresented and image statistics and SF became more overrepresented compared with identity in CNNs than in the human brain. Similar results were obtained in a CNN trained with stylized images that emphasized shape representations. Overall, by measuring the coding strength of object identity and nonidentity features together, our approach provides a new tool for characterizing feature coding in the human brain and the correspondence between the brain and CNNs.SIGNIFICANCE STATEMENT This study examined the coding strength of object identity and four types of nonidentity features along the human ventral visual processing pathway and compared brain responses with those of 14 convolutional neural networks (CNNs) pretrained to perform object categorization. Overall, identity representation increased and nonidentity feature representation decreased along the ventral visual pathway, with some notable differences among the different nonidentity features. CNNs differed from the brain in a number of aspects in their representations of identity and nonidentity features over the course of visual processing. Our approach provides a new tool for characterizing feature coding in the human brain and the correspondence between the brain and CNNs.
Collapse
|
17
|
Abstract
Recent work has highlighted the role of early visual areas in visual working memory (VWM) storage and put forward a sensory storage account of VWM. Using a distractor interference paradigm, however, we previolsy showed that the contribution of early visual areas to VWM storage may not be essential. Instead, higher cortical regions such as the posterior parietal cortex may play a more significant role in VWM storage. This is consistent with reviews of other available behavioral, neuroimaging and neurophysiology results. Recently, a number of studies brought forward new evidence regarding this debate. Here I review these new pieces of evidence in detail and show that there is still no strong and definitive evidence supporting an essential role of the early visual areas in VWM storage. Instead, converging evidence suggests that early visual areas may contribute to the decision stage of a VWM task by facilitating target and probe comparison. Aside from further clarifying this debate, it is also important to note that whether or not VWM storage uses a sensory code depends on how it is defined, and that behavioral interactions between VWM and perception tasks do not necessarily support the involvement of sensory regions in VWM storage.
Collapse
|
18
|
Roles of Category, Shape, and Spatial Frequency in Shaping Animal and Tool Selectivity in the Occipitotemporal Cortex. J Neurosci 2020; 40:5644-5657. [PMID: 32527983 PMCID: PMC7363473 DOI: 10.1523/jneurosci.3064-19.2020] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2019] [Revised: 05/29/2020] [Accepted: 06/02/2020] [Indexed: 11/21/2022] Open
Abstract
Does the nature of representation in the category-selective regions in the occipitotemporal cortex reflect visual or conceptual properties? Previous research showed that natural variability in visual features across categories, quantified by image gist statistics, is highly correlated with the different neural responses observed in the occipitotemporal cortex. Using fMRI, we examined whether category selectivity for animals and tools would remain, when image gist statistics were comparable across categories. Critically, we investigated how category, shape, and spatial frequency may contribute to the category selectivity in the animal- and tool-selective regions. Female and male human observers viewed low- or high-passed images of round or elongated animals and tools that shared comparable gist statistics in the main experiment, and animal and tool images of naturally varied gist statistics in a separate localizer. Univariate analysis revealed robust category-selective responses for images with comparable gist statistics across categories. Successful classification for category (animals/tools), shape (round/elongated), and spatial frequency (low/high) was also observed, with highest classification accuracy for category. Representational similarity analyses further revealed that the activation patterns in the animal-selective regions were most correlated with a model that represents only animal information, whereas the activation patterns in the tool-selective regions were most correlated with a model that represents only tool information, suggesting that these regions selectively represent information of only animals or tools. Together, in addition to visual features, the distinction between animal and tool representations in the occipitotemporal cortex is likely shaped by higher-level conceptual influences such as categorization or interpretation of visual inputs. SIGNIFICANCE STATEMENT Since different categories often vary systematically in both visual and conceptual features, it remains unclear what kinds of information determine category-selective responses in the occipitotemporal cortex. To minimize the influences of low- and mid-level visual features, here we used a diverse image set of animals and tools that shared comparable gist statistics. We manipulated category (animals/tools), shape (round/elongated), and spatial frequency (low/high), and found that the representational content of the animal- and tool-selective regions is primarily determined by their preferred categories only, regardless of shape or spatial frequency. Our results show that category-selective responses in the occipitotemporal cortex are influenced by higher-level processing such as categorization or interpretation of visual inputs, and highlight the specificity in these category-selective regions.
Collapse
|
19
|
Xu Y, Vaziri-Pashkam M. Task modulation of the 2-pathway characterization of occipitotemporal and posterior parietal visual object representations. Neuropsychologia 2019; 132:107140. [PMID: 31301350 PMCID: PMC6857731 DOI: 10.1016/j.neuropsychologia.2019.107140] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Revised: 06/24/2019] [Accepted: 07/08/2019] [Indexed: 01/04/2023]
Abstract
Recent studies have reported the existence of rich non-spatial visual object representations in both human and monkey posterior parietal cortex (PPC), similar to those found in occipito-temporal cortex (OTC). Despite this similarity, we recently showed that visual object representation still differ between OTC and PPC in two aspects. In one study, by manipulating whether object shape or color was task relevant, we showed that visual object representations were under greater top-down attention and task control in PPC than in OTC (Vaziri-Pashkam & Xu, 2017, J Neurosci). In another study, using a bottom-up data driven approach, we showed that there exists a large separation between PPC and OTC regions in the representational space, with OTC regions lining up hierarchically along an OTC pathway and PPC regions lining up hierarchically along an orthogonal PPC pathway (Vaziri-Pashkam & Xu, 2019, Cereb Cortex). To understand the interaction of goal-driven visual processing and the two-pathway structure in the representational space, here we performed a set of new analyses of the data from the three experiments of Vaziri-Pashkam and Xu (2017) and directly compared the two-pathway separation of OTC and PPC regions when object shapes were attended and task relevant and when they were not. We found that in all three experiments the correlation of visual object representational structure between superior IPS (a key PPC visual region) and lateral and ventral occipito-temporal regions (higher OTC visual regions) became greater when object shapes were attended than when they were not. This modified the two-pathway structure, with PPC regions moving closer to higher OTC regions and a compression of the PPC pathway towards the OTC pathway in the representational space when shapes were attended. Consistent with this observation, the correlation between neural and behavioral measures of visual representational structure was also higher in superior IPS when shapes were attended than when they were not. By comparing representational structures across experiments and tasks, we further showed that attention to object shape resulted in the formation of more similar object representations in superior IPS across experiments than between the two tasks within the same experiment despite noise and stimulus differences across the experiments. Overall, these results demonstrated that, despite the separation of the OTC and PPC pathways in the representational space, the visual representational structure of PPC is flexible and can be modulated by the task demand. This reaffirms the adaptive nature of visual processing in PPC and further distinguishes it from the more invariant nature of visual processing in OTC.
Collapse
|
20
|
Yildirim I, Wu J, Kanwisher N, Tenenbaum J. An integrative computational architecture for object-driven cortex. Curr Opin Neurobiol 2019; 55:73-81. [PMID: 30825704 PMCID: PMC6548583 DOI: 10.1016/j.conb.2019.01.010] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Revised: 12/24/2018] [Accepted: 01/13/2019] [Indexed: 01/09/2023]
Abstract
Computational architecture for object-driven cortex Objects in motion activate multiple cortical regions in every lobe of the human brain. Do these regions represent a collection of independent systems, or is there an overarching functional architecture spanning all of object-driven cortex? Inspired by recent work in artificial intelligence (AI), machine learning, and cognitive science, we consider the hypothesis that these regions can be understood as a coherent network implementing an integrative computational system that unifies the functions needed to perceive, predict, reason about, and plan with physical objects-as in the paradigmatic case of using or making tools. Our proposal draws on a modeling framework that combines multiple AI methods, including causal generative models, hybrid symbolic-continuous planning algorithms, and neural recognition networks, with object-centric, physics-based representations. We review evidence relating specific components of our proposal to the specific regions that comprise object-driven cortex, and lay out future research directions with the goal of building a complete functional and mechanistic account of this system.
Collapse
Affiliation(s)
- Ilker Yildirim
- Center for Brains, Minds, and Machines, MIT, Cambridge, MA 02138, United States; Department of Brain & Cognitive Science, MIT, Cambridge, MA 02138, United States.
| | - Jiajun Wu
- Center for Brains, Minds, and Machines, MIT, Cambridge, MA 02138, United States; Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02138, United States
| | - Nancy Kanwisher
- Center for Brains, Minds, and Machines, MIT, Cambridge, MA 02138, United States; McGovern Institute for Brain Research, MIT, Cambridge, MA 02138, United States; Department of Brain & Cognitive Science, MIT, Cambridge, MA 02138, United States
| | - Joshua Tenenbaum
- Center for Brains, Minds, and Machines, MIT, Cambridge, MA 02138, United States; McGovern Institute for Brain Research, MIT, Cambridge, MA 02138, United States; Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02138, United States; Department of Brain & Cognitive Science, MIT, Cambridge, MA 02138, United States
| |
Collapse
|