1
|
Koc AN, Urgen BA, Afacan Y. Task-modulated neural responses in scene-selective regions of the human brain. Vision Res 2025; 227:108539. [PMID: 39733756 DOI: 10.1016/j.visres.2024.108539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 10/29/2024] [Accepted: 12/20/2024] [Indexed: 12/31/2024]
Abstract
The study of scene perception is crucial to the understanding of how one interprets and interacts with their environment, and how the environment impacts various cognitive functions. The literature so far has mainly focused on the impact of low-level and categorical properties of scenes and how they are represented in the scene-selective regions in the brain, PPA, RSC, and OPA. However, higher-level scene perception and the impact of behavioral goals is a developing research area. Moreover, the selection of the stimuli has not been systematic and mainly focused on outdoor environments. In this fMRI experiment, we adopted multiple behavioral tasks, selected real-life indoor stimuli with a systematic categorization approach, and used various multivariate analysis techniques to explain the neural modulation of scene perception in the scene-selective regions of the human brain. Participants (N = 21) performed categorization and approach-avoidance tasks during fMRI scans while they were viewing scenes from built environment categories based on different affordances ((i)access and (ii)circulation elements, (iii)restrooms and (iv)eating/seating areas). ROI-based classification analysis revealed that the OPA was significantly successful in decoding scene category regardless of the task, and that the task condition affected category decoding performances of all the scene-selective regions. Model-based representational similarity analysis (RSA) revealed that the activity patterns in scene-selective regions are best explained by task. These results contribute to the literature by extending the task and stimulus content of scene perception research, and uncovering the impact of behavioral goals on the scene-selective regions of the brain.
Collapse
Affiliation(s)
- Aysu Nur Koc
- Department of Psychology, Justus Liebig University Giessen, Giessen, Germany; Interdisciplinary Neuroscience Program, Bilkent University, Ankara, Turkey.
| | - Burcu A Urgen
- Interdisciplinary Neuroscience Program, Bilkent University, Ankara, Turkey; Department of Psychology, Bilkent University, Ankara, Turkey; Aysel Sabuncu Brain Research Center and National Magnetic Resonance Imaging Center, Bilkent University, Ankara, Turkey.
| | - Yasemin Afacan
- Interdisciplinary Neuroscience Program, Bilkent University, Ankara, Turkey; Department of Interior Architecture and Environmental Design, Bilkent University, Ankara, Turkey; Aysel Sabuncu Brain Research Center and National Magnetic Resonance Imaging Center, Bilkent University, Ankara, Turkey.
| |
Collapse
|
2
|
Yao JK, Choo J, Finzi D, Grill-Spector K. Visuospatial computations vary by category and stream and continue to develop in adolescence. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.14.633067. [PMID: 39868259 PMCID: PMC11761743 DOI: 10.1101/2025.01.14.633067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/28/2025]
Abstract
Reading, face recognition, and navigation are supported by visuospatial computations in category-selective regions across ventral, lateral, and dorsal visual streams. However, the nature of visuospatial computations across streams and their development in adolescence remain unknown. Using fMRI and population receptive field (pRF) modeling in adolescents and adults, we estimate pRFs in high-level visual cortex and determine their development. Results reveal that pRF location, size, and visual field coverage vary across category, stream, and hemisphere in both adolescents and adults. While pRF location is mature by adolescence, pRF size and visual field coverage continue to develop - increasing in face-selective and decreasing in place-selective regions - alongside similar development of category selectivity. These findings provide a timeline for differential development of visual functions and suggest that visuospatial computations in high-level visual cortex continue to be optimized to accommodate both category and stream demands through adolescence.
Collapse
Affiliation(s)
- Jewelia K Yao
- Department of Psychology, Stanford University, Stanford, CA 94305
| | - Justin Choo
- Department of Symbolic Systems, Stanford University, Stanford, CA, 94305
| | - Dawn Finzi
- Department of Psychology, Stanford University, Stanford, CA 94305
| | - Kalanit Grill-Spector
- Department of Psychology, Stanford University, Stanford, CA 94305
- Wu Tsai Neuroscience Institute, Stanford University, Stanford, CA 94305
| |
Collapse
|
3
|
Tian X, Song Y, Liu J. Decoding face identity: A reverse-correlation approach using deep learning. Cognition 2024; 254:106008. [PMID: 39550877 DOI: 10.1016/j.cognition.2024.106008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Revised: 10/31/2024] [Accepted: 11/05/2024] [Indexed: 11/19/2024]
Abstract
Face recognition is crucial for social interactions. Traditional approaches primarily rely on subjective judgment, utilizing a pre-selected set of facial features based on literature or intuition to identify critical facial features for face recognition. In this study, we adopted a reverse-correlation approach, aligning responses of a deep convolutional neural network (DCNN) with its internal representations to objectively identify facial features pivotal for face recognition. Specifically, we trained a DCNN, namely VGG-FD, to possess human-like capability in discriminating facial identities. A representational similarity analysis (RSA) was employed to characterize VGG-FD's performance metrics, which was subsequently reverse-correlated with its representations in layers capable of discriminating facial identities. Our analysis revealed a higher likelihood of face pairs being perceived as different identities when their representations significantly differed in areas such as the eyes, eyebrows, or central facial region, suggesting the significance of the eyes as facial parts and the central facial region as an integral of face configuration in face recognition. In summary, our study leveraged DCNNs to identify critical facial features for face discrimination in a hypothesis-neutral, data-driven manner, hereby advocating for the adoption of this new paradigm to explore critical facial features across various face recognition tasks.
Collapse
Affiliation(s)
- Xue Tian
- Faculty of Psychology, Tianjin Normal University, Tianjin 300387, China
| | - Yiying Song
- Beijing Key Laboratory of Applied Experimental Psychology, Faculty of Psychology, Beijing Normal University, Beijing, China.
| | - Jia Liu
- Department of Psychology and Tsinghua Laboratory of Brain & Intelligence, Tsinghua University, Beijing, China.
| |
Collapse
|
4
|
Das S, Mangun GR, Ding M. Perceptual Expertise and Attention: An Exploration using Deep Neural Networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.15.617743. [PMID: 39464001 PMCID: PMC11507720 DOI: 10.1101/2024.10.15.617743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
Perceptual expertise and attention are two important factors that enable superior object recognition and task performance. While expertise enhances knowledge and provides a holistic understanding of the environment, attention allows us to selectively focus on task-related information and suppress distraction. It has been suggested that attention operates differently in experts and in novices, but much remains unknown. This study investigates the relationship between perceptual expertise and attention using convolutional neural networks (CNNs), which are shown to be good models of primate visual pathways. Two CNN models were trained to become experts in either face or scene recognition, and the effect of attention on performance was evaluated in tasks involving complex stimuli, such as superimposed images containing superimposed faces and scenes. The goal was to explore how feature-based attention (FBA) influences recognition within and outside the domain of expertise of the models. We found that each model performed better in its area of expertise-and that FBA further enhanced task performance, but only within the domain of expertise, increasing performance by up to 35% in scene recognition, and 15% in face recognition. However, attention had reduced or negative effects when applied outside the models' expertise domain. Neural unit-level analysis revealed that expertise led to stronger tuning towards category-specific features and sharper tuning curves, as reflected in greater representational dissimilarity between targets and distractors, which, in line with the biased competition model of attention, leads to enhanced performance by reducing competition. These findings highlight the critical role of neural tuning at single as well as network level neural in distinguishing the effects of attention in experts and in novices and demonstrate that CNNs can be used fruitfully as computational models for addressing neuroscience questions not practical with the empirical methods.
Collapse
Affiliation(s)
- Soukhin Das
- Center for Mind and Brain, University of California, Davis
- Department of Psychology, University of California, Davis
| | - G R Mangun
- Center for Mind and Brain, University of California, Davis
- Department of Psychology, University of California, Davis
- Department of Neurology, University of California, Davis
| | - Mingzhou Ding
- Department of Neurology, University of California, Davis
| |
Collapse
|
5
|
Dwivedi K, Sadiya S, Balode MP, Roig G, Cichy RM. Visual features are processed before navigational affordances in the human brain. Sci Rep 2024; 14:5573. [PMID: 38448446 PMCID: PMC10917749 DOI: 10.1038/s41598-024-55652-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Accepted: 02/26/2024] [Indexed: 03/08/2024] Open
Abstract
To navigate through their immediate environment humans process scene information rapidly. How does the cascade of neural processing elicited by scene viewing to facilitate navigational planning unfold over time? To investigate, we recorded human brain responses to visual scenes with electroencephalography and related those to computational models that operationalize three aspects of scene processing (2D, 3D, and semantic information), as well as to a behavioral model capturing navigational affordances. We found a temporal processing hierarchy: navigational affordance is processed later than the other scene features (2D, 3D, and semantic) investigated. This reveals the temporal order with which the human brain computes complex scene information and suggests that the brain leverages these pieces of information to plan navigation.
Collapse
Affiliation(s)
- Kshitij Dwivedi
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
- Department of Computer Science, Goethe University Frankfurt, Frankfurt, Germany
| | - Sari Sadiya
- Department of Computer Science, Goethe University Frankfurt, Frankfurt, Germany.
- Frankfurt Institute for Advanced Studies (FIAS), Frankfurt, Germany.
| | - Marta P Balode
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
- Institute of Neuroinformatics, ETH Zurich and University of Zurich, Zurich, Switzerland
| | - Gemma Roig
- Department of Computer Science, Goethe University Frankfurt, Frankfurt, Germany
- The Hessian Center for Artificial Intelligence (hessian.AI), Darmstadt, Germany
| | - Radoslaw M Cichy
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
| |
Collapse
|
6
|
McMahon E, Bonner MF, Isik L. Hierarchical organization of social action features along the lateral visual pathway. Curr Biol 2023; 33:5035-5047.e8. [PMID: 37918399 PMCID: PMC10841461 DOI: 10.1016/j.cub.2023.10.015] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 09/01/2023] [Accepted: 10/10/2023] [Indexed: 11/04/2023]
Abstract
Recent theoretical work has argued that in addition to the classical ventral (what) and dorsal (where/how) visual streams, there is a third visual stream on the lateral surface of the brain specialized for processing social information. Like visual representations in the ventral and dorsal streams, representations in the lateral stream are thought to be hierarchically organized. However, no prior studies have comprehensively investigated the organization of naturalistic, social visual content in the lateral stream. To address this question, we curated a naturalistic stimulus set of 250 3-s videos of two people engaged in everyday actions. Each clip was richly annotated for its low-level visual features, mid-level scene and object properties, visual social primitives (including the distance between people and the extent to which they were facing), and high-level information about social interactions and affective content. Using a condition-rich fMRI experiment and a within-subject encoding model approach, we found that low-level visual features are represented in early visual cortex (EVC) and middle temporal (MT) area, mid-level visual social features in extrastriate body area (EBA) and lateral occipital complex (LOC), and high-level social interaction information along the superior temporal sulcus (STS). Communicative interactions, in particular, explained unique variance in regions of the STS after accounting for variance explained by all other labeled features. Taken together, these results provide support for representation of increasingly abstract social visual content-consistent with hierarchical organization-along the lateral visual stream and suggest that recognizing communicative actions may be a key computational goal of the lateral visual pathway.
Collapse
Affiliation(s)
- Emalie McMahon
- Department of Cognitive Science, Zanvyl Krieger School of Arts & Sciences, Johns Hopkins University, 237 Krieger Hall, 3400 N. Charles Street, Baltimore, MD 21218, USA.
| | - Michael F Bonner
- Department of Cognitive Science, Zanvyl Krieger School of Arts & Sciences, Johns Hopkins University, 237 Krieger Hall, 3400 N. Charles Street, Baltimore, MD 21218, USA
| | - Leyla Isik
- Department of Cognitive Science, Zanvyl Krieger School of Arts & Sciences, Johns Hopkins University, 237 Krieger Hall, 3400 N. Charles Street, Baltimore, MD 21218, USA; Department of Biomedical Engineering, Whiting School of Engineering, Johns Hopkins University, Suite 400 West, Wyman Park Building, 3400 N. Charles Street, Baltimore, MD 21218, USA
| |
Collapse
|
7
|
Zhuang T, Kabulska Z, Lingnau A. The Representation of Observed Actions at the Subordinate, Basic, and Superordinate Level. J Neurosci 2023; 43:8219-8230. [PMID: 37798129 PMCID: PMC10697398 DOI: 10.1523/jneurosci.0700-22.2023] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Revised: 08/08/2023] [Accepted: 09/06/2023] [Indexed: 10/07/2023] Open
Abstract
Actions can be planned and recognized at different hierarchical levels, ranging from very specific (e.g., to swim backstroke) to very broad (e.g., locomotion). Understanding the corresponding neural representation is an important prerequisite to reveal how our brain flexibly assigns meaning to the world around us. To address this question, we conducted an event-related fMRI study in male and female human participants in which we examined distinct representations of observed actions at the subordinate, basic and superordinate level. Using multiple regression representational similarity analysis (RSA) in predefined regions of interest, we found that the three different taxonomic levels were best captured by patterns of activations in bilateral lateral occipitotemporal cortex (LOTC), showing the highest similarity with the basic level model. A whole-brain multiple regression RSA revealed that information unique to the basic level was captured by patterns of activation in dorsal and ventral portions of the LOTC and in parietal regions. By contrast, the unique information for the subordinate level was limited to bilateral occipitotemporal cortex, while no single cluster was obtained that captured unique information for the superordinate level. The behaviorally established action space was best captured by patterns of activation in the LOTC and superior parietal cortex, and the corresponding neural patterns of activation showed the highest similarity with patterns of activation corresponding to the basic level model. Together, our results suggest that occipitotemporal cortex shows a preference for the basic level model, with flexible access across the subordinate and the basic level.SIGNIFICANCE STATEMENT The human brain captures information at varying levels of abstraction. It is debated which brain regions host representations across different hierarchical levels, with some studies emphasizing parietal and premotor regions, while other studies highlight the role of the lateral occipitotemporal cortex (LOTC). To shed light on this debate, here we examined the representation of observed actions at the three taxonomic levels suggested by Rosch et al. (1976) Our results highlight the role of the LOTC, which hosts a shared representation across the subordinate and the basic level, with the highest similarity with the basic level model. These results shed new light on the hierarchical organization of observed actions and provide insights into the neural basis underlying the basic level advantage.
Collapse
Affiliation(s)
- Tonghe Zhuang
- Faculty of Human Sciences, Institute of Psychology, Chair of Cognitive Neuroscience, University of Regensburg, 93053 Regensburg, Germany
| | - Zuzanna Kabulska
- Faculty of Human Sciences, Institute of Psychology, Chair of Cognitive Neuroscience, University of Regensburg, 93053 Regensburg, Germany
| | - Angelika Lingnau
- Faculty of Human Sciences, Institute of Psychology, Chair of Cognitive Neuroscience, University of Regensburg, 93053 Regensburg, Germany
| |
Collapse
|
8
|
Orima T, Motoyoshi I. Spatiotemporal cortical dynamics for visual scene processing as revealed by EEG decoding. Front Neurosci 2023; 17:1167719. [PMID: 38027518 PMCID: PMC10646306 DOI: 10.3389/fnins.2023.1167719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 10/16/2023] [Indexed: 12/01/2023] Open
Abstract
The human visual system rapidly recognizes the categories and global properties of complex natural scenes. The present study investigated the spatiotemporal dynamics of neural signals involved in visual scene processing using electroencephalography (EEG) decoding. We recorded visual evoked potentials from 11 human observers for 232 natural scenes, each of which belonged to one of 13 natural scene categories (e.g., a bedroom or open country) and had three global properties (naturalness, openness, and roughness). We trained a deep convolutional classification model of the natural scene categories and global properties using EEGNet. Having confirmed that the model successfully classified natural scene categories and the three global properties, we applied Grad-CAM to the EEGNet model to visualize the EEG channels and time points that contributed to the classification. The analysis showed that EEG signals in the occipital electrodes at short latencies (approximately 80 ~ ms) contributed to the classifications, whereas those in the frontal electrodes at relatively long latencies (200 ~ ms) contributed to the classification of naturalness and the individual scene category. These results suggest that different global properties are encoded in different cortical areas and with different timings, and that the combination of the EEGNet model and Grad-CAM can be a tool to investigate both temporal and spatial distribution of natural scene processing in the human brain.
Collapse
Affiliation(s)
- Taiki Orima
- Department of Life Sciences, The University of Tokyo, Tokyo, Japan
- Japan Society for the Promotion of Science, Tokyo, Japan
| | - Isamu Motoyoshi
- Department of Life Sciences, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
9
|
Jiahui G, Feilong M, Visconti di Oleggio Castello M, Nastase SA, Haxby JV, Gobbini MI. Modeling naturalistic face processing in humans with deep convolutional neural networks. Proc Natl Acad Sci U S A 2023; 120:e2304085120. [PMID: 37847731 PMCID: PMC10614847 DOI: 10.1073/pnas.2304085120] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 09/11/2023] [Indexed: 10/19/2023] Open
Abstract
Deep convolutional neural networks (DCNNs) trained for face identification can rival and even exceed human-level performance. The ways in which the internal face representations in DCNNs relate to human cognitive representations and brain activity are not well understood. Nearly all previous studies focused on static face image processing with rapid display times and ignored the processing of naturalistic, dynamic information. To address this gap, we developed the largest naturalistic dynamic face stimulus set in human neuroimaging research (700+ naturalistic video clips of unfamiliar faces). We used this naturalistic dataset to compare representational geometries estimated from DCNNs, behavioral responses, and brain responses. We found that DCNN representational geometries were consistent across architectures, cognitive representational geometries were consistent across raters in a behavioral arrangement task, and neural representational geometries in face areas were consistent across brains. Representational geometries in late, fully connected DCNN layers, which are optimized for individuation, were much more weakly correlated with cognitive and neural geometries than were geometries in late-intermediate layers. The late-intermediate face-DCNN layers successfully matched cognitive representational geometries, as measured with a behavioral arrangement task that primarily reflected categorical attributes, and correlated with neural representational geometries in known face-selective topographies. Our study suggests that current DCNNs successfully capture neural cognitive processes for categorical attributes of faces but less accurately capture individuation and dynamic features.
Collapse
Affiliation(s)
- Guo Jiahui
- Center for Cognitive Neuroscience, Dartmouth College, Hanover, NH03755
| | - Ma Feilong
- Center for Cognitive Neuroscience, Dartmouth College, Hanover, NH03755
| | | | - Samuel A. Nastase
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ08544
| | - James V. Haxby
- Center for Cognitive Neuroscience, Dartmouth College, Hanover, NH03755
| | - M. Ida Gobbini
- Department of Medical and Surgical Sciences, University of Bologna, Bologna40138, Italy
- Istituti di Ricovero e Cura a Carattere Scientifico, Istituto delle Scienze Neurologiche di Bologna, Bologna40139, Italia
| |
Collapse
|
10
|
Magri C, Elmoznino E, Bonner MF. Scene context is predictive of unconstrained object similarity judgments. Cognition 2023; 239:105535. [PMID: 37481806 DOI: 10.1016/j.cognition.2023.105535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 06/21/2023] [Accepted: 06/23/2023] [Indexed: 07/25/2023]
Abstract
What makes objects alike in the human mind? Computational approaches for characterizing object similarity have largely focused on the visual forms of objects or their linguistic associations. However, intuitive notions of object similarity may depend heavily on contextual reasoning-that is, objects may be grouped together in the mind if they occur in the context of similar scenes or events. Using large-scale analyses of natural scene statistics and human behavior, we found that a computational model of the associations between objects and their scene contexts is strongly predictive of how humans spontaneously group objects by similarity. Specifically, we learned contextual prototypes for a diverse set of object categories by taking the average response of a convolutional neural network (CNN) to the scene contexts in which the objects typically occurred. In behavioral experiments, we found that contextual prototypes were strongly predictive of human similarity judgments for a large set of objects and rivaled the performance of models based on CNN representations of the objects themselves or word embeddings for their names. Together, our findings reveal the remarkable degree to which the natural statistics of context predict commonsense notions of object similarity.
Collapse
Affiliation(s)
- Caterina Magri
- Department of Cognitive Science, Johns Hopkins University, 3400 N. Charles St., Baltimore, MD 21218, United States of America
| | - Eric Elmoznino
- Department of Cognitive Science, Johns Hopkins University, 3400 N. Charles St., Baltimore, MD 21218, United States of America
| | - Michael F Bonner
- Department of Cognitive Science, Johns Hopkins University, 3400 N. Charles St., Baltimore, MD 21218, United States of America.
| |
Collapse
|
11
|
Li Z, Dong Q, Hu B, Wu H. Every individual makes a difference: A trinity derived from linking individual brain morphometry, connectivity and mentalising ability. Hum Brain Mapp 2023; 44:3343-3358. [PMID: 37051692 PMCID: PMC10171537 DOI: 10.1002/hbm.26285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 02/01/2023] [Accepted: 03/08/2023] [Indexed: 04/14/2023] Open
Abstract
Mentalising ability, indexed as the ability to understand others' beliefs, feelings, intentions, thoughts and traits, is a pivotal and fundamental component of human social cognition. However, considering the multifaceted nature of mentalising ability, little research has focused on characterising individual differences in different mentalising components. And even less research has been devoted to investigating how the variance in the structural and functional patterns of the amygdala and hippocampus, two vital subcortical regions of the "social brain", are related to inter-individual variability in mentalising ability. Here, as a first step toward filling these gaps, we exploited inter-subject representational similarity analysis (IS-RSA) to assess relationships between amygdala and hippocampal morphometry (surface-based multivariate morphometry statistics, MMS), connectivity (resting-state functional connectivity, rs-FC) and mentalising ability (interactive mentalisation questionnaire [IMQ] scores) across the participants ( N = 24 $$ N=24 $$ ). In IS-RSA, we proposed a novel pipeline, that is, computing patching and pooling operations-based surface distance (CPP-SD), to obtain a decent representation for high-dimensional MMS data. On this basis, we found significant correlations (i.e., second-order isomorphisms) between these three distinct modalities, indicating that a trinity existed in idiosyncratic patterns of brain morphometry, connectivity and mentalising ability. Notably, a region-related mentalising specificity emerged from these associations: self-self and self-other mentalisation are more related to the hippocampus, while other-self mentalisation shows a closer link with the amygdala. Furthermore, by utilising the dyadic regression analysis, we observed significant interactions such that subject pairs with similar morphometry had even greater mentalising similarity if they were also similar in rs-FC. Altogether, we demonstrated the feasibility and illustrated the promise of using IS-RSA to study individual differences, deepening our understanding of how individual brains give rise to their mentalising abilities.
Collapse
Affiliation(s)
- Zhaoning Li
- Centre for Cognitive and Brain Sciences and Department of Psychology, University of Macau, Taipa, China
| | - Qunxi Dong
- School of Medical Technology, Beijing Institute of Technology, Beijing, China
| | - Bin Hu
- School of Medical Technology, Beijing Institute of Technology, Beijing, China
| | - Haiyan Wu
- Centre for Cognitive and Brain Sciences and Department of Psychology, University of Macau, Taipa, China
| |
Collapse
|
12
|
Jozwik KM, Kietzmann TC, Cichy RM, Kriegeskorte N, Mur M. Deep Neural Networks and Visuo-Semantic Models Explain Complementary Components of Human Ventral-Stream Representational Dynamics. J Neurosci 2023; 43:1731-1741. [PMID: 36759190 PMCID: PMC10010451 DOI: 10.1523/jneurosci.1424-22.2022] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 11/08/2022] [Accepted: 12/20/2022] [Indexed: 02/11/2023] Open
Abstract
Deep neural networks (DNNs) are promising models of the cortical computations supporting human object recognition. However, despite their ability to explain a significant portion of variance in neural data, the agreement between models and brain representational dynamics is far from perfect. We address this issue by asking which representational features are currently unaccounted for in neural time series data, estimated for multiple areas of the ventral stream via source-reconstructed magnetoencephalography data acquired in human participants (nine females, six males) during object viewing. We focus on the ability of visuo-semantic models, consisting of human-generated labels of object features and categories, to explain variance beyond the explanatory power of DNNs alone. We report a gradual reversal in the relative importance of DNN versus visuo-semantic features as ventral-stream object representations unfold over space and time. Although lower-level visual areas are better explained by DNN features starting early in time (at 66 ms after stimulus onset), higher-level cortical dynamics are best accounted for by visuo-semantic features starting later in time (at 146 ms after stimulus onset). Among the visuo-semantic features, object parts and basic categories drive the advantage over DNNs. These results show that a significant component of the variance unexplained by DNNs in higher-level cortical dynamics is structured and can be explained by readily nameable aspects of the objects. We conclude that current DNNs fail to fully capture dynamic representations in higher-level human visual cortex and suggest a path toward more accurate models of ventral-stream computations.SIGNIFICANCE STATEMENT When we view objects such as faces and cars in our visual environment, their neural representations dynamically unfold over time at a millisecond scale. These dynamics reflect the cortical computations that support fast and robust object recognition. DNNs have emerged as a promising framework for modeling these computations but cannot yet fully account for the neural dynamics. Using magnetoencephalography data acquired in human observers during object viewing, we show that readily nameable aspects of objects, such as 'eye', 'wheel', and 'face', can account for variance in the neural dynamics over and above DNNs. These findings suggest that DNNs and humans may in part rely on different object features for visual recognition and provide guidelines for model improvement.
Collapse
Affiliation(s)
- Kamila M Jozwik
- Department of Psychology, University of Cambridge, Cambridge CB2 3EB, United Kingdom
| | - Tim C Kietzmann
- Institute of Cognitive Science, University of Osnabrück, 49069 Osnabrück, Germany
| | - Radoslaw M Cichy
- Department of Education and Psychology, Freie Universität Berlin, 14195 Berlin, Germany
| | - Nikolaus Kriegeskorte
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York 10027
| | - Marieke Mur
- Department of Psychology, Western University, London, Ontario N6A 3K7, Canada
- Department of Computer Science, Western University, London, Ontario N6A 3K7, Canada
| |
Collapse
|
13
|
Abstract
A schema refers to a structured body of prior knowledge that captures common patterns across related experiences. Schemas have been studied separately in the realms of episodic memory and spatial navigation across different species and have been grounded in theories of memory consolidation, but there has been little attempt to integrate our understanding across domains, particularly in humans. We propose that experiences during navigation with many similarly structured environments give rise to the formation of spatial schemas (for example, the expected layout of modern cities) that share properties with but are distinct from cognitive maps (for example, the memory of a modern city) and event schemas (such as expected events in a modern city) at both cognitive and neural levels. We describe earlier theoretical frameworks and empirical findings relevant to spatial schemas, along with more targeted investigations of spatial schemas in human and non-human animals. Consideration of architecture and urban analytics, including the influence of scale and regionalization, on different properties of spatial schemas may provide a powerful approach to advance our understanding of spatial schemas.
Collapse
|
14
|
Bracci S, Op de Beeck HP. Understanding Human Object Vision: A Picture Is Worth a Thousand Representations. Annu Rev Psychol 2023; 74:113-135. [PMID: 36378917 DOI: 10.1146/annurev-psych-032720-041031] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Objects are the core meaningful elements in our visual environment. Classic theories of object vision focus upon object recognition and are elegant and simple. Some of their proposals still stand, yet the simplicity is gone. Recent evolutions in behavioral paradigms, neuroscientific methods, and computational modeling have allowed vision scientists to uncover the complexity of the multidimensional representational space that underlies object vision. We review these findings and propose that the key to understanding this complexity is to relate object vision to the full repertoire of behavioral goals that underlie human behavior, running far beyond object recognition. There might be no such thing as core object recognition, and if it exists, then its importance is more limited than traditionally thought.
Collapse
Affiliation(s)
- Stefania Bracci
- Center for Mind/Brain Sciences, University of Trento, Rovereto, Italy;
| | - Hans P Op de Beeck
- Leuven Brain Institute, Research Unit Brain & Cognition, KU Leuven, Leuven, Belgium;
| |
Collapse
|
15
|
Tang K, Chin M, Chun M, Xu Y. The contribution of object identity and configuration to scene representation in convolutional neural networks. PLoS One 2022; 17:e0270667. [PMID: 35763531 PMCID: PMC9239439 DOI: 10.1371/journal.pone.0270667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 06/14/2022] [Indexed: 11/23/2022] Open
Abstract
Scene perception involves extracting the identities of the objects comprising a scene in conjunction with their configuration (the spatial layout of the objects in the scene). How object identity and configuration information is weighted during scene processing and how this weighting evolves over the course of scene processing however, is not fully understood. Recent developments in convolutional neural networks (CNNs) have demonstrated their aptitude at scene processing tasks and identified correlations between processing in CNNs and in the human brain. Here we examined four CNN architectures (Alexnet, Resnet18, Resnet50, Densenet161) and their sensitivity to changes in object and configuration information over the course of scene processing. Despite differences among the four CNN architectures, across all CNNs, we observed a common pattern in the CNN's response to object identity and configuration changes. Each CNN demonstrated greater sensitivity to configuration changes in early stages of processing and stronger sensitivity to object identity changes in later stages. This pattern persists regardless of the spatial structure present in the image background, the accuracy of the CNN in classifying the scene, and even the task used to train the CNN. Importantly, CNNs' sensitivity to a configuration change is not the same as their sensitivity to any type of position change, such as that induced by a uniform translation of the objects without a configuration change. These results provide one of the first documentations of how object identity and configuration information are weighted in CNNs during scene processing.
Collapse
Affiliation(s)
- Kevin Tang
- Department of Psychology, Yale University, New Haven, CT, United States of America
| | - Matthew Chin
- Department of Psychology, Yale University, New Haven, CT, United States of America
| | - Marvin Chun
- Department of Psychology, Yale University, New Haven, CT, United States of America
| | - Yaoda Xu
- Department of Psychology, Yale University, New Haven, CT, United States of America
- * E-mail:
| |
Collapse
|
16
|
Ayzenberg V, Kamps FS, Dilks DD, Lourenco SF. Skeletal representations of shape in the human visual cortex. Neuropsychologia 2022; 164:108092. [PMID: 34801519 PMCID: PMC9840386 DOI: 10.1016/j.neuropsychologia.2021.108092] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Revised: 11/07/2021] [Accepted: 11/17/2021] [Indexed: 01/17/2023]
Abstract
Shape perception is crucial for object recognition. However, it remains unknown exactly how shape information is represented and used by the visual system. Here, we tested the hypothesis that the visual system represents object shape via a skeletal structure. Using functional magnetic resonance imaging (fMRI) and representational similarity analysis (RSA), we found that a model of skeletal similarity explained significant unique variance in the response profiles of V3 and LO. Moreover, the skeletal model remained predictive in these regions even when controlling for other models of visual similarity that approximate low-to high-level visual features (i.e., Gabor-jet, GIST, HMAX, and AlexNet), and across different surface forms, a manipulation that altered object contours while preserving the underlying skeleton. Together, these findings shed light on shape processing in human vision, as well as the computational properties of V3 and LO. We discuss how these regions may support two putative roles of shape skeletons: namely, perceptual organization and object recognition.
Collapse
Affiliation(s)
- Vladislav Ayzenberg
- Department of Psychology, Carnegie Mellon University, USA,Corresponding author: (V. Ayzenberg)
| | - Frederik S. Kamps
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, USA
| | | | - Stella F. Lourenco
- Department of Psychology, Emory University, USA,Corresponding author: (S.F. Lourenco)
| |
Collapse
|
17
|
Harel A, Nador JD, Bonner MF, Epstein RA. Early Electrophysiological Markers of Navigational Affordances in Scenes. J Cogn Neurosci 2021; 34:397-410. [PMID: 35015877 DOI: 10.1162/jocn_a_01810] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Scene perception and spatial navigation are interdependent cognitive functions, and there is increasing evidence that cortical areas that process perceptual scene properties also carry information about the potential for navigation in the environment (navigational affordances). However, the temporal stages by which visual information is transformed into navigationally relevant information are not yet known. We hypothesized that navigational affordances are encoded during perceptual processing and therefore should modulate early visually evoked ERPs, especially the scene-selective P2 component. To test this idea, we recorded ERPs from participants while they passively viewed computer-generated room scenes matched in visual complexity. By simply changing the number of doors (no doors, 1 door, 2 doors, 3 doors), we were able to systematically vary the number of pathways that afford movement in the local environment, while keeping the overall size and shape of the environment constant. We found that rooms with no doors evoked a higher P2 response than rooms with three doors, consistent with prior research reporting higher P2 amplitude to closed relative to open scenes. Moreover, we found P2 amplitude scaled linearly with the number of doors in the scenes. Navigability effects on the ERP waveform were also observed in a multivariate analysis, which showed significant decoding of the number of doors and their location at earlier time windows. Together, our results suggest that navigational affordances are represented in the early stages of scene perception. This complements research showing that the occipital place area automatically encodes the structure of navigable space and strengthens the link between scene perception and navigation.
Collapse
|
18
|
Functional selectivity for social interaction perception in the human superior temporal sulcus during natural viewing. Neuroimage 2021; 245:118741. [PMID: 34800663 DOI: 10.1016/j.neuroimage.2021.118741] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Revised: 09/15/2021] [Accepted: 11/16/2021] [Indexed: 11/22/2022] Open
Abstract
Recognizing others' social interactions is a crucial human ability. Using simple stimuli, previous studies have shown that social interactions are selectively processed in the superior temporal sulcus (STS), but prior work with movies has suggested that social interactions are processed in the medial prefrontal cortex (mPFC), part of the theory of mind network. It remains unknown to what extent social interaction selectivity is observed in real world stimuli when controlling for other covarying perceptual and social information, such as faces, voices, and theory of mind. The current study utilizes a functional magnetic resonance imaging (fMRI) movie paradigm and advanced machine learning methods to uncover the brain mechanisms uniquely underlying naturalistic social interaction perception. We analyzed two publicly available fMRI datasets, collected while both male and female human participants (n = 17 and 18) watched two different commercial movies in the MRI scanner. By performing voxel-wise encoding and variance partitioning analyses, we found that broad social-affective features predict neural responses in social brain regions, including the STS and mPFC. However, only the STS showed robust and unique selectivity specifically to social interactions, independent from other covarying features. This selectivity was observed across two separate fMRI datasets. These findings suggest that naturalistic social interaction perception recruits dedicated neural circuity in the STS, separate from the theory of mind network, and is a critical dimension of human social understanding.
Collapse
|
19
|
Groen IIA, Dekker TM, Knapen T, Silson EH. Visuospatial coding as ubiquitous scaffolding for human cognition. Trends Cogn Sci 2021; 26:81-96. [PMID: 34799253 DOI: 10.1016/j.tics.2021.10.011] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 10/19/2021] [Accepted: 10/19/2021] [Indexed: 01/28/2023]
Abstract
For more than 100 years we have known that the visual field is mapped onto the surface of visual cortex, imposing an inherently spatial reference frame on visual information processing. Recent studies highlight visuospatial coding not only throughout visual cortex, but also brain areas not typically considered visual. Such widespread access to visuospatial coding raises important questions about its role in wider cognitive functioning. Here, we synthesise these recent developments and propose that visuospatial coding scaffolds human cognition by providing a reference frame through which neural computations interface with environmental statistics and task demands via perception-action loops.
Collapse
Affiliation(s)
- Iris I A Groen
- Institute for Informatics, University of Amsterdam, Amsterdam, The Netherlands
| | - Tessa M Dekker
- Institute of Ophthalmology, University College London, London, UK
| | - Tomas Knapen
- Behavioral and Movement Sciences, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands; Spinoza Centre for NeuroImaging, Royal Dutch Academy of Sciences, Amsterdam, The Netherlands
| | - Edward H Silson
- Department of Psychology, School of Philosophy, Psychology & Language Sciences, University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
20
|
Direct comparison of contralateral bias and face/scene selectivity in human occipitotemporal cortex. Brain Struct Funct 2021; 227:1405-1421. [PMID: 34727232 PMCID: PMC9046350 DOI: 10.1007/s00429-021-02411-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 10/08/2021] [Indexed: 10/27/2022]
Abstract
Human visual cortex is organised broadly according to two major principles: retinotopy (the spatial mapping of the retina in cortex) and category-selectivity (preferential responses to specific categories of stimuli). Historically, these principles were considered anatomically separate, with retinotopy restricted to the occipital cortex and category-selectivity emerging in the lateral-occipital and ventral-temporal cortex. However, recent studies show that category-selective regions exhibit systematic retinotopic biases, for example exhibiting stronger activation for stimuli presented in the contra- compared to the ipsilateral visual field. It is unclear, however, whether responses within category-selective regions are more strongly driven by retinotopic location or by category preference, and if there are systematic differences between category-selective regions in the relative strengths of these preferences. Here, we directly compare contralateral and category preferences by measuring fMRI responses to scene and face stimuli presented in the left or right visual field and computing two bias indices: a contralateral bias (response to the contralateral minus ipsilateral visual field) and a face/scene bias (preferred response to scenes compared to faces, or vice versa). We compare these biases within and between scene- and face-selective regions and across the lateral and ventral surfaces of the visual cortex more broadly. We find an interaction between surface and bias: lateral surface regions show a stronger contralateral than face/scene bias, whilst ventral surface regions show the opposite. These effects are robust across and within subjects, and appear to reflect large-scale, smoothly varying gradients. Together, these findings support distinct functional roles for the lateral and ventral visual cortex in terms of the relative importance of the spatial location of stimuli during visual information processing.
Collapse
|
21
|
Abstract
During natural vision, our brains are constantly exposed to complex, but regularly structured environments. Real-world scenes are defined by typical part-whole relationships, where the meaning of the whole scene emerges from configurations of localized information present in individual parts of the scene. Such typical part-whole relationships suggest that information from individual scene parts is not processed independently, but that there are mutual influences between the parts and the whole during scene analysis. Here, we review recent research that used a straightforward, but effective approach to study such mutual influences: By dissecting scenes into multiple arbitrary pieces, these studies provide new insights into how the processing of whole scenes is shaped by their constituent parts and, conversely, how the processing of individual parts is determined by their role within the whole scene. We highlight three facets of this research: First, we discuss studies demonstrating that the spatial configuration of multiple scene parts has a profound impact on the neural processing of the whole scene. Second, we review work showing that cortical responses to individual scene parts are shaped by the context in which these parts typically appear within the environment. Third, we discuss studies demonstrating that missing scene parts are interpolated from the surrounding scene context. Bridging these findings, we argue that efficient scene processing relies on an active use of the scene's part-whole structure, where the visual brain matches scene inputs with internal models of what the world should look like.
Collapse
Affiliation(s)
- Daniel Kaiser
- Justus-Liebig-Universität Gießen, Germany.,Philipps-Universität Marburg, Germany.,University of York, United Kingdom
| | - Radoslaw M Cichy
- Freie Universität Berlin, Germany.,Humboldt-Universität zu Berlin, Germany.,Bernstein Centre for Computational Neuroscience Berlin, Germany
| |
Collapse
|
22
|
Chaisilprungraung T, Park S. "Scene" from inside: The representation of Observer's space in high-level visual cortex. Neuropsychologia 2021; 161:108010. [PMID: 34454940 DOI: 10.1016/j.neuropsychologia.2021.108010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2021] [Revised: 07/30/2021] [Accepted: 08/23/2021] [Indexed: 10/20/2022]
Abstract
Human observers are remarkably adept at perceiving and interacting with visual stimuli around them. Compared to visual stimuli like objects or faces, scenes are unique in that they provide enclosures for observers. An observer looks at a scene by being physically inside the scene. The current research explored this unique observer-scene relationship by studying the neural representation of scenes' spatial boundaries. Previous studies hypothesized that scenes' boundaries were processed in sets of high-level visual cortices. Notably, the parahippocampal place area (PPA), exhibited neural sensitivity to scenes that had closed vs. open spatial boundaries (Kravitz et al., 2011; Park et al., 2011). We asked whether this sensitivity reflected the openness of landscape (e.g., forest vs. beach), or the openness of the environment immediately surrounding the observer (i.e., whether a scene was viewed from inside vs. outside a room). Across two human fMRI experiments, we found that the PPA, as well as another well-known navigation-processing area, the occipital place area (OPA), processed scenes' boundaries according to the observer's space rather than the landscape. Moreover, we found that the PPA's activation pattern was susceptible to manipulations involving mid-level perceptual properties of scenes (e.g., rectilinear pattern of window frames), while the OPA's response was not. Our results have important implications for research in visual scene processing and suggest an important role of an observer's location in representing the spatial boundary, beyond the low-level visual input of a landscape.
Collapse
Affiliation(s)
| | - Soojin Park
- Department of Psychology, Yonsei University, Seoul, South Korea.
| |
Collapse
|
23
|
Lindsay GW. Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future. J Cogn Neurosci 2021; 33:2017-2031. [DOI: 10.1162/jocn_a_01544] [Citation(s) in RCA: 96] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Abstract
Convolutional neural networks (CNNs) were inspired by early findings in the study of biological vision. They have since become successful tools in computer vision and state-of-the-art models of both neural activity and behavior on visual tasks. This review highlights what, in the context of CNNs, it means to be a good model in computational neuroscience and the various ways models can provide insight. Specifically, it covers the origins of CNNs and the methods by which we validate them as models of biological vision. It then goes on to elaborate on what we can learn about biological vision by understanding and experimenting on CNNs and discusses emerging opportunities for the use of CNNs in vision research beyond basic object recognition.
Collapse
|
24
|
Dwivedi K, Cichy RM, Roig G. Unraveling Representations in Scene-selective Brain Regions Using Scene-Parsing Deep Neural Networks. J Cogn Neurosci 2021; 33:2032-2043. [PMID: 32897121 PMCID: PMC7612022 DOI: 10.1162/jocn_a_01624] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/24/2024]
Abstract
Visual scene perception is mediated by a set of cortical regions that respond preferentially to images of scenes, including the occipital place area (OPA) and parahippocampal place area (PPA). However, the differential contribution of OPA and PPA to scene perception remains an open research question. In this study, we take a deep neural network (DNN)-based computational approach to investigate the differences in OPA and PPA function. In a first step, we search for a computational model that predicts fMRI responses to scenes in OPA and PPA well. We find that DNNs trained to predict scene components (e.g., wall, ceiling, floor) explain higher variance uniquely in OPA and PPA than a DNN trained to predict scene category (e.g., bathroom, kitchen, office). This result is robust across several DNN architectures. On this basis, we then determine whether particular scene components predicted by DNNs differentially account for unique variance in OPA and PPA. We find that variance in OPA responses uniquely explained by the navigation-related floor component is higher compared to the variance explained by the wall and ceiling components. In contrast, PPA responses are better explained by the combination of wall and floor, that is, scene components that together contain the structure and texture of the scene. This differential sensitivity to scene components suggests differential functions of OPA and PPA in scene processing. Moreover, our results further highlight the potential of the proposed computational approach as a general tool in the investigation of the neural basis of human scene perception.
Collapse
Affiliation(s)
- Kshitij Dwivedi
- Department of Education and Psychology, Free Universität Berlin, Germany
- Department of Computer Science, Goethe University, Frankfurt am Main, Germany
| | | | - Gemma Roig
- Department of Computer Science, Goethe University, Frankfurt am Main, Germany
| |
Collapse
|
25
|
Dwivedi K, Bonner MF, Cichy RM, Roig G. Unveiling functions of the visual cortex using task-specific deep neural networks. PLoS Comput Biol 2021; 17:e1009267. [PMID: 34388161 PMCID: PMC8407579 DOI: 10.1371/journal.pcbi.1009267] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 08/31/2021] [Accepted: 07/11/2021] [Indexed: 11/20/2022] Open
Abstract
The human visual cortex enables visual perception through a cascade of hierarchical computations in cortical regions with distinct functionalities. Here, we introduce an AI-driven approach to discover the functional mapping of the visual cortex. We related human brain responses to scene images measured with functional MRI (fMRI) systematically to a diverse set of deep neural networks (DNNs) optimized to perform different scene perception tasks. We found a structured mapping between DNN tasks and brain regions along the ventral and dorsal visual streams. Low-level visual tasks mapped onto early brain regions, 3-dimensional scene perception tasks mapped onto the dorsal stream, and semantic tasks mapped onto the ventral stream. This mapping was of high fidelity, with more than 60% of the explainable variance in nine key regions being explained. Together, our results provide a novel functional mapping of the human visual cortex and demonstrate the power of the computational approach.
Collapse
Affiliation(s)
- Kshitij Dwivedi
- Department of Education and Psychology, Freie Universität Berlin, Germany
- Department of Computer Science, Goethe University, Frankfurt am Main, Germany
| | - Michael F. Bonner
- Department of Cognitive Science, Johns Hopkins University, Baltimore, Maryland, United States of America
| | | | - Gemma Roig
- Department of Computer Science, Goethe University, Frankfurt am Main, Germany
| |
Collapse
|
26
|
Li J, Zhang R, Liu S, Liang Q, Zheng S, He X, Huang R. Human spatial navigation: Neural representations of spatial scales and reference frames obtained from an ALE meta-analysis. Neuroimage 2021; 238:118264. [PMID: 34129948 DOI: 10.1016/j.neuroimage.2021.118264] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Revised: 05/26/2021] [Accepted: 05/27/2021] [Indexed: 11/16/2022] Open
Abstract
Humans use different spatial reference frames (allocentric or egocentric) to navigate successfully toward their destination in different spatial scale spaces (environmental or vista). However, it remains unclear how the brain represents different spatial scales and different spatial reference frames. Thus, we conducted an activation likelihood estimation (ALE) meta-analysis of 47 fMRI articles involving human spatial navigation. We found that both the environmental and vista spaces activated the parahippocampal place area (PPA), retrosplenial complex (RSC), and occipital place area in the right hemisphere. The environmental space showed stronger activation than the vista space in the occipital and frontal regions. No brain region exhibited stronger activation for the vista than the environmental space. The allocentric and egocentric reference frames activated the bilateral PPA and right RSC. The allocentric frame showed more stronger activations than the egocentric frame in the right culmen, left middle frontal gyrus, and precuneus. No brain region displayed stronger activation for the egocentric than the allocentric navigation. Our findings suggest that navigation in different spatial scale spaces can evoke specific and common brain regions, and that the brain regions representing spatial reference frames are not absolutely separated.
Collapse
Affiliation(s)
- Jinhui Li
- Key Laboratory of Brain, Cognition and Education Sciences (South China Normal University), Ministry of Education; School of Psychology, Center for Studies of Psychological Application, and Guangdong Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou, Guangdong, 510631, China
| | - Ruibin Zhang
- Department of Psychology, School of Public Health, Southern Medical University (Guangdong Provincial Key Laboratory of Tropical Disease Research), Guangzhou, China; Department of Psychiatry, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Siqi Liu
- Key Laboratory of Brain, Cognition and Education Sciences (South China Normal University), Ministry of Education; School of Psychology, Center for Studies of Psychological Application, and Guangdong Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou, Guangdong, 510631, China
| | - Qunjun Liang
- Key Laboratory of Brain, Cognition and Education Sciences (South China Normal University), Ministry of Education; School of Psychology, Center for Studies of Psychological Application, and Guangdong Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou, Guangdong, 510631, China
| | - Senning Zheng
- Key Laboratory of Brain, Cognition and Education Sciences (South China Normal University), Ministry of Education; School of Psychology, Center for Studies of Psychological Application, and Guangdong Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou, Guangdong, 510631, China
| | - Xianyou He
- Key Laboratory of Brain, Cognition and Education Sciences (South China Normal University), Ministry of Education; School of Psychology, Center for Studies of Psychological Application, and Guangdong Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou, Guangdong, 510631, China
| | - Ruiwang Huang
- Key Laboratory of Brain, Cognition and Education Sciences (South China Normal University), Ministry of Education; School of Psychology, Center for Studies of Psychological Application, and Guangdong Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou, Guangdong, 510631, China.
| |
Collapse
|
27
|
Lu Z, Ku Y. NeuroRA: A Python Toolbox of Representational Analysis From Multi-Modal Neural Data. Front Neuroinform 2021; 14:563669. [PMID: 33424573 PMCID: PMC7787009 DOI: 10.3389/fninf.2020.563669] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Accepted: 12/03/2020] [Indexed: 11/26/2022] Open
Abstract
In studies of cognitive neuroscience, multivariate pattern analysis (MVPA) is widely used as it offers richer information than traditional univariate analysis. Representational similarity analysis (RSA), as one method of MVPA, has become an effective decoding method based on neural data by calculating the similarity between different representations in the brain under different conditions. Moreover, RSA is suitable for researchers to compare data from different modalities and even bridge data from different species. However, previous toolboxes have been made to fit specific datasets. Here, we develop NeuroRA, a novel and easy-to-use toolbox for representational analysis. Our toolbox aims at conducting cross-modal data analysis from multi-modal neural data (e.g., EEG, MEG, fNIRS, fMRI, and other sources of neruroelectrophysiological data), behavioral data, and computer-simulated data. Compared with previous software packages, our toolbox is more comprehensive and powerful. Using NeuroRA, users can not only calculate the representational dissimilarity matrix (RDM), which reflects the representational similarity among different task conditions and conduct a representational analysis among different RDMs to achieve a cross-modal comparison. Besides, users can calculate neural pattern similarity (NPS), spatiotemporal pattern similarity (STPS), and inter-subject correlation (ISC) with this toolbox. NeuroRA also provides users with functions performing statistical analysis, storage, and visualization of results. We introduce the structure, modules, features, and algorithms of NeuroRA in this paper, as well as examples applying the toolbox in published datasets.
Collapse
Affiliation(s)
- Zitong Lu
- Guangdong Provincial Key Laboratory of Social Cognitive Neuroscience and Mental Health, Department of Psychology, Sun Yat-sen University, Guangzhou, China.,Peng Cheng Laboratory, Shenzhen, China.,Shanghai Key Laboratory of Brain Functional Genomics, Shanghai Changning-East China Normal University (ECNU) Mental Health Center, School of Psychology and Cognitive Science, East China Normal University, Shanghai, China
| | - Yixuan Ku
- Guangdong Provincial Key Laboratory of Social Cognitive Neuroscience and Mental Health, Department of Psychology, Sun Yat-sen University, Guangzhou, China.,Peng Cheng Laboratory, Shenzhen, China
| |
Collapse
|
28
|
Abstract
Does the human mind resemble the machines that can behave like it? Biologically inspired machine-learning systems approach "human-level" accuracy in an astounding variety of domains, and even predict human brain activity-raising the exciting possibility that such systems represent the world like we do. However, even seemingly intelligent machines fail in strange and "unhumanlike" ways, threatening their status as models of our minds. How can we know when human-machine behavioral differences reflect deep disparities in their underlying capacities, vs. when such failures are only superficial or peripheral? This article draws on a foundational insight from cognitive science-the distinction between performance and competence-to encourage "species-fair" comparisons between humans and machines. The performance/competence distinction urges us to consider whether the failure of a system to behave as ideally hypothesized, or the failure of one creature to behave like another, arises not because the system lacks the relevant knowledge or internal capacities ("competence"), but instead because of superficial constraints on demonstrating that knowledge ("performance"). I argue that this distinction has been neglected by research comparing human and machine behavior, and that it should be essential to any such comparison. Focusing on the domain of image classification, I identify three factors contributing to the species-fairness of human-machine comparisons, extracted from recent work that equates such constraints. Species-fair comparisons level the playing field between natural and artificial intelligence, so that we can separate more superficial differences from those that may be deep and enduring.
Collapse
Affiliation(s)
- Chaz Firestone
- Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, MD 21218
| |
Collapse
|
29
|
Castelhano MS, Krzyś K. Rethinking Space: A Review of Perception, Attention, and Memory in Scene Processing. Annu Rev Vis Sci 2020; 6:563-586. [PMID: 32491961 DOI: 10.1146/annurev-vision-121219-081745] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Scene processing is fundamentally influenced and constrained by spatial layout and spatial associations with objects. However, semantic information has played a vital role in propelling our understanding of real-world scene perception forward. In this article, we review recent advances in assessing how spatial layout and spatial relations influence scene processing. We examine the organization of the larger environment and how we take full advantage of spatial configurations independently of semantic information. We demonstrate that a clear differentiation of spatial from semantic information is necessary to advance research in the field of scene processing.
Collapse
Affiliation(s)
- Monica S Castelhano
- Department of Psychology, Queen's University, Kingston, Ontario K7L 3N6, Canada;
| | - Karolina Krzyś
- Department of Psychology, Queen's University, Kingston, Ontario K7L 3N6, Canada;
| |
Collapse
|
30
|
Rehrig G, Peacock CE, Hayes TR, Henderson JM, Ferreira F. Where the action could be: Speakers look at graspable objects and meaningful scene regions when describing potential actions. J Exp Psychol Learn Mem Cogn 2020; 46:1659-1681. [PMID: 32271065 PMCID: PMC7483632 DOI: 10.1037/xlm0000837] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The world is visually complex, yet we can efficiently describe it by extracting the information that is most relevant to convey. How do the properties of real-world scenes help us decide where to look and what to say? Image salience has been the dominant explanation for what drives visual attention and production as we describe displays, but new evidence shows scene meaning predicts attention better than image salience. Here we investigated the relevance of one aspect of meaning, graspability (the grasping interactions objects in the scene afford), given that affordances have been implicated in both visual and linguistic processing. We quantified image salience, meaning, and graspability for real-world scenes. In 3 eyetracking experiments, native English speakers described possible actions that could be carried out in a scene. We hypothesized that graspability would preferentially guide attention due to its task-relevance. In 2 experiments using stimuli from a previous study, meaning explained visual attention better than graspability or salience did, and graspability explained attention better than salience. In a third experiment we quantified image salience, meaning, graspability, and reach-weighted graspability for scenes that depicted reachable spaces containing graspable objects. Graspability and meaning explained attention equally well in the third experiment, and both explained attention better than salience. We conclude that speakers use object graspability to allocate attention to plan descriptions when scenes depict graspable objects within reach, and otherwise rely more on general meaning. The results shed light on what aspects of meaning guide attention during scene viewing in language production tasks. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
Collapse
|
31
|
Disentangling the Independent Contributions of Visual and Conceptual Features to the Spatiotemporal Dynamics of Scene Categorization. J Neurosci 2020; 40:5283-5299. [PMID: 32467356 DOI: 10.1523/jneurosci.2088-19.2020] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 04/18/2020] [Accepted: 04/23/2020] [Indexed: 11/21/2022] Open
Abstract
Human scene categorization is characterized by its remarkable speed. While many visual and conceptual features have been linked to this ability, significant correlations exist between feature spaces, impeding our ability to determine their relative contributions to scene categorization. Here, we used a whitening transformation to decorrelate a variety of visual and conceptual features and assess the time course of their unique contributions to scene categorization. Participants (both sexes) viewed 2250 full-color scene images drawn from 30 different scene categories while having their brain activity measured through 256-channel EEG. We examined the variance explained at each electrode and time point of visual event-related potential (vERP) data from nine different whitened encoding models. These ranged from low-level features obtained from filter outputs to high-level conceptual features requiring human annotation. The amount of category information in the vERPs was assessed through multivariate decoding methods. Behavioral similarity measures were obtained in separate crowdsourced experiments. We found that all nine models together contributed 78% of the variance of human scene similarity assessments and were within the noise ceiling of the vERP data. Low-level models explained earlier vERP variability (88 ms after image onset), whereas high-level models explained later variance (169 ms). Critically, only high-level models shared vERP variability with behavior. Together, these results suggest that scene categorization is primarily a high-level process, but reliant on previously extracted low-level features.SIGNIFICANCE STATEMENT In a single fixation, we glean enough information to describe a general scene category. Many types of features are associated with scene categories, ranging from low-level properties, such as colors and contours, to high-level properties, such as objects and attributes. Because these properties are correlated, it is difficult to understand each property's unique contributions to scene categorization. This work uses a whitening transformation to remove the correlations between features and examines the extent to which each feature contributes to visual event-related potentials over time. We found that low-level visual features contributed first but were not correlated with categorization behavior. High-level features followed 80 ms later, providing key insights into how the brain makes sense of a complex visual world.
Collapse
|
32
|
Effects of Spatial Frequency Filtering Choices on the Perception of Filtered Images. Vision (Basel) 2020; 4:vision4020029. [PMID: 32466442 PMCID: PMC7355859 DOI: 10.3390/vision4020029] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2020] [Revised: 05/13/2020] [Accepted: 05/22/2020] [Indexed: 11/17/2022] Open
Abstract
The early visual system is composed of spatial frequency-tuned channels that break an image into its individual frequency components. Therefore, researchers commonly filter images for spatial frequencies to arrive at conclusions about the differential importance of high versus and low spatial frequency image content. Here, we show how simple decisions about the filtering of the images, and how they are displayed on the screen, can result in drastically different behavioral outcomes. We show that jointly normalizing the contrast of the stimuli is critical in order to draw accurate conclusions about the influence of the different spatial frequencies, as images of the real world naturally have higher contrast energy at low than high spatial frequencies. Furthermore, the specific choice of filter shape can result in contradictory results about whether high or low spatial frequencies are more useful for understanding image content. Finally, we show that the manner in which the high spatial frequency content is displayed on the screen influences how recognizable an image is. Previous findings that make claims about the visual system's use of certain spatial frequency bands should be revisited, especially if their methods sections do not make clear what filtering choices were made.
Collapse
|
33
|
Coding of Navigational Distance and Functional Constraint of Boundaries in the Human Scene-Selective Cortex. J Neurosci 2020; 40:3621-3630. [PMID: 32209608 DOI: 10.1523/jneurosci.1991-19.2020] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2019] [Revised: 02/28/2020] [Accepted: 03/05/2020] [Indexed: 11/21/2022] Open
Abstract
For visually guided navigation, the use of environmental cues is essential. Particularly, detecting local boundaries that impose limits to locomotion and estimating their location is crucial. In a series of three fMRI experiments, we investigated whether there is a neural coding of navigational distance in the human visual cortex (both female and male). We used virtual reality software to systematically manipulate the distance from a viewer perspective to different types of a boundary. Using a multivoxel pattern classification employing a linear support vector machine, we found that the occipital place area (OPA) is sensitive to the navigational distance restricted by the transparent glass wall. Further, the OPA was sensitive to a non-crossable boundary only, suggesting an importance of the functional constraint of a boundary. Together, we propose the OPA as a perceptual source of external environmental features relevant for navigation.SIGNIFICANCE STATEMENT One of major goals in cognitive neuroscience has been to understand the nature of visual scene representation in human ventral visual cortex. An aspect of scene perception that has been overlooked despite its ecological importance is the analysis of space for navigation. One of critical computation necessary for navigation is coding of distance to environmental boundaries that impose limit on navigator's movements. This paper reports the first empirical evidence for coding of navigational distance in the human visual cortex and its striking sensitivity to functional constraint of environmental boundaries. Such finding links the paper to previous neurological and behavioral works that emphasized the distance to boundaries as a crucial geometric property for reorientation behavior of children and other animal species.
Collapse
|
34
|
Mohsenzadeh Y, Mullin C, Lahner B, Oliva A. Emergence of Visual Center-Periphery Spatial Organization in Deep Convolutional Neural Networks. Sci Rep 2020; 10:4638. [PMID: 32170209 PMCID: PMC7070097 DOI: 10.1038/s41598-020-61409-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Accepted: 02/26/2020] [Indexed: 12/02/2022] Open
Abstract
Research at the intersection of computer vision and neuroscience has revealed hierarchical correspondence between layers of deep convolutional neural networks (DCNNs) and cascade of regions along human ventral visual cortex. Recently, studies have uncovered emergence of human interpretable concepts within DCNNs layers trained to identify visual objects and scenes. Here, we asked whether an artificial neural network (with convolutional structure) trained for visual categorization would demonstrate spatial correspondences with human brain regions showing central/peripheral biases. Using representational similarity analysis, we compared activations of convolutional layers of a DCNN trained for object and scene categorization with neural representations in human brain visual regions. Results reveal a brain-like topographical organization in the layers of the DCNN, such that activations of layer-units with central-bias were associated with brain regions with foveal tendencies (e.g. fusiform gyrus), and activations of layer-units with selectivity for image backgrounds were associated with cortical regions showing peripheral preference (e.g. parahippocampal cortex). The emergence of a categorical topographical correspondence between DCNNs and brain regions suggests these models are a good approximation of the perceptual representation generated by biological neural networks.
Collapse
Affiliation(s)
- Yalda Mohsenzadeh
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA.
- Department of Computer Science, The University of Western Ontario, London, ON, Canada.
- The Brain and Mind Institute, The University of Western Ontario, London, ON, Canada.
| | - Caitlin Mullin
- Department of Psychology, Center for Vision Research, York University, Toronto, ON, Canada
| | - Benjamin Lahner
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
| | - Aude Oliva
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
| |
Collapse
|
35
|
Wu W, Wang X, Wei T, He C, Bi Y. Object parsing in the left lateral occipitotemporal cortex: Whole shape, part shape, and graspability. Neuropsychologia 2020; 138:107340. [DOI: 10.1016/j.neuropsychologia.2020.107340] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Revised: 11/26/2019] [Accepted: 01/10/2020] [Indexed: 11/27/2022]
|
36
|
Tucciarelli R, Wurm M, Baccolo E, Lingnau A. The representational space of observed actions. eLife 2019; 8:47686. [PMID: 31804177 PMCID: PMC6894926 DOI: 10.7554/elife.47686] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2019] [Accepted: 11/21/2019] [Indexed: 11/25/2022] Open
Abstract
Categorizing and understanding other people’s actions is a key human capability. Whereas there exists a growing literature regarding the organization of objects, the representational space underlying the organization of observed actions remains largely unexplored. Here we examined the organizing principles of a large set of actions and the corresponding neural representations. Using multiple regression representational similarity analysis of fMRI data, in which we accounted for variability due to major action components (body parts, scenes, movements, objects, sociality, transitivity) and three control models (distance between observer and actor, number of people, HMAX-C1), we found that the semantic dissimilarity structure was best captured by patterns of activation in the lateral occipitotemporal cortex (LOTC). Together, our results demonstrate that the organization of observed actions in the LOTC resembles the organizing principles used by participants to classify actions behaviorally, in line with the view that this region is crucial for accessing the meaning of actions.
Collapse
Affiliation(s)
- Raffaele Tucciarelli
- Department of Psychology, Royal Holloway University of London, Egham, United Kingdom
| | - Moritz Wurm
- Center for Mind/Brain Sciences (CIMeC), University of Trento, Rovereto, Italy
| | - Elisa Baccolo
- Center for Mind/Brain Sciences (CIMeC), University of Trento, Rovereto, Italy
| | - Angelika Lingnau
- Department of Psychology, Royal Holloway University of London, Egham, United Kingdom.,Center for Mind/Brain Sciences (CIMeC), University of Trento, Rovereto, Italy.,Institute of Psychology, University of Regensburg, Regensburg, Germany
| |
Collapse
|
37
|
Julian JB, Keinath AT, Marchette SA, Epstein RA. The Neurocognitive Basis of Spatial Reorientation. Curr Biol 2019; 28:R1059-R1073. [PMID: 30205055 DOI: 10.1016/j.cub.2018.04.057] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
The ability to recover one's bearings when lost is a skill that is fundamental for spatial navigation. We review the cognitive and neural mechanisms that underlie this ability, with the aim of linking together previously disparate findings from animal behavior, human psychology, electrophysiology, and cognitive neuroscience. Behavioral work suggests that reorientation involves two key abilities: first, the recovery of a spatial reference frame (a cognitive map) that is appropriate to the current environment; and second, the determination of one's heading and location relative to that reference frame. Electrophysiological recording studies, primarily in rodents, have revealed potential correlates of these operations in place, grid, border/boundary, and head-direction cells in the hippocampal formation. Cognitive neuroscience studies, primarily in humans, suggest that the perceptual inputs necessary for these operations are processed by neocortical regions such as the retrosplenial complex, occipital place area and parahippocampal place area, with the retrosplenial complex mediating spatial transformations between the local environment and the recovered spatial reference frame, the occipital place area supporting perception of local boundaries, and the parahippocampal place area processing visual information that is essential for identification of the local spatial context. By combining results across these various literatures, we converge on a unified account of reorientation that bridges the cognitive and neural domains.
Collapse
Affiliation(s)
- Joshua B Julian
- University of Pennsylvania, Department of Psychology, 3710 Hamilton Walk, Philadelphia, PA 19104, USA; Kavli Institute for Systems Neuroscience, Centre for Neural Computation, NTNU, Norwegian University of Science and Technology, Trondheim, Norway.
| | - Alexandra T Keinath
- University of Pennsylvania, Department of Psychology, 3710 Hamilton Walk, Philadelphia, PA 19104, USA; McGill University, Douglas Mental Health University Institute, 6875 Boulevard LaSalle, Verdun, QC, Canada
| | - Steven A Marchette
- University of Pennsylvania, Department of Psychology, 3710 Hamilton Walk, Philadelphia, PA 19104, USA
| | - Russell A Epstein
- University of Pennsylvania, Department of Psychology, 3710 Hamilton Walk, Philadelphia, PA 19104, USA.
| |
Collapse
|
38
|
Peer M, Ron Y, Monsa R, Arzy S. Processing of different spatial scales in the human brain. eLife 2019; 8:47492. [PMID: 31502539 PMCID: PMC6739872 DOI: 10.7554/elife.47492] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Accepted: 08/05/2019] [Indexed: 11/13/2022] Open
Abstract
Humans navigate across a range of spatial scales, from rooms to continents, but the brain systems underlying spatial cognition are usually investigated only in small-scale environments. Do the same brain systems represent and process larger spaces? Here we asked subjects to compare distances between real-world items at six different spatial scales (room, building, neighborhood, city, country, continent) under functional MRI. Cortical activity showed a gradual progression from small to large scale processing, along three gradients extending anteriorly from the parahippocampal place area (PPA), retrosplenial complex (RSC) and occipital place area (OPA), and along the hippocampus posterior-anterior axis. Each of the cortical gradients overlapped with the visual system posteriorly and the default-mode network (DMN) anteriorly. These results suggest a progression from concrete to abstract processing with increasing spatial scale, and offer a new organizational framework for the brain’s spatial system, that may also apply to conceptual spaces beyond the spatial domain.
Collapse
Affiliation(s)
- Michael Peer
- Department of Medical Neurosciences, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel.,Department of Neurology, Hadassah Hebrew University Medical School, Jerusalem, Israel.,Department of Psychology, University of Pennsylvania, Philadelphia, United States
| | - Yorai Ron
- Department of Medical Neurosciences, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel.,Department of Neurology, Hadassah Hebrew University Medical School, Jerusalem, Israel
| | - Rotem Monsa
- Department of Medical Neurosciences, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel.,Department of Neurology, Hadassah Hebrew University Medical School, Jerusalem, Israel
| | - Shahar Arzy
- Department of Medical Neurosciences, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel.,Department of Neurology, Hadassah Hebrew University Medical School, Jerusalem, Israel
| |
Collapse
|
39
|
King ML, Groen IIA, Steel A, Kravitz DJ, Baker CI. Similarity judgments and cortical visual responses reflect different properties of object and scene categories in naturalistic images. Neuroimage 2019; 197:368-382. [PMID: 31054350 PMCID: PMC6591094 DOI: 10.1016/j.neuroimage.2019.04.079] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2018] [Revised: 03/26/2019] [Accepted: 04/29/2019] [Indexed: 12/20/2022] Open
Abstract
Numerous factors have been reported to underlie the representation of complex images in high-level human visual cortex, including categories (e.g. faces, objects, scenes), animacy, and real-world size, but the extent to which this organization reflects behavioral judgments of real-world stimuli is unclear. Here, we compared representations derived from explicit behavioral similarity judgments and ultra-high field (7T) fMRI of human visual cortex for multiple exemplars of a diverse set of naturalistic images from 48 object and scene categories. While there was a significant correlation between similarity judgments and fMRI responses, there were striking differences between the two representational spaces. Behavioral judgements primarily revealed a coarse division between man-made (including humans) and natural (including animals) images, with clear groupings of conceptually-related categories (e.g. transportation, animals), while these conceptual groupings were largely absent in the fMRI representations. Instead, fMRI responses primarily seemed to reflect a separation of both human and non-human faces/bodies from all other categories. Further, comparison of the behavioral and fMRI representational spaces with those derived from the layers of a deep neural network (DNN) showed a strong correspondence with behavior in the top-most layer and with fMRI in the mid-level layers. These results suggest a complex relationship between localized responses in high-level visual cortex and behavioral similarity judgments - each domain reflects different properties of the images, and responses in high-level visual cortex may correspond to intermediate stages of processing between basic visual features and the conceptual categories that dominate the behavioral response.
Collapse
Affiliation(s)
- Marcie L King
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, 20892, USA; Department of Psychological and Brain Sciences, University of Iowa, W311 Seashore Hall, Iowa City, IA, 52242, USA
| | - Iris I A Groen
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, 20892, USA; Department of Psychology, New York University, 6 Washington Place, New York, NY, 10003, USA
| | - Adam Steel
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Dwight J Kravitz
- Department of Psychology, George Washington University, 2125 G St. NW, Washington, DC, 20008, USA
| | - Chris I Baker
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, 20892, USA.
| |
Collapse
|
40
|
Representation of human spatial navigation responding to input spatial information and output navigational strategies: An ALE meta-analysis. Neurosci Biobehav Rev 2019; 103:60-72. [DOI: 10.1016/j.neubiorev.2019.06.012] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Revised: 05/22/2019] [Accepted: 06/11/2019] [Indexed: 12/23/2022]
|
41
|
Williams CC, Castelhano MS. The Changing Landscape: High-Level Influences on Eye Movement Guidance in Scenes. Vision (Basel) 2019; 3:E33. [PMID: 31735834 PMCID: PMC6802790 DOI: 10.3390/vision3030033] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Revised: 06/20/2019] [Accepted: 06/24/2019] [Indexed: 11/16/2022] Open
Abstract
The use of eye movements to explore scene processing has exploded over the last decade. Eye movements provide distinct advantages when examining scene processing because they are both fast and spatially measurable. By using eye movements, researchers have investigated many questions about scene processing. Our review will focus on research performed in the last decade examining: (1) attention and eye movements; (2) where you look; (3) influence of task; (4) memory and scene representations; and (5) dynamic scenes and eye movements. Although typically addressed as separate issues, we argue that these distinctions are now holding back research progress. Instead, it is time to examine the intersections of these seemingly separate influences and examine the intersectionality of how these influences interact to more completely understand what eye movements can tell us about scene processing.
Collapse
Affiliation(s)
- Carrick C. Williams
- Department of Psychology, California State University San Marcos, San Marcos, CA 92069, USA
| | | |
Collapse
|
42
|
Ayzenberg V, Lourenco SF. Skeletal descriptions of shape provide unique perceptual information for object recognition. Sci Rep 2019; 9:9359. [PMID: 31249321 PMCID: PMC6597715 DOI: 10.1038/s41598-019-45268-y] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Accepted: 05/29/2019] [Indexed: 11/17/2022] Open
Abstract
With seemingly little effort, humans can both identify an object across large changes in orientation and extend category membership to novel exemplars. Although researchers argue that object shape is crucial in these cases, there are open questions as to how shape is represented for object recognition. Here we tested whether the human visual system incorporates a three-dimensional skeletal descriptor of shape to determine an object's identity. Skeletal models not only provide a compact description of an object's global shape structure, but also provide a quantitative metric by which to compare the visual similarity between shapes. Our results showed that a model of skeletal similarity explained the greatest amount of variance in participants' object dissimilarity judgments when compared with other computational models of visual similarity (Experiment 1). Moreover, parametric changes to an object's skeleton led to proportional changes in perceived similarity, even when controlling for another model of structure (Experiment 2). Importantly, participants preferentially categorized objects by their skeletons across changes to local shape contours and non-accidental properties (Experiment 3). Our findings highlight the importance of skeletal structure in vision, not only as a shape descriptor, but also as a diagnostic cue of object identity.
Collapse
|
43
|
Abstract
Humans are remarkably adept at perceiving and understanding complex real-world scenes. Uncovering the neural basis of this ability is an important goal of vision science. Neuroimaging studies have identified three cortical regions that respond selectively to scenes: parahippocampal place area, retrosplenial complex/medial place area, and occipital place area. Here, we review what is known about the visual and functional properties of these brain areas. Scene-selective regions exhibit retinotopic properties and sensitivity to low-level visual features that are characteristic of scenes. They also mediate higher-level representations of layout, objects, and surface properties that allow individual scenes to be recognized and their spatial structure ascertained. Challenges for the future include developing computational models of information processing in scene regions, investigating how these regions support scene perception under ecologically realistic conditions, and understanding how they operate in the context of larger brain networks.
Collapse
Affiliation(s)
- Russell A Epstein
- Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA;
| | - Chris I Baker
- Section on Learning and Plasticity, Laboratory of Brain and Cognition, National Institute of Mental Health, Bethesda, Maryland 20892, USA;
| |
Collapse
|
44
|
Henriksson L, Mur M, Kriegeskorte N. Rapid Invariant Encoding of Scene Layout in Human OPA. Neuron 2019; 103:161-171.e3. [PMID: 31097360 DOI: 10.1016/j.neuron.2019.04.014] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Revised: 03/13/2019] [Accepted: 04/05/2019] [Indexed: 01/30/2023]
Abstract
Successful visual navigation requires a sense of the geometry of the local environment. How do our brains extract this information from retinal images? Here we visually presented scenes with all possible combinations of five scene-bounding elements (left, right, and back walls; ceiling; floor) to human subjects during functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG). The fMRI response patterns in the scene-responsive occipital place area (OPA) reflected scene layout with invariance to changes in surface texture. This result contrasted sharply with the primary visual cortex (V1), which reflected low-level image features of the stimuli, and the parahippocampal place area (PPA), which showed better texture than layout decoding. MEG indicated that the texture-invariant scene layout representation is computed from visual input within ∼100 ms, suggesting a rapid computational mechanism. Taken together, these results suggest that the cortical representation underlying our instant sense of the environmental geometry is located in the OPA.
Collapse
Affiliation(s)
- Linda Henriksson
- Department of Neuroscience and Biomedical Engineering, Aalto University, 02150 Espoo, Finland; AMI Centre, MEG Core, ABL, Aalto NeuroImaging, Aalto University, 02150 Espoo, Finland.
| | - Marieke Mur
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge CB2 7EF, UK; Department of Psychology, Brain and Mind Institute, Western University, London, ON N6A 3K7, Canada
| | - Nikolaus Kriegeskorte
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge CB2 7EF, UK; Department of Psychology, Department of Neuroscience, and Department of Electrical Engineering, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10032, USA
| |
Collapse
|
45
|
Abstract
Our research has previously shown that scene categories can be predicted from observers' eye movements when they view photographs of real-world scenes. The time course of category predictions reveals the differential influences of bottom-up and top-down information. Here we used these known differences to determine to what extent image features at different representational levels contribute toward guiding gaze in a category-specific manner. Participants viewed grayscale photographs and line drawings of real-world scenes while their gaze was tracked. Scene categories could be predicted from fixation density at all times over a 2-s time course in both photographs and line drawings. We replicated the shape of the prediction curve found previously, with an initial steep decrease in prediction accuracy from 300 to 500 ms, representing the contribution of bottom-up information, followed by a steady increase, representing top-down knowledge of category-specific information. We then computed the low-level features (luminance contrasts and orientation statistics), mid-level features (local symmetry and contour junctions), and Deep Gaze II output from the images, and used that information as a reference in our category predictions in order to assess their respective contributions to category-specific guidance of gaze. We observed that, as expected, low-level salience contributes mostly to the initial bottom-up peak of gaze guidance. Conversely, the mid-level features that describe scene structure (i.e., local symmetry and junctions) split their contributions between bottom-up and top-down attentional guidance, with symmetry contributing to both bottom-up and top-down guidance, while junctions play a more prominent role in the top-down guidance of gaze.
Collapse
|
46
|
Glaser JI, Benjamin AS, Farhoodi R, Kording KP. The roles of supervised machine learning in systems neuroscience. Prog Neurobiol 2019; 175:126-137. [PMID: 30738835 PMCID: PMC8454059 DOI: 10.1016/j.pneurobio.2019.01.008] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2018] [Revised: 01/23/2019] [Accepted: 01/28/2019] [Indexed: 01/18/2023]
Abstract
Over the last several years, the use of machine learning (ML) in neuroscience has been rapidly increasing. Here, we review ML's contributions, both realized and potential, across several areas of systems neuroscience. We describe four primary roles of ML within neuroscience: (1) creating solutions to engineering problems, (2) identifying predictive variables, (3) setting benchmarks for simple models of the brain, and (4) serving itself as a model for the brain. The breadth and ease of its applicability suggests that machine learning should be in the toolbox of most systems neuroscientists.
Collapse
Affiliation(s)
- Joshua I Glaser
- Department of Bioengineering, University of Pennsylvania, United States.
| | - Ari S Benjamin
- Department of Bioengineering, University of Pennsylvania, United States.
| | - Roozbeh Farhoodi
- Department of Bioengineering, University of Pennsylvania, United States.
| | - Konrad P Kording
- Department of Bioengineering, University of Pennsylvania, United States; Department of Neuroscience, University of Pennsylvania, United States; Canadian Institute for Advanced Research, Canada.
| |
Collapse
|
47
|
O'Connell TP, Chun MM. Predicting eye movement patterns from fMRI responses to natural scenes. Nat Commun 2018; 9:5159. [PMID: 30514836 PMCID: PMC6279768 DOI: 10.1038/s41467-018-07471-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Accepted: 11/02/2018] [Indexed: 12/03/2022] Open
Abstract
Eye tracking has long been used to measure overt spatial attention, and computational models of spatial attention reliably predict eye movements to natural images. However, researchers lack techniques to noninvasively access spatial representations in the human brain that guide eye movements. Here, we use functional magnetic resonance imaging (fMRI) to predict eye movement patterns from reconstructed spatial representations evoked by natural scenes. First, we reconstruct fixation maps to directly predict eye movement patterns from fMRI activity. Next, we use a model-based decoding pipeline that aligns fMRI activity to deep convolutional neural network activity to reconstruct spatial priority maps and predict eye movements in a zero-shot fashion. We predict human eye movement patterns from fMRI responses to natural scenes, provide evidence that visual representations of scenes and objects map onto neural representations that predict eye movements, and find a novel three-way link between brain activity, deep neural network models, and behavior.
Collapse
Affiliation(s)
| | - Marvin M Chun
- Department of Psychology, Yale University, New Haven, 06520, USA
- Department of Neuroscience, Yale School of Medicine, New Haven, 06520, USA
| |
Collapse
|
48
|
Lescroart MD, Gallant JL. Human Scene-Selective Areas Represent 3D Configurations of Surfaces. Neuron 2018; 101:178-192.e7. [PMID: 30497771 DOI: 10.1016/j.neuron.2018.11.004] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2017] [Revised: 08/01/2018] [Accepted: 11/02/2018] [Indexed: 10/27/2022]
Abstract
It has been argued that scene-selective areas in the human brain represent both the 3D structure of the local visual environment and low-level 2D features (such as spatial frequency) that provide cues for 3D structure. To evaluate the degree to which each of these hypotheses explains variance in scene-selective areas, we develop an encoding model of 3D scene structure and test it against a model of low-level 2D features. We fit the models to fMRI data recorded while subjects viewed visual scenes. The fit models reveal that scene-selective areas represent the distance to and orientation of large surfaces, at least partly independent of low-level features. Principal component analysis of the model weights reveals that the most important dimensions of 3D structure are distance and openness. Finally, reconstructions of the stimuli based on the model weights demonstrate that our model captures unprecedented detail about the local visual environment from scene-selective areas.
Collapse
Affiliation(s)
- Mark D Lescroart
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Jack L Gallant
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA 94720, USA; Department of Psychology, University of California, Berkeley, Berkeley, CA 94720, USA.
| |
Collapse
|
49
|
Dima DC, Perry G, Singh KD. Spatial frequency supports the emergence of categorical representations in visual cortex during natural scene perception. Neuroimage 2018; 179:102-116. [PMID: 29902586 PMCID: PMC6057270 DOI: 10.1016/j.neuroimage.2018.06.033] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Revised: 06/01/2018] [Accepted: 06/09/2018] [Indexed: 11/22/2022] Open
Abstract
In navigating our environment, we rapidly process and extract meaning from visual cues. However, the relationship between visual features and categorical representations in natural scene perception is still not well understood. Here, we used natural scene stimuli from different categories and filtered at different spatial frequencies to address this question in a passive viewing paradigm. Using representational similarity analysis (RSA) and cross-decoding of magnetoencephalography (MEG) data, we show that categorical representations emerge in human visual cortex at ∼180 ms and are linked to spatial frequency processing. Furthermore, dorsal and ventral stream areas reveal temporally and spatially overlapping representations of low and high-level layer activations extracted from a feedforward neural network. Our results suggest that neural patterns from extrastriate visual cortex switch from low-level to categorical representations within 200 ms, highlighting the rapid cascade of processing stages essential in human visual perception.
Collapse
Affiliation(s)
- Diana C Dima
- Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, Cardiff University, Cardiff, CF24 4HQ, United Kingdom.
| | - Gavin Perry
- Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, Cardiff University, Cardiff, CF24 4HQ, United Kingdom
| | - Krish D Singh
- Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, Cardiff University, Cardiff, CF24 4HQ, United Kingdom
| |
Collapse
|