1
|
Papale P, Wang F, Self MW, Roelfsema PR. An extensive dataset of spiking activity to reveal the syntax of the ventral stream. Neuron 2025; 113:539-553.e5. [PMID: 39809277 DOI: 10.1016/j.neuron.2024.12.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 08/16/2024] [Accepted: 12/03/2024] [Indexed: 01/16/2025]
Abstract
Visual neuroscience benefits from high-quality datasets with neuronal responses to many images. Several neuroimaging datasets have been published in recent years, but no comparable dataset with spiking activity exists. Here, we introduce the THINGS ventral stream spiking dataset (TVSD). We extensively sampled neuronal activity in response to >25,000 natural images from the THINGS database in macaques, using high-channel-count implants in three key cortical regions: primary visual cortex (V1), V4, and the inferotemporal cortex. We showcase the utility of TVSD by using an artificial neural network to visualize the tuning of neurons. We also characterize the correlated fluctuations in activity within and between areas and demonstrate that these noise correlations are strongest between neurons with similar tuning. The TVSD allows researchers to answer many questions about neuronal tuning, analyze the interactions within and between cortical regions, and compare spiking activity in monkeys to human neuroimaging data.
Collapse
Affiliation(s)
- Paolo Papale
- Department of Vision & Cognition, Netherlands Institute for Neuroscience (KNAW), 1105 BA Amsterdam, the Netherlands.
| | - Feng Wang
- Department of Vision & Cognition, Netherlands Institute for Neuroscience (KNAW), 1105 BA Amsterdam, the Netherlands
| | - Matthew W Self
- Department of Vision & Cognition, Netherlands Institute for Neuroscience (KNAW), 1105 BA Amsterdam, the Netherlands
| | - Pieter R Roelfsema
- Department of Vision & Cognition, Netherlands Institute for Neuroscience (KNAW), 1105 BA Amsterdam, the Netherlands; Department of Integrative Neurophysiology, VU University, De Boelelaan 1085, 1081 HV Amsterdam, the Netherlands; Department of Neurosurgery, Academic Medical Centre, Postbus 22660, 1100 DD Amsterdam, the Netherlands; Laboratory of Visual Brain Therapy, Sorbonne Université, INSERM, CNRS, Institut de la Vision, 17 rue Moreau, 75012 Paris, France.
| |
Collapse
|
2
|
Badwal MW, Bergmann J, Roth JHR, Doeller CF, Hebart MN. The Scope and Limits of Fine-Grained Image and Category Information in the Ventral Visual Pathway. J Neurosci 2025; 45:e0936242024. [PMID: 39505406 PMCID: PMC11735656 DOI: 10.1523/jneurosci.0936-24.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 09/15/2024] [Accepted: 09/20/2024] [Indexed: 11/08/2024] Open
Abstract
Humans can easily abstract incoming visual information into discrete semantic categories. Previous research employing functional MRI (fMRI) in humans has identified cortical organizing principles that allow not only for coarse-scale distinctions such as animate versus inanimate objects but also more fine-grained distinctions at the level of individual objects. This suggests that fMRI carries rather fine-grained information about individual objects. However, most previous work investigating fine-grained category representations either additionally included coarse-scale category comparisons of objects, which confounds fine-grained and coarse-scale distinctions, or only used a single exemplar of each object, which confounds visual and semantic information. To address these challenges, here we used multisession human fMRI (female and male) paired with a broad yet homogenous stimulus class of 48 terrestrial mammals, with two exemplars per mammal. Multivariate decoding and representational similarity analysis revealed high image-specific reliability in low- and high-level visual regions, indicating stable representational patterns at the image level. In contrast, analyses across exemplars of the same animal yielded only small effects in the lateral occipital complex (LOC), indicating rather subtle category effects in this region. Variance partitioning with a deep neural network and shape model showed that across-exemplar effects in the early visual cortex were largely explained by low-level visual appearance, while representations in LOC appeared to also contain higher category-specific information. These results suggest that representations typically measured with fMRI are dominated by image-specific visual or coarse-grained category information but indicate that commonly employed fMRI protocols may reveal subtle yet reliable distinctions between individual objects.
Collapse
Affiliation(s)
- Markus W Badwal
- Department of Psychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
- Vision & Computational Cognition Group, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
- Department of Neurosurgery, University of Leipzig Medical Center, Leipzig 04103, Germany
| | - Johanna Bergmann
- Department of Psychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
| | - Johannes H R Roth
- Vision & Computational Cognition Group, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
- Department of Medicine, Justus Liebig University, Giessen 35390 Germany
| | - Christian F Doeller
- Department of Psychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
- Kavli Institute for Systems Neuroscience, Norwegian University of Science and Technology, Trondheim 7030, Norway
| | - Martin N Hebart
- Vision & Computational Cognition Group, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
- Department of Medicine, Justus Liebig University, Giessen 35390 Germany
- Center for Mind, Brain and Behavior, Universities of Marburg, Giessen, and Darmstadt, Marburg 35032, Germany
| |
Collapse
|
3
|
Mukherjee K, Rogers TT. Using drawings and deep neural networks to characterize the building blocks of human visual similarity. Mem Cognit 2025; 53:219-241. [PMID: 38814385 DOI: 10.3758/s13421-024-01580-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/22/2024] [Indexed: 05/31/2024]
Abstract
Early in life and without special training, human beings discern resemblance between abstract visual stimuli, such as drawings, and the real-world objects they represent. We used this capacity for visual abstraction as a tool for evaluating deep neural networks (DNNs) as models of human visual perception. Contrasting five contemporary DNNs, we evaluated how well each explains human similarity judgments among line drawings of recognizable and novel objects. For object sketches, human judgments were dominated by semantic category information; DNN representations contributed little additional information. In contrast, such features explained significant unique variance perceived similarity of abstract drawings. In both cases, a vision transformer trained to blend representations of images and their natural language descriptions showed the greatest ability to explain human perceptual similarity-an observation consistent with contemporary views of semantic representation and processing in the human mind and brain. Together, the results suggest that the building blocks of visual similarity may arise within systems that learn to use visual information, not for specific classification, but in service of generating semantic representations of objects.
Collapse
Affiliation(s)
- Kushin Mukherjee
- Department of Psychology & Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA.
| | - Timothy T Rogers
- Department of Psychology & Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
4
|
Contier O, Baker CI, Hebart MN. Distributed representations of behaviour-derived object dimensions in the human visual system. Nat Hum Behav 2024; 8:2179-2193. [PMID: 39251723 PMCID: PMC11576512 DOI: 10.1038/s41562-024-01980-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 08/06/2024] [Indexed: 09/11/2024]
Abstract
Object vision is commonly thought to involve a hierarchy of brain regions processing increasingly complex image features, with high-level visual cortex supporting object recognition and categorization. However, object vision supports diverse behavioural goals, suggesting basic limitations of this category-centric framework. To address these limitations, we mapped a series of dimensions derived from a large-scale analysis of human similarity judgements directly onto the brain. Our results reveal broadly distributed representations of behaviourally relevant information, demonstrating selectivity to a wide variety of novel dimensions while capturing known selectivities for visual features and categories. Behaviour-derived dimensions were superior to categories at predicting brain responses, yielding mixed selectivity in much of visual cortex and sparse selectivity in category-selective clusters. This framework reconciles seemingly disparate findings regarding regional specialization, explaining category selectivity as a special case of sparse response profiles among representational dimensions, suggesting a more expansive view on visual processing in the human brain.
Collapse
Affiliation(s)
- Oliver Contier
- Vision and Computational Cognition Group, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany.
- Max Planck School of Cognition, Leipzig, Germany.
| | - Chris I Baker
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA
| | - Martin N Hebart
- Vision and Computational Cognition Group, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
- Department of Medicine, Justus Liebig University Giessen, Giessen, Germany
| |
Collapse
|
5
|
Pacheco-Estefan D, Fellner MC, Kunz L, Zhang H, Reinacher P, Roy C, Brandt A, Schulze-Bonhage A, Yang L, Wang S, Liu J, Xue G, Axmacher N. Maintenance and transformation of representational formats during working memory prioritization. Nat Commun 2024; 15:8234. [PMID: 39300141 DOI: 10.1038/s41467-024-52541-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 09/11/2024] [Indexed: 09/22/2024] Open
Abstract
Visual working memory depends on both material-specific brain areas in the ventral visual stream (VVS) that support the maintenance of stimulus representations and on regions in the prefrontal cortex (PFC) that control these representations. How executive control prioritizes working memory contents and whether this affects their representational formats remains an open question, however. Here, we analyzed intracranial EEG (iEEG) recordings in epilepsy patients with electrodes in VVS and PFC who performed a multi-item working memory task involving a retro-cue. We employed Representational Similarity Analysis (RSA) with various Deep Neural Network (DNN) architectures to investigate the representational format of prioritized VWM content. While recurrent DNN representations matched PFC representations in the beta band (15-29 Hz) following the retro-cue, they corresponded to VVS representations in a lower frequency range (3-14 Hz) towards the end of the maintenance period. Our findings highlight the distinct coding schemes and representational formats of prioritized content in VVS and PFC.
Collapse
Affiliation(s)
- Daniel Pacheco-Estefan
- Department of Neuropsychology, Institute of Cognitive Neuroscience, Faculty of Psychology, Ruhr University Bochum, 44801, Bochum, Germany.
| | - Marie-Christin Fellner
- Department of Neuropsychology, Institute of Cognitive Neuroscience, Faculty of Psychology, Ruhr University Bochum, 44801, Bochum, Germany
| | - Lukas Kunz
- Department of Epileptology, University Hospital Bonn, Bonn, Germany
| | - Hui Zhang
- Department of Neuropsychology, Institute of Cognitive Neuroscience, Faculty of Psychology, Ruhr University Bochum, 44801, Bochum, Germany
| | - Peter Reinacher
- Department of Stereotactic and Functional Neurosurgery, Medical Center - Faculty of Medicine, University of Freiburg, Freiburg, Germany
- Fraunhofer Institute for Laser Technology, Aachen, Germany
| | - Charlotte Roy
- Epilepsy Center, Medical Center - Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Armin Brandt
- Epilepsy Center, Medical Center - Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Andreas Schulze-Bonhage
- Epilepsy Center, Medical Center - Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Linglin Yang
- Department of Psychiatry, Second Affiliated Hospital, School of medicine, Zhejiang University, Hangzhou, China
| | - Shuang Wang
- Department of Neurology, Epilepsy center, Second Affiliated Hospital, School of medicine, Zhejiang University, Hangzhou, China
| | - Jing Liu
- Department of Applied Social Sciences, The Hong Kong Polytechnic University, Hong Kong, Hong Kong SAR
| | - Gui Xue
- State Key Laboratory of Cognitive Neuroscience and Learning and IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, 100875, PR China
| | - Nikolai Axmacher
- Department of Neuropsychology, Institute of Cognitive Neuroscience, Faculty of Psychology, Ruhr University Bochum, 44801, Bochum, Germany
- State Key Laboratory of Cognitive Neuroscience and Learning and IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, 100875, PR China
| |
Collapse
|
6
|
Dima DC, Janarthanan S, Culham JC, Mohsenzadeh Y. Shared representations of human actions across vision and language. Neuropsychologia 2024; 202:108962. [PMID: 39047974 DOI: 10.1016/j.neuropsychologia.2024.108962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 06/26/2024] [Accepted: 07/20/2024] [Indexed: 07/27/2024]
Abstract
Humans can recognize and communicate about many actions performed by others. How are actions organized in the mind, and is this organization shared across vision and language? We collected similarity judgments of human actions depicted through naturalistic videos and sentences, and tested four models of action categorization, defining actions at different levels of abstraction ranging from specific (action verb) to broad (action target: whether an action is directed towards an object, another person, or the self). The similarity judgments reflected a shared organization of action representations across videos and sentences, determined mainly by the target of actions, even after accounting for other semantic features. Furthermore, language model embeddings predicted the behavioral similarity of action videos and sentences, and captured information about the target of actions alongside unique semantic information. Together, our results show that action concepts are similarly organized in the mind across vision and language, and that this organization reflects socially relevant goals.
Collapse
Affiliation(s)
- Diana C Dima
- Dept of Computer Science, Western University, London, Ontario, Canada; Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada.
| | | | - Jody C Culham
- Dept of Psychology, Western University, London, Ontario, Canada
| | - Yalda Mohsenzadeh
- Dept of Computer Science, Western University, London, Ontario, Canada; Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
| |
Collapse
|
7
|
Ritchie JB, Andrews ST, Vaziri-Pashkam M, Baker CI. Graspable foods and tools elicit similar responses in visual cortex. Cereb Cortex 2024; 34:bhae383. [PMID: 39319569 DOI: 10.1093/cercor/bhae383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 08/28/2024] [Accepted: 09/04/2024] [Indexed: 09/26/2024] Open
Abstract
The extrastriatal visual cortex is known to exhibit distinct response profiles to complex stimuli of varying ecological importance (e.g. faces, scenes, and tools). Although food is primarily distinguished from other objects by its edibility, not its appearance, recent evidence suggests that there is also food selectivity in human visual cortex. Food is also associated with a common behavior, eating, and food consumption typically also involves the manipulation of food, often with hands. In this context, food items share many properties with tools: they are graspable objects that we manipulate in self-directed and stereotyped forms of action. Thus, food items may be preferentially represented in extrastriatal visual cortex in part because of these shared affordance properties, rather than because they reflect a wholly distinct kind of category. We conducted functional MRI and behavioral experiments to test this hypothesis. We found that graspable food items and tools were judged to be similar in their action-related properties and that the location, magnitude, and patterns of neural responses for images of graspable food items were similar in profile to the responses for tool stimuli. Our findings suggest that food selectivity may reflect the behavioral affordances of food items rather than a distinct form of category selectivity.
Collapse
Affiliation(s)
- John Brendan Ritchie
- The Laboratory of Brain and Cognition, The National Institute of Mental Health, 10 Center Drive, Bethesda, MD 20982, United States
| | - Spencer T Andrews
- The Laboratory of Brain and Cognition, The National Institute of Mental Health, 10 Center Drive, Bethesda, MD 20982, United States
- Harvard Law School, Harvard University, 1585 Massachusetts Ave, Cambridge, MA 02138, United States
| | - Maryam Vaziri-Pashkam
- The Laboratory of Brain and Cognition, The National Institute of Mental Health, 10 Center Drive, Bethesda, MD 20982, United States
- Department of Psychological and Brain Sciences, University of Delaware, 434 Wolf Hall, Newark, DE 19716, United States
| | - Chris I Baker
- The Laboratory of Brain and Cognition, The National Institute of Mental Health, 10 Center Drive, Bethesda, MD 20982, United States
| |
Collapse
|
8
|
Walbrin J, Sossounov N, Mahdiani M, Vaz I, Almeida J. Fine-grained knowledge about manipulable objects is well-predicted by contrastive language image pre-training. iScience 2024; 27:110297. [PMID: 39040066 PMCID: PMC11261149 DOI: 10.1016/j.isci.2024.110297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 02/23/2024] [Accepted: 06/14/2024] [Indexed: 07/24/2024] Open
Abstract
Object recognition is an important ability that relies on distinguishing between similar objects (e.g., deciding which utensil(s) to use at different stages of meal preparation). Recent work describes the fine-grained organization of knowledge about manipulable objects via the study of the constituent dimensions that are most relevant to human behavior, for example, vision, manipulation, and function-based properties. A logical extension of this work concerns whether or not these dimensions are uniquely human, or can be approximated by deep learning. Here, we show that behavioral dimensions are generally well-predicted by CLIP-ViT - a multimodal network trained on a large and diverse set of image-text pairs. Moreover, this model outperforms comparison networks pre-trained on smaller, image-only datasets. These results demonstrate the impressive capacity of CLIP-ViT to approximate fine-grained object knowledge. We discuss the possible sources of this benefit relative to other models (e.g., multimodal vs. image-only pre-training, dataset size, architecture).
Collapse
Affiliation(s)
- Jon Walbrin
- Proaction Laboratory, Faculty of Psychology and Educational Sciences, University of Coimbra, Coimbra, Portugal
- CINEICC, Faculty of Psychology and Educational Sciences, University of Coimbra, Coimbra, Portugal
| | - Nikita Sossounov
- Proaction Laboratory, Faculty of Psychology and Educational Sciences, University of Coimbra, Coimbra, Portugal
- CINEICC, Faculty of Psychology and Educational Sciences, University of Coimbra, Coimbra, Portugal
| | | | - Igor Vaz
- Proaction Laboratory, Faculty of Psychology and Educational Sciences, University of Coimbra, Coimbra, Portugal
- CINEICC, Faculty of Psychology and Educational Sciences, University of Coimbra, Coimbra, Portugal
| | - Jorge Almeida
- Proaction Laboratory, Faculty of Psychology and Educational Sciences, University of Coimbra, Coimbra, Portugal
- CINEICC, Faculty of Psychology and Educational Sciences, University of Coimbra, Coimbra, Portugal
| |
Collapse
|
9
|
Ritchie JB, Andrews S, Vaziri-Pashkam M, Baker CI. Graspable foods and tools elicit similar responses in visual cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.20.581258. [PMID: 38529495 PMCID: PMC10962699 DOI: 10.1101/2024.02.20.581258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
Extrastriatal visual cortex is known to exhibit distinct response profiles to complex stimuli of varying ecological importance (e.g., faces, scenes, and tools). The dominant interpretation of these effects is that they reflect activation of distinct "category-selective" brain regions specialized to represent these and other stimulus categories. We sought to explore an alternative perspective: that the response to these stimuli is determined less by whether they form distinct categories, and more by their relevance to different forms of natural behavior. In this regard, food is an interesting test case, since it is primarily distinguished from other objects by its edibility, not its appearance, and there is evidence of food-selectivity in human visual cortex. Food is also associated with a common behavior, eating, and food consumption typically also involves the manipulation of food, often with the hands. In this context, food items share many properties in common with tools: they are graspable objects that we manipulate in self-directed and stereotyped forms of action. Thus, food items may be preferentially represented in extrastriatal visual cortex in part because of these shared affordance properties, rather than because they reflect a wholly distinct kind of category. We conducted fMRI and behavioral experiments to test this hypothesis. We found that behaviorally graspable food items and tools were judged to be similar in their action-related properties, and that the location, magnitude, and patterns of neural responses for images of graspable food items were similar in profile to the responses for tool stimuli. Our findings suggest that food-selectivity may reflect the behavioral affordances of food items rather than a distinct form of category-selectivity.
Collapse
Affiliation(s)
- J. Brendan Ritchie
- The Laboratory of Brain and Cognition, The National Institute of Mental Health, MD, USA
| | - Spencer Andrews
- The Laboratory of Brain and Cognition, The National Institute of Mental Health, MD, USA
| | - Maryam Vaziri-Pashkam
- The Laboratory of Brain and Cognition, The National Institute of Mental Health, MD, USA
- Department of Psychological and Brain Sciences, University of Delaware, Newark, DE, USA
| | - Christopher I. Baker
- The Laboratory of Brain and Cognition, The National Institute of Mental Health, MD, USA
| |
Collapse
|
10
|
Milne GA, Lisi M, McLean A, Zheng R, Groen II, Dekker TM. Perceptual reorganization from prior knowledge emerges late in childhood. iScience 2024; 27:108787. [PMID: 38303715 PMCID: PMC10831247 DOI: 10.1016/j.isci.2024.108787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 09/05/2023] [Accepted: 01/02/2024] [Indexed: 02/03/2024] Open
Abstract
Human vision relies heavily on prior knowledge. Here, we show for the first time that prior-knowledge-induced reshaping of visual inputs emerges gradually in late childhood. To isolate the effects of prior knowledge on perception, we presented 4- to 12-year-olds and adults with two-tone images - hard-to-recognize degraded photos. In adults, seeing the original photo triggers perceptual reorganization, causing mandatory recognition of the two-tone version. This involves top-down signaling from higher-order brain areas to early visual cortex. We show that children younger than 7-9 years do not experience this knowledge-guided shift, despite viewing the original photo immediately before each two-tone. To assess computations underlying this development, we compared human performance to three neural networks with varying architectures. The best-performing model behaved much like 4- to 5-year-olds, displaying feature-based rather than holistic processing strategies. The reconciliation of prior knowledge with sensory input undergoes a striking age-related shift, which may underpin the development of many perceptual abilities.
Collapse
Affiliation(s)
- Georgia A. Milne
- Institute of Ophthalmology, University College London, EC1V 9EL London, UK
- Division of Psychology and Language Sciences, University College London, WC1H 0AP London, UK
| | - Matteo Lisi
- Department of Psychology, Royal Holloway, University of London, TW20 0EX London, UK
| | - Aisha McLean
- Institute of Ophthalmology, University College London, EC1V 9EL London, UK
| | - Rosie Zheng
- Informatics Institute, University of Amsterdam, 1098 XH Amsterdam, the Netherlands
| | - Iris I.A. Groen
- Informatics Institute, University of Amsterdam, 1098 XH Amsterdam, the Netherlands
| | - Tessa M. Dekker
- Institute of Ophthalmology, University College London, EC1V 9EL London, UK
- Division of Psychology and Language Sciences, University College London, WC1H 0AP London, UK
| |
Collapse
|
11
|
Naspi L, Stensholt C, Karlsson AE, Monge ZA, Cabeza R. Effects of Aging on Successful Object Encoding: Enhanced Semantic Representations Compensate for Impaired Visual Representations. J Neurosci 2023; 43:7337-7350. [PMID: 37673674 PMCID: PMC10621770 DOI: 10.1523/jneurosci.2265-22.2023] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 08/09/2023] [Accepted: 08/13/2023] [Indexed: 09/08/2023] Open
Abstract
Although episodic memory and visual processing decline substantially with healthy aging, semantic knowledge is generally spared. There is evidence that older adults' spared semantic knowledge can support episodic memory. Here, we used functional magnetic resonance imaging (fMRI) combined with representational similarity analyses (RSAs) to examine how novel visual and preexisting semantic representations at encoding predict subjective memory vividness at retrieval. Eighteen young and seventeen older adults (female and male participants) encoded images of objects during fMRI scanning and recalled these images while rating the vividness of their memories. After scanning, participants discriminated between studied images and similar lures. RSA based on a deep convolutional neural network and normative concept feature data were used to link patterns of neural activity during encoding to visual and semantic representations. Relative to young adults, the specificity of activation patterns for visual features was reduced in older adults, consistent with dedifferentiation. However, the specificity of activation patterns for semantic features was enhanced in older adults, consistent with hyperdifferentiation. Despite dedifferentiation, visual representations in early visual cortex (EVC) predicted high memory vividness in both age groups. In contrast, semantic representations in lingual gyrus (LG) and fusiform gyrus (FG) were associated with high memory vividness only in the older adults. Intriguingly, data suggests that older adults with lower specificity of visual representations in combination with higher specificity of semantic representations tended to rate their memories as more vivid. Our findings suggest that memory vividness in aging relies more on semantic representations over anterior regions, potentially compensating for age-related dedifferentiation of visual information in posterior regions.SIGNIFICANCE STATEMENT Normal aging is associated with impaired memory for events while semantic knowledge might even improve. We investigated the effects of aging on the specificity of visual and semantic information in the brain when viewing common objects and how this information enables subsequent memory vividness for these objects. Using functional magnetic resonance imaging (fMRI) combined with modeling of the stimuli we found that visual information was represented with less specificity in older than young adults while still supporting memory vividness. In contrast semantic information supported memory vividness only in older adults and especially in those individuals that had the lowest specificity of visual information. These findings provide evidence for a spared semantic memory system increasingly recruited to compensate for degraded visual representations in older age.
Collapse
Affiliation(s)
- Loris Naspi
- Department of Psychology, Humboldt University of Berlin, Berlin 10117, Germany
| | - Charlotte Stensholt
- Department of Psychology, Humboldt University of Berlin, Berlin 10117, Germany
| | - Anna E Karlsson
- Department of Psychology, Humboldt University of Berlin, Berlin 10117, Germany
| | - Zachary A Monge
- Center for Cognitive Neuroscience, Duke University, Durham, North Carolina 27708
| | - Roberto Cabeza
- Department of Psychology, Humboldt University of Berlin, Berlin 10117, Germany
- Center for Cognitive Neuroscience, Duke University, Durham, North Carolina 27708
| |
Collapse
|
12
|
Taylor J, Kriegeskorte N. Extracting and visualizing hidden activations and computational graphs of PyTorch models with TorchLens. Sci Rep 2023; 13:14375. [PMID: 37658079 PMCID: PMC10474256 DOI: 10.1038/s41598-023-40807-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Accepted: 08/16/2023] [Indexed: 09/03/2023] Open
Abstract
Deep neural network models (DNNs) are essential to modern AI and provide powerful models of information processing in biological neural networks. Researchers in both neuroscience and engineering are pursuing a better understanding of the internal representations and operations that undergird the successes and failures of DNNs. Neuroscientists additionally evaluate DNNs as models of brain computation by comparing their internal representations to those found in brains. It is therefore essential to have a method to easily and exhaustively extract and characterize the results of the internal operations of any DNN. Many models are implemented in PyTorch, the leading framework for building DNN models. Here we introduce TorchLens, a new open-source Python package for extracting and characterizing hidden-layer activations in PyTorch models. Uniquely among existing approaches to this problem, TorchLens has the following features: (1) it exhaustively extracts the results of all intermediate operations, not just those associated with PyTorch module objects, yielding a full record of every step in the model's computational graph, (2) it provides an intuitive visualization of the model's complete computational graph along with metadata about each computational step in a model's forward pass for further analysis, (3) it contains a built-in validation procedure to algorithmically verify the accuracy of all saved hidden-layer activations, and (4) the approach it uses can be automatically applied to any PyTorch model with no modifications, including models with conditional (if-then) logic in their forward pass, recurrent models, branching models where layer outputs are fed into multiple subsequent layers in parallel, and models with internally generated tensors (e.g., injections of noise). Furthermore, using TorchLens requires minimal additional code, making it easy to incorporate into existing pipelines for model development and analysis, and useful as a pedagogical aid when teaching deep learning concepts. We hope this contribution will help researchers in AI and neuroscience understand the internal representations of DNNs.
Collapse
Affiliation(s)
- JohnMark Taylor
- Zuckerman Mind Brain Behavior Institute, Columbia University, 3227 Broadway, New York, NY, 10027, USA.
| | - Nikolaus Kriegeskorte
- Zuckerman Mind Brain Behavior Institute, Columbia University, 3227 Broadway, New York, NY, 10027, USA
| |
Collapse
|
13
|
Nicholls VI, Alsbury-Nealy B, Krugliak A, Clarke A. Context effects on object recognition in real-world environments: A study protocol. Wellcome Open Res 2023; 7:165. [PMID: 37274451 PMCID: PMC10238820 DOI: 10.12688/wellcomeopenres.17856.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/05/2023] [Indexed: 07/22/2023] Open
Abstract
Background: The environments that we live in impact on our ability to recognise objects, with recognition being facilitated when objects appear in expected locations (congruent) compared to unexpected locations (incongruent). However, these findings are based on experiments where the object is isolated from its environment. Moreover, it is not clear which components of the recognition process are impacted by the environment. In this experiment, we seek to examine the impact real world environments have on object recognition. Specifically, we will use mobile electroencephalography (mEEG) and augmented reality (AR) to investigate how the visual and semantic processing aspects of object recognition are changed by the environment. Methods: We will use AR to place congruent and incongruent virtual objects around indoor and outdoor environments. During the experiment a total of 34 participants will walk around the environments and find these objects while we record their eye movements and neural signals. We will perform two primary analyses. First, we will analyse the event-related potential (ERP) data using paired samples t-tests in the N300/400 time windows in an attempt to replicate congruency effects on the N300/400. Second, we will use representational similarity analysis (RSA) and computational models of vision and semantics to determine how visual and semantic processes are changed by congruency. Conclusions: Based on previous literature, we hypothesise that scene-object congruence would facilitate object recognition. For ERPs, we predict a congruency effect in the N300/N400, and for RSA we predict that higher level visual and semantic information will be represented earlier for congruent scenes than incongruent scenes. By collecting mEEG data while participants are exploring a real-world environment, we will be able to determine the impact of a natural context on object recognition, and the different processing stages of object recognition.
Collapse
Affiliation(s)
| | | | - Alexandra Krugliak
- Department of Psychology, University of Cambridge, Cambridge, CB2 3EB, UK
| | - Alex Clarke
- Department of Psychology, University of Cambridge, Cambridge, CB2 3EB, UK
| |
Collapse
|
14
|
Taylor J, Kriegeskorte N. TorchLens: A Python package for extracting and visualizing hidden activations of PyTorch models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.16.532916. [PMID: 36993311 PMCID: PMC10055035 DOI: 10.1101/2023.03.16.532916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/13/2023]
Abstract
Deep neural network models (DNNs) are essential to modern AI and provide powerful models of information processing in biological neural networks. Researchers in both neuroscience and engineering are pursuing a better understanding of the internal representations and operations that undergird the successes and failures of DNNs. Neuroscientists additionally evaluate DNNs as models of brain computation by comparing their internal representations to those found in brains. It is therefore essential to have a method to easily and exhaustively extract and characterize the results of the internal operations of any DNN. Many models are implemented in PyTorch, the leading framework for building DNN models. Here we introduce TorchLens , a new open-source Python package for extracting and characterizing hidden-layer activations in PyTorch models. Uniquely among existing approaches to this problem, TorchLens has the following features: (1) it exhaustively extracts the results of all intermediate operations, not just those associated with PyTorch module objects, yielding a full record of every step in the model's computational graph, (2) it provides an intuitive visualization of the model's complete computational graph along with metadata about each computational step in a model's forward pass for further analysis, (3) it contains a built-in validation procedure to algorithmically verify the accuracy of all saved hidden-layer activations, and (4) the approach it uses can be automatically applied to any PyTorch model with no modifications, including models with conditional (if-then) logic in their forward pass, recurrent models, branching models where layer outputs are fed into multiple subsequent layers in parallel, and models with internally generated tensors (e.g., injections of noise). Furthermore, using TorchLens requires minimal additional code, making it easy to incorporate into existing pipelines for model development and analysis, and useful as a pedagogical aid when teaching deep learning concepts. We hope this contribution will help researchers in AI and neuroscience understand the internal representations of DNNs.
Collapse
Affiliation(s)
- JohnMark Taylor
- Zuckerman Mind Brain Behavior Institute, Columbia University (10027)
| | | |
Collapse
|
15
|
Nicholls VI, Alsbury-Nealy B, Krugliak A, Clarke A. Context effects on object recognition in real-world environments: A study protocol. Wellcome Open Res 2022. [DOI: 10.12688/wellcomeopenres.17856.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Background: The environments that we live in impact on our ability to recognise objects, with recognition being facilitated when objects appear in expected locations (congruent) compared to unexpected locations (incongruent). However, these findings are based on experiments where the object is isolated from its environment. Moreover, it is not clear which components of the recognition process are impacted by the environment. In this experiment, we seek to examine the impact real world environments have on object recognition. Specifically, we will use mobile electroencephalography (mEEG) and augmented reality (AR) to investigate how the visual and semantic processing aspects of object recognition are changed by the environment. Methods: We will use AR to place congruent and incongruent virtual objects around indoor and outdoor environments. During the experiment a total of 34 participants will walk around the environments and find these objects while we record their eye movements and neural signals. We will perform two primary analyses. First, we will analyse the event-related potential (ERP) data using paired samples t-tests in the N300/400 time windows in an attempt to replicate congruency effects on the N300/400. Second, we will use representational similarity analysis (RSA) and computational models of vision and semantics to determine how visual and semantic processes are changed by congruency. Conclusions: Based on previous literature, we hypothesise that scene-object congruence would facilitate object recognition. For ERPs, we predict a congruency effect in the N300/N400, and for RSA we predict that higher level visual and semantic information will be represented earlier for congruent scenes than incongruent scenes. By collecting mEEG data while participants are exploring a real-world environment, we will be able to determine the impact of a natural context on object recognition, and the different processing stages of object recognition.
Collapse
|