1
|
Tuckute G, Kanwisher N, Fedorenko E. Language in Brains, Minds, and Machines. Annu Rev Neurosci 2024; 47:277-301. [PMID: 38669478 DOI: 10.1146/annurev-neuro-120623-101142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/28/2024]
Abstract
It has long been argued that only humans could produce and understand language. But now, for the first time, artificial language models (LMs) achieve this feat. Here we survey the new purchase LMs are providing on the question of how language is implemented in the brain. We discuss why, a priori, LMs might be expected to share similarities with the human language system. We then summarize evidence that LMs represent linguistic information similarly enough to humans to enable relatively accurate brain encoding and decoding during language processing. Finally, we examine which LM properties-their architecture, task performance, or training-are critical for capturing human neural responses to language and review studies using LMs as in silico model organisms for testing hypotheses about language. These ongoing investigations bring us closer to understanding the representations and processes that underlie our ability to comprehend sentences and express thoughts in language.
Collapse
Affiliation(s)
- Greta Tuckute
- Department of Brain and Cognitive Sciences and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
| | - Nancy Kanwisher
- Department of Brain and Cognitive Sciences and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
| |
Collapse
|
2
|
Wang T, Lee TS, Yao H, Hong J, Li Y, Jiang H, Andolina IM, Tang S. Large-scale calcium imaging reveals a systematic V4 map for encoding natural scenes. Nat Commun 2024; 15:6401. [PMID: 39080309 PMCID: PMC11289446 DOI: 10.1038/s41467-024-50821-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 07/22/2024] [Indexed: 08/02/2024] Open
Abstract
Biological visual systems have evolved to process natural scenes. A full understanding of visual cortical functions requires a comprehensive characterization of how neuronal populations in each visual area encode natural scenes. Here, we utilized widefield calcium imaging to record V4 cortical response to tens of thousands of natural images in male macaques. Using this large dataset, we developed a deep-learning digital twin of V4 that allowed us to map the natural image preferences of the neural population at 100-µm scale. This detailed map revealed a diverse set of functional domains in V4, each encoding distinct natural image features. We validated these model predictions using additional widefield imaging and single-cell resolution two-photon imaging. Feature attribution analysis revealed that these domains lie along a continuum from preferring spatially localized shape features to preferring spatially dispersed surface features. These results provide insights into the organizing principles that govern natural scene encoding in V4.
Collapse
Affiliation(s)
- Tianye Wang
- Peking University School of Life Sciences, Beijing, 100871, China
- Peking-Tsinghua Center for Life Sciences, Beijing, 100871, China
- IDG/McGovern Institute for Brain Research at Peking University, Beijing, 100871, China
- Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing, 100871, China
| | - Tai Sing Lee
- Computer Science Department and Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| | - Haoxuan Yao
- Peking University School of Life Sciences, Beijing, 100871, China
- Peking-Tsinghua Center for Life Sciences, Beijing, 100871, China
- IDG/McGovern Institute for Brain Research at Peking University, Beijing, 100871, China
- Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing, 100871, China
| | - Jiayi Hong
- Peking University School of Life Sciences, Beijing, 100871, China
| | - Yang Li
- Peking University School of Life Sciences, Beijing, 100871, China
- Peking-Tsinghua Center for Life Sciences, Beijing, 100871, China
- IDG/McGovern Institute for Brain Research at Peking University, Beijing, 100871, China
- Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing, 100871, China
| | - Hongfei Jiang
- Peking University School of Life Sciences, Beijing, 100871, China
- Peking-Tsinghua Center for Life Sciences, Beijing, 100871, China
- IDG/McGovern Institute for Brain Research at Peking University, Beijing, 100871, China
- Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing, 100871, China
| | - Ian Max Andolina
- The Center for Excellence in Brain Science and Intelligence Technology, State Key Laboratory of Neuroscience, Key Laboratory of Primate Neurobiology, Institute of Neuroscience, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Shiming Tang
- Peking University School of Life Sciences, Beijing, 100871, China.
- Peking-Tsinghua Center for Life Sciences, Beijing, 100871, China.
- IDG/McGovern Institute for Brain Research at Peking University, Beijing, 100871, China.
- Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing, 100871, China.
| |
Collapse
|
3
|
Dipani A, McNeal N, Ratan Murty NA. Linking faces to social cognition: The temporal pole as a potential social switch. Proc Natl Acad Sci U S A 2024; 121:e2411735121. [PMID: 39024106 PMCID: PMC11295026 DOI: 10.1073/pnas.2411735121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/20/2024] Open
Affiliation(s)
- Alish Dipani
- Cognition and Brain Science, School of Psychology, Georgia Institute of Technology, Atlanta, GA30332
- Center of Excellence in Computational Cognition, Georgia Institute of Technology, Atlanta, GA30332
| | - Nikolas McNeal
- School of Mathematics, Georgia Institute of Technology, Atlanta, GA30332
| | - N. Apurva Ratan Murty
- Cognition and Brain Science, School of Psychology, Georgia Institute of Technology, Atlanta, GA30332
- Center of Excellence in Computational Cognition, Georgia Institute of Technology, Atlanta, GA30332
| |
Collapse
|
4
|
Lahner B, Dwivedi K, Iamshchinina P, Graumann M, Lascelles A, Roig G, Gifford AT, Pan B, Jin S, Ratan Murty NA, Kay K, Oliva A, Cichy R. Modeling short visual events through the BOLD moments video fMRI dataset and metadata. Nat Commun 2024; 15:6241. [PMID: 39048577 PMCID: PMC11269733 DOI: 10.1038/s41467-024-50310-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 07/04/2024] [Indexed: 07/27/2024] Open
Abstract
Studying the neural basis of human dynamic visual perception requires extensive experimental data to evaluate the large swathes of functionally diverse brain neural networks driven by perceiving visual events. Here, we introduce the BOLD Moments Dataset (BMD), a repository of whole-brain fMRI responses to over 1000 short (3 s) naturalistic video clips of visual events across ten human subjects. We use the videos' extensive metadata to show how the brain represents word- and sentence-level descriptions of visual events and identify correlates of video memorability scores extending into the parietal cortex. Furthermore, we reveal a match in hierarchical processing between cortical regions of interest and video-computable deep neural networks, and we showcase that BMD successfully captures temporal dynamics of visual events at second resolution. With its rich metadata, BMD offers new perspectives and accelerates research on the human brain basis of visual event perception.
Collapse
Affiliation(s)
- Benjamin Lahner
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA.
| | - Kshitij Dwivedi
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
- Department of Computer Science, Goethe University Frankfurt, Frankfurt am Main, Germany
| | - Polina Iamshchinina
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
| | - Monika Graumann
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
| | - Alex Lascelles
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
| | - Gemma Roig
- Department of Computer Science, Goethe University Frankfurt, Frankfurt am Main, Germany
- The Hessian Center for AI (hessian.AI), Darmstadt, Germany
| | | | - Bowen Pan
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
| | - SouYoung Jin
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
| | - N Apurva Ratan Murty
- Department of Brain and Cognitive Science, MIT, Cambridge, MA, USA
- School of Psychology, Georgia Institute of Technology, Atlanta, GA, USA
| | - Kendrick Kay
- Center for Magnetic Resonance Research (CMRR), Department of Radiology, University of Minnesota, Minneapolis, MN, USA
| | - Aude Oliva
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
| | - Radoslaw Cichy
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
| |
Collapse
|
5
|
Liu X, He D, Zhu M, Li Y, Lin L, Cai Q. Hemispheric dominance in reading system alters contribution to face processing lateralization across development. Dev Cogn Neurosci 2024; 69:101418. [PMID: 39059053 PMCID: PMC11331717 DOI: 10.1016/j.dcn.2024.101418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 07/07/2024] [Accepted: 07/21/2024] [Indexed: 07/28/2024] Open
Abstract
Face processing dominates the right hemisphere. This lateralization can be affected by co-lateralization within the same system and influence between different systems, such as neural competition from reading acquisition. Yet, how the relationship pattern changes through development remains unknown. This study examined the lateralization of core face processing and word processing in different age groups. By comparing fMRI data from 36 school-aged children and 40 young adults, we investigated whether there are age and regional effects on lateralization, and how relationships between lateralization within and between systems change across development. Our results showed significant right hemispheric lateralization in the core face system and left hemispheric lateralization in reading-related areas for both age groups when viewing faces and texts passively. While all participants showed stronger lateralization in brain regions of higher functional hierarchy when viewing faces, only adults exhibited this lateralization when viewing texts. In both age cohorts, there was intra-system co-lateralization for face processing, whereas an inter-system relationship was only found in adults. Specifically, functional lateralization of Broca's area during reading negatively predicted functional asymmetry in the FFA during face perception. This study initially provides neuroimaging evidence for the reading-induced neural competition theory from a maturational perspective in Chinese cohorts.
Collapse
Affiliation(s)
- Xinyang Liu
- Key Laboratory of Brain Functional Genomics (MOE & STCSM), Affiliated Mental Health Center (ECNU), Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, Shanghai 200062, China.
| | - Danni He
- Key Laboratory of Brain Functional Genomics (MOE & STCSM), Affiliated Mental Health Center (ECNU), Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, Shanghai 200062, China
| | - Miaomiao Zhu
- Key Laboratory of Brain Functional Genomics (MOE & STCSM), Affiliated Mental Health Center (ECNU), Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, Shanghai 200062, China
| | - Yinghui Li
- Key Laboratory of Brain Functional Genomics (MOE & STCSM), Affiliated Mental Health Center (ECNU), Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, Shanghai 200062, China
| | - Longnian Lin
- Key Laboratory of Brain Functional Genomics (MOE & STCSM), Affiliated Mental Health Center (ECNU), Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, Shanghai 200062, China; Shanghai Center for Brain Science and Brain-Inspired Technology, East China Normal University, China; NYU-ECNU Institute of Brain and Cognitive Science, New York University, Shanghai, China; School of Life Science Department, East China Normal University, Shanghai 200062, China.
| | - Qing Cai
- Key Laboratory of Brain Functional Genomics (MOE & STCSM), Affiliated Mental Health Center (ECNU), Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, Shanghai 200062, China; Shanghai Changning Mental Health Center, Shanghai 200335, China; Shanghai Center for Brain Science and Brain-Inspired Technology, East China Normal University, China; NYU-ECNU Institute of Brain and Cognitive Science, New York University, Shanghai, China.
| |
Collapse
|
6
|
Ren Y, Bashivan P. How well do models of visual cortex generalize to out of distribution samples? PLoS Comput Biol 2024; 20:e1011145. [PMID: 38820563 PMCID: PMC11216589 DOI: 10.1371/journal.pcbi.1011145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 07/01/2024] [Accepted: 04/29/2024] [Indexed: 06/02/2024] Open
Abstract
Unit activity in particular deep neural networks (DNNs) are remarkably similar to the neuronal population responses to static images along the primate ventral visual cortex. Linear combinations of DNN unit activities are widely used to build predictive models of neuronal activity in the visual cortex. Nevertheless, prediction performance in these models is often investigated on stimulus sets consisting of everyday objects under naturalistic settings. Recent work has revealed a generalization gap in how predicting neuronal responses to synthetically generated out-of-distribution (OOD) stimuli. Here, we investigated how the recent progress in improving DNNs' object recognition generalization, as well as various DNN design choices such as architecture, learning algorithm, and datasets have impacted the generalization gap in neural predictivity. We came to a surprising conclusion that the performance on none of the common computer vision OOD object recognition benchmarks is predictive of OOD neural predictivity performance. Furthermore, we found that adversarially robust models often yield substantially higher generalization in neural predictivity, although the degree of robustness itself was not predictive of neural predictivity score. These results suggest that improving object recognition behavior on current benchmarks alone may not lead to more general models of neurons in the primate ventral visual cortex.
Collapse
Affiliation(s)
- Yifei Ren
- Department of Computer Science, McGill University, Montreal, Canada
| | - Pouya Bashivan
- Department of Computer Science, McGill University, Montreal, Canada
- Department of Computer Physiology, McGill University, Montreal, Canada
- Mila, Université de Montréal, Montreal, Canada
| |
Collapse
|
7
|
Garlichs A, Blank H. Prediction error processing and sharpening of expected information across the face-processing hierarchy. Nat Commun 2024; 15:3407. [PMID: 38649694 PMCID: PMC11035707 DOI: 10.1038/s41467-024-47749-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 04/10/2024] [Indexed: 04/25/2024] Open
Abstract
The perception and neural processing of sensory information are strongly influenced by prior expectations. The integration of prior and sensory information can manifest through distinct underlying mechanisms: focusing on unexpected input, denoted as prediction error (PE) processing, or amplifying anticipated information via sharpened representation. In this study, we employed computational modeling using deep neural networks combined with representational similarity analyses of fMRI data to investigate these two processes during face perception. Participants were cued to see face images, some generated by morphing two faces, leading to ambiguity in face identity. We show that expected faces were identified faster and perception of ambiguous faces was shifted towards priors. Multivariate analyses uncovered evidence for PE processing across and beyond the face-processing hierarchy from the occipital face area (OFA), via the fusiform face area, to the anterior temporal lobe, and suggest sharpened representations in the OFA. Our findings support the proposition that the brain represents faces grounded in prior expectations.
Collapse
Affiliation(s)
- Annika Garlichs
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, 20246, Hamburg, Germany.
| | - Helen Blank
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, 20246, Hamburg, Germany.
| |
Collapse
|
8
|
Lahner B, Mohsenzadeh Y, Mullin C, Oliva A. Visual perception of highly memorable images is mediated by a distributed network of ventral visual regions that enable a late memorability response. PLoS Biol 2024; 22:e3002564. [PMID: 38557761 PMCID: PMC10984539 DOI: 10.1371/journal.pbio.3002564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 02/26/2024] [Indexed: 04/04/2024] Open
Abstract
Behavioral and neuroscience studies in humans and primates have shown that memorability is an intrinsic property of an image that predicts its strength of encoding into and retrieval from memory. While previous work has independently probed when or where this memorability effect may occur in the human brain, a description of its spatiotemporal dynamics is missing. Here, we used representational similarity analysis (RSA) to combine functional magnetic resonance imaging (fMRI) with source-estimated magnetoencephalography (MEG) to simultaneously measure when and where the human cortex is sensitive to differences in image memorability. Results reveal that visual perception of High Memorable images, compared to Low Memorable images, recruits a set of regions of interest (ROIs) distributed throughout the ventral visual cortex: a late memorability response (from around 300 ms) in early visual cortex (EVC), inferior temporal cortex, lateral occipital cortex, fusiform gyrus, and banks of the superior temporal sulcus. Image memorability magnitude results are represented after high-level feature processing in visual regions and reflected in classical memory regions in the medial temporal lobe (MTL). Our results present, to our knowledge, the first unified spatiotemporal account of visual memorability effect across the human cortex, further supporting the levels-of-processing theory of perception and memory.
Collapse
Affiliation(s)
- Benjamin Lahner
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Yalda Mohsenzadeh
- The Brain and Mind Institute, The University of Western Ontario, London, Canada
- Department of Computer Science, The University of Western Ontario, London, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
| | - Caitlin Mullin
- Vision: Science to Application (VISTA), York University, Toronto, Ontario, Canada
| | - Aude Oliva
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| |
Collapse
|
9
|
Jain S, Vo VA, Wehbe L, Huth AG. Computational Language Modeling and the Promise of In Silico Experimentation. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2024; 5:80-106. [PMID: 38645624 PMCID: PMC11025654 DOI: 10.1162/nol_a_00101] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 01/18/2023] [Indexed: 04/23/2024]
Abstract
Language neuroscience currently relies on two major experimental paradigms: controlled experiments using carefully hand-designed stimuli, and natural stimulus experiments. These approaches have complementary advantages which allow them to address distinct aspects of the neurobiology of language, but each approach also comes with drawbacks. Here we discuss a third paradigm-in silico experimentation using deep learning-based encoding models-that has been enabled by recent advances in cognitive computational neuroscience. This paradigm promises to combine the interpretability of controlled experiments with the generalizability and broad scope of natural stimulus experiments. We show four examples of simulating language neuroscience experiments in silico and then discuss both the advantages and caveats of this approach.
Collapse
Affiliation(s)
- Shailee Jain
- Department of Computer Science, University of Texas at Austin, Austin, TX, USA
| | - Vy A. Vo
- Brain-Inspired Computing Lab, Intel Labs, Hillsboro, OR, USA
| | - Leila Wehbe
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Alexander G. Huth
- Department of Computer Science, University of Texas at Austin, Austin, TX, USA
- Department of Neuroscience, University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
10
|
Walbrin J, Downing PE, Sotero FD, Almeida J. Characterizing the discriminability of visual categorical information in strongly connected voxels. Neuropsychologia 2024; 195:108815. [PMID: 38311112 DOI: 10.1016/j.neuropsychologia.2024.108815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 01/06/2024] [Accepted: 02/01/2024] [Indexed: 02/06/2024]
Abstract
Functional brain responses are strongly influenced by connectivity. Recently, we demonstrated a major example of this: category discriminability within occipitotemporal cortex (OTC) is enhanced for voxel sets that share strong functional connectivity to distal brain areas, relative to those that share lesser connectivity. That is, within OTC regions, sets of 'most-connected' voxels show improved multivoxel pattern discriminability for tool-, face-, and place stimuli relative to voxels with weaker connectivity to the wider brain. However, understanding whether these effects generalize to other domains (e.g. body perception network), and across different levels of the visual processing streams (e.g. dorsal as well as ventral stream areas) is an important extension of this work. Here, we show that this so-called connectivity-guided decoding (CGD) effect broadly generalizes across a wide range of categories (tools, faces, bodies, hands, places). This effect is robust across dorsal stream areas, but less consistent in earlier ventral stream areas. In the latter regions, category discriminability is generally very high, suggesting that extraction of category-relevant visual properties is less reliant on connectivity to downstream areas. Further, CGD effects are primarily expressed in a category-specific manner: For example, within the network of tool regions, discriminability of tool information is greater than non-tool information. The connectivity-guided decoding approach shown here provides a novel demonstration of the crucial relationship between wider brain connectivity and complex local-level functional responses at different levels of the visual processing streams. Further, this approach generates testable new hypotheses about the relationships between connectivity and local selectivity.
Collapse
Affiliation(s)
- Jon Walbrin
- Proaction Laboratory, Faculty of Psychology and Educational Sciences, University of Coimbra, Portugal; CINEICC, Faculty of Psychology and Educational Sciences, University of Coimbra, Portugal.
| | - Paul E Downing
- School of Human and Behavioural Sciences, Bangor University, Bangor, Wales
| | - Filipa Dourado Sotero
- Proaction Laboratory, Faculty of Psychology and Educational Sciences, University of Coimbra, Portugal; CINEICC, Faculty of Psychology and Educational Sciences, University of Coimbra, Portugal
| | - Jorge Almeida
- Proaction Laboratory, Faculty of Psychology and Educational Sciences, University of Coimbra, Portugal; CINEICC, Faculty of Psychology and Educational Sciences, University of Coimbra, Portugal
| |
Collapse
|
11
|
Liu P, Bo K, Ding M, Fang R. Emergence of Emotion Selectivity in Deep Neural Networks Trained to Recognize Visual Objects. PLoS Comput Biol 2024; 20:e1011943. [PMID: 38547053 PMCID: PMC10977720 DOI: 10.1371/journal.pcbi.1011943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 02/24/2024] [Indexed: 04/02/2024] Open
Abstract
Recent neuroimaging studies have shown that the visual cortex plays an important role in representing the affective significance of visual input. The origin of these affect-specific visual representations is debated: they are intrinsic to the visual system versus they arise through reentry from frontal emotion processing structures such as the amygdala. We examined this problem by combining convolutional neural network (CNN) models of the human ventral visual cortex pre-trained on ImageNet with two datasets of affective images. Our results show that in all layers of the CNN models, there were artificial neurons that responded consistently and selectively to neutral, pleasant, or unpleasant images and lesioning these neurons by setting their output to zero or enhancing these neurons by increasing their gain led to decreased or increased emotion recognition performance respectively. These results support the idea that the visual system may have the intrinsic ability to represent the affective significance of visual input and suggest that CNNs offer a fruitful platform for testing neuroscientific theories.
Collapse
Affiliation(s)
- Peng Liu
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, United States of America
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, New Hampshire, United States of America
| | - Ke Bo
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, New Hampshire, United States of America
| | - Mingzhou Ding
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, United States of America
| | - Ruogu Fang
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, United States of America
- Center for Cognitive Aging and Memory, McKnight Brain Institute, University of Florida, Gainesville, Florida, United States of America
| |
Collapse
|
12
|
Stoinski LM, Perkuhn J, Hebart MN. THINGSplus: New norms and metadata for the THINGS database of 1854 object concepts and 26,107 natural object images. Behav Res Methods 2024; 56:1583-1603. [PMID: 37095326 PMCID: PMC10991023 DOI: 10.3758/s13428-023-02110-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/13/2023] [Indexed: 04/26/2023]
Abstract
To study visual and semantic object representations, the need for well-curated object concepts and images has grown significantly over the past years. To address this, we have previously developed THINGS, a large-scale database of 1854 systematically sampled object concepts with 26,107 high-quality naturalistic images of these concepts. With THINGSplus, we significantly extend THINGS by adding concept- and image-specific norms and metadata for all 1854 concepts and one copyright-free image example per concept. Concept-specific norms were collected for the properties of real-world size, manmadeness, preciousness, liveliness, heaviness, naturalness, ability to move or be moved, graspability, holdability, pleasantness, and arousal. Further, we provide 53 superordinate categories as well as typicality ratings for all their members. Image-specific metadata includes a nameability measure, based on human-generated labels of the objects depicted in the 26,107 images. Finally, we identified one new public domain image per concept. Property (M = 0.97, SD = 0.03) and typicality ratings (M = 0.97, SD = 0.01) demonstrate excellent consistency, with the subsequently collected arousal ratings as the only exception (r = 0.69). Our property (M = 0.85, SD = 0.11) and typicality (r = 0.72, 0.74, 0.88) data correlated strongly with external norms, again with the lowest validity for arousal (M = 0.41, SD = 0.08). To summarize, THINGSplus provides a large-scale, externally validated extension to existing object norms and an important extension to THINGS, allowing detailed selection of stimuli and control variables for a wide range of research interested in visual object processing, language, and semantic memory.
Collapse
Affiliation(s)
- Laura M Stoinski
- Max Planck Institute for Human Cognitive & Brain Sciences, Leipzig, Germany.
| | - Jonas Perkuhn
- Max Planck Institute for Human Cognitive & Brain Sciences, Leipzig, Germany
| | - Martin N Hebart
- Max Planck Institute for Human Cognitive & Brain Sciences, Leipzig, Germany
- Justus Liebig University, Gießen, Germany
| |
Collapse
|
13
|
Op de Beeck H, Bracci S. Going after the bigger picture: Using high-capacity models to understand mind and brain. Behav Brain Sci 2023; 46:e404. [PMID: 38054291 DOI: 10.1017/s0140525x2300153x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Deep neural networks (DNNs) provide a unique opportunity to move towards a generic modelling framework in psychology. The high representational capacity of these models combined with the possibility for further extensions has already allowed us to investigate the forest, namely the complex landscape of representations and processes that underlie human cognition, without forgetting about the trees, which include individual psychological phenomena.
Collapse
Affiliation(s)
| | - Stefania Bracci
- Center for Mind/Brain Sciences, University of Trento, Rovereto, Italy ://webapps.unitn.it/du/en/Persona/PER0076943/Curriculum
| |
Collapse
|
14
|
Tuckute G, Feather J, Boebinger D, McDermott JH. Many but not all deep neural network audio models capture brain responses and exhibit correspondence between model stages and brain regions. PLoS Biol 2023; 21:e3002366. [PMID: 38091351 PMCID: PMC10718467 DOI: 10.1371/journal.pbio.3002366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 10/06/2023] [Indexed: 12/18/2023] Open
Abstract
Models that predict brain responses to stimuli provide one measure of understanding of a sensory system and have many potential applications in science and engineering. Deep artificial neural networks have emerged as the leading such predictive models of the visual system but are less explored in audition. Prior work provided examples of audio-trained neural networks that produced good predictions of auditory cortical fMRI responses and exhibited correspondence between model stages and brain regions, but left it unclear whether these results generalize to other neural network models and, thus, how to further improve models in this domain. We evaluated model-brain correspondence for publicly available audio neural network models along with in-house models trained on 4 different tasks. Most tested models outpredicted standard spectromporal filter-bank models of auditory cortex and exhibited systematic model-brain correspondence: Middle stages best predicted primary auditory cortex, while deep stages best predicted non-primary cortex. However, some state-of-the-art models produced substantially worse brain predictions. Models trained to recognize speech in background noise produced better brain predictions than models trained to recognize speech in quiet, potentially because hearing in noise imposes constraints on biological auditory representations. The training task influenced the prediction quality for specific cortical tuning properties, with best overall predictions resulting from models trained on multiple tasks. The results generally support the promise of deep neural networks as models of audition, though they also indicate that current models do not explain auditory cortical responses in their entirety.
Collapse
Affiliation(s)
- Greta Tuckute
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
| | - Jenelle Feather
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
| | - Dana Boebinger
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
- Program in Speech and Hearing Biosciences and Technology, Harvard, Cambridge, Massachusetts, United States of America
- University of Rochester Medical Center, Rochester, New York, New York, United States of America
| | - Josh H. McDermott
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
- Program in Speech and Hearing Biosciences and Technology, Harvard, Cambridge, Massachusetts, United States of America
| |
Collapse
|
15
|
Tuckute G, Sathe A, Srikant S, Taliaferro M, Wang M, Schrimpf M, Kay K, Fedorenko E. Driving and suppressing the human language network using large language models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.16.537080. [PMID: 37090673 PMCID: PMC10120732 DOI: 10.1101/2023.04.16.537080] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2023]
Abstract
Transformer models such as GPT generate human-like language and are highly predictive of human brain responses to language. Here, using fMRI-measured brain responses to 1,000 diverse sentences, we first show that a GPT-based encoding model can predict the magnitude of brain response associated with each sentence. Then, we use the model to identify new sentences that are predicted to drive or suppress responses in the human language network. We show that these model-selected novel sentences indeed strongly drive and suppress activity of human language areas in new individuals. A systematic analysis of the model-selected sentences reveals that surprisal and well-formedness of linguistic input are key determinants of response strength in the language network. These results establish the ability of neural network models to not only mimic human language but also noninvasively control neural activity in higher-level cortical areas, like the language network.
Collapse
Affiliation(s)
- Greta Tuckute
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Aalok Sathe
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Shashank Srikant
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- MIT-IBM Watson AI Lab, Cambridge, MA 02142, USA
| | - Maya Taliaferro
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Mingye Wang
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Martin Schrimpf
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- Quest for Intelligence, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- Neuro-X Institute, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - Kendrick Kay
- Center for Magnetic Resonance Research, University of Minnesota, Minneapolis, MN 55455 USA
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- The Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA 02138 USA
| |
Collapse
|
16
|
Jiahui G, Feilong M, Visconti di Oleggio Castello M, Nastase SA, Haxby JV, Gobbini MI. Modeling naturalistic face processing in humans with deep convolutional neural networks. Proc Natl Acad Sci U S A 2023; 120:e2304085120. [PMID: 37847731 PMCID: PMC10614847 DOI: 10.1073/pnas.2304085120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 09/11/2023] [Indexed: 10/19/2023] Open
Abstract
Deep convolutional neural networks (DCNNs) trained for face identification can rival and even exceed human-level performance. The ways in which the internal face representations in DCNNs relate to human cognitive representations and brain activity are not well understood. Nearly all previous studies focused on static face image processing with rapid display times and ignored the processing of naturalistic, dynamic information. To address this gap, we developed the largest naturalistic dynamic face stimulus set in human neuroimaging research (700+ naturalistic video clips of unfamiliar faces). We used this naturalistic dataset to compare representational geometries estimated from DCNNs, behavioral responses, and brain responses. We found that DCNN representational geometries were consistent across architectures, cognitive representational geometries were consistent across raters in a behavioral arrangement task, and neural representational geometries in face areas were consistent across brains. Representational geometries in late, fully connected DCNN layers, which are optimized for individuation, were much more weakly correlated with cognitive and neural geometries than were geometries in late-intermediate layers. The late-intermediate face-DCNN layers successfully matched cognitive representational geometries, as measured with a behavioral arrangement task that primarily reflected categorical attributes, and correlated with neural representational geometries in known face-selective topographies. Our study suggests that current DCNNs successfully capture neural cognitive processes for categorical attributes of faces but less accurately capture individuation and dynamic features.
Collapse
Affiliation(s)
- Guo Jiahui
- Center for Cognitive Neuroscience, Dartmouth College, Hanover, NH03755
| | - Ma Feilong
- Center for Cognitive Neuroscience, Dartmouth College, Hanover, NH03755
| | | | - Samuel A. Nastase
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ08544
| | - James V. Haxby
- Center for Cognitive Neuroscience, Dartmouth College, Hanover, NH03755
| | - M. Ida Gobbini
- Department of Medical and Surgical Sciences, University of Bologna, Bologna40138, Italy
- Istituti di Ricovero e Cura a Carattere Scientifico, Istituto delle Scienze Neurologiche di Bologna, Bologna40139, Italia
| |
Collapse
|
17
|
Gu Z, Jamison K, Sabuncu MR, Kuceyeski A. Human brain responses are modulated when exposed to optimized natural images or synthetically generated images. Commun Biol 2023; 6:1076. [PMID: 37872319 PMCID: PMC10593916 DOI: 10.1038/s42003-023-05440-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 10/10/2023] [Indexed: 10/25/2023] Open
Abstract
Understanding how human brains interpret and process information is important. Here, we investigated the selectivity and inter-individual differences in human brain responses to images via functional MRI. In our first experiment, we found that images predicted to achieve maximal activations using a group level encoding model evoke higher responses than images predicted to achieve average activations, and the activation gain is positively associated with the encoding model accuracy. Furthermore, anterior temporal lobe face area (aTLfaces) and fusiform body area 1 had higher activation in response to maximal synthetic images compared to maximal natural images. In our second experiment, we found that synthetic images derived using a personalized encoding model elicited higher responses compared to synthetic images from group-level or other subjects' encoding models. The finding of aTLfaces favoring synthetic images than natural images was also replicated. Our results indicate the possibility of using data-driven and generative approaches to modulate macro-scale brain region responses and probe inter-individual differences in and functional specialization of the human visual system.
Collapse
Affiliation(s)
- Zijin Gu
- School of Electrical and Computer Engineering, Cornell University and Cornell Tech, New York, NY, USA
| | - Keith Jamison
- Department of Radiology, Weill Cornell Medicine, New York, NY, USA
| | - Mert R Sabuncu
- School of Electrical and Computer Engineering, Cornell University and Cornell Tech, New York, NY, USA
- Department of Radiology, Weill Cornell Medicine, New York, NY, USA
| | - Amy Kuceyeski
- Department of Radiology, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
18
|
Ma G, Yan R, Tang H. Exploiting noise as a resource for computation and learning in spiking neural networks. PATTERNS (NEW YORK, N.Y.) 2023; 4:100831. [PMID: 37876899 PMCID: PMC10591140 DOI: 10.1016/j.patter.2023.100831] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 07/06/2023] [Accepted: 08/07/2023] [Indexed: 10/26/2023]
Abstract
Networks of spiking neurons underpin the extraordinary information-processing capabilities of the brain and have become pillar models in neuromorphic artificial intelligence. Despite extensive research on spiking neural networks (SNNs), most studies are established on deterministic models, overlooking the inherent non-deterministic, noisy nature of neural computations. This study introduces the noisy SNN (NSNN) and the noise-driven learning (NDL) rule by incorporating noisy neuronal dynamics to exploit the computational advantages of noisy neural processing. The NSNN provides a theoretical framework that yields scalable, flexible, and reliable computation and learning. We demonstrate that this framework leads to spiking neural models with competitive performance, improved robustness against challenging perturbations compared with deterministic SNNs, and better reproducing probabilistic computation in neural coding. Generally, this study offers a powerful and easy-to-use tool for machine learning, neuromorphic intelligence practitioners, and computational neuroscience researchers.
Collapse
Affiliation(s)
- Gehua Ma
- College of Computer Science and Technology, Zhejiang University, Hangzhou, PRC
| | - Rui Yan
- College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, PRC
| | - Huajin Tang
- College of Computer Science and Technology, Zhejiang University, Hangzhou, PRC
- State Key Lab of Brain-Machine Intelligence, Zhejiang University, Hangzhou, PRC
| |
Collapse
|
19
|
van Dyck LE, Gruber WR. Modeling Biological Face Recognition with Deep Convolutional Neural Networks. J Cogn Neurosci 2023; 35:1521-1537. [PMID: 37584587 DOI: 10.1162/jocn_a_02040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2023]
Abstract
Deep convolutional neural networks (DCNNs) have become the state-of-the-art computational models of biological object recognition. Their remarkable success has helped vision science break new ground, and recent efforts have started to transfer this achievement to research on biological face recognition. In this regard, face detection can be investigated by comparing face-selective biological neurons and brain areas to artificial neurons and model layers. Similarly, face identification can be examined by comparing in vivo and in silico multidimensional "face spaces." In this review, we summarize the first studies that use DCNNs to model biological face recognition. On the basis of a broad spectrum of behavioral and computational evidence, we conclude that DCNNs are useful models that closely resemble the general hierarchical organization of face recognition in the ventral visual pathway and the core face network. In two exemplary spotlights, we emphasize the unique scientific contributions of these models. First, studies on face detection in DCNNs indicate that elementary face selectivity emerges automatically through feedforward processing even in the absence of visual experience. Second, studies on face identification in DCNNs suggest that identity-specific experience and generative mechanisms facilitate this particular challenge. Taken together, as this novel modeling approach enables close control of predisposition (i.e., architecture) and experience (i.e., training data), it may be suited to inform long-standing debates on the substrates of biological face recognition.
Collapse
|
20
|
Yao M, Wen B, Yang M, Guo J, Jiang H, Feng C, Cao Y, He H, Chang L. High-dimensional topographic organization of visual features in the primate temporal lobe. Nat Commun 2023; 14:5931. [PMID: 37739988 PMCID: PMC10517140 DOI: 10.1038/s41467-023-41584-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 09/07/2023] [Indexed: 09/24/2023] Open
Abstract
The inferotemporal cortex supports our supreme object recognition ability. Numerous studies have been conducted to elucidate the functional organization of this brain area, but there are still important questions that remain unanswered, including how this organization differs between humans and non-human primates. Here, we use deep neural networks trained on object categorization to construct a 25-dimensional space of visual features, and systematically measure the spatial organization of feature preference in both male monkey brains and human brains using fMRI. These feature maps allow us to predict the selectivity of a previously unknown region in monkey brains, which is corroborated by additional fMRI and electrophysiology experiments. These maps also enable quantitative analyses of the topographic organization of the temporal lobe, demonstrating the existence of a pair of orthogonal gradients that differ in spatial scale and revealing significant differences in the functional organization of high-level visual areas between monkey and human brains.
Collapse
Affiliation(s)
- Mengna Yao
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Bincheng Wen
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
- Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
| | - Mingpo Yang
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Jiebin Guo
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Haozhou Jiang
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Chao Feng
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Yilei Cao
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Huiguang He
- University of Chinese Academy of Sciences, Beijing, 100049, China
- Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
| | - Le Chang
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
21
|
Ozcelik F, VanRullen R. Natural scene reconstruction from fMRI signals using generative latent diffusion. Sci Rep 2023; 13:15666. [PMID: 37731047 PMCID: PMC10511448 DOI: 10.1038/s41598-023-42891-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 09/15/2023] [Indexed: 09/22/2023] Open
Abstract
In neural decoding research, one of the most intriguing topics is the reconstruction of perceived natural images based on fMRI signals. Previous studies have succeeded in re-creating different aspects of the visuals, such as low-level properties (shape, texture, layout) or high-level features (category of objects, descriptive semantics of scenes) but have typically failed to reconstruct these properties together for complex scene images. Generative AI has recently made a leap forward with latent diffusion models capable of generating high-complexity images. Here, we investigate how to take advantage of this innovative technology for brain decoding. We present a two-stage scene reconstruction framework called "Brain-Diffuser". In the first stage, starting from fMRI signals, we reconstruct images that capture low-level properties and overall layout using a VDVAE (Very Deep Variational Autoencoder) model. In the second stage, we use the image-to-image framework of a latent diffusion model (Versatile Diffusion) conditioned on predicted multimodal (text and visual) features, to generate final reconstructed images. On the publicly available Natural Scenes Dataset benchmark, our method outperforms previous models both qualitatively and quantitatively. When applied to synthetic fMRI patterns generated from individual ROI (region-of-interest) masks, our trained model creates compelling "ROI-optimal" scenes consistent with neuroscientific knowledge. Thus, the proposed methodology can have an impact on both applied (e.g. brain-computer interface) and fundamental neuroscience.
Collapse
Affiliation(s)
- Furkan Ozcelik
- CerCo, CNRS UMR5549, Toulouse, France.
- Universite de Toulouse, Toulouse, France.
| | - Rufin VanRullen
- CerCo, CNRS UMR5549, Toulouse, France
- Universite de Toulouse, Toulouse, France
- ANITI, Toulouse, France
| |
Collapse
|
22
|
Vinken K, Prince JS, Konkle T, Livingstone MS. The neural code for "face cells" is not face-specific. SCIENCE ADVANCES 2023; 9:eadg1736. [PMID: 37647400 PMCID: PMC10468123 DOI: 10.1126/sciadv.adg1736] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 07/27/2023] [Indexed: 09/01/2023]
Abstract
Face cells are neurons that respond more to faces than to non-face objects. They are found in clusters in the inferotemporal cortex, thought to process faces specifically, and, hence, studied using faces almost exclusively. Analyzing neural responses in and around macaque face patches to hundreds of objects, we found graded response profiles for non-face objects that predicted the degree of face selectivity and provided information on face-cell tuning beyond that from actual faces. This relationship between non-face and face responses was not predicted by color and simple shape properties but by information encoded in deep neural networks trained on general objects rather than face classification. These findings contradict the long-standing assumption that face versus non-face selectivity emerges from face-specific features and challenge the practice of focusing on only the most effective stimulus. They provide evidence instead that category-selective neurons are best understood by their tuning directions in a domain-general object space.
Collapse
Affiliation(s)
- Kasper Vinken
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Jacob S. Prince
- Department of Psychology, Harvard University, Cambridge, MA 02478, USA
| | - Talia Konkle
- Department of Psychology, Harvard University, Cambridge, MA 02478, USA
| | | |
Collapse
|
23
|
Yang Y, Liu X, Li W, Li C, Ma G, Yang G, Ren J, Ge S. Detection of Hindwing Landmarks Using Transfer Learning and High-Resolution Networks. BIOLOGY 2023; 12:1006. [PMID: 37508435 PMCID: PMC10376506 DOI: 10.3390/biology12071006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 06/25/2023] [Accepted: 07/11/2023] [Indexed: 07/30/2023]
Abstract
Hindwing venation is one of the most important morphological features for the functional and evolutionary analysis of beetles, as it is one of the key features used for the analysis of beetle flight performance and the design of beetle-like flapping wing micro aerial vehicles. However, manual landmark annotation for hindwing morphological analysis is a time-consuming process hindering the development of wing morphology research. In this paper, we present a novel approach for the detection of landmarks on the hindwings of leaf beetles (Coleoptera, Chrysomelidae) using a limited number of samples. The proposed method entails the transfer of a pre-existing model, trained on a large natural image dataset, to the specific domain of leaf beetle hindwings. This is achieved by using a deep high-resolution network as the backbone. The low-stage network parameters are frozen, while the high-stage parameters are re-trained to construct a leaf beetle hindwing landmark detection model. A leaf beetle hindwing landmark dataset was constructed, and the network was trained on varying numbers of randomly selected hindwing samples. The results demonstrate that the average detection normalized mean error for specific landmarks of leaf beetle hindwings (100 samples) remains below 0.02 and only reached 0.045 when using a mere three samples for training. Comparative analyses reveal that the proposed approach out-performs a prevalently used method (i.e., a deep residual network). This study showcases the practicability of employing natural images-specifically, those in ImageNet-for the purpose of pre-training leaf beetle hindwing landmark detection models in particular, providing a promising approach for insect wing venation digitization.
Collapse
Affiliation(s)
- Yi Yang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiaokun Liu
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wenjie Li
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Congqiao Li
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Ge Ma
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guangqin Yang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jing Ren
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Siqin Ge
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
24
|
Huber LS, Geirhos R, Wichmann FA. The developmental trajectory of object recognition robustness: Children are like small adults but unlike big deep neural networks. J Vis 2023; 23:4. [PMID: 37410494 PMCID: PMC10337805 DOI: 10.1167/jov.23.7.4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 05/10/2023] [Indexed: 07/07/2023] Open
Abstract
In laboratory object recognition tasks based on undistorted photographs, both adult humans and deep neural networks (DNNs) perform close to ceiling. Unlike adults', whose object recognition performance is robust against a wide range of image distortions, DNNs trained on standard ImageNet (1.3M images) perform poorly on distorted images. However, the last 2 years have seen impressive gains in DNN distortion robustness, predominantly achieved through ever-increasing large-scale datasets-orders of magnitude larger than ImageNet. Although this simple brute-force approach is very effective in achieving human-level robustness in DNNs, it raises the question of whether human robustness, too, is simply due to extensive experience with (distorted) visual input during childhood and beyond. Here we investigate this question by comparing the core object recognition performance of 146 children (aged 4-15 years) against adults and against DNNs. We find, first, that already 4- to 6-year-olds show remarkable robustness to image distortions and outperform DNNs trained on ImageNet. Second, we estimated the number of images children had been exposed to during their lifetime. Compared with various DNNs, children's high robustness requires relatively little data. Third, when recognizing objects, children-like adults but unlike DNNs-rely heavily on shape but not on texture cues. Together our results suggest that the remarkable robustness to distortions emerges early in the developmental trajectory of human object recognition and is unlikely the result of a mere accumulation of experience with distorted visual input. Even though current DNNs match human performance regarding robustness, they seem to rely on different and more data-hungry strategies to do so.
Collapse
Affiliation(s)
- Lukas S Huber
- Department of Psychology, University of Bern, Bern, Switzerland
- Neural Information Processing Group, University of Tübingen, Tübingen, Germany
- https://orcid.org/0000-0002-7755-6926
| | - Robert Geirhos
- Neural Information Processing Group, University of Tübingen, Tübingen, Germany
- https://orcid.org/0000-0001-7698-3187
| | - Felix A Wichmann
- Neural Information Processing Group, University of Tübingen, Tübingen, Germany
- https://orcid.org/0000-0002-2592-634X
| |
Collapse
|
25
|
Doshi FR, Konkle T. Cortical topographic motifs emerge in a self-organized map of object space. SCIENCE ADVANCES 2023; 9:eade8187. [PMID: 37343093 DOI: 10.1126/sciadv.ade8187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 05/17/2023] [Indexed: 06/23/2023]
Abstract
The human ventral visual stream has a highly systematic organization of object information, but the causal pressures driving these topographic motifs are highly debated. Here, we use self-organizing principles to learn a topographic representation of the data manifold of a deep neural network representational space. We find that a smooth mapping of this representational space showed many brain-like motifs, with a large-scale organization by animacy and real-world object size, supported by mid-level feature tuning, with naturally emerging face- and scene-selective regions. While some theories of the object-selective cortex posit that these differently tuned regions of the brain reflect a collection of distinctly specified functional modules, the present work provides computational support for an alternate hypothesis that the tuning and topography of the object-selective cortex reflect a smooth mapping of a unified representational space.
Collapse
Affiliation(s)
- Fenil R Doshi
- Department of Psychology and Center for Brain Sciences, Harvard University, Cambridge, MA, USA
| | - Talia Konkle
- Department of Psychology and Center for Brain Sciences, Harvard University, Cambridge, MA, USA
| |
Collapse
|
26
|
Marrazzo G, De Martino F, Lage-Castellanos A, Vaessen MJ, de Gelder B. Voxelwise encoding models of body stimuli reveal a representational gradient from low-level visual features to postural features in occipitotemporal cortex. Neuroimage 2023:120240. [PMID: 37348622 DOI: 10.1016/j.neuroimage.2023.120240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 06/16/2023] [Accepted: 06/19/2023] [Indexed: 06/24/2023] Open
Abstract
Research on body representation in the brain has focused on category-specific representation, using fMRI to investigate the response pattern to body stimuli in occipitotemporal cortex without so far addressing the issue of the specific computations involved in body selective regions, only defined by higher order category selectivity. This study used ultra-high field fMRI and banded ridge regression to investigate the coding of body images, by comparing the performance of three encoding models in predicting brain activity in occipitotemporal cortex and specifically the extrastriate body area (EBA). Our results suggest that bodies are encoded in occipitotemporal cortex and in the EBA according to a combination of low-level visual features and postural features.
Collapse
Affiliation(s)
- Giuseppe Marrazzo
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Limburg 6200 MD, Maastricht, The Netherlands
| | - Federico De Martino
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Limburg 6200 MD, Maastricht, The Netherlands; Center for Magnetic Resonance Research, Department of Radiology, University of Minnesota, Minneapolis, MN 55455, United States and Department of NeuroInformatics
| | - Agustin Lage-Castellanos
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Limburg 6200 MD, Maastricht, The Netherlands; Cuban Center for Neuroscience, Street 190 e/25 and 27 Cubanacán Playa Havana, CP 11600, Cuba
| | - Maarten J Vaessen
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Limburg 6200 MD, Maastricht, The Netherlands
| | - Beatrice de Gelder
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Limburg 6200 MD, Maastricht, The Netherlands.
| |
Collapse
|
27
|
Doerig A, Sommers RP, Seeliger K, Richards B, Ismael J, Lindsay GW, Kording KP, Konkle T, van Gerven MAJ, Kriegeskorte N, Kietzmann TC. The neuroconnectionist research programme. Nat Rev Neurosci 2023:10.1038/s41583-023-00705-w. [PMID: 37253949 DOI: 10.1038/s41583-023-00705-w] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/21/2023] [Indexed: 06/01/2023]
Abstract
Artificial neural networks (ANNs) inspired by biology are beginning to be widely used to model behavioural and neural data, an approach we call 'neuroconnectionism'. ANNs have been not only lauded as the current best models of information processing in the brain but also criticized for failing to account for basic cognitive functions. In this Perspective article, we propose that arguing about the successes and failures of a restricted set of current ANNs is the wrong approach to assess the promise of neuroconnectionism for brain science. Instead, we take inspiration from the philosophy of science, and in particular from Lakatos, who showed that the core of a scientific research programme is often not directly falsifiable but should be assessed by its capacity to generate novel insights. Following this view, we present neuroconnectionism as a general research programme centred around ANNs as a computational language for expressing falsifiable theories about brain computation. We describe the core of the programme, the underlying computational framework and its tools for testing specific neuroscientific hypotheses and deriving novel understanding. Taking a longitudinal view, we review past and present neuroconnectionist projects and their responses to challenges and argue that the research programme is highly progressive, generating new and otherwise unreachable insights into the workings of the brain.
Collapse
Affiliation(s)
- Adrien Doerig
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany.
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands.
| | - Rowan P Sommers
- Department of Neurobiology of Language, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| | - Katja Seeliger
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Blake Richards
- Department of Neurology and Neurosurgery, McGill University, Montréal, QC, Canada
- School of Computer Science, McGill University, Montréal, QC, Canada
- Mila, Montréal, QC, Canada
- Montréal Neurological Institute, Montréal, QC, Canada
- Learning in Machines and Brains Program, CIFAR, Toronto, ON, Canada
| | | | | | - Konrad P Kording
- Learning in Machines and Brains Program, CIFAR, Toronto, ON, Canada
- Bioengineering, Neuroscience, University of Pennsylvania, Pennsylvania, PA, USA
| | | | | | | | - Tim C Kietzmann
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany
| |
Collapse
|
28
|
Bracci S, Mraz J, Zeman A, Leys G, Op de Beeck H. The representational hierarchy in human and artificial visual systems in the presence of object-scene regularities. PLoS Comput Biol 2023; 19:e1011086. [PMID: 37115763 PMCID: PMC10171658 DOI: 10.1371/journal.pcbi.1011086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 05/10/2023] [Accepted: 04/09/2023] [Indexed: 04/29/2023] Open
Abstract
Human vision is still largely unexplained. Computer vision made impressive progress on this front, but it is still unclear to which extent artificial neural networks approximate human object vision at the behavioral and neural levels. Here, we investigated whether machine object vision mimics the representational hierarchy of human object vision with an experimental design that allows testing within-domain representations for animals and scenes, as well as across-domain representations reflecting their real-world contextual regularities such as animal-scene pairs that often co-occur in the visual environment. We found that DCNNs trained in object recognition acquire representations, in their late processing stage, that closely capture human conceptual judgements about the co-occurrence of animals and their typical scenes. Likewise, the DCNNs representational hierarchy shows surprising similarities with the representational transformations emerging in domain-specific ventrotemporal areas up to domain-general frontoparietal areas. Despite these remarkable similarities, the underlying information processing differs. The ability of neural networks to learn a human-like high-level conceptual representation of object-scene co-occurrence depends upon the amount of object-scene co-occurrence present in the image set thus highlighting the fundamental role of training history. Further, although mid/high-level DCNN layers represent the category division for animals and scenes as observed in VTC, its information content shows reduced domain-specific representational richness. To conclude, by testing within- and between-domain selectivity while manipulating contextual regularities we reveal unknown similarities and differences in the information processing strategies employed by human and artificial visual systems.
Collapse
Affiliation(s)
- Stefania Bracci
- Center for Mind/Brain Sciences-CIMeC, University of Trento, Rovereto, Italy
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Jakob Mraz
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Astrid Zeman
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Gaëlle Leys
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Hans Op de Beeck
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| |
Collapse
|
29
|
Nakai T, Nishimoto S. Artificial neural network modelling of the neural population code underlying mathematical operations. Neuroimage 2023; 270:119980. [PMID: 36848969 DOI: 10.1016/j.neuroimage.2023.119980] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Revised: 02/10/2023] [Accepted: 02/23/2023] [Indexed: 02/28/2023] Open
Abstract
Mathematical operations have long been regarded as a sparse, symbolic process in neuroimaging studies. In contrast, advances in artificial neural networks (ANN) have enabled extracting distributed representations of mathematical operations. Recent neuroimaging studies have compared distributed representations of the visual, auditory and language domains in ANNs and biological neural networks (BNNs). However, such a relationship has not yet been examined in mathematics. Here we hypothesise that ANN-based distributed representations can explain brain activity patterns of symbolic mathematical operations. We used the fMRI data of a series of mathematical problems with nine different combinations of operators to construct voxel-wise encoding/decoding models using both sparse operator and latent ANN features. Representational similarity analysis demonstrated shared representations between ANN and BNN, an effect particularly evident in the intraparietal sulcus. Feature-brain similarity (FBS) analysis served to reconstruct a sparse representation of mathematical operations based on distributed ANN features in each cortical voxel. Such reconstruction was more efficient when using features from deeper ANN layers. Moreover, latent ANN features allowed the decoding of novel operators not used during model training from brain activity. The current study provides novel insights into the neural code underlying mathematical thought.
Collapse
Affiliation(s)
- Tomoya Nakai
- Center for Information and Neural Networks, National Institute of Information and Communications Technology, Suita, Japan; Lyon Neuroscience Research Center (CRNL), INSERM U1028 - CNRS UMR5292, University of Lyon, Bron, France.
| | - Shinji Nishimoto
- Center for Information and Neural Networks, National Institute of Information and Communications Technology, Suita, Japan; Graduate School of Frontier Biosciences, Osaka University, Suita, Japan; Graduate School of Medicine, Osaka University, Suita, Japan
| |
Collapse
|
30
|
Bracci S, Op de Beeck HP. Understanding Human Object Vision: A Picture Is Worth a Thousand Representations. Annu Rev Psychol 2023; 74:113-135. [PMID: 36378917 DOI: 10.1146/annurev-psych-032720-041031] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Objects are the core meaningful elements in our visual environment. Classic theories of object vision focus upon object recognition and are elegant and simple. Some of their proposals still stand, yet the simplicity is gone. Recent evolutions in behavioral paradigms, neuroscientific methods, and computational modeling have allowed vision scientists to uncover the complexity of the multidimensional representational space that underlies object vision. We review these findings and propose that the key to understanding this complexity is to relate object vision to the full repertoire of behavioral goals that underlie human behavior, running far beyond object recognition. There might be no such thing as core object recognition, and if it exists, then its importance is more limited than traditionally thought.
Collapse
Affiliation(s)
- Stefania Bracci
- Center for Mind/Brain Sciences, University of Trento, Rovereto, Italy;
| | - Hans P Op de Beeck
- Leuven Brain Institute, Research Unit Brain & Cognition, KU Leuven, Leuven, Belgium;
| |
Collapse
|
31
|
Kanwisher N, Gupta P, Dobs K. CNNs reveal the computational implausibility of the expertise hypothesis. iScience 2023; 26:105976. [PMID: 36794151 PMCID: PMC9923184 DOI: 10.1016/j.isci.2023.105976] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 11/07/2022] [Accepted: 01/11/2023] [Indexed: 01/15/2023] Open
Abstract
Face perception has long served as a classic example of domain specificity of mind and brain. But an alternative "expertise" hypothesis holds that putatively face-specific mechanisms are actually domain-general, and can be recruited for the perception of other objects of expertise (e.g., cars for car experts). Here, we demonstrate the computational implausibility of this hypothesis: Neural network models optimized for generic object categorization provide a better foundation for expert fine-grained discrimination than do models optimized for face recognition.
Collapse
Affiliation(s)
- Nancy Kanwisher
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA,McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Pranjul Gupta
- Department of Psychology, Justus-Liebig University Giessen, 35394 Giessen, Germany
| | - Katharina Dobs
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA,McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139, USA,Department of Psychology, Justus-Liebig University Giessen, 35394 Giessen, Germany,Center for Mind, Brain and Behavior (CMBB), University of Marburg and Justus-Liebig University, 35032 Marburg, Germany,Corresponding author
| |
Collapse
|
32
|
Gu Z, Jamison K, Sabuncu M, Kuceyeski A. Personalized visual encoding model construction with small data. Commun Biol 2022; 5:1382. [PMID: 36528715 PMCID: PMC9759560 DOI: 10.1038/s42003-022-04347-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Accepted: 12/05/2022] [Indexed: 12/23/2022] Open
Abstract
Quantifying population heterogeneity in brain stimuli-response mapping may allow insight into variability in bottom-up neural systems that can in turn be related to individual's behavior or pathological state. Encoding models that predict brain responses to stimuli are one way to capture this relationship. However, they generally need a large amount of fMRI data to achieve optimal accuracy. Here, we propose an ensemble approach to create encoding models for novel individuals with relatively little data by modeling each subject's predicted response vector as a linear combination of the other subjects' predicted response vectors. We show that these ensemble encoding models trained with hundreds of image-response pairs, achieve accuracy not different from models trained on 20,000 image-response pairs. Importantly, the ensemble encoding models preserve patterns of inter-individual differences in the image-response relationship. We also show the proposed approach is robust against domain shift by validating on data with a different scanner and experimental setup. Additionally, we show that the ensemble encoding models are able to discover the inter-individual differences in various face areas' responses to images of animal vs human faces using a recently developed NeuroGen framework. Our approach shows the potential to use existing densely-sampled data, i.e. large amounts of data collected from a single individual, to efficiently create accurate, personalized encoding models and, subsequently, personalized optimal synthetic images for new individuals scanned under different experimental conditions.
Collapse
Affiliation(s)
- Zijin Gu
- School of Electrical and Computer Engineering, Cornell University, Ithaca, NY, USA
| | - Keith Jamison
- Department of Radiology, Weill Cornell Medicine, New York, NY, USA
| | - Mert Sabuncu
- School of Electrical and Computer Engineering, Cornell University, Ithaca, NY, USA
- Department of Radiology, Weill Cornell Medicine, New York, NY, USA
| | - Amy Kuceyeski
- Department of Radiology, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
33
|
Dissociation and hierarchy of human visual pathways for simultaneously coding facial identity and expression. Neuroimage 2022; 264:119769. [PMID: 36435341 DOI: 10.1016/j.neuroimage.2022.119769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 11/14/2022] [Accepted: 11/22/2022] [Indexed: 11/25/2022] Open
Abstract
Humans have an extraordinary ability to recognize facial expression and identity from a single face simultaneously and effortlessly, however, the underlying neural computation is not well understood. Here, we optimized a multi-task deep neural network to classify facial expression and identity simultaneously. Under various optimization training strategies, the best-performing model consistently showed 'share-separate' organization. The two separate branches of the best-performing model also exhibited distinct abilities to categorize facial expression and identity, and these abilities increased along the facial expression or identity branches toward high layers. By comparing the representational similarities between the best-performing model and functional magnetic resonance imaging (fMRI) responses in the human visual cortex to the same face stimuli, the face-selective posterior superior temporal sulcus (pSTS) in the dorsal visual cortex was significantly correlated with layers in the expression branch of the model, and the anterior inferotemporal cortex (aIT) and anterior fusiform face area (aFFA) in the ventral visual cortex were significantly correlated with layers in the identity branch of the model. Besides, the aFFA and aIT better matched the high layers of the model, while the posterior FFA (pFFA) and occipital facial area (OFA) better matched the middle and early layers of the model, respectively. Overall, our study provides a task-optimization computational model to better understand the neural mechanism underlying face recognition, which suggest that similar to the best-performing model, the human visual system exhibits both dissociated and hierarchical neuroanatomical organization when simultaneously coding facial identity and expression.
Collapse
|
34
|
Bowers JS, Malhotra G, Dujmović M, Llera Montero M, Tsvetkov C, Biscione V, Puebla G, Adolfi F, Hummel JE, Heaton RF, Evans BD, Mitchell J, Blything R. Deep problems with neural network models of human vision. Behav Brain Sci 2022; 46:e385. [PMID: 36453586 DOI: 10.1017/s0140525x22002813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Deep neural networks (DNNs) have had extraordinary successes in classifying photographic images of objects and are often described as the best models of biological vision. This conclusion is largely based on three sets of findings: (1) DNNs are more accurate than any other model in classifying images taken from various datasets, (2) DNNs do the best job in predicting the pattern of human errors in classifying objects taken from various behavioral datasets, and (3) DNNs do the best job in predicting brain signals in response to images taken from various brain datasets (e.g., single cell responses or fMRI data). However, these behavioral and brain datasets do not test hypotheses regarding what features are contributing to good predictions and we show that the predictions may be mediated by DNNs that share little overlap with biological vision. More problematically, we show that DNNs account for almost no results from psychological research. This contradicts the common claim that DNNs are good, let alone the best, models of human object recognition. We argue that theorists interested in developing biologically plausible models of human vision need to direct their attention to explaining psychological findings. More generally, theorists need to build models that explain the results of experiments that manipulate independent variables designed to test hypotheses rather than compete on making the best predictions. We conclude by briefly summarizing various promising modeling approaches that focus on psychological data.
Collapse
Affiliation(s)
- Jeffrey S Bowers
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Gaurav Malhotra
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Marin Dujmović
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Milton Llera Montero
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Christian Tsvetkov
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Valerio Biscione
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Guillermo Puebla
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Federico Adolfi
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
- Ernst Strüngmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society, Frankfurt am Main, Germany
| | - John E Hummel
- Department of Psychology, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Rachel F Heaton
- Department of Psychology, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Benjamin D Evans
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Jeffrey Mitchell
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Ryan Blything
- School of Psychology, Aston University, Birmingham, UK
| |
Collapse
|
35
|
Ayzenberg V, Behrmann M. Does the brain's ventral visual pathway compute object shape? Trends Cogn Sci 2022; 26:1119-1132. [PMID: 36272937 DOI: 10.1016/j.tics.2022.09.019] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 09/22/2022] [Accepted: 09/26/2022] [Indexed: 11/11/2022]
Abstract
A rich behavioral literature has shown that human object recognition is supported by a representation of shape that is tolerant to variations in an object's appearance. Such 'global' shape representations are achieved by describing objects via the spatial arrangement of their local features, or structure, rather than by the appearance of the features themselves. However, accumulating evidence suggests that the ventral visual pathway - the primary substrate underlying object recognition - may not represent global shape. Instead, ventral representations may be better described as a basis set of local image features. We suggest that this evidence forces a reevaluation of the role of the ventral pathway in object perception and posits a broader network for shape perception that encompasses contributions from the dorsal pathway.
Collapse
Affiliation(s)
- Vladislav Ayzenberg
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA; Psychology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
| | - Marlene Behrmann
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA; Psychology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA; The Department of Ophthalmology, University of Pittsburgh, Pittsburgh, PA 15260, USA.
| |
Collapse
|
36
|
Khosla M, Ratan Murty NA, Kanwisher N. A highly selective response to food in human visual cortex revealed by hypothesis-free voxel decomposition. Curr Biol 2022; 32:4159-4171.e9. [PMID: 36027910 PMCID: PMC9561032 DOI: 10.1016/j.cub.2022.08.009] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 08/03/2022] [Accepted: 08/05/2022] [Indexed: 12/14/2022]
Abstract
Prior work has identified cortical regions selectively responsive to specific categories of visual stimuli. However, this hypothesis-driven work cannot reveal how prominent these category selectivities are in the overall functional organization of the visual cortex, or what others might exist that scientists have not thought to look for. Furthermore, standard voxel-wise tests cannot detect distinct neural selectivities that coexist within voxels. To overcome these limitations, we used data-driven voxel decomposition methods to identify the main components underlying fMRI responses to thousands of complex photographic images. Our hypothesis-neutral analysis rediscovered components selective for faces, places, bodies, and words, validating our method and showing that these selectivities are dominant features of the ventral visual pathway. The analysis also revealed an unexpected component with a distinct anatomical distribution that responded highly selectively to images of food. Alternative accounts based on low- to mid-level visual features, such as color, shape, or texture, failed to account for the food selectivity of this component. High-throughput testing and control experiments with matched stimuli on a highly accurate computational model of this component confirm its selectivity for food. We registered our methods and hypotheses before replicating them on held-out participants and in a novel dataset. These findings demonstrate the power of data-driven methods and show that the dominant neural responses of the ventral visual pathway include not only selectivities for faces, scenes, bodies, and words but also the visually heterogeneous category of food, thus constraining accounts of when and why functional specialization arises in the cortex.
Collapse
Affiliation(s)
- Meenakshi Khosla
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - N Apurva Ratan Murty
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Nancy Kanwisher
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
37
|
Sp A. Trailblazers in Neuroscience: Using compositionality to understand how parts combine in whole objects. Eur J Neurosci 2022; 56:4378-4392. [PMID: 35760552 PMCID: PMC10084036 DOI: 10.1111/ejn.15746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Revised: 06/09/2022] [Accepted: 06/16/2022] [Indexed: 11/27/2022]
Abstract
A fundamental question for any visual system is whether its image representation can be understood in terms of its components. Decomposing any image into components is challenging because there are many possible decompositions with no common dictionary, and enumerating them leads to a combinatorial explosion. Even in perception, many objects are readily seen as containing parts, but there are many exceptions. These exceptions include objects that are not perceived as containing parts, properties like symmetry that cannot be localized to any single part, and also special categories like words and faces whose perception is widely believed to be holistic. Here, I describe a novel approach we have used to address these issues and evaluate compositionality at the behavioral and neural levels. The key design principle is to create a large number of objects by combining a small number of pre-defined components in all possible ways. This allows for building component-based models that explain whole objects using a combination of these components. Importantly, any systematic error in model fits can be used to detect the presence of emergent or holistic properties. Using this approach, we have found that whole object representations are surprisingly predictable from their components, that some components are preferred to others in perception, and that emergent properties can be discovered or explained using compositional models. Thus, compositionality is a powerful approach for understanding how whole objects relate to their parts.
Collapse
Affiliation(s)
- Arun Sp
- Centre for Neuroscience, Indian Institute of Science Bangalore
| |
Collapse
|
38
|
Kamps FS, Richardson H, Murty NAR, Kanwisher N, Saxe R. Using child-friendly movie stimuli to study the development of face, place, and object regions from age 3 to 12 years. Hum Brain Mapp 2022; 43:2782-2800. [PMID: 35274789 PMCID: PMC9120553 DOI: 10.1002/hbm.25815] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 02/11/2022] [Accepted: 02/13/2022] [Indexed: 01/21/2023] Open
Abstract
Scanning young children while they watch short, engaging, commercially-produced movies has emerged as a promising approach for increasing data retention and quality. Movie stimuli also evoke a richer variety of cognitive processes than traditional experiments, allowing the study of multiple aspects of brain development simultaneously. However, because these stimuli are uncontrolled, it is unclear how effectively distinct profiles of brain activity can be distinguished from the resulting data. Here we develop an approach for identifying multiple distinct subject-specific Regions of Interest (ssROIs) using fMRI data collected during movie-viewing. We focused on the test case of higher-level visual regions selective for faces, scenes, and objects. Adults (N = 13) were scanned while viewing a 5.6-min child-friendly movie, as well as a traditional localizer experiment with blocks of faces, scenes, and objects. We found that just 2.7 min of movie data could identify subject-specific face, scene, and object regions. While successful, movie-defined ssROIS still showed weaker domain selectivity than traditional ssROIs. Having validated our approach in adults, we then used the same methods on movie data collected from 3 to 12-year-old children (N = 122). Movie response timecourses in 3-year-old children's face, scene, and object regions were already significantly and specifically predicted by timecourses from the corresponding regions in adults. We also found evidence of continued developmental change, particularly in the face-selective posterior superior temporal sulcus. Taken together, our results reveal both early maturity and functional change in face, scene, and object regions, and more broadly highlight the promise of short, child-friendly movies for developmental cognitive neuroscience.
Collapse
Affiliation(s)
- Frederik S. Kamps
- Department of Brain and Cognitive SciencesMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| | - Hilary Richardson
- School of Philosophy, Psychology and Language SciencesUniversity of EdinburghEdinburghUK
| | - N. Apurva Ratan Murty
- Department of Brain and Cognitive SciencesMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| | - Nancy Kanwisher
- Department of Brain and Cognitive SciencesMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| | - Rebecca Saxe
- Department of Brain and Cognitive SciencesMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| |
Collapse
|
39
|
Abstract
Visual representations of bodies, in addition to those of faces, contribute to the recognition of con- and heterospecifics, to action recognition, and to nonverbal communication. Despite its importance, the neural basis of the visual analysis of bodies has been less studied than that of faces. In this article, I review what is known about the neural processing of bodies, focusing on the macaque temporal visual cortex. Early single-unit recording work suggested that the temporal visual cortex contains representations of body parts and bodies, with the dorsal bank of the superior temporal sulcus representing bodily actions. Subsequent functional magnetic resonance imaging studies in both humans and monkeys showed several temporal cortical regions that are strongly activated by bodies. Single-unit recordings in the macaque body patches suggest that these represent mainly body shape features. More anterior patches show a greater viewpoint-tolerant selectivity for body features, which may reflect a processing principle shared with other object categories, including faces. Expected final online publication date for the Annual Review of Vision Science, Volume 8 is September 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Rufin Vogels
- Laboratorium voor Neuro- en Psychofysiologie, KU Leuven, Belgium; .,Leuven Brain Institute, KU Leuven, Belgium
| |
Collapse
|
40
|
Abstract
Significance Face neurons, which fire more strongly in response to images of faces than to other objects, are a paradigmatic example of object selectivity in the visual cortex. We asked whether such neurons represent the semantic concept of faces or, rather, visual features that are present in faces but do not necessarily count as a face. We created synthetic stimuli that strongly activated face neurons and showed that these stimuli were perceived as clearly distinct from real faces. At the same time, these synthetic stimuli were slightly more often associated with faces than other objects were. These results suggest that so-called face neurons do not represent a semantic category but, rather, represent visual features that correlate with faces.
Collapse
|
41
|
Gu Z, Jamison KW, Khosla M, Allen EJ, Wu Y, St-Yves G, Naselaris T, Kay K, Sabuncu MR, Kuceyeski A. NeuroGen: Activation optimized image synthesis for discovery neuroscience. Neuroimage 2022; 247:118812. [PMID: 34936922 PMCID: PMC8845078 DOI: 10.1016/j.neuroimage.2021.118812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 10/11/2021] [Accepted: 12/12/2021] [Indexed: 11/24/2022] Open
Abstract
Functional MRI (fMRI) is a powerful technique that has allowed us to characterize visual cortex responses to stimuli, yet such experiments are by nature constructed based on a priori hypotheses, limited to the set of images presented to the individual while they are in the scanner, are subject to noise in the observed brain responses, and may vary widely across individuals. In this work, we propose a novel computational strategy, which we call NeuroGen, to overcome these limitations and develop a powerful tool for human vision neuroscience discovery. NeuroGen combines an fMRI-trained neural encoding model of human vision with a deep generative network to synthesize images predicted to achieve a target pattern of macro-scale brain activation. We demonstrate that the reduction of noise that the encoding model provides, coupled with the generative network's ability to produce images of high fidelity, results in a robust discovery architecture for visual neuroscience. By using only a small number of synthetic images created by NeuroGen, we demonstrate that we can detect and amplify differences in regional and individual human brain response patterns to visual stimuli. We then verify that these discoveries are reflected in the several thousand observed image responses measured with fMRI. We further demonstrate that NeuroGen can create synthetic images predicted to achieve regional response patterns not achievable by the best-matching natural images. The NeuroGen framework extends the utility of brain encoding models and opens up a new avenue for exploring, and possibly precisely controlling, the human visual system.
Collapse
Affiliation(s)
- Zijin Gu
- School of Electrical and Computer Engineering, Cornell University, Ithaca, New York, USA
| | | | - Meenakshi Khosla
- School of Electrical and Computer Engineering, Cornell University, Ithaca, New York, USA
| | - Emily J Allen
- Center for Magnetic Resonance Research(CMRR), Department of Radiology, University of Minnesota, Minneapolis, Minnesota, USA; Department of Neuroscience, University of Minnesota, Minneapolis, Minnesota, USA
| | - Yihan Wu
- Center for Magnetic Resonance Research(CMRR), Department of Radiology, University of Minnesota, Minneapolis, Minnesota, USA
| | - Ghislain St-Yves
- Center for Magnetic Resonance Research(CMRR), Department of Radiology, University of Minnesota, Minneapolis, Minnesota, USA; Department of Neuroscience, University of Minnesota, Minneapolis, Minnesota, USA
| | - Thomas Naselaris
- Center for Magnetic Resonance Research(CMRR), Department of Radiology, University of Minnesota, Minneapolis, Minnesota, USA; Department of Neuroscience, University of Minnesota, Minneapolis, Minnesota, USA
| | - Kendrick Kay
- Center for Magnetic Resonance Research(CMRR), Department of Radiology, University of Minnesota, Minneapolis, Minnesota, USA
| | - Mert R Sabuncu
- School of Electrical and Computer Engineering, Cornell University, Ithaca, New York, USA
| | - Amy Kuceyeski
- Department of Radiology, Weill Cornell Medicine, New York, New York, USA.
| |
Collapse
|
42
|
Big Data in Cognitive Neuroscience: Opportunities and Challenges. BIG DATA ANALYTICS 2022. [DOI: 10.1007/978-3-031-24094-2_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
|