1
|
Pacheco-Estefan D, Fellner MC, Kunz L, Zhang H, Reinacher P, Roy C, Brandt A, Schulze-Bonhage A, Yang L, Wang S, Liu J, Xue G, Axmacher N. Maintenance and transformation of representational formats during working memory prioritization. Nat Commun 2024; 15:8234. [PMID: 39300141 DOI: 10.1038/s41467-024-52541-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 09/11/2024] [Indexed: 09/22/2024] Open
Abstract
Visual working memory depends on both material-specific brain areas in the ventral visual stream (VVS) that support the maintenance of stimulus representations and on regions in the prefrontal cortex (PFC) that control these representations. How executive control prioritizes working memory contents and whether this affects their representational formats remains an open question, however. Here, we analyzed intracranial EEG (iEEG) recordings in epilepsy patients with electrodes in VVS and PFC who performed a multi-item working memory task involving a retro-cue. We employed Representational Similarity Analysis (RSA) with various Deep Neural Network (DNN) architectures to investigate the representational format of prioritized VWM content. While recurrent DNN representations matched PFC representations in the beta band (15-29 Hz) following the retro-cue, they corresponded to VVS representations in a lower frequency range (3-14 Hz) towards the end of the maintenance period. Our findings highlight the distinct coding schemes and representational formats of prioritized content in VVS and PFC.
Collapse
Affiliation(s)
- Daniel Pacheco-Estefan
- Department of Neuropsychology, Institute of Cognitive Neuroscience, Faculty of Psychology, Ruhr University Bochum, 44801, Bochum, Germany.
| | - Marie-Christin Fellner
- Department of Neuropsychology, Institute of Cognitive Neuroscience, Faculty of Psychology, Ruhr University Bochum, 44801, Bochum, Germany
| | - Lukas Kunz
- Department of Epileptology, University Hospital Bonn, Bonn, Germany
| | - Hui Zhang
- Department of Neuropsychology, Institute of Cognitive Neuroscience, Faculty of Psychology, Ruhr University Bochum, 44801, Bochum, Germany
| | - Peter Reinacher
- Department of Stereotactic and Functional Neurosurgery, Medical Center - Faculty of Medicine, University of Freiburg, Freiburg, Germany
- Fraunhofer Institute for Laser Technology, Aachen, Germany
| | - Charlotte Roy
- Epilepsy Center, Medical Center - Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Armin Brandt
- Epilepsy Center, Medical Center - Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Andreas Schulze-Bonhage
- Epilepsy Center, Medical Center - Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Linglin Yang
- Department of Psychiatry, Second Affiliated Hospital, School of medicine, Zhejiang University, Hangzhou, China
| | - Shuang Wang
- Department of Neurology, Epilepsy center, Second Affiliated Hospital, School of medicine, Zhejiang University, Hangzhou, China
| | - Jing Liu
- Department of Applied Social Sciences, The Hong Kong Polytechnic University, Hong Kong, Hong Kong SAR
| | - Gui Xue
- State Key Laboratory of Cognitive Neuroscience and Learning and IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, 100875, PR China
| | - Nikolai Axmacher
- Department of Neuropsychology, Institute of Cognitive Neuroscience, Faculty of Psychology, Ruhr University Bochum, 44801, Bochum, Germany
- State Key Laboratory of Cognitive Neuroscience and Learning and IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, 100875, PR China
| |
Collapse
|
2
|
Pelech P, Navarro PP, Vettiger A, Chao LH, Allolio C. Stress-mediated growth determines E. coli division site morphogenesis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.11.612282. [PMID: 39314472 PMCID: PMC11419054 DOI: 10.1101/2024.09.11.612282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
In order to proliferate, bacteria must remodel their cell wall at the division site. The division process is driven by the enzymatic activity of peptidoglycan (PG) synthases and hydrolases around the constricting Z-ring. PG remodelling is reg-ulated by de-and re-crosslinking enzymes, and the directing constrictive force of the Z-ring. We introduce a model that is able to reproduce correctly the shape of the division site during the constriction and septation phase of E. coli . The model represents mechanochemical coupling within the mathematical framework of morphoelasticity. It contains only two parameters, associated with volumet-ric growth and PG remodelling, that are coupled to the mechanical stress in the bacterial wall. Different morphologies, corresponding either to mutant or wild type cells were recovered as a function of the remodeling parameter. In addition, a plausible range for the cell stiffness and turgor pressure was determined by comparing numerical simulations with bacterial cell lysis data.
Collapse
|
3
|
Wang EY, Fahey PG, Ding Z, Papadopoulos S, Ponder K, Weis MA, Chang A, Muhammad T, Patel S, Ding Z, Tran D, Fu J, Schneider-Mizell CM, Reid RC, Collman F, da Costa NM, Franke K, Ecker AS, Reimer J, Pitkow X, Sinz FH, Tolias AS. Foundation model of neural activity predicts response to new stimulus types and anatomy. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.03.21.533548. [PMID: 36993435 PMCID: PMC10055288 DOI: 10.1101/2023.03.21.533548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The complexity of neural circuits makes it challenging to decipher the brain's algorithms of intelligence. Recent break-throughs in deep learning have produced models that accurately simulate brain activity, enhancing our understanding of the brain's computational objectives and neural coding. However, these models struggle to generalize beyond their training distribution, limiting their utility. The emergence of foundation models, trained on vast datasets, has introduced a new AI paradigm with remarkable generalization capabilities. We collected large amounts of neural activity from visual cortices of multiple mice and trained a foundation model to accurately predict neuronal responses to arbitrary natural videos. This model generalized to new mice with minimal training and successfully predicted responses across various new stimulus domains, such as coherent motion and noise patterns. It could also be adapted to new tasks beyond neural prediction, accurately predicting anatomical cell types, dendritic features, and neuronal connectivity within the MICrONS functional connectomics dataset. Our work is a crucial step toward building foundation brain models. As neuroscience accumulates larger, multi-modal datasets, foundation models will uncover statistical regularities, enabling rapid adaptation to new tasks and accelerating research.
Collapse
|
4
|
Latham AP, Tempkin JOB, Otsuka S, Zhang W, Ellenberg J, Sali A. Integrative spatiotemporal modeling of biomolecular processes: application to the assembly of the Nuclear Pore Complex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.06.606842. [PMID: 39149317 PMCID: PMC11326192 DOI: 10.1101/2024.08.06.606842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Dynamic processes involving biomolecules are essential for the function of the cell. Here, we introduce an integrative method for computing models of these processes based on multiple heterogeneous sources of information, including time-resolved experimental data and physical models of dynamic processes. We first compute integrative structure models at fixed time points and then optimally select and connect these snapshots into a series of trajectories that optimize the likelihood of both the snapshots and transitions between them. The method is demonstrated by application to the assembly process of the human Nuclear Pore Complex in the context of the reforming nuclear envelope during mitotic cell division, based on live-cell correlated electron tomography, bulk fluorescence correlation spectroscopy-calibrated quantitative live imaging, and a structural model of the fully-assembled Nuclear Pore Complex. Modeling of the assembly process improves the model precision over static integrative structure modeling alone. The method is applicable to a wide range of time-dependent systems in cell biology, and is available to the broader scientific community through an implementation in the open source Integrative Modeling Platform software.
Collapse
Affiliation(s)
- Andrew P Latham
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Jeremy O B Tempkin
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Shotaro Otsuka
- Cell Biology and Biophysics Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Wanlu Zhang
- Cell Biology and Biophysics Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Jan Ellenberg
- Cell Biology and Biophysics Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Andrej Sali
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA 94143, USA
| |
Collapse
|
5
|
Zhang J, Huang L, Ma Z, Zhou H. Predicting the temporal-dynamic trajectories of cortical neuronal responses in non-human primates based on deep spiking neural network. Cogn Neurodyn 2024; 18:1977-1988. [PMID: 39104695 PMCID: PMC11297849 DOI: 10.1007/s11571-023-09989-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 05/25/2023] [Accepted: 06/21/2023] [Indexed: 08/07/2024] Open
Abstract
Deep convolutional neural networks (CNNs) are commonly used as computational models for the primate ventral stream, while deep spiking neural networks (SNNs) incorporated with both the temporal and spatial spiking information still lack investigation. We compared performances of SNN and CNN in prediction of visual responses to the naturalistic stimuli in area V4, inferior temporal (IT), and orbitofrontal cortex (OFC). The accuracies based on SNN were significantly higher than that of CNN in prediction of temporal-dynamic trajectory and averaged firing rate of visual response in V4 and IT. The temporal dynamics were captured by SNN for neurons with diverse temporal profiles and category selectivities, and most sensitively captured around the time of peak responses for each brain region. Consistently, SNN activities showed significantly stronger correlations with IT, V4 and OFC responses. In SNN, correlations with neural activities were stronger for later time-step features than early time-step features. The temporal-dynamic prediction was also significantly improved by considering preceding neural activities during the prediction. Thus, our study demonstrated SNN as a powerful temporal-dynamic model for cortical responses to complex naturalistic stimuli.
Collapse
Affiliation(s)
- Jie Zhang
- The Brain Cognition and Brain Disease Institute, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055 China
- University of Chinese Academy of Sciences, Beijing, 100049 China
- Peng Cheng Laboratory, Shenzhen, 518000 China
| | - Liwei Huang
- Peng Cheng Laboratory, Shenzhen, 518000 China
- Peking University, Beijing, 100871 China
| | - Zhengyu Ma
- Peng Cheng Laboratory, Shenzhen, 518000 China
| | - Huihui Zhou
- The Brain Cognition and Brain Disease Institute, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055 China
- Peng Cheng Laboratory, Shenzhen, 518000 China
| |
Collapse
|
6
|
Chen Y, Beech P, Yin Z, Jia S, Zhang J, Yu Z, Liu JK. Decoding dynamic visual scenes across the brain hierarchy. PLoS Comput Biol 2024; 20:e1012297. [PMID: 39093861 PMCID: PMC11324145 DOI: 10.1371/journal.pcbi.1012297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 08/14/2024] [Accepted: 07/03/2024] [Indexed: 08/04/2024] Open
Abstract
Understanding the computational mechanisms that underlie the encoding and decoding of environmental stimuli is a crucial investigation in neuroscience. Central to this pursuit is the exploration of how the brain represents visual information across its hierarchical architecture. A prominent challenge resides in discerning the neural underpinnings of the processing of dynamic natural visual scenes. Although considerable research efforts have been made to characterize individual components of the visual pathway, a systematic understanding of the distinctive neural coding associated with visual stimuli, as they traverse this hierarchical landscape, remains elusive. In this study, we leverage the comprehensive Allen Visual Coding-Neuropixels dataset and utilize the capabilities of deep learning neural network models to study neural coding in response to dynamic natural visual scenes across an expansive array of brain regions. Our study reveals that our decoding model adeptly deciphers visual scenes from neural spiking patterns exhibited within each distinct brain area. A compelling observation arises from the comparative analysis of decoding performances, which manifests as a notable encoding proficiency within the visual cortex and subcortical nuclei, in contrast to a relatively reduced encoding activity within hippocampal neurons. Strikingly, our results unveil a robust correlation between our decoding metrics and well-established anatomical and functional hierarchy indexes. These findings corroborate existing knowledge in visual coding related to artificial visual stimuli and illuminate the functional role of these deeper brain regions using dynamic stimuli. Consequently, our results suggest a novel perspective on the utility of decoding neural network models as a metric for quantifying the encoding quality of dynamic natural visual scenes represented by neural responses, thereby advancing our comprehension of visual coding within the complex hierarchy of the brain.
Collapse
Affiliation(s)
- Ye Chen
- School of Computer Science, Peking University, Beijing, China
- Institute for Artificial Intelligence, Peking University, Beijing, China
| | - Peter Beech
- School of Computing, University of Leeds, Leeds, United Kingdom
| | - Ziwei Yin
- School of Computer Science, Centre for Human Brain Health, University of Birmingham, Birmingham, United Kingdom
| | - Shanshan Jia
- School of Computer Science, Peking University, Beijing, China
- Institute for Artificial Intelligence, Peking University, Beijing, China
| | - Jiayi Zhang
- Institutes of Brain Science, State Key Laboratory of Medical Neurobiology, MOE Frontiers Center for Brain Science and Institute for Medical and Engineering Innovation, Eye & ENT Hospital, Fudan University, Shanghai, China
| | - Zhaofei Yu
- School of Computer Science, Peking University, Beijing, China
- Institute for Artificial Intelligence, Peking University, Beijing, China
| | - Jian K. Liu
- School of Computing, University of Leeds, Leeds, United Kingdom
- School of Computer Science, Centre for Human Brain Health, University of Birmingham, Birmingham, United Kingdom
| |
Collapse
|
7
|
Quaia C, Krauzlis RJ. Object recognition in primates: what can early visual areas contribute? Front Behav Neurosci 2024; 18:1425496. [PMID: 39070778 PMCID: PMC11272660 DOI: 10.3389/fnbeh.2024.1425496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Accepted: 07/01/2024] [Indexed: 07/30/2024] Open
Abstract
Introduction If neuroscientists were asked which brain area is responsible for object recognition in primates, most would probably answer infero-temporal (IT) cortex. While IT is likely responsible for fine discriminations, and it is accordingly dominated by foveal visual inputs, there is more to object recognition than fine discrimination. Importantly, foveation of an object of interest usually requires recognizing, with reasonable confidence, its presence in the periphery. Arguably, IT plays a secondary role in such peripheral recognition, and other visual areas might instead be more critical. Methods To investigate how signals carried by early visual processing areas (such as LGN and V1) could be used for object recognition in the periphery, we focused here on the task of distinguishing faces from non-faces. We tested how sensitive various models were to nuisance parameters, such as changes in scale and orientation of the image, and the type of image background. Results We found that a model of V1 simple or complex cells could provide quite reliable information, resulting in performance better than 80% in realistic scenarios. An LGN model performed considerably worse. Discussion Because peripheral recognition is both crucial to enable fine recognition (by bringing an object of interest on the fovea), and probably sufficient to account for a considerable fraction of our daily recognition-guided behavior, we think that the current focus on area IT and foveal processing is too narrow. We propose that rather than a hierarchical system with IT-like properties as its primary aim, object recognition should be seen as a parallel process, with high-accuracy foveal modules operating in parallel with lower-accuracy and faster modules that can operate across the visual field.
Collapse
Affiliation(s)
- Christian Quaia
- Laboratory of Sensorimotor Research, National Eye Institute, NIH, Bethesda, MD, United States
| | | |
Collapse
|
8
|
Turishcheva P, Fahey PG, Vystrčilová M, Hansel L, Froebe R, Ponder K, Qiu Y, Willeke KF, Bashiri M, Baikulov R, Zhu Y, Ma L, Yu S, Huang T, Li BM, Wulf WD, Kudryashova N, Hennig MH, Rochefort NL, Onken A, Wang E, Ding Z, Tolias AS, Sinz FH, Ecker AS. Retrospective for the Dynamic Sensorium Competition for predicting large-scale mouse primary visual cortex activity from videos. ARXIV 2024:arXiv:2407.09100v1. [PMID: 39040641 PMCID: PMC11261979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 07/24/2024]
Abstract
Understanding how biological visual systems process information is challenging because of the nonlinear relationship between visual input and neuronal responses. Artificial neural networks allow computational neuroscientists to create predictive models that connect biological and machine vision. Machine learning has benefited tremendously from benchmarks that compare different model on the same task under standardized conditions. However, there was no standardized benchmark to identify state-of-the-art dynamic models of the mouse visual system. To address this gap, we established the SENSORIUM 2023 Benchmark Competition with dynamic input, featuring a new large-scale dataset from the primary visual cortex of ten mice. This dataset includes responses from 78,853 neurons to 2 hours of dynamic stimuli per neuron, together with the behavioral measurements such as running speed, pupil dilation, and eye movements. The competition ranked models in two tracks based on predictive performance for neuronal responses on a held-out test set: one focusing on predicting in-domain natural stimuli and another on out-of-distribution (OOD) stimuli to assess model generalization. As part of the NeurIPS 2023 competition track, we received more than 160 model submissions from 22 teams. Several new architectures for predictive models were proposed, and the winning teams improved the previous state-of-the-art model by 50%. Access to the dataset as well as the benchmarking infrastructure will remain online at www.sensorium-competition.net.
Collapse
Affiliation(s)
- Polina Turishcheva
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
| | - Paul G. Fahey
- Department of Neuroscience & Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, US
- Stanford Bio-X, Stanford University, Stanford, CA, US
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, US
| | - Michaela Vystrčilová
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
| | - Laura Hansel
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
| | - Rachel Froebe
- Department of Neuroscience & Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, US
- Stanford Bio-X, Stanford University, Stanford, CA, US
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, US
| | - Kayla Ponder
- Department of Neuroscience & Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, USA
| | - Yongrong Qiu
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, US
- Stanford Bio-X, Stanford University, Stanford, CA, US
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, US
| | - Konstantin F. Willeke
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
- International Max Planck Research School for Intelligent Systems, Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, Tübingen University, Germany
| | - Mohammad Bashiri
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
- International Max Planck Research School for Intelligent Systems, Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, Tübingen University, Germany
| | | | - Yu Zhu
- Institute of Automation, Chinese Academy of Sciences, China
- Beijing Academy of Artificial Intelligence, China
| | - Lei Ma
- Beijing Academy of Artificial Intelligence, China
| | - Shan Yu
- Institute of Automation, Chinese Academy of Sciences, China
| | - Tiejun Huang
- Beijing Academy of Artificial Intelligence, China
| | - Bryan M. Li
- The Alan Turing Institute, UK
- School of Informatics, University of Edinburgh, UK
| | - Wolf De Wulf
- School of Informatics, University of Edinburgh, UK
| | | | | | - Nathalie L. Rochefort
- Centre for Discovery Brain Sciences, University of Edinburgh, UK
- Simons Initiative for the Developing Brain, University of Edinburgh, UK
| | - Arno Onken
- School of Informatics, University of Edinburgh, UK
| | - Eric Wang
- Department of Neuroscience & Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, USA
| | - Zhiwei Ding
- Department of Neuroscience & Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, USA
| | - Andreas S. Tolias
- Department of Neuroscience & Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, US
- Stanford Bio-X, Stanford University, Stanford, CA, US
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, US
- Department of Electrical Engineering, Stanford University, Stanford, CA, US
| | - Fabian H. Sinz
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
- Department of Neuroscience & Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, USA
- International Max Planck Research School for Intelligent Systems, Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, Tübingen University, Germany
| | - Alexander S Ecker
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
- Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany
| |
Collapse
|
9
|
Turishcheva P, Fahey PG, Vystrčilová M, Hansel L, Froebe R, Ponder K, Qiu Y, Willeke KF, Bashiri M, Wang E, Ding Z, Tolias AS, Sinz FH, Ecker AS. The Dynamic Sensorium competition for predicting large-scale mouse visual cortex activity from videos. ARXIV 2024:arXiv:2305.19654v2. [PMID: 37396602 PMCID: PMC10312815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Understanding how biological visual systems process information is challenging due to the complex nonlinear relationship between neuronal responses and high-dimensional visual input. Artificial neural networks have already improved our understanding of this system by allowing computational neuroscientists to create predictive models and bridge biological and machine vision. During the Sensorium 2022 competition, we introduced benchmarks for vision models with static input (i.e. images). However, animals operate and excel in dynamic environments, making it crucial to study and understand how the brain functions under these conditions. Moreover, many biological theories, such as predictive coding, suggest that previous input is crucial for current input processing. Currently, there is no standardized benchmark to identify state-of-the-art dynamic models of the mouse visual system. To address this gap, we propose the Sensorium 2023 Benchmark Competition with dynamic input (https://www.sensorium-competition.net/). This competition includes the collection of a new large-scale dataset from the primary visual cortex of ten mice, containing responses from over 78,000 neurons to over 2 hours of dynamic stimuli per neuron. Participants in the main benchmark track will compete to identify the best predictive models of neuronal responses for dynamic input (i.e. video). We will also host a bonus track in which submission performance will be evaluated on out-of-domain input, using withheld neuronal responses to dynamic input stimuli whose statistics differ from the training set. Both tracks will offer behavioral data along with video stimuli. As before, we will provide code, tutorials, and strong pre-trained baseline models to encourage participation. We hope this competition will continue to strengthen the accompanying Sensorium benchmarks collection as a standard tool to measure progress in large-scale neural system identification models of the entire mouse visual hierarchy and beyond.
Collapse
Affiliation(s)
- Polina Turishcheva
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
| | - Paul G Fahey
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, US
- Stanford Bio-X, Stanford University, Stanford, CA, US
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, US
| | - Michaela Vystrčilová
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
| | - Laura Hansel
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
| | - Rachel Froebe
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, US
- Stanford Bio-X, Stanford University, Stanford, CA, US
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, US
| | - Kayla Ponder
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
| | - Yongrong Qiu
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, US
- Stanford Bio-X, Stanford University, Stanford, CA, US
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, US
| | - Konstantin F Willeke
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, US
- Stanford Bio-X, Stanford University, Stanford, CA, US
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, US
- International Max Planck Research School for Intelligent Systems, University of Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Germany
| | - Mohammad Bashiri
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
- International Max Planck Research School for Intelligent Systems, University of Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Germany
| | - Eric Wang
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
| | - Zhiwei Ding
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
| | - Andreas S Tolias
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, US
- Stanford Bio-X, Stanford University, Stanford, CA, US
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, US
- Department of Electrical Engineering, Stanford University, Stanford, CA, US
| | - Fabian H Sinz
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- International Max Planck Research School for Intelligent Systems, University of Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Germany
| | - Alexander S Ecker
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
- Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany
| |
Collapse
|
10
|
Lindsey JW, Issa EB. Factorized visual representations in the primate visual system and deep neural networks. eLife 2024; 13:RP91685. [PMID: 38968311 PMCID: PMC11226229 DOI: 10.7554/elife.91685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/07/2024] Open
Abstract
Object classification has been proposed as a principal objective of the primate ventral visual stream and has been used as an optimization target for deep neural network models (DNNs) of the visual system. However, visual brain areas represent many different types of information, and optimizing for classification of object identity alone does not constrain how other information may be encoded in visual representations. Information about different scene parameters may be discarded altogether ('invariance'), represented in non-interfering subspaces of population activity ('factorization') or encoded in an entangled fashion. In this work, we provide evidence that factorization is a normative principle of biological visual representations. In the monkey ventral visual hierarchy, we found that factorization of object pose and background information from object identity increased in higher-level regions and strongly contributed to improving object identity decoding performance. We then conducted a large-scale analysis of factorization of individual scene parameters - lighting, background, camera viewpoint, and object pose - in a diverse library of DNN models of the visual system. Models which best matched neural, fMRI, and behavioral data from both monkeys and humans across 12 datasets tended to be those which factorized scene parameters most strongly. Notably, invariance to these parameters was not as consistently associated with matches to neural and behavioral data, suggesting that maintaining non-class information in factorized activity subspaces is often preferred to dropping it altogether. Thus, we propose that factorization of visual scene information is a widely used strategy in brains and DNN models thereof.
Collapse
Affiliation(s)
- Jack W Lindsey
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkUnited States
- Department of Neuroscience, Columbia UniversityNew YorkUnited States
| | - Elias B Issa
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkUnited States
- Department of Neuroscience, Columbia UniversityNew YorkUnited States
| |
Collapse
|
11
|
Zhu H, Ge Y, Bratch A, Yuille A, Kay K, Kersten D. Natural scenes reveal diverse representations of 2D and 3D body pose in the human brain. Proc Natl Acad Sci U S A 2024; 121:e2317707121. [PMID: 38830105 PMCID: PMC11181088 DOI: 10.1073/pnas.2317707121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 04/25/2024] [Indexed: 06/05/2024] Open
Abstract
Human pose, defined as the spatial relationships between body parts, carries instrumental information supporting the understanding of motion and action of a person. A substantial body of previous work has identified cortical areas responsive to images of bodies and different body parts. However, the neural basis underlying the visual perception of body part relationships has received less attention. To broaden our understanding of body perception, we analyzed high-resolution fMRI responses to a wide range of poses from over 4,000 complex natural scenes. Using ground-truth annotations and an application of three-dimensional (3D) pose reconstruction algorithms, we compared similarity patterns of cortical activity with similarity patterns built from human pose models with different levels of depth availability and viewpoint dependency. Targeting the challenge of explaining variance in complex natural image responses with interpretable models, we achieved statistically significant correlations between pose models and cortical activity patterns (though performance levels are substantially lower than the noise ceiling). We found that the 3D view-independent pose model, compared with two-dimensional models, better captures the activation from distinct cortical areas, including the right posterior superior temporal sulcus (pSTS). These areas, together with other pose-selective regions in the LOTC, form a broader, distributed cortical network with greater view-tolerance in more anterior patches. We interpret these findings in light of the computational complexity of natural body images, the wide range of visual tasks supported by pose structures, and possible shared principles for view-invariant processing between articulated objects and ordinary, rigid objects.
Collapse
Affiliation(s)
- Hongru Zhu
- Department of Cognitive Science, Johns Hopkins University, Baltimore, MD21218
| | - Yijun Ge
- Department of Psychology, University of Minnesota, Minneapolis, MN55455
- Laboratory for Consciousness, Riken Center for Brain Science, Wako, Saitama3510198, Japan
| | - Alexander Bratch
- Department of Psychology, University of Minnesota, Minneapolis, MN55455
| | - Alan Yuille
- Department of Cognitive Science, Johns Hopkins University, Baltimore, MD21218
| | - Kendrick Kay
- Center for Magnetic Resonance Research, Department of Radiology, University of Minnesota, Minneapolis, MN55455
| | - Daniel Kersten
- Department of Psychology, University of Minnesota, Minneapolis, MN55455
| |
Collapse
|
12
|
Srinath R, Ni AM, Marucci C, Cohen MR, Brainard DH. Orthogonal neural representations support perceptual judgements of natural stimuli. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.14.580134. [PMID: 38464018 PMCID: PMC10925131 DOI: 10.1101/2024.02.14.580134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
In natural behavior, observers must separate relevant information from a barrage of irrelevant information. Many studies have investigated the neural underpinnings of this ability using artificial stimuli presented on simple backgrounds. Natural viewing, however, carries a set of challenges that are inaccessible using artificial stimuli, including neural responses to background objects that are task-irrelevant. An emerging body of evidence suggests that the visual abilities of humans and animals can be modeled through the linear decoding of task-relevant information from visual cortex. This idea suggests the hypothesis that irrelevant features of a natural scene should impair performance on a visual task only if their neural representations intrude on the linear readout of the task relevant feature, as would occur if the representations of task-relevant and irrelevant features are not orthogonal in the underlying neural population. We tested this hypothesis using human psychophysics and monkey neurophysiology, in response to parametrically variable naturalistic stimuli. We demonstrate that 1) the neural representation of one feature (the position of a central object) in visual area V4 is orthogonal to those of several background features, 2) the ability of human observers to precisely judge object position was largely unaffected by task-irrelevant variation in those background features, and 3) many features of the object and the background are orthogonally represented by V4 neural responses. Our observations are consistent with the hypothesis that orthogonal neural representations can support stable perception of objects and features despite the tremendous richness of natural visual scenes.
Collapse
Affiliation(s)
- Ramanujan Srinath
- equal contribution
- Department of Neurobiology and Neuroscience Institute, The University of Chicago, Chicago, IL 60637, USA
| | - Amy M. Ni
- equal contribution
- Department of Neurobiology and Neuroscience Institute, The University of Chicago, Chicago, IL 60637, USA
- Department of Psychology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Claire Marucci
- Department of Psychology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Marlene R. Cohen
- Department of Neurobiology and Neuroscience Institute, The University of Chicago, Chicago, IL 60637, USA
- equal contribution
| | - David H. Brainard
- Department of Psychology, University of Pennsylvania, Philadelphia, PA 19104, USA
- equal contribution
| |
Collapse
|
13
|
Ahn S, Adeli H, Zelinsky GJ. The attentive reconstruction of objects facilitates robust object recognition. PLoS Comput Biol 2024; 20:e1012159. [PMID: 38870125 PMCID: PMC11175536 DOI: 10.1371/journal.pcbi.1012159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 05/11/2024] [Indexed: 06/15/2024] Open
Abstract
Humans are extremely robust in our ability to perceive and recognize objects-we see faces in tea stains and can recognize friends on dark streets. Yet, neurocomputational models of primate object recognition have focused on the initial feed-forward pass of processing through the ventral stream and less on the top-down feedback that likely underlies robust object perception and recognition. Aligned with the generative approach, we propose that the visual system actively facilitates recognition by reconstructing the object hypothesized to be in the image. Top-down attention then uses this reconstruction as a template to bias feedforward processing to align with the most plausible object hypothesis. Building on auto-encoder neural networks, our model makes detailed hypotheses about the appearance and location of the candidate objects in the image by reconstructing a complete object representation from potentially incomplete visual input due to noise and occlusion. The model then leverages the best object reconstruction, measured by reconstruction error, to direct the bottom-up process of selectively routing low-level features, a top-down biasing that captures a core function of attention. We evaluated our model using the MNIST-C (handwritten digits under corruptions) and ImageNet-C (real-world objects under corruptions) datasets. Not only did our model achieve superior performance on these challenging tasks designed to approximate real-world noise and occlusion viewing conditions, but also better accounted for human behavioral reaction times and error patterns than a standard feedforward Convolutional Neural Network. Our model suggests that a complete understanding of object perception and recognition requires integrating top-down and attention feedback, which we propose is an object reconstruction.
Collapse
Affiliation(s)
- Seoyoung Ahn
- Department of Molecular and Cell Biology, University of California, Berkeley, California, United States of America
| | - Hossein Adeli
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York City, New York, United States of America
| | - Gregory J. Zelinsky
- Department of Psychology, Stony Brook University, Stony Brook, New York, United States of America
- Department of Computer Science, Stony Brook University, Stony Brook, New York, United States of America
| |
Collapse
|
14
|
Mukherjee K, Rogers TT. Using drawings and deep neural networks to characterize the building blocks of human visual similarity. Mem Cognit 2024:10.3758/s13421-024-01580-1. [PMID: 38814385 DOI: 10.3758/s13421-024-01580-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/22/2024] [Indexed: 05/31/2024]
Abstract
Early in life and without special training, human beings discern resemblance between abstract visual stimuli, such as drawings, and the real-world objects they represent. We used this capacity for visual abstraction as a tool for evaluating deep neural networks (DNNs) as models of human visual perception. Contrasting five contemporary DNNs, we evaluated how well each explains human similarity judgments among line drawings of recognizable and novel objects. For object sketches, human judgments were dominated by semantic category information; DNN representations contributed little additional information. In contrast, such features explained significant unique variance perceived similarity of abstract drawings. In both cases, a vision transformer trained to blend representations of images and their natural language descriptions showed the greatest ability to explain human perceptual similarity-an observation consistent with contemporary views of semantic representation and processing in the human mind and brain. Together, the results suggest that the building blocks of visual similarity may arise within systems that learn to use visual information, not for specific classification, but in service of generating semantic representations of objects.
Collapse
Affiliation(s)
- Kushin Mukherjee
- Department of Psychology & Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA.
| | - Timothy T Rogers
- Department of Psychology & Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
15
|
Serrano RA, Smeltz AM. The Promise of Artificial Intelligence-Assisted Point-of-Care Ultrasonography in Perioperative Care. J Cardiothorac Vasc Anesth 2024; 38:1244-1250. [PMID: 38402063 DOI: 10.1053/j.jvca.2024.01.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 01/29/2024] [Indexed: 02/26/2024]
Abstract
The role of point-of-care ultrasonography in the perioperative setting has expanded rapidly over recent years. Revolutionizing this technology further is integrating artificial intelligence to assist clinicians in optimizing images, identifying anomalies, performing automated measurements and calculations, and facilitating diagnoses. Artificial intelligence can increase point-of-care ultrasonography efficiency and accuracy, making it an even more valuable point-of-care tool. Given this topic's importance and ever-changing landscape, this review discusses the latest trends to serve as an introduction and update in this area.
Collapse
Affiliation(s)
| | - Alan M Smeltz
- University of North Carolina School of Medicine, Chapel Hill, NC
| |
Collapse
|
16
|
Dado T, Papale P, Lozano A, Le L, Wang F, van Gerven M, Roelfsema P, Güçlütürk Y, Güçlü U. Brain2GAN: Feature-disentangled neural encoding and decoding of visual perception in the primate brain. PLoS Comput Biol 2024; 20:e1012058. [PMID: 38709818 PMCID: PMC11098503 DOI: 10.1371/journal.pcbi.1012058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 05/16/2024] [Accepted: 04/08/2024] [Indexed: 05/08/2024] Open
Abstract
A challenging goal of neural coding is to characterize the neural representations underlying visual perception. To this end, multi-unit activity (MUA) of macaque visual cortex was recorded in a passive fixation task upon presentation of faces and natural images. We analyzed the relationship between MUA and latent representations of state-of-the-art deep generative models, including the conventional and feature-disentangled representations of generative adversarial networks (GANs) (i.e., z- and w-latents of StyleGAN, respectively) and language-contrastive representations of latent diffusion networks (i.e., CLIP-latents of Stable Diffusion). A mass univariate neural encoding analysis of the latent representations showed that feature-disentangled w representations outperform both z and CLIP representations in explaining neural responses. Further, w-latent features were found to be positioned at the higher end of the complexity gradient which indicates that they capture visual information relevant to high-level neural activity. Subsequently, a multivariate neural decoding analysis of the feature-disentangled representations resulted in state-of-the-art spatiotemporal reconstructions of visual perception. Taken together, our results not only highlight the important role of feature-disentanglement in shaping high-level neural representations underlying visual perception but also serve as an important benchmark for the future of neural coding.
Collapse
Affiliation(s)
- Thirza Dado
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Paolo Papale
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Amsterdam, Netherlands
| | - Antonio Lozano
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Amsterdam, Netherlands
| | - Lynn Le
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Feng Wang
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Amsterdam, Netherlands
| | - Marcel van Gerven
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Pieter Roelfsema
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Amsterdam, Netherlands
- Laboratory of Visual Brain Therapy, Sorbonne University, Paris, France
- Department of Integrative Neurophysiology, VU Amsterdam, Amsterdam, Netherlands
- Department of Psychiatry, Amsterdam UMC, Amsterdam, Netherlands
| | - Yağmur Güçlütürk
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Umut Güçlü
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| |
Collapse
|
17
|
Ren Y, Bashivan P. How well do models of visual cortex generalize to out of distribution samples? PLoS Comput Biol 2024; 20:e1011145. [PMID: 38820563 PMCID: PMC11216589 DOI: 10.1371/journal.pcbi.1011145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 07/01/2024] [Accepted: 04/29/2024] [Indexed: 06/02/2024] Open
Abstract
Unit activity in particular deep neural networks (DNNs) are remarkably similar to the neuronal population responses to static images along the primate ventral visual cortex. Linear combinations of DNN unit activities are widely used to build predictive models of neuronal activity in the visual cortex. Nevertheless, prediction performance in these models is often investigated on stimulus sets consisting of everyday objects under naturalistic settings. Recent work has revealed a generalization gap in how predicting neuronal responses to synthetically generated out-of-distribution (OOD) stimuli. Here, we investigated how the recent progress in improving DNNs' object recognition generalization, as well as various DNN design choices such as architecture, learning algorithm, and datasets have impacted the generalization gap in neural predictivity. We came to a surprising conclusion that the performance on none of the common computer vision OOD object recognition benchmarks is predictive of OOD neural predictivity performance. Furthermore, we found that adversarially robust models often yield substantially higher generalization in neural predictivity, although the degree of robustness itself was not predictive of neural predictivity score. These results suggest that improving object recognition behavior on current benchmarks alone may not lead to more general models of neurons in the primate ventral visual cortex.
Collapse
Affiliation(s)
- Yifei Ren
- Department of Computer Science, McGill University, Montreal, Canada
| | - Pouya Bashivan
- Department of Computer Science, McGill University, Montreal, Canada
- Department of Computer Physiology, McGill University, Montreal, Canada
- Mila, Université de Montréal, Montreal, Canada
| |
Collapse
|
18
|
Zhang Q, Zhang Y, Liu N, Sun X. Understanding of facial features in face perception: insights from deep convolutional neural networks. Front Comput Neurosci 2024; 18:1209082. [PMID: 38655070 PMCID: PMC11035738 DOI: 10.3389/fncom.2024.1209082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 03/18/2024] [Indexed: 04/26/2024] Open
Abstract
Introduction Face recognition has been a longstanding subject of interest in the fields of cognitive neuroscience and computer vision research. One key focus has been to understand the relative importance of different facial features in identifying individuals. Previous studies in humans have demonstrated the crucial role of eyebrows in face recognition, potentially even surpassing the importance of the eyes. However, eyebrows are not only vital for face recognition but also play a significant role in recognizing facial expressions and intentions, which might occur simultaneously and influence the face recognition process. Methods To address these challenges, our current study aimed to leverage the power of deep convolutional neural networks (DCNNs), an artificial face recognition system, which can be specifically tailored for face recognition tasks. In this study, we investigated the relative importance of various facial features in face recognition by selectively blocking feature information from the input to the DCNN. Additionally, we conducted experiments in which we systematically blurred the information related to eyebrows to varying degrees. Results Our findings aligned with previous human research, revealing that eyebrows are the most critical feature for face recognition, followed by eyes, mouth, and nose, in that order. The results demonstrated that the presence of eyebrows was more crucial than their specific high-frequency details, such as edges and textures, compared to other facial features, where the details also played a significant role. Furthermore, our results revealed that, unlike other facial features, the activation map indicated that the significance of eyebrows areas could not be readily adjusted to compensate for the absence of eyebrow information. This finding explains why masking eyebrows led to more significant deficits in face recognition performance. Additionally, we observed a synergistic relationship among facial features, providing evidence for holistic processing of faces within the DCNN. Discussion Overall, our study sheds light on the underlying mechanisms of face recognition and underscores the potential of using DCNNs as valuable tools for further exploration in this field.
Collapse
Affiliation(s)
- Qianqian Zhang
- MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
| | - Yueyi Zhang
- MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
| | - Ning Liu
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
- State Key Laboratory of Brain and Cognitive Science, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Xiaoyan Sun
- MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
| |
Collapse
|
19
|
Heinen R, Bierbrauer A, Wolf OT, Axmacher N. Representational formats of human memory traces. Brain Struct Funct 2024; 229:513-529. [PMID: 37022435 PMCID: PMC10978732 DOI: 10.1007/s00429-023-02636-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 03/28/2023] [Indexed: 04/07/2023]
Abstract
Neural representations are internal brain states that constitute the brain's model of the external world or some of its features. In the presence of sensory input, a representation may reflect various properties of this input. When perceptual information is no longer available, the brain can still activate representations of previously experienced episodes due to the formation of memory traces. In this review, we aim at characterizing the nature of neural memory representations and how they can be assessed with cognitive neuroscience methods, mainly focusing on neuroimaging. We discuss how multivariate analysis techniques such as representational similarity analysis (RSA) and deep neural networks (DNNs) can be leveraged to gain insights into the structure of neural representations and their different representational formats. We provide several examples of recent studies which demonstrate that we are able to not only measure memory representations using RSA but are also able to investigate their multiple formats using DNNs. We demonstrate that in addition to slow generalization during consolidation, memory representations are subject to semantization already during short-term memory, by revealing a shift from visual to semantic format. In addition to perceptual and conceptual formats, we describe the impact of affective evaluations as an additional dimension of episodic memories. Overall, these studies illustrate how the analysis of neural representations may help us gain a deeper understanding of the nature of human memory.
Collapse
Affiliation(s)
- Rebekka Heinen
- Department of Neuropsychology, Institute of Cognitive Neuroscience, Faculty of Psychology, Ruhr University Bochum, Universitätsstraße 150, 44801, Bochum, Germany.
| | - Anne Bierbrauer
- Department of Neuropsychology, Institute of Cognitive Neuroscience, Faculty of Psychology, Ruhr University Bochum, Universitätsstraße 150, 44801, Bochum, Germany
- Institute for Systems Neuroscience, Medical Center Hamburg-Eppendorf, Martinistraße 52, 20251, Hamburg, Germany
| | - Oliver T Wolf
- Department of Cognitive Psychology, Institute of Cognitive Neuroscience, Faculty of Psychology, Ruhr University Bochum, Universitätsstraße 150, 44801, Bochum, Germany
| | - Nikolai Axmacher
- Department of Neuropsychology, Institute of Cognitive Neuroscience, Faculty of Psychology, Ruhr University Bochum, Universitätsstraße 150, 44801, Bochum, Germany
| |
Collapse
|
20
|
Deng K, Schwendeman PS, Guan Y. Predicting Single Neuron Responses of the Primary Visual Cortex with Deep Learning Model. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2305626. [PMID: 38350735 PMCID: PMC11022733 DOI: 10.1002/advs.202305626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 01/03/2024] [Indexed: 02/15/2024]
Abstract
Modeling neuron responses to stimuli can shed light on next-generation technologies such as brain-chip interfaces. Furthermore, high-performing models can serve to help formulate hypotheses and reveal the mechanisms underlying neural responses. Here the state-of-the-art computational model is presented for predicting single neuron responses to natural stimuli in the primary visual cortex (V1) of mice. The algorithm incorporates object positions and assembles multiple models with different train-validation data, resulting in a 15%-30% improvement over the existing models in cross-subject predictions and ranking first in the SENSORIUM 2022 Challenge, which benchmarks methods for neuron-specific prediction based on thousands of images. Importantly, The model reveals evidence that the spatial organizations of V1 are conserved across mice. This model will serve as an important noninvasive tool for understanding and utilizing the response patterns of primary visual cortex neurons.
Collapse
Affiliation(s)
- Kaiwen Deng
- Department of Computational Medicine and BioinformaticsUniversity of MichiganAnn ArborMI48105USA
| | | | - Yuanfang Guan
- Department of Computational Medicine and BioinformaticsUniversity of MichiganAnn ArborMI48105USA
| |
Collapse
|
21
|
Pan X, Coen-Cagli R, Schwartz O. Probing the Structure and Functional Properties of the Dropout-Induced Correlated Variability in Convolutional Neural Networks. Neural Comput 2024; 36:621-644. [PMID: 38457752 PMCID: PMC11164410 DOI: 10.1162/neco_a_01652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 12/04/2023] [Indexed: 03/10/2024]
Abstract
Computational neuroscience studies have shown that the structure of neural variability to an unchanged stimulus affects the amount of information encoded. Some artificial deep neural networks, such as those with Monte Carlo dropout layers, also have variable responses when the input is fixed. However, the structure of the trial-by-trial neural covariance in neural networks with dropout has not been studied, and its role in decoding accuracy is unknown. We studied the above questions in a convolutional neural network model with dropout in both the training and testing phases. We found that trial-by-trial correlation between neurons (i.e., noise correlation) is positive and low dimensional. Neurons that are close in a feature map have larger noise correlation. These properties are surprisingly similar to the findings in the visual cortex. We further analyzed the alignment of the main axes of the covariance matrix. We found that different images share a common trial-by-trial noise covariance subspace, and they are aligned with the global signal covariance. This evidence that the noise covariance is aligned with signal covariance suggests that noise covariance in dropout neural networks reduces network accuracy, which we further verified directly with a trial-shuffling procedure commonly used in neuroscience. These findings highlight a previously overlooked aspect of dropout layers that can affect network performance. Such dropout networks could also potentially be a computational model of neural variability.
Collapse
Affiliation(s)
- Xu Pan
- Department of Computer Science, University of Miami, Coral Gables, FL 33146, U.S.A.
| | - Ruben Coen-Cagli
- Department of Systems and Computational Biology, Dominick Purpura Department of Neuroscience, and Department of Ophthalmology and Visual Sciences, Albert Einstein College of Medicine, Bronx, NY 10461, U.S.A.
| | - Odelia Schwartz
- Department of Computer Science, University of Miami, Coral Gables, FL 33146, U.S.A.
| |
Collapse
|
22
|
Lippl S, Peters B, Kriegeskorte N. Can neural networks benefit from objectives that encourage iterative convergent computations? A case study of ResNets and object classification. PLoS One 2024; 19:e0293440. [PMID: 38512838 PMCID: PMC10956829 DOI: 10.1371/journal.pone.0293440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 03/05/2024] [Indexed: 03/23/2024] Open
Abstract
Recent work has suggested that feedforward residual neural networks (ResNets) approximate iterative recurrent computations. Iterative computations are useful in many domains, so they might provide good solutions for neural networks to learn. However, principled methods for measuring and manipulating iterative convergence in neural networks remain lacking. Here we address this gap by 1) quantifying the degree to which ResNets learn iterative solutions and 2) introducing a regularization approach that encourages the learning of iterative solutions. Iterative methods are characterized by two properties: iteration and convergence. To quantify these properties, we define three indices of iterative convergence. Consistent with previous work, we show that, even though ResNets can express iterative solutions, they do not learn them when trained conventionally on computer-vision tasks. We then introduce regularizations to encourage iterative convergent computation and test whether this provides a useful inductive bias. To make the networks more iterative, we manipulate the degree of weight sharing across layers using soft gradient coupling. This new method provides a form of recurrence regularization and can interpolate smoothly between an ordinary ResNet and a "recurrent" ResNet (i.e., one that uses identical weights across layers and thus could be physically implemented with a recurrent network computing the successive stages iteratively across time). To make the networks more convergent we impose a Lipschitz constraint on the residual functions using spectral normalization. The three indices of iterative convergence reveal that the gradient coupling and the Lipschitz constraint succeed at making the networks iterative and convergent, respectively. To showcase the practicality of our approach, we study how iterative convergence impacts generalization on standard visual recognition tasks (MNIST, CIFAR-10, CIFAR-100) or challenging recognition tasks with partial occlusions (Digitclutter). We find that iterative convergent computation, in these tasks, does not provide a useful inductive bias for ResNets. Importantly, our approach may be useful for investigating other network architectures and tasks as well and we hope that our study provides a useful starting point for investigating the broader question of whether iterative convergence can help neural networks in their generalization.
Collapse
Affiliation(s)
- Samuel Lippl
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, United States of America
- Department of Neuroscience, Columbia University, New York, NY, United States of America
- Center for Theoretical Neuroscience, Columbia University, New York, NY, United States of America
| | - Benjamin Peters
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, United States of America
- School of Psychology and Neuroscience, University of Glasgow, Glasgow, United Kingdom
| | - Nikolaus Kriegeskorte
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, United States of America
- Department of Neuroscience, Columbia University, New York, NY, United States of America
- Department of Psychology, Columbia University, New York, NY, United States of America
- Affiliated member, Electrical Engineering, Columbia University, New York, NY, United States of America
| |
Collapse
|
23
|
Liu P, Bo K, Ding M, Fang R. Emergence of Emotion Selectivity in Deep Neural Networks Trained to Recognize Visual Objects. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.04.16.537079. [PMID: 37163104 PMCID: PMC10168209 DOI: 10.1101/2023.04.16.537079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Recent neuroimaging studies have shown that the visual cortex plays an important role in representing the affective significance of visual input. The origin of these affect-specific visual representations is debated: they are intrinsic to the visual system versus they arise through reentry from frontal emotion processing structures such as the amygdala. We examined this problem by combining convolutional neural network (CNN) models of the human ventral visual cortex pre-trained on ImageNet with two datasets of affective images. Our results show that (1) in all layers of the CNN models, there were artificial neurons that responded consistently and selectively to neutral, pleasant, or unpleasant images and (2) lesioning these neurons by setting their output to 0 or enhancing these neurons by increasing their gain led to decreased or increased emotion recognition performance respectively. These results support the idea that the visual system may have the intrinsic ability to represent the affective significance of visual input and suggest that CNNs offer a fruitful platform for testing neuroscientific theories.
Collapse
Affiliation(s)
- Peng Liu
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, USA
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
| | - Ke Bo
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
| | - Mingzhou Ding
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, USA
| | - Ruogu Fang
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, USA
- Center for Cognitive Aging and Memory, McKnight Brain Institute, University of Florida, Gainesville, FL, USA
| |
Collapse
|
24
|
Loke J, Seijdel N, Snoek L, Sörensen LKA, van de Klundert R, van der Meer M, Quispel E, Cappaert N, Scholte HS. Human Visual Cortex and Deep Convolutional Neural Network Care Deeply about Object Background. J Cogn Neurosci 2024; 36:551-566. [PMID: 38165735 DOI: 10.1162/jocn_a_02098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2024]
Abstract
Deep convolutional neural networks (DCNNs) are able to partially predict brain activity during object categorization tasks, but factors contributing to this predictive power are not fully understood. Our study aimed to investigate the factors contributing to the predictive power of DCNNs in object categorization tasks. We compared the activity of four DCNN architectures with EEG recordings obtained from 62 human participants during an object categorization task. Previous physiological studies on object categorization have highlighted the importance of figure-ground segregation-the ability to distinguish objects from their backgrounds. Therefore, we investigated whether figure-ground segregation could explain the predictive power of DCNNs. Using a stimulus set consisting of identical target objects embedded in different backgrounds, we examined the influence of object background versus object category within both EEG and DCNN activity. Crucially, the recombination of naturalistic objects and experimentally controlled backgrounds creates a challenging and naturalistic task, while retaining experimental control. Our results showed that early EEG activity (< 100 msec) and early DCNN layers represent object background rather than object category. We also found that the ability of DCNNs to predict EEG activity is primarily influenced by how both systems process object backgrounds, rather than object categories. We demonstrated the role of figure-ground segregation as a potential prerequisite for recognition of object features, by contrasting the activations of trained and untrained (i.e., random weights) DCNNs. These findings suggest that both human visual cortex and DCNNs prioritize the segregation of object backgrounds and target objects to perform object categorization. Altogether, our study provides new insights into the mechanisms underlying object categorization as we demonstrated that both human visual cortex and DCNNs care deeply about object background.
Collapse
|
25
|
Liu P, Bo K, Ding M, Fang R. Emergence of Emotion Selectivity in Deep Neural Networks Trained to Recognize Visual Objects. PLoS Comput Biol 2024; 20:e1011943. [PMID: 38547053 PMCID: PMC10977720 DOI: 10.1371/journal.pcbi.1011943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 02/24/2024] [Indexed: 04/02/2024] Open
Abstract
Recent neuroimaging studies have shown that the visual cortex plays an important role in representing the affective significance of visual input. The origin of these affect-specific visual representations is debated: they are intrinsic to the visual system versus they arise through reentry from frontal emotion processing structures such as the amygdala. We examined this problem by combining convolutional neural network (CNN) models of the human ventral visual cortex pre-trained on ImageNet with two datasets of affective images. Our results show that in all layers of the CNN models, there were artificial neurons that responded consistently and selectively to neutral, pleasant, or unpleasant images and lesioning these neurons by setting their output to zero or enhancing these neurons by increasing their gain led to decreased or increased emotion recognition performance respectively. These results support the idea that the visual system may have the intrinsic ability to represent the affective significance of visual input and suggest that CNNs offer a fruitful platform for testing neuroscientific theories.
Collapse
Affiliation(s)
- Peng Liu
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, United States of America
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, New Hampshire, United States of America
| | - Ke Bo
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, New Hampshire, United States of America
| | - Mingzhou Ding
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, United States of America
| | - Ruogu Fang
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, United States of America
- Center for Cognitive Aging and Memory, McKnight Brain Institute, University of Florida, Gainesville, Florida, United States of America
| |
Collapse
|
26
|
Mikhailova A, Lightfoot S, Santos-Victor J, Coco MI. Differential effects of intrinsic properties of natural scenes and interference mechanisms on recognition processes in long-term visual memory. Cogn Process 2024; 25:173-187. [PMID: 37831320 DOI: 10.1007/s10339-023-01164-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Accepted: 09/20/2023] [Indexed: 10/14/2023]
Abstract
Humans display remarkable long-term visual memory (LTVM) processes. Even though images may be intrinsically memorable, the fidelity of their visual representations, and consequently the likelihood of successfully retrieving them, hinges on their similarity when concurrently held in LTVM. In this debate, it is still unclear whether intrinsic features of images (perceptual and semantic) may be mediated by mechanisms of interference generated at encoding, or during retrieval, and how these factors impinge on recognition processes. In the current study, participants (32) studied a stream of 120 natural scenes from 8 semantic categories, which varied in frequencies (4, 8, 16 or 32 exemplars per category) to generate different levels of category interference, in preparation for a recognition test. Then they were asked to indicate which of two images, presented side by side (i.e. two-alternative forced-choice), they remembered. The two images belonged to the same semantic category but varied in their perceptual similarity (similar or dissimilar). Participants also expressed their confidence (sure/not sure) about their recognition response, enabling us to tap into their metacognitive efficacy (meta-d'). Additionally, we extracted the activation of perceptual and semantic features in images (i.e. their informational richness) through deep neural network modelling and examined their impact on recognition processes. Corroborating previous literature, we found that category interference and perceptual similarity negatively impact recognition processes, as well as response times and metacognitive efficacy. Moreover, images semantically rich were less likely remembered, an effect that trumped a positive memorability boost coming from perceptual information. Critically, we did not observe any significant interaction between intrinsic features of images and interference generated either at encoding or during retrieval. All in all, our study calls for a more integrative understanding of the representational dynamics during encoding and recognition enabling us to form, maintain and access visual information.
Collapse
Affiliation(s)
- Anastasiia Mikhailova
- Institute for Systems and Robotics, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal.
| | | | - José Santos-Victor
- Institute for Systems and Robotics, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal
| | - Moreno I Coco
- Sapienza, University of Rome, Rome, Italy.
- I.R.C.C.S. Santa Lucia, Fondazione Santa Lucia, Roma, Italy.
| |
Collapse
|
27
|
Suzuki K, Seth AK, Schwartzman DJ. Modelling phenomenological differences in aetiologically distinct visual hallucinations using deep neural networks. Front Hum Neurosci 2024; 17:1159821. [PMID: 38234594 PMCID: PMC10791985 DOI: 10.3389/fnhum.2023.1159821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 09/11/2023] [Indexed: 01/19/2024] Open
Abstract
Visual hallucinations (VHs) are perceptions of objects or events in the absence of the sensory stimulation that would normally support such perceptions. Although all VHs share this core characteristic, there are substantial phenomenological differences between VHs that have different aetiologies, such as those arising from Neurodegenerative conditions, visual loss, or psychedelic compounds. Here, we examine the potential mechanistic basis of these differences by leveraging recent advances in visualising the learned representations of a coupled classifier and generative deep neural network-an approach we call 'computational (neuro)phenomenology'. Examining three aetiologically distinct populations in which VHs occur-Neurodegenerative conditions (Parkinson's Disease and Lewy Body Dementia), visual loss (Charles Bonnet Syndrome, CBS), and psychedelics-we identified three dimensions relevant to distinguishing these classes of VHs: realism (veridicality), dependence on sensory input (spontaneity), and complexity. By selectively tuning the parameters of the visualisation algorithm to reflect influence along each of these phenomenological dimensions we were able to generate 'synthetic VHs' that were characteristic of the VHs experienced by each aetiology. We verified the validity of this approach experimentally in two studies that examined the phenomenology of VHs in Neurodegenerative and CBS patients, and in people with recent psychedelic experience. These studies confirmed the existence of phenomenological differences across these three dimensions between groups, and crucially, found that the appropriate synthetic VHs were rated as being representative of each group's hallucinatory phenomenology. Together, our findings highlight the phenomenological diversity of VHs associated with distinct causal factors and demonstrate how a neural network model of visual phenomenology can successfully capture the distinctive visual characteristics of hallucinatory experience.
Collapse
Affiliation(s)
- Keisuke Suzuki
- Sussex Centre for Consciousness Science, University of Sussex, Brighton, United Kingdom
- Department of Informatics, University of Sussex, Brighton, United Kingdom
- Center for Human Nature, Artificial Intelligence and Neuroscience (CHAIN), Hokkaido University, Sapporo, Japan
| | - Anil K. Seth
- Sussex Centre for Consciousness Science, University of Sussex, Brighton, United Kingdom
- Department of Informatics, University of Sussex, Brighton, United Kingdom
- Program on Brain, Mind, and Consciousness, Canadian Institute for Advanced Research, Toronto, ON, Canada
| | - David J. Schwartzman
- Sussex Centre for Consciousness Science, University of Sussex, Brighton, United Kingdom
- Department of Informatics, University of Sussex, Brighton, United Kingdom
| |
Collapse
|
28
|
Kim G, Kim DK, Jeong H. Spontaneous emergence of rudimentary music detectors in deep neural networks. Nat Commun 2024; 15:148. [PMID: 38168097 PMCID: PMC10761941 DOI: 10.1038/s41467-023-44516-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 12/15/2023] [Indexed: 01/05/2024] Open
Abstract
Music exists in almost every society, has universal acoustic features, and is processed by distinct neural circuits in humans even with no experience of musical training. However, it remains unclear how these innate characteristics emerge and what functions they serve. Here, using an artificial deep neural network that models the auditory information processing of the brain, we show that units tuned to music can spontaneously emerge by learning natural sound detection, even without learning music. The music-selective units encoded the temporal structure of music in multiple timescales, following the population-level response characteristics observed in the brain. We found that the process of generalization is critical for the emergence of music-selectivity and that music-selectivity can work as a functional basis for the generalization of natural sound, thereby elucidating its origin. These findings suggest that evolutionary adaptation to process natural sounds can provide an initial blueprint for our sense of music.
Collapse
Affiliation(s)
- Gwangsu Kim
- Department of Physics, Korea Advanced Institute of Science and Technology, Daejeon, 34141, Korea
| | - Dong-Kyum Kim
- Department of Physics, Korea Advanced Institute of Science and Technology, Daejeon, 34141, Korea
| | - Hawoong Jeong
- Department of Physics, Korea Advanced Institute of Science and Technology, Daejeon, 34141, Korea.
- Center for Complex Systems, Korea Advanced Institute of Science and Technology, Daejeon, 34141, Korea.
| |
Collapse
|
29
|
Raman R, Bognár A, Nejad GG, Taubert N, Giese M, Vogels R. Bodies in motion: Unraveling the distinct roles of motion and shape in dynamic body responses in the temporal cortex. Cell Rep 2023; 42:113438. [PMID: 37995183 PMCID: PMC10783614 DOI: 10.1016/j.celrep.2023.113438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 09/26/2023] [Accepted: 10/26/2023] [Indexed: 11/25/2023] Open
Abstract
The temporal cortex represents social stimuli, including bodies. We examine and compare the contributions of dynamic and static features to the single-unit responses to moving monkey bodies in and between a patch in the anterior dorsal bank of the superior temporal sulcus (dorsal patch [DP]) and patches in the anterior inferotemporal cortex (ventral patch [VP]), using fMRI guidance in macaques. The response to dynamics varies within both regions, being higher in DP. The dynamic body selectivity of VP neurons correlates with static features derived from convolutional neural networks and motion. DP neurons' dynamic body selectivity is not predicted by static features but is dominated by motion. Whereas these data support the dominance of motion in the newly proposed "dynamic social perception" stream, they challenge the traditional view that distinguishes DP and VP processing in terms of motion versus static features, underscoring the role of inferotemporal neurons in representing body dynamics.
Collapse
Affiliation(s)
- Rajani Raman
- Department of Neurosciences, KU Leuven, 3000 Leuven, Belgium; Leuven Brain Institute, KU Leuven, 3000 Leuven, Belgium
| | - Anna Bognár
- Department of Neurosciences, KU Leuven, 3000 Leuven, Belgium; Leuven Brain Institute, KU Leuven, 3000 Leuven, Belgium
| | - Ghazaleh Ghamkhari Nejad
- Department of Neurosciences, KU Leuven, 3000 Leuven, Belgium; Leuven Brain Institute, KU Leuven, 3000 Leuven, Belgium
| | - Nick Taubert
- Hertie Institute for Clinical Brain Research and Center for Integrative Neuroscience, University Clinic Tuebingen, 72074 Tuebingen, Germany
| | - Martin Giese
- Hertie Institute for Clinical Brain Research and Center for Integrative Neuroscience, University Clinic Tuebingen, 72074 Tuebingen, Germany
| | - Rufin Vogels
- Department of Neurosciences, KU Leuven, 3000 Leuven, Belgium; Leuven Brain Institute, KU Leuven, 3000 Leuven, Belgium.
| |
Collapse
|
30
|
Shi Y, Bi D, Hesse JK, Lanfranchi FF, Chen S, Tsao DY. Rapid, concerted switching of the neural code in inferotemporal cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.06.570341. [PMID: 38106108 PMCID: PMC10723419 DOI: 10.1101/2023.12.06.570341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
A fundamental paradigm in neuroscience is the concept of neural coding through tuning functions 1 . According to this idea, neurons encode stimuli through fixed mappings of stimulus features to firing rates. Here, we report that the tuning of visual neurons can rapidly and coherently change across a population to attend to a whole and its parts. We set out to investigate a longstanding debate concerning whether inferotemporal (IT) cortex uses a specialized code for representing specific types of objects or whether it uses a general code that applies to any object. We found that face cells in macaque IT cortex initially adopted a general code optimized for face detection. But following a rapid, concerted population event lasting < 20 ms, the neural code transformed into a face-specific one with two striking properties: (i) response gradients to principal detection-related dimensions reversed direction, and (ii) new tuning developed to multiple higher feature space dimensions supporting fine face discrimination. These dynamics were face specific and did not occur in response to objects. Overall, these results show that, for faces, face cells shift from detection to discrimination by switching from an object-general code to a face-specific code. More broadly, our results suggest a novel mechanism for neural representation: concerted, stimulus-dependent switching of the neural code used by a cortical area.
Collapse
|
31
|
Schnell AE, Leemans M, Vinken K, Op de Beeck H. A computationally informed comparison between the strategies of rodents and humans in visual object recognition. eLife 2023; 12:RP87719. [PMID: 38079481 PMCID: PMC10712954 DOI: 10.7554/elife.87719] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2023] Open
Abstract
Many species are able to recognize objects, but it has been proven difficult to pinpoint and compare how different species solve this task. Recent research suggested to combine computational and animal modelling in order to obtain a more systematic understanding of task complexity and compare strategies between species. In this study, we created a large multidimensional stimulus set and designed a visual discrimination task partially based upon modelling with a convolutional deep neural network (CNN). Experiments included rats (N = 11; 1115 daily sessions in total for all rats together) and humans (N = 45). Each species was able to master the task and generalize to a variety of new images. Nevertheless, rats and humans showed very little convergence in terms of which object pairs were associated with high and low performance, suggesting the use of different strategies. There was an interaction between species and whether stimulus pairs favoured early or late processing in a CNN. A direct comparison with CNN representations and visual feature analyses revealed that rat performance was best captured by late convolutional layers and partially by visual features such as brightness and pixel-level similarity, while human performance related more to the higher-up fully connected layers. These findings highlight the additional value of using a computational approach for the design of object recognition tasks. Overall, this computationally informed investigation of object recognition behaviour reveals a strong discrepancy in strategies between rodent and human vision.
Collapse
Affiliation(s)
| | - Maarten Leemans
- Department of Brain and Cognition & Leuven Brain InstituteLeuvenBelgium
| | - Kasper Vinken
- Department of Neurobiology, Harvard Medical SchoolBostonUnited States
| | - Hans Op de Beeck
- Department of Brain and Cognition & Leuven Brain InstituteLeuvenBelgium
| |
Collapse
|
32
|
Ligeralde A, Kuang Y, Yerxa TE, Pitcher MN, Feller M, Chung S. Unsupervised learning on spontaneous retinal activity leads to efficient neural representation geometry. ARXIV 2023:arXiv:2312.02791v1. [PMID: 38106456 PMCID: PMC10723543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Prior to the onset of vision, neurons in the developing mammalian retina spontaneously fire in correlated activity patterns known as retinal waves. Experimental evidence suggests that retinal waves strongly influence the emergence of sensory representations before visual experience. We aim to model this early stage of functional development by using movies of neurally active developing retinas as pre-training data for neural networks. Specifically, we pre-train a ResNet-18 with an unsupervised contrastive learning objective (SimCLR) on both simulated and experimentally-obtained movies of retinal waves, then evaluate its performance on image classification tasks. We find that pre-training on retinal waves significantly improves performance on tasks that test object invariance to spatial translation, while slightly improving performance on more complex tasks like image classification. Notably, these performance boosts are realized on held-out natural images even though the pre-training procedure does not include any natural image data. We then propose a geometrical explanation for the increase in network performance, namely that the spatiotemporal characteristics of retinal waves facilitate the formation of separable feature representations. In particular, we demonstrate that networks pre-trained on retinal waves are more effective at separating image manifolds than randomly initialized networks, especially for manifolds defined by sets of spatial translations. These findings indicate that the broad spatiotemporal properties of retinal waves prepare networks for higher order feature extraction.
Collapse
Affiliation(s)
- Andrew Ligeralde
- Biophysics Graduate Group, University of California, Berkeley
- Center for Computational Neuroscience, Flatiron Institute
| | - Yilun Kuang
- Center for Computational Neuroscience, Flatiron Institute
- Courant Inst. of Mathematical Sciences, New York University
| | - Thomas Edward Yerxa
- Center for Computational Neuroscience, Flatiron Institute
- Center for Neural Science, New York University
| | - Miah N Pitcher
- Helen Wills Neuroscience Institute, University of California, Berkeley
| | - Marla Feller
- Helen Wills Neuroscience Institute, University of California, Berkeley
- Department of Molecular and Cell Biology, University of California, Berkeley
| | - SueYeon Chung
- Center for Computational Neuroscience, Flatiron Institute
- Center for Neural Science, New York University
| |
Collapse
|
33
|
Fang Z, Bloem IM, Olsson C, Ma WJ, Winawer J. Normalization by orientation-tuned surround in human V1-V3. PLoS Comput Biol 2023; 19:e1011704. [PMID: 38150484 PMCID: PMC10793941 DOI: 10.1371/journal.pcbi.1011704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 01/17/2024] [Accepted: 11/20/2023] [Indexed: 12/29/2023] Open
Abstract
An influential account of neuronal responses in primary visual cortex is the normalized energy model. This model is often implemented as a multi-stage computation. The first stage is linear filtering. The second stage is the extraction of contrast energy, whereby a complex cell computes the squared and summed outputs of a pair of the linear filters in quadrature phase. The third stage is normalization, in which a local population of complex cells mutually inhibit one another. Because the population includes cells tuned to a range of orientations and spatial frequencies, the result is that the responses are effectively normalized by the local stimulus contrast. Here, using evidence from human functional MRI, we show that the classical model fails to account for the relative responses to two classes of stimuli: straight, parallel, band-passed contours (gratings), and curved, band-passed contours (snakes). The snakes elicit fMRI responses that are about twice as large as the gratings, yet a traditional divisive normalization model predicts responses that are about the same. Motivated by these observations and others from the literature, we implement a divisive normalization model in which cells matched in orientation tuning ("tuned normalization") preferentially inhibit each other. We first show that this model accounts for differential responses to these two classes of stimuli. We then show that the model successfully generalizes to other band-pass textures, both in V1 and in extrastriate cortex (V2 and V3). We conclude that even in primary visual cortex, complex features of images such as the degree of heterogeneity, can have large effects on neural responses.
Collapse
Affiliation(s)
- Zeming Fang
- Department of Psychology and Center for Neural Science, New York University, New York City, New York, United States of America
- Department of Cognitive Science, Rensselaer Polytechnic Institute, Troy, New York, United States of America
| | - Ilona M. Bloem
- Department of Psychology and Center for Neural Science, New York University, New York City, New York, United States of America
| | - Catherine Olsson
- Department of Psychology and Center for Neural Science, New York University, New York City, New York, United States of America
| | - Wei Ji Ma
- Department of Psychology and Center for Neural Science, New York University, New York City, New York, United States of America
| | - Jonathan Winawer
- Department of Psychology and Center for Neural Science, New York University, New York City, New York, United States of America
| |
Collapse
|
34
|
Li Y, Anumanchipalli GK, Mohamed A, Chen P, Carney LH, Lu J, Wu J, Chang EF. Dissecting neural computations in the human auditory pathway using deep neural networks for speech. Nat Neurosci 2023; 26:2213-2225. [PMID: 37904043 PMCID: PMC10689246 DOI: 10.1038/s41593-023-01468-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 09/13/2023] [Indexed: 11/01/2023]
Abstract
The human auditory system extracts rich linguistic abstractions from speech signals. Traditional approaches to understanding this complex process have used linear feature-encoding models, with limited success. Artificial neural networks excel in speech recognition tasks and offer promising computational models of speech processing. We used speech representations in state-of-the-art deep neural network (DNN) models to investigate neural coding from the auditory nerve to the speech cortex. Representations in hierarchical layers of the DNN correlated well with the neural activity throughout the ascending auditory system. Unsupervised speech models performed at least as well as other purely supervised or fine-tuned models. Deeper DNN layers were better correlated with the neural activity in the higher-order auditory cortex, with computations aligned with phonemic and syllabic structures in speech. Accordingly, DNN models trained on either English or Mandarin predicted cortical responses in native speakers of each language. These results reveal convergence between DNN model representations and the biological auditory pathway, offering new approaches for modeling neural coding in the auditory cortex.
Collapse
Affiliation(s)
- Yuanning Li
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China
| | - Gopala K Anumanchipalli
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
- Department of Electrical Engineering and Computer Science, University of California, Berkeley, Berkeley, CA, USA
| | | | - Peili Chen
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materialsand Devices, ShanghaiTech University, Shanghai, China
| | - Laurel H Carney
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA
| | - Junfeng Lu
- Neurologic Surgery Department, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai, China
- Brain Function Laboratory, Neurosurgical Institute, Fudan University, Shanghai, China
| | - Jinsong Wu
- Neurologic Surgery Department, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai, China
- Brain Function Laboratory, Neurosurgical Institute, Fudan University, Shanghai, China
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA.
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
35
|
Moore JA, Wilms M, Gutierrez A, Ismail Z, Fakhar K, Hadaeghi F, Hilgetag CC, Forkert ND. Simulation of neuroplasticity in a CNN-based in-silico model of neurodegeneration of the visual system. Front Comput Neurosci 2023; 17:1274824. [PMID: 38105786 PMCID: PMC10722164 DOI: 10.3389/fncom.2023.1274824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 11/08/2023] [Indexed: 12/19/2023] Open
Abstract
The aim of this work was to enhance the biological feasibility of a deep convolutional neural network-based in-silico model of neurodegeneration of the visual system by equipping it with a mechanism to simulate neuroplasticity. Therefore, deep convolutional networks of multiple sizes were trained for object recognition tasks and progressively lesioned to simulate neurodegeneration of the visual cortex. More specifically, the injured parts of the network remained injured while we investigated how the added retraining steps were able to recover some of the model's object recognition baseline performance. The results showed with retraining, model object recognition abilities are subject to a smoother and more gradual decline with increasing injury levels than without retraining and, therefore, more similar to the longitudinal cognition impairments of patients diagnosed with Alzheimer's disease (AD). Moreover, with retraining, the injured model exhibits internal activation patterns similar to those of the healthy baseline model when compared to the injured model without retraining. Furthermore, we conducted this analysis on a network that had been extensively pruned, resulting in an optimized number of parameters or synapses. Our findings show that this network exhibited remarkably similar capability to recover task performance with decreasingly viable pathways through the network. In conclusion, adding a retraining step to the in-silico setup that simulates neuroplasticity improves the model's biological feasibility considerably and could prove valuable to test different rehabilitation approaches in-silico.
Collapse
Affiliation(s)
- Jasmine A. Moore
- Department of Radiology, University of Calgary, Calgary, AB, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
- Biomedical Engineering Program, University of Calgary, Calgary, AB, Canada
| | - Matthias Wilms
- Department of Radiology, University of Calgary, Calgary, AB, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
- Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Alejandro Gutierrez
- Department of Radiology, University of Calgary, Calgary, AB, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
- Biomedical Engineering Program, University of Calgary, Calgary, AB, Canada
| | - Zahinoor Ismail
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
- Department of Clinical Neurosciences, University of Calgary, Calgary, AB, Canada
| | - Kayson Fakhar
- Institute of Computational Neuroscience, University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
| | - Fatemeh Hadaeghi
- Institute of Computational Neuroscience, University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
| | - Claus C. Hilgetag
- Institute of Computational Neuroscience, University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
- Department of Health Sciences, Boston University, Boston, MA, United States
| | - Nils D. Forkert
- Department of Radiology, University of Calgary, Calgary, AB, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
- Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| |
Collapse
|
36
|
Karapetian A, Boyanova A, Pandaram M, Obermayer K, Kietzmann TC, Cichy RM. Empirically Identifying and Computationally Modeling the Brain-Behavior Relationship for Human Scene Categorization. J Cogn Neurosci 2023; 35:1879-1897. [PMID: 37590093 PMCID: PMC10586810 DOI: 10.1162/jocn_a_02043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/19/2023]
Abstract
Humans effortlessly make quick and accurate perceptual decisions about the nature of their immediate visual environment, such as the category of the scene they face. Previous research has revealed a rich set of cortical representations potentially underlying this feat. However, it remains unknown which of these representations are suitably formatted for decision-making. Here, we approached this question empirically and computationally, using neuroimaging and computational modeling. For the empirical part, we collected EEG data and RTs from human participants during a scene categorization task (natural vs. man-made). We then related EEG data to behavior to behavior using a multivariate extension of signal detection theory. We observed a correlation between neural data and behavior specifically between ∼100 msec and ∼200 msec after stimulus onset, suggesting that the neural scene representations in this time period are suitably formatted for decision-making. For the computational part, we evaluated a recurrent convolutional neural network (RCNN) as a model of brain and behavior. Unifying our previous observations in an image-computable model, the RCNN predicted well the neural representations, the behavioral scene categorization data, as well as the relationship between them. Our results identify and computationally characterize the neural and behavioral correlates of scene categorization in humans.
Collapse
Affiliation(s)
- Agnessa Karapetian
- Freie Universität Berlin, Germany
- Charité - Universitätsmedizin Berlin, Einstein Center for Neurosciences Berlin, Germany
- Bernstein Center for Computational Neuroscience Berlin, Germany
| | | | | | - Klaus Obermayer
- Charité - Universitätsmedizin Berlin, Einstein Center for Neurosciences Berlin, Germany
- Bernstein Center for Computational Neuroscience Berlin, Germany
- Technische Universität Berlin, Germany
- Humboldt-Universität zu Berlin, Germany
| | | | - Radoslaw M Cichy
- Freie Universität Berlin, Germany
- Charité - Universitätsmedizin Berlin, Einstein Center for Neurosciences Berlin, Germany
- Bernstein Center for Computational Neuroscience Berlin, Germany
- Humboldt-Universität zu Berlin, Germany
| |
Collapse
|
37
|
Velarde OM, Makse HA, Parra LC. Architecture of the brain's visual system enhances network stability and performance through layers, delays, and feedback. PLoS Comput Biol 2023; 19:e1011078. [PMID: 37948463 PMCID: PMC10664920 DOI: 10.1371/journal.pcbi.1011078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 11/22/2023] [Accepted: 10/19/2023] [Indexed: 11/12/2023] Open
Abstract
In the visual system of primates, image information propagates across successive cortical areas, and there is also local feedback within an area and long-range feedback across areas. Recent findings suggest that the resulting temporal dynamics of neural activity are crucial in several vision tasks. In contrast, artificial neural network models of vision are typically feedforward and do not capitalize on the benefits of temporal dynamics, partly due to concerns about stability and computational costs. In this study, we focus on recurrent networks with feedback connections for visual tasks with static input corresponding to a single fixation. We demonstrate mathematically that a network's dynamics can be stabilized by four key features of biological networks: layer-ordered structure, temporal delays between layers, longer distance feedback across layers, and nonlinear neuronal responses. Conversely, when feedback has a fixed distance, one can omit delays in feedforward connections to achieve more efficient artificial implementations. We also evaluated the effect of feedback connections on object detection and classification performance using standard benchmarks, specifically the COCO and CIFAR10 datasets. Our findings indicate that feedback connections improved the detection of small objects, and classification performance became more robust to noise. We found that performance increased with the temporal dynamics, not unlike what is observed in core vision of primates. These results suggest that delays and layered organization are crucial features for stability and performance in both biological and artificial recurrent neural networks.
Collapse
Affiliation(s)
- Osvaldo Matias Velarde
- Biomedical Engineering Department, The City College of New York, New York, New York, United States of America
| | - Hernán A. Makse
- Levich Institute and Physics Department, The City College of New York, New York, New York, United States of America
| | - Lucas C. Parra
- Biomedical Engineering Department, The City College of New York, New York, New York, United States of America
| |
Collapse
|
38
|
van Dyck LE, Gruber WR. Modeling Biological Face Recognition with Deep Convolutional Neural Networks. J Cogn Neurosci 2023; 35:1521-1537. [PMID: 37584587 DOI: 10.1162/jocn_a_02040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2023]
Abstract
Deep convolutional neural networks (DCNNs) have become the state-of-the-art computational models of biological object recognition. Their remarkable success has helped vision science break new ground, and recent efforts have started to transfer this achievement to research on biological face recognition. In this regard, face detection can be investigated by comparing face-selective biological neurons and brain areas to artificial neurons and model layers. Similarly, face identification can be examined by comparing in vivo and in silico multidimensional "face spaces." In this review, we summarize the first studies that use DCNNs to model biological face recognition. On the basis of a broad spectrum of behavioral and computational evidence, we conclude that DCNNs are useful models that closely resemble the general hierarchical organization of face recognition in the ventral visual pathway and the core face network. In two exemplary spotlights, we emphasize the unique scientific contributions of these models. First, studies on face detection in DCNNs indicate that elementary face selectivity emerges automatically through feedforward processing even in the absence of visual experience. Second, studies on face identification in DCNNs suggest that identity-specific experience and generative mechanisms facilitate this particular challenge. Taken together, as this novel modeling approach enables close control of predisposition (i.e., architecture) and experience (i.e., training data), it may be suited to inform long-standing debates on the substrates of biological face recognition.
Collapse
|
39
|
Farahat A, Effenberger F, Vinck M. A novel feature-scrambling approach reveals the capacity of convolutional neural networks to learn spatial relations. Neural Netw 2023; 167:400-414. [PMID: 37673027 DOI: 10.1016/j.neunet.2023.08.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 07/07/2023] [Accepted: 08/13/2023] [Indexed: 09/08/2023]
Abstract
Convolutional neural networks (CNNs) are one of the most successful computer vision systems to solve object recognition. Furthermore, CNNs have major applications in understanding the nature of visual representations in the human brain. Yet it remains poorly understood how CNNs actually make their decisions, what the nature of their internal representations is, and how their recognition strategies differ from humans. Specifically, there is a major debate about the question of whether CNNs primarily rely on surface regularities of objects, or whether they are capable of exploiting the spatial arrangement of features, similar to humans. Here, we develop a novel feature-scrambling approach to explicitly test whether CNNs use the spatial arrangement of features (i.e. object parts) to classify objects. We combine this approach with a systematic manipulation of effective receptive field sizes of CNNs as well as minimal recognizable configurations (MIRCs) analysis. In contrast to much previous literature, we provide evidence that CNNs are in fact capable of using relatively long-range spatial relationships for object classification. Moreover, the extent to which CNNs use spatial relationships depends heavily on the dataset, e.g. texture vs. sketch. In fact, CNNs even use different strategies for different classes within heterogeneous datasets (ImageNet), suggesting CNNs have a continuous spectrum of classification strategies. Finally, we show that CNNs learn the spatial arrangement of features only up to an intermediate level of granularity, which suggests that intermediate rather than global shape features provide the optimal trade-off between sensitivity and specificity in object classification. These results provide novel insights into the nature of CNN representations and the extent to which they rely on the spatial arrangement of features for object classification.
Collapse
Affiliation(s)
- Amr Farahat
- Ernst Strüngmann Institute for Neuroscience in Cooperation with Max Planck Society, Frankfurt, Germany; Donders Centre for Neuroscience, Department of Neuroinformatics, Radboud University, Nijmegen, The Netherlands.
| | - Felix Effenberger
- Ernst Strüngmann Institute for Neuroscience in Cooperation with Max Planck Society, Frankfurt, Germany; Frankfurt Institute for Advanced Studies, Frankfurt, Germany
| | - Martin Vinck
- Ernst Strüngmann Institute for Neuroscience in Cooperation with Max Planck Society, Frankfurt, Germany; Donders Centre for Neuroscience, Department of Neuroinformatics, Radboud University, Nijmegen, The Netherlands
| |
Collapse
|
40
|
Li B, Zhang C, Cao L, Chen P, Liu T, Gao H, Wang L, Yan B, Tong L. Brain Functional Representation of Highly Occluded Object Recognition. Brain Sci 2023; 13:1387. [PMID: 37891756 PMCID: PMC10605645 DOI: 10.3390/brainsci13101387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 09/23/2023] [Accepted: 09/27/2023] [Indexed: 10/29/2023] Open
Abstract
Recognizing highly occluded objects is believed to arise from the interaction between the brain's vision and cognition-controlling areas, although supporting neuroimaging data are currently limited. To explore the neural mechanism during this activity, we conducted an occlusion object recognition experiment using functional magnetic resonance imaging (fMRI). During magnet resonance examinations, 66 subjects engaged in object recognition tasks with three different occlusion degrees. Generalized linear model (GLM) analysis showed that the activation degree of the occipital lobe (inferior occipital gyrus, middle occipital gyrus, and occipital fusiform gyrus) and dorsal anterior cingulate cortex (dACC) was related to the occlusion degree of the objects. Multivariate pattern analysis (MVPA) further unearthed a considerable surge in classification precision when dACC activation was incorporated as a feature. This suggested the combined role of dACC and the occipital lobe in occluded object recognition tasks. Moreover, psychophysiological interaction (PPI) analysis disclosed that functional connectivity (FC) between the dACC and the occipital lobe was enhanced with increased occlusion, highlighting the necessity of FC between these two brain regions in effectively identifying exceedingly occluded objects. In conclusion, these findings contribute to understanding the neural mechanisms of highly occluded object recognition, augmenting our appreciation of how the brain manages incomplete visual data.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Li Tong
- Henan Key Laboratory of Imaging and Intelligent Processing, PLA Strategic Support Force Information Engineering University, Zhengzhou 450001, China; (B.L.); (C.Z.); (T.L.)
| |
Collapse
|
41
|
Yao M, Wen B, Yang M, Guo J, Jiang H, Feng C, Cao Y, He H, Chang L. High-dimensional topographic organization of visual features in the primate temporal lobe. Nat Commun 2023; 14:5931. [PMID: 37739988 PMCID: PMC10517140 DOI: 10.1038/s41467-023-41584-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 09/07/2023] [Indexed: 09/24/2023] Open
Abstract
The inferotemporal cortex supports our supreme object recognition ability. Numerous studies have been conducted to elucidate the functional organization of this brain area, but there are still important questions that remain unanswered, including how this organization differs between humans and non-human primates. Here, we use deep neural networks trained on object categorization to construct a 25-dimensional space of visual features, and systematically measure the spatial organization of feature preference in both male monkey brains and human brains using fMRI. These feature maps allow us to predict the selectivity of a previously unknown region in monkey brains, which is corroborated by additional fMRI and electrophysiology experiments. These maps also enable quantitative analyses of the topographic organization of the temporal lobe, demonstrating the existence of a pair of orthogonal gradients that differ in spatial scale and revealing significant differences in the functional organization of high-level visual areas between monkey and human brains.
Collapse
Affiliation(s)
- Mengna Yao
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Bincheng Wen
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
- Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
| | - Mingpo Yang
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Jiebin Guo
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Haozhou Jiang
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Chao Feng
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Yilei Cao
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Huiguang He
- University of Chinese Academy of Sciences, Beijing, 100049, China
- Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
| | - Le Chang
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
42
|
Abstract
Perception and memory are traditionally thought of as separate cognitive functions, supported by distinct brain regions. The canonical perspective is that perceptual processing of visual information is supported by the ventral visual stream, whereas long-term declarative memory is supported by the medial temporal lobe. However, this modular framework cannot account for the increasingly large body of evidence that reveals a role for early visual areas in long-term recognition memory and a role for medial temporal lobe structures in high-level perceptual processing. In this article, we review relevant research conducted in humans, nonhuman primates, and rodents. We conclude that the evidence is largely inconsistent with theoretical proposals that draw sharp functional boundaries between perceptual and memory systems in the brain. Instead, the weight of the empirical findings is best captured by a representational-hierarchical model that emphasizes differences in content, rather than in cognitive processes within the ventral visual stream and medial temporal lobe.
Collapse
Affiliation(s)
- Chris B Martin
- Department of Psychology, Florida State University, Tallahassee, Florida, USA;
| | - Morgan D Barense
- Department of Psychology, University of Toronto, Toronto, Ontario, Canada;
- Rotman Research Institute, Baycrest Hospital, Toronto, Ontario, Canada
| |
Collapse
|
43
|
Veerabadran V, Goldman J, Shankar S, Cheung B, Papernot N, Kurakin A, Goodfellow I, Shlens J, Sohl-Dickstein J, Mozer MC, Elsayed GF. Subtle adversarial image manipulations influence both human and machine perception. Nat Commun 2023; 14:4933. [PMID: 37582834 PMCID: PMC10427626 DOI: 10.1038/s41467-023-40499-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Accepted: 08/01/2023] [Indexed: 08/17/2023] Open
Abstract
Although artificial neural networks (ANNs) were inspired by the brain, ANNs exhibit a brittleness not generally observed in human perception. One shortcoming of ANNs is their susceptibility to adversarial perturbations-subtle modulations of natural images that result in changes to classification decisions, such as confidently mislabelling an image of an elephant, initially classified correctly, as a clock. In contrast, a human observer might well dismiss the perturbations as an innocuous imaging artifact. This phenomenon may point to a fundamental difference between human and machine perception, but it drives one to ask whether human sensitivity to adversarial perturbations might be revealed with appropriate behavioral measures. Here, we find that adversarial perturbations that fool ANNs similarly bias human choice. We further show that the effect is more likely driven by higher-order statistics of natural images to which both humans and ANNs are sensitive, rather than by the detailed architecture of the ANN.
Collapse
Affiliation(s)
- Vijay Veerabadran
- Google, Mountain View, CA, USA
- Department of Cognitive Science, University of California, San Diego, CA, USA
| | | | - Shreya Shankar
- Google, Mountain View, CA, USA
- University of California, Berkeley, CA, USA
| | - Brian Cheung
- Google, Mountain View, CA, USA
- MIT Brain and Cognitive Sciences, Cambridge, MA, USA
| | | | | | | | | | | | | | | |
Collapse
|
44
|
Baek S, Park Y, Paik SB. Species-specific wiring of cortical circuits for small-world networks in the primary visual cortex. PLoS Comput Biol 2023; 19:e1011343. [PMID: 37540638 PMCID: PMC10403141 DOI: 10.1371/journal.pcbi.1011343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 07/10/2023] [Indexed: 08/06/2023] Open
Abstract
Long-range horizontal connections (LRCs) are conspicuous anatomical structures in the primary visual cortex (V1) of mammals, yet their detailed functions in relation to visual processing are not fully understood. Here, we show that LRCs are key components to organize a "small-world network" optimized for each size of the visual cortex, enabling the cost-efficient integration of visual information. Using computational simulations of a biologically inspired model neural network, we found that sparse LRCs added to networks, combined with dense local connections, compose a small-world network and significantly enhance image classification performance. We confirmed that the performance of the network appeared to be strongly correlated with the small-world coefficient of the model network under various conditions. Our theoretical model demonstrates that the amount of LRCs to build a small-world network depends on each size of cortex and that LRCs are beneficial only when the size of the network exceeds a certain threshold. Our model simulation of various sizes of cortices validates this prediction and provides an explanation of the species-specific existence of LRCs in animal data. Our results provide insight into a biological strategy of the brain to balance functional performance and resource cost.
Collapse
Affiliation(s)
- Seungdae Baek
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - Youngjin Park
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - Se-Bum Paik
- Department of Brain and Cognitive Sciences, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| |
Collapse
|
45
|
Pierzchlewicz PA, Willeke KF, Nix AF, Elumalai P, Restivo K, Shinn T, Nealley C, Rodriguez G, Patel S, Franke K, Tolias AS, Sinz FH. Energy Guided Diffusion for Generating Neurally Exciting Images. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.18.541176. [PMID: 37292670 PMCID: PMC10245650 DOI: 10.1101/2023.05.18.541176] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In recent years, most exciting inputs (MEIs) synthesized from encoding models of neuronal activity have become an established method to study tuning properties of biological and artificial visual systems. However, as we move up the visual hierarchy, the complexity of neuronal computations increases. Consequently, it becomes more challenging to model neuronal activity, requiring more complex models. In this study, we introduce a new attention readout for a convolutional data-driven core for neurons in macaque V4 that outperforms the state-of-the-art task-driven ResNet model in predicting neuronal responses. However, as the predictive network becomes deeper and more complex, synthesizing MEIs via straightforward gradient ascent (GA) can struggle to produce qualitatively good results and overfit to idiosyncrasies of a more complex model, potentially decreasing the MEI's model-to-brain transferability. To solve this problem, we propose a diffusion-based method for generating MEIs via Energy Guidance (EGG). We show that for models of macaque V4, EGG generates single neuron MEIs that generalize better across architectures than the state-of-the-art GA while preserving the within-architectures activation and requiring 4.7x less compute time. Furthermore, EGG diffusion can be used to generate other neurally exciting images, like most exciting natural images that are on par with a selection of highly activating natural images, or image reconstructions that generalize better across architectures. Finally, EGG is simple to implement, requires no retraining of the diffusion model, and can easily be generalized to provide other characterizations of the visual system, such as invariances. Thus EGG provides a general and flexible framework to study coding properties of the visual system in the context of natural images.
Collapse
Affiliation(s)
- Paweł A Pierzchlewicz
- Institute for Bioinformatics and Medical Informatics, Tübingen University, Tübingen, Germany
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
| | - Konstantin F Willeke
- Institute for Bioinformatics and Medical Informatics, Tübingen University, Tübingen, Germany
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
| | - Arne F Nix
- Institute for Bioinformatics and Medical Informatics, Tübingen University, Tübingen, Germany
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
| | - Pavithra Elumalai
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
| | - Kelli Restivo
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
| | - Tori Shinn
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
| | - Cate Nealley
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
| | - Gabrielle Rodriguez
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
| | - Saumil Patel
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
| | - Katrin Franke
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
| | - Andreas S Tolias
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| | - Fabian H Sinz
- Institute for Bioinformatics and Medical Informatics, Tübingen University, Tübingen, Germany
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
| |
Collapse
|
46
|
Pennington JR, David SV. A convolutional neural network provides a generalizable model of natural sound coding by neural populations in auditory cortex. PLoS Comput Biol 2023; 19:e1011110. [PMID: 37146065 DOI: 10.1371/journal.pcbi.1011110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2022] [Revised: 05/17/2023] [Accepted: 04/17/2023] [Indexed: 05/07/2023] Open
Abstract
Convolutional neural networks (CNNs) can provide powerful and flexible models of neural sensory processing. However, the utility of CNNs in studying the auditory system has been limited by their requirement for large datasets and the complex response properties of single auditory neurons. To address these limitations, we developed a population encoding model: a CNN that simultaneously predicts activity of several hundred neurons recorded during presentation of a large set of natural sounds. This approach defines a shared spectro-temporal space and pools statistical power across neurons. Population models of varying architecture performed consistently and substantially better than traditional linear-nonlinear models on data from primary and non-primary auditory cortex. Moreover, population models were highly generalizable. The output layer of a model pre-trained on one population of neurons could be fit to data from novel single units, achieving performance equivalent to that of neurons in the original fit data. This ability to generalize suggests that population encoding models capture a complete representational space across neurons in an auditory cortical field.
Collapse
Affiliation(s)
- Jacob R Pennington
- Washington State University, Vancouver, Washington, United States of America
| | - Stephen V David
- Oregon Hearing Research Center, Oregon Health and Science University, Oregon, United States of America
| |
Collapse
|
47
|
Akbarinia A, Morgenstern Y, Gegenfurtner KR. Contrast sensitivity function in deep networks. Neural Netw 2023; 164:228-244. [PMID: 37156217 DOI: 10.1016/j.neunet.2023.04.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 03/14/2023] [Accepted: 04/18/2023] [Indexed: 05/10/2023]
Abstract
The contrast sensitivity function (CSF) is a fundamental signature of the visual system that has been measured extensively in several species. It is defined by the visibility threshold for sinusoidal gratings at all spatial frequencies. Here, we investigated the CSF in deep neural networks using the same 2AFC contrast detection paradigm as in human psychophysics. We examined 240 networks pretrained on several tasks. To obtain their corresponding CSFs, we trained a linear classifier on top of the extracted features from frozen pretrained networks. The linear classifier is exclusively trained on a contrast discrimination task with natural images. It has to find which of the two input images has higher contrast. The network's CSF is measured by detecting which one of two images contains a sinusoidal grating of varying orientation and spatial frequency. Our results demonstrate characteristics of the human CSF are manifested in deep networks both in the luminance channel (a band-limited inverted U-shaped function) and in the chromatic channels (two low-pass functions of similar properties). The exact shape of the networks' CSF appears to be task-dependent. The human CSF is better captured by networks trained on low-level visual tasks such as image-denoising or autoencoding. However, human-like CSF also emerges in mid- and high-level tasks such as edge detection and object recognition. Our analysis shows that human-like CSF appears in all architectures but at different depths of processing, some at early layers, while others in intermediate and final layers. Overall, these results suggest that (i) deep networks model the human CSF faithfully, making them suitable candidates for applications of image quality and compression, (ii) efficient/purposeful processing of the natural world drives the CSF shape, and (iii) visual representation from all levels of visual hierarchy contribute to the tuning curve of the CSF, in turn implying a function which we intuitively think of as modulated by low-level visual features may arise as a consequence of pooling from a larger set of neurons at all levels of the visual system.
Collapse
Affiliation(s)
- Arash Akbarinia
- Department of Experimental Psychology, University of Giessen, Germany.
| | - Yaniv Morgenstern
- Department of Experimental Psychology, University of Giessen, Germany; Faculty of Psychology and Educational Sciences, KU Leuven, Belgium
| | | |
Collapse
|
48
|
Beguš G, Zhou A, Zhao TC. Encoding of speech in convolutional layers and the brain stem based on language experience. Sci Rep 2023; 13:6480. [PMID: 37081119 PMCID: PMC10119295 DOI: 10.1038/s41598-023-33384-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 04/12/2023] [Indexed: 04/22/2023] Open
Abstract
Comparing artificial neural networks with outputs of neuroimaging techniques has recently seen substantial advances in (computer) vision and text-based language models. Here, we propose a framework to compare biological and artificial neural computations of spoken language representations and propose several new challenges to this paradigm. The proposed technique is based on a similar principle that underlies electroencephalography (EEG): averaging of neural (artificial or biological) activity across neurons in the time domain, and allows to compare encoding of any acoustic property in the brain and in intermediate convolutional layers of an artificial neural network. Our approach allows a direct comparison of responses to a phonetic property in the brain and in deep neural networks that requires no linear transformations between the signals. We argue that the brain stem response (cABR) and the response in intermediate convolutional layers to the exact same stimulus are highly similar without applying any transformations, and we quantify this observation. The proposed technique not only reveals similarities, but also allows for analysis of the encoding of actual acoustic properties in the two signals: we compare peak latency (i) in cABR relative to the stimulus in the brain stem and in (ii) intermediate convolutional layers relative to the input/output in deep convolutional networks. We also examine and compare the effect of prior language exposure on the peak latency in cABR and in intermediate convolutional layers. Substantial similarities in peak latency encoding between the human brain and intermediate convolutional networks emerge based on results from eight trained networks (including a replication experiment). The proposed technique can be used to compare encoding between the human brain and intermediate convolutional layers for any acoustic property and for other neuroimaging techniques.
Collapse
Affiliation(s)
- Gašper Beguš
- Department of Linguistics, University of California, Berkeley, USA.
| | - Alan Zhou
- Department of Cognitive Science, Johns Hopkins University, Baltimore, USA
| | - T Christina Zhao
- Institute for Learning and Brain Sciences, University of Washington, Seattle, USA
- Department of Speech and Hearing Sciences, University of Washington, Seattle, USA
| |
Collapse
|
49
|
Gombolay GY, Gopalan N, Bernasconi A, Nabbout R, Megerian JT, Siegel B, Hallman-Cooper J, Bhalla S, Gombolay MC. Review of Machine Learning and Artificial Intelligence (ML/AI) for the Pediatric Neurologist. Pediatr Neurol 2023; 141:42-51. [PMID: 36773406 PMCID: PMC10040433 DOI: 10.1016/j.pediatrneurol.2023.01.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 01/03/2023] [Accepted: 01/09/2023] [Indexed: 01/15/2023]
Abstract
Artificial intelligence (AI) and a popular branch of AI known as machine learning (ML) are increasingly being utilized in medicine and to inform medical research. This review provides an overview of AI and ML (AI/ML), including definitions of common terms. We discuss the history of AI and provide instances of how AI/ML can be applied to pediatric neurology. Examples include imaging in neuro-oncology, autism diagnosis, diagnosis from charts, epilepsy, cerebral palsy, and neonatal neurology. Topics such as supervised learning, unsupervised learning, and reinforcement learning are discussed.
Collapse
Affiliation(s)
- Grace Y Gombolay
- Division of Neurology, Department of Pediatrics, Emory University School of Medicine, Atlanta Georgia; Division of Pediatric Neurology, Children's Healthcare of Atlanta, Atlanta Georgia.
| | - Nakul Gopalan
- Georgia Institute of Technology, Interactive Computing, Atlanta, Georgia
| | - Andrea Bernasconi
- Neuroimaging of Epilepsy Laboratory, McConnell Brain Imaging Centre, Montreal Neurological Institute, McGill University, Montreal, UK
| | - Rima Nabbout
- Department of Pediatric Neurology, Necker Enfants Malades Hospital, Reference Centre for Rare Epilepsies and Member of the ERN EpiCARE, Imagine Institute UMR1163, Paris Descartes University, Paris, France
| | - Jonathan T Megerian
- Department of Pediatrics, CHOC Children's, Irvine School of Medicine, University of California, Orange, California
| | - Benjamin Siegel
- Division of Neurology, Department of Pediatrics, Emory University School of Medicine, Atlanta Georgia; Division of Pediatric Neurology, Children's Healthcare of Atlanta, Atlanta Georgia
| | - Jamika Hallman-Cooper
- Division of Neurology, Department of Pediatrics, Emory University School of Medicine, Atlanta Georgia; Division of Pediatric Neurology, Children's Healthcare of Atlanta, Atlanta Georgia
| | - Sonam Bhalla
- Division of Neurology, Department of Pediatrics, Emory University School of Medicine, Atlanta Georgia; Division of Pediatric Neurology, Children's Healthcare of Atlanta, Atlanta Georgia
| | - Matthew C Gombolay
- Georgia Institute of Technology, Interactive Computing, Atlanta, Georgia
| |
Collapse
|
50
|
Frisby SL, Halai AD, Cox CR, Lambon Ralph MA, Rogers TT. Decoding semantic representations in mind and brain. Trends Cogn Sci 2023; 27:258-281. [PMID: 36631371 DOI: 10.1016/j.tics.2022.12.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Revised: 12/12/2022] [Accepted: 12/13/2022] [Indexed: 01/11/2023]
Abstract
A key goal for cognitive neuroscience is to understand the neurocognitive systems that support semantic memory. Recent multivariate analyses of neuroimaging data have contributed greatly to this effort, but the rapid development of these novel approaches has made it difficult to track the diversity of findings and to understand how and why they sometimes lead to contradictory conclusions. We address this challenge by reviewing cognitive theories of semantic representation and their neural instantiation. We then consider contemporary approaches to neural decoding and assess which types of representation each can possibly detect. The analysis suggests why the results are heterogeneous and identifies crucial links between cognitive theory, data collection, and analysis that can help to better connect neuroimaging to mechanistic theories of semantic cognition.
Collapse
Affiliation(s)
- Saskia L Frisby
- Medical Research Council (MRC) Cognition and Brain Sciences Unit, Chaucer Road, Cambridge CB2 7EF, UK.
| | - Ajay D Halai
- Medical Research Council (MRC) Cognition and Brain Sciences Unit, Chaucer Road, Cambridge CB2 7EF, UK
| | - Christopher R Cox
- Department of Psychology, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Matthew A Lambon Ralph
- Medical Research Council (MRC) Cognition and Brain Sciences Unit, Chaucer Road, Cambridge CB2 7EF, UK
| | - Timothy T Rogers
- Department of Psychology, University of Wisconsin-Madison, 1202 West Johnson Street, Madison, WI 53706, USA.
| |
Collapse
|