1
|
Cheng S. Distinct mechanisms and functions of episodic memory. Philos Trans R Soc Lond B Biol Sci 2024; 379:20230411. [PMID: 39278239 DOI: 10.1098/rstb.2023.0411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 04/28/2024] [Accepted: 05/13/2024] [Indexed: 09/18/2024] Open
Abstract
The concept of episodic memory (EM) faces significant challenges by two claims: EM might not be a distinct memory system, and EM might be an epiphenomenon of a more general capacity for mental time travel (MTT). Nevertheless, the observations leading to these arguments do not preclude the existence of a mechanically and functionally distinct EM system. First, modular systems, like cognition, can have distinct subsystems that may not be distinguishable in the system's final output. EM could be such a subsystem, even though its effects may be difficult to distinguish from those of other subsystems. Second, EM could have a distinct and consistent low-level function, which is used in diverse high-level functions such as MTT. This article introduces the scenario construction framework, proposing that EM crucially rests on memory traces containing the gist of an episodic experience. During retrieval, EM traces trigger the reconstruction of semantic representations, which were active during the remembered episode, and are further enriched with semantic information, to generate a scenario of the past experience. This conceptualization of EM is consistent with studies on the neural basis of EM and resolves the two challenges while retaining the key properties associated with EM. This article is part of the theme issue 'Elements of episodic memory: lessons from 40 years of research'.
Collapse
Affiliation(s)
- Sen Cheng
- Institute for Neural Computation Faculty of Computer Science, Ruhr University Bochum , Bochum 44780, Germany
| |
Collapse
|
2
|
Chow JK, Palmeri TJ. Manipulating and measuring variation in deep neural network (DNN) representations of objects. Cognition 2024; 252:105920. [PMID: 39163818 DOI: 10.1016/j.cognition.2024.105920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 07/22/2024] [Accepted: 08/10/2024] [Indexed: 08/22/2024]
Abstract
We explore how DNNs can be used to develop a computational understanding of individual differences in high-level visual cognition given their ability to generate rich meaningful object representations informed by their architecture, experience, and training protocols. As a first step to quantifying individual differences in DNN representations, we systematically explored the robustness of a variety of representational similarity measures: Representational Similarity Analysis (RSA), Centered Kernel Alignment (CKA), and Projection-Weighted Canonical Correlation Analysis (PWCCA), with an eye to how these measures are used in cognitive science, cognitive neuroscience, and vision science. To manipulate object representations, we next created a large set of models varying in random initial weights and random training image order, training image frequencies, training category frequencies, and model size and architecture and measured the representational variation caused by each manipulation. We examined both small (All-CNN-C) and commonly-used large (VGG and ResNet) DNN architectures. To provide a comparison for the magnitude of representational differences, we established a baseline based on the representational variation caused by image-augmentation techniques used to train those DNNs. We found that variation in model randomization and model size never exceeded baseline. By contrast, differences in training image frequency and training category frequencies caused representational variation that exceeded baseline, with training category frequency manipulations exceeding baseline earlier in the networks. These findings provide insights into the magnitude of representational variations that can be expected with a range of manipulations and provide a springboard for further exploration of systematic model variations aimed at modeling individual differences in high-level visual cognition.
Collapse
Affiliation(s)
- Jason K Chow
- Department of Psychology, Vanderbilt University, USA.
| | | |
Collapse
|
3
|
Rafiei F, Shekhar M, Rahnev D. The neural network RTNet exhibits the signatures of human perceptual decision-making. Nat Hum Behav 2024; 8:1752-1770. [PMID: 38997452 DOI: 10.1038/s41562-024-01914-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 05/13/2024] [Indexed: 07/14/2024]
Abstract
Convolutional neural networks show promise as models of biological vision. However, their decision behaviour, including the facts that they are deterministic and use equal numbers of computations for easy and difficult stimuli, differs markedly from human decision-making, thus limiting their applicability as models of human perceptual behaviour. Here we develop a new neural network, RTNet, that generates stochastic decisions and human-like response time (RT) distributions. We further performed comprehensive tests that showed RTNet reproduces all foundational features of human accuracy, RT and confidence and does so better than all current alternatives. To test RTNet's ability to predict human behaviour on novel images, we collected accuracy, RT and confidence data from 60 human participants performing a digit discrimination task. We found that the accuracy, RT and confidence produced by RTNet for individual novel images correlated with the same quantities produced by human participants. Critically, human participants who were more similar to the average human performance were also found to be closer to RTNet's predictions, suggesting that RTNet successfully captured average human behaviour. Overall, RTNet is a promising model of human RTs that exhibits the critical signatures of perceptual decision-making.
Collapse
Affiliation(s)
- Farshad Rafiei
- School of Psychology, Georgia Institute of Technology, Atlanta, GA, USA.
| | - Medha Shekhar
- School of Psychology, Georgia Institute of Technology, Atlanta, GA, USA
| | - Dobromir Rahnev
- School of Psychology, Georgia Institute of Technology, Atlanta, GA, USA
| |
Collapse
|
4
|
Hou G, Li R, Tian M, Ding J, Zhang X, Yang B, Chen C, Huang R, Yin Y. Improving Efficiency: Automatic Intelligent Weighing System as a Replacement for Manual Pig Weighing. Animals (Basel) 2024; 14:1614. [PMID: 38891661 PMCID: PMC11171250 DOI: 10.3390/ani14111614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 05/27/2024] [Accepted: 05/27/2024] [Indexed: 06/21/2024] Open
Abstract
To verify the accuracy of AIWS, we weighed 106 pen growing-finishing pigs' weights using both the manual and AIWS methods, respectively. Accuracy was evaluated based on the values of MAE, MAPE, and RMSE. In the growth experiment, manual weighing was conducted every two weeks and AIWS predicted weight data was recorded daily, followed by fitting the growth curves. The results showed that MAE, MAPE, and RMSE values for 60 to 120 kg pigs were 3.48 kg, 3.71%, and 4.43 kg, respectively. The correlation coefficient r between the AIWS and manual method was 0.9410, and R2 was 0.8854. The two were extremely significant correlations (p < 0.001). In growth curve fitting, the AIWS method has lower AIC and BIC values than the manual method. The Logistic model by AIWS was the best-fit model. The age and body weight at the inflection point of the best-fit model were 164.46 d and 93.45 kg, respectively. The maximum growth rate was 831.66 g/d. In summary, AIWS can accurately predict pigs' body weights in actual production and has a better fitting effect on the growth curves of growing-finishing pigs. This study suggested that it was feasible for AIWS to replace manual weighing to measure the weight of 50 to 120 kg live pigs in large-scale farming.
Collapse
Affiliation(s)
- Gaifeng Hou
- CAS Key Laboratory of Agro-Ecological Processes in Subtropical Region, Hunan Provincial Key Laboratory of Animal Nutritional Physiology and Metabolic Process, Hunan Research Center of Livestock and Poultry Sciences, South Central Experimental Station of Animal Nutrition and Feed Science in the Ministry of Agriculture, National Engineering Laboratory for Poultry Breeding Pollution Control and Resource Technology, Institute of Subtropical Agriculture, Chinese Academy of Sciences, Changsha 410125, China; (G.H.); (R.L.); (M.T.); (J.D.)
| | - Rui Li
- CAS Key Laboratory of Agro-Ecological Processes in Subtropical Region, Hunan Provincial Key Laboratory of Animal Nutritional Physiology and Metabolic Process, Hunan Research Center of Livestock and Poultry Sciences, South Central Experimental Station of Animal Nutrition and Feed Science in the Ministry of Agriculture, National Engineering Laboratory for Poultry Breeding Pollution Control and Resource Technology, Institute of Subtropical Agriculture, Chinese Academy of Sciences, Changsha 410125, China; (G.H.); (R.L.); (M.T.); (J.D.)
| | - Mingzhou Tian
- CAS Key Laboratory of Agro-Ecological Processes in Subtropical Region, Hunan Provincial Key Laboratory of Animal Nutritional Physiology and Metabolic Process, Hunan Research Center of Livestock and Poultry Sciences, South Central Experimental Station of Animal Nutrition and Feed Science in the Ministry of Agriculture, National Engineering Laboratory for Poultry Breeding Pollution Control and Resource Technology, Institute of Subtropical Agriculture, Chinese Academy of Sciences, Changsha 410125, China; (G.H.); (R.L.); (M.T.); (J.D.)
| | - Jing Ding
- CAS Key Laboratory of Agro-Ecological Processes in Subtropical Region, Hunan Provincial Key Laboratory of Animal Nutritional Physiology and Metabolic Process, Hunan Research Center of Livestock and Poultry Sciences, South Central Experimental Station of Animal Nutrition and Feed Science in the Ministry of Agriculture, National Engineering Laboratory for Poultry Breeding Pollution Control and Resource Technology, Institute of Subtropical Agriculture, Chinese Academy of Sciences, Changsha 410125, China; (G.H.); (R.L.); (M.T.); (J.D.)
| | - Xingfu Zhang
- College of Computer Science and Technology, Heilongjiang Institute of Technology, Harbin 150050, China;
- Beijing Focused Loong Technology Co., Ltd., Beijing 100086, China
| | - Bin Yang
- Key Laboratory of Visual Perception and Artificial Intelligence of Hunan Province, College of Electrical and Information Engineering, Hunan University, Changsha 410082, China;
| | - Chunyu Chen
- College of Information and Communication, Harbin Engineering University, Harbin 150001, China;
| | - Ruilin Huang
- CAS Key Laboratory of Agro-Ecological Processes in Subtropical Region, Hunan Provincial Key Laboratory of Animal Nutritional Physiology and Metabolic Process, Hunan Research Center of Livestock and Poultry Sciences, South Central Experimental Station of Animal Nutrition and Feed Science in the Ministry of Agriculture, National Engineering Laboratory for Poultry Breeding Pollution Control and Resource Technology, Institute of Subtropical Agriculture, Chinese Academy of Sciences, Changsha 410125, China; (G.H.); (R.L.); (M.T.); (J.D.)
| | - Yulong Yin
- CAS Key Laboratory of Agro-Ecological Processes in Subtropical Region, Hunan Provincial Key Laboratory of Animal Nutritional Physiology and Metabolic Process, Hunan Research Center of Livestock and Poultry Sciences, South Central Experimental Station of Animal Nutrition and Feed Science in the Ministry of Agriculture, National Engineering Laboratory for Poultry Breeding Pollution Control and Resource Technology, Institute of Subtropical Agriculture, Chinese Academy of Sciences, Changsha 410125, China; (G.H.); (R.L.); (M.T.); (J.D.)
| |
Collapse
|
5
|
Lu Z, Wang Y, Golomb JD. Achieving more human brain-like vision via human EEG representational alignment. ARXIV 2024:arXiv:2401.17231v2. [PMID: 38351926 PMCID: PMC10862929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/19/2024]
Abstract
Despite advancements in artificial intelligence, object recognition models still lag behind in emulating visual information processing in human brains. Recent studies have highlighted the potential of using neural data to mimic brain processing; however, these often rely on invasive neural recordings from non-human subjects, leaving a critical gap in understanding human visual perception. Addressing this gap, we present, for the first time, 'Re(presentational)Al(ignment)net', a vision model aligned with human brain activity based on non-invasive EEG, demonstrating a significantly higher similarity to human brain representations. Our innovative image-to-brain multi-layer encoding framework advances human neural alignment by optimizing multiple model layers and enabling the model to efficiently learn and mimic human brain's visual representational patterns across object categories and different modalities. Our findings suggest that ReAlnet represents a breakthrough in bridging the gap between artificial and human vision, and paving the way for more brain-like artificial intelligence systems.
Collapse
Affiliation(s)
- Zitong Lu
- Department of Psychology, The Ohio State University
| | - Yile Wang
- Department of Neuroscience, The University of Texas at Dallas
| | | |
Collapse
|
6
|
Lippl S, Peters B, Kriegeskorte N. Can neural networks benefit from objectives that encourage iterative convergent computations? A case study of ResNets and object classification. PLoS One 2024; 19:e0293440. [PMID: 38512838 PMCID: PMC10956829 DOI: 10.1371/journal.pone.0293440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 03/05/2024] [Indexed: 03/23/2024] Open
Abstract
Recent work has suggested that feedforward residual neural networks (ResNets) approximate iterative recurrent computations. Iterative computations are useful in many domains, so they might provide good solutions for neural networks to learn. However, principled methods for measuring and manipulating iterative convergence in neural networks remain lacking. Here we address this gap by 1) quantifying the degree to which ResNets learn iterative solutions and 2) introducing a regularization approach that encourages the learning of iterative solutions. Iterative methods are characterized by two properties: iteration and convergence. To quantify these properties, we define three indices of iterative convergence. Consistent with previous work, we show that, even though ResNets can express iterative solutions, they do not learn them when trained conventionally on computer-vision tasks. We then introduce regularizations to encourage iterative convergent computation and test whether this provides a useful inductive bias. To make the networks more iterative, we manipulate the degree of weight sharing across layers using soft gradient coupling. This new method provides a form of recurrence regularization and can interpolate smoothly between an ordinary ResNet and a "recurrent" ResNet (i.e., one that uses identical weights across layers and thus could be physically implemented with a recurrent network computing the successive stages iteratively across time). To make the networks more convergent we impose a Lipschitz constraint on the residual functions using spectral normalization. The three indices of iterative convergence reveal that the gradient coupling and the Lipschitz constraint succeed at making the networks iterative and convergent, respectively. To showcase the practicality of our approach, we study how iterative convergence impacts generalization on standard visual recognition tasks (MNIST, CIFAR-10, CIFAR-100) or challenging recognition tasks with partial occlusions (Digitclutter). We find that iterative convergent computation, in these tasks, does not provide a useful inductive bias for ResNets. Importantly, our approach may be useful for investigating other network architectures and tasks as well and we hope that our study provides a useful starting point for investigating the broader question of whether iterative convergence can help neural networks in their generalization.
Collapse
Affiliation(s)
- Samuel Lippl
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, United States of America
- Department of Neuroscience, Columbia University, New York, NY, United States of America
- Center for Theoretical Neuroscience, Columbia University, New York, NY, United States of America
| | - Benjamin Peters
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, United States of America
- School of Psychology and Neuroscience, University of Glasgow, Glasgow, United Kingdom
| | - Nikolaus Kriegeskorte
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, United States of America
- Department of Neuroscience, Columbia University, New York, NY, United States of America
- Department of Psychology, Columbia University, New York, NY, United States of America
- Affiliated member, Electrical Engineering, Columbia University, New York, NY, United States of America
| |
Collapse
|
7
|
Lu Z, Ku Y. Bridging the gap between EEG and DCNNs reveals a fatigue mechanism of facial repetition suppression. iScience 2023; 26:108501. [PMID: 38089588 PMCID: PMC10711494 DOI: 10.1016/j.isci.2023.108501] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 10/26/2023] [Accepted: 11/17/2023] [Indexed: 08/05/2024] Open
Abstract
Facial repetition suppression, a well-studied phenomenon characterized by decreased neural responses to repeated faces in visual cortices, remains a subject of ongoing debate regarding its underlying neural mechanisms. Our research harnesses advanced multivariate analysis techniques and the prowess of deep convolutional neural networks (DCNNs) in face recognition to bridge the gap between human electroencephalogram (EEG) data and DCNNs, especially in the context of facial repetition suppression. Our innovative reverse engineering approach, manipulating the neuronal activity in DCNNs and conducted representational comparisons between brain activations derived from human EEG and manipulated DCNN activations, provided insights into the underlying facial repetition suppression. Significantly, our findings advocate the fatigue mechanism as the dominant force behind the facial repetition suppression effect. Broadly, this integrative framework, bridging the human brain and DCNNs, offers a promising tool for simulating brain activity and making inferences regarding the neural mechanisms underpinning complex human behaviors.
Collapse
Affiliation(s)
- Zitong Lu
- Department of Psychology, The Ohio State University, Columbus, OH, USA
| | - Yixuan Ku
- Guangdong Provincial Key Laboratory of Brain Function and Disease, Center for Brain and Mental Well-being, Department of Psychology, Sun Yat-sen University, Guangzhou, China
- Peng Cheng Laboratory, Shenzhen, China
| |
Collapse
|
8
|
Khan S, Wong A, Tripp B. Modeling the Role of Contour Integration in Visual Inference. Neural Comput 2023; 36:33-74. [PMID: 38052088 DOI: 10.1162/neco_a_01625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 09/08/2023] [Indexed: 12/07/2023]
Abstract
Under difficult viewing conditions, the brain's visual system uses a variety of recurrent modulatory mechanisms to augment feedforward processing. One resulting phenomenon is contour integration, which occurs in the primary visual (V1) cortex and strengthens neural responses to edges if they belong to a larger smooth contour. Computational models have contributed to an understanding of the circuit mechanisms of contour integration, but less is known about its role in visual perception. To address this gap, we embedded a biologically grounded model of contour integration in a task-driven artificial neural network and trained it using a gradient-descent variant. We used this model to explore how brain-like contour integration may be optimized for high-level visual objectives as well as its potential roles in perception. When the model was trained to detect contours in a background of random edges, a task commonly used to examine contour integration in the brain, it closely mirrored the brain in terms of behavior, neural responses, and lateral connection patterns. When trained on natural images, the model enhanced weaker contours and distinguished whether two points lay on the same versus different contours. The model learned robust features that generalized well to out-of-training-distribution stimuli. Surprisingly, and in contrast with the synthetic task, a parameter-matched control network without recurrence performed the same as or better than the model on the natural-image tasks. Thus, a contour integration mechanism is not essential to perform these more naturalistic contour-related tasks. Finally, the best performance in all tasks was achieved by a modified contour integration model that did not distinguish between excitatory and inhibitory neurons.
Collapse
Affiliation(s)
- Salman Khan
- Centre for Theoretical Neuroscience, Department of System Design Engineering
- Vision and Image Processing Group, Department of System Design Engineering
- Waterloo Artificial Intelligence Institute: University of Waterloo, Waterloo, ON, Canada, N2L 3G1
| | - Alexander Wong
- Vision and Image Processing Group, Department of System Design Engineering
- Waterloo Artificial Intelligence Institute: University of Waterloo, Waterloo, ON, Canada, N2L 3G1
| | - Bryan Tripp
- Centre for Theoretical Neuroscience, Department of System Design Engineering
- Vision and Image Processing Group, Department of System Design Engineering
- Waterloo Artificial Intelligence Institute: University of Waterloo, Waterloo, ON, Canada, N2L 3G1
| |
Collapse
|
9
|
Golan T, Taylor J, Schütt H, Peters B, Sommers RP, Seeliger K, Doerig A, Linton P, Konkle T, van Gerven M, Kording K, Richards B, Kietzmann TC, Lindsay GW, Kriegeskorte N. Deep neural networks are not a single hypothesis but a language for expressing computational hypotheses. Behav Brain Sci 2023; 46:e392. [PMID: 38054329 DOI: 10.1017/s0140525x23001553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
An ideal vision model accounts for behavior and neurophysiology in both naturalistic conditions and designed lab experiments. Unlike psychological theories, artificial neural networks (ANNs) actually perform visual tasks and generate testable predictions for arbitrary inputs. These advantages enable ANNs to engage the entire spectrum of the evidence. Failures of particular models drive progress in a vibrant ANN research program of human vision.
Collapse
Affiliation(s)
- Tal Golan
- Department of Cognitive and Brain Sciences, Ben-Gurion University of the Negev, Be'er Sheva, Israel
| | - JohnMark Taylor
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA ://linton.vision/
| | - Heiko Schütt
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA ://linton.vision/
- Center for Neural Science, New York University, New York, NY, USA
| | - Benjamin Peters
- School of Psychology & Neuroscience, University of Glasgow, Glasgow, UK
| | - Rowan P Sommers
- Department of Neurobiology of Language, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| | - Katja Seeliger
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Adrien Doerig
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany
| | - Paul Linton
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA ://linton.vision/
- Presidential Scholars in Society and Neuroscience, Center for Science and Society, Columbia University, New York, NY, USA
- Italian Academy for Advanced Studies in America, Columbia University, New York, NY, USA
| | - Talia Konkle
- Department of Psychology and Center for Brain Sciences, Harvard University, Cambridge, MA, USA ://konklab.fas.harvard.edu/
| | - Marcel van Gerven
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlandsartcogsys.com
| | - Konrad Kording
- Departments of Bioengineering and Neuroscience, University of Pennsylvania, Philadelphia, PA, USA
- Learning in Machines and Brains Program, CIFAR, Toronto, ON, Canada
| | - Blake Richards
- Learning in Machines and Brains Program, CIFAR, Toronto, ON, Canada
- Mila, Montreal, QC, Canada
- School of Computer Science, McGill University, Montreal, QC, Canada
- Department of Neurology & Neurosurgery, McGill University, Montreal, QC, Canada
- Montreal Neurological Institute, Montreal, QC, Canada
| | - Tim C Kietzmann
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany
| | - Grace W Lindsay
- Department of Psychology and Center for Data Science, New York University, New York, NY, USA
| | - Nikolaus Kriegeskorte
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA ://linton.vision/
- Departments of Psychology, Neuroscience, and Electrical Engineering, Columbia University, New York, NY, USA
| |
Collapse
|
10
|
von Seth J, Nicholls VI, Tyler LK, Clarke A. Recurrent connectivity supports higher-level visual and semantic object representations in the brain. Commun Biol 2023; 6:1207. [PMID: 38012301 PMCID: PMC10682037 DOI: 10.1038/s42003-023-05565-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 11/09/2023] [Indexed: 11/29/2023] Open
Abstract
Visual object recognition has been traditionally conceptualised as a predominantly feedforward process through the ventral visual pathway. While feedforward artificial neural networks (ANNs) can achieve human-level classification on some image-labelling tasks, it's unclear whether computational models of vision alone can accurately capture the evolving spatiotemporal neural dynamics. Here, we probe these dynamics using a combination of representational similarity and connectivity analyses of fMRI and MEG data recorded during the recognition of familiar, unambiguous objects. Modelling the visual and semantic properties of our stimuli using an artificial neural network as well as a semantic feature model, we find that unique aspects of the neural architecture and connectivity dynamics relate to visual and semantic object properties. Critically, we show that recurrent processing between the anterior and posterior ventral temporal cortex relates to higher-level visual properties prior to semantic object properties, in addition to semantic-related feedback from the frontal lobe to the ventral temporal lobe between 250 and 500 ms after stimulus onset. These results demonstrate the distinct contributions made by semantic object properties in explaining neural activity and connectivity, highlighting it as a core part of object recognition not fully accounted for by current biologically inspired neural networks.
Collapse
Affiliation(s)
- Jacqueline von Seth
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | | | - Lorraine K Tyler
- Department of Psychology, University of Cambridge, Cambridge, UK
- Cambridge Centre for Ageing and Neuroscience (Cam-CAN), University of Cambridge and MRC Cognition and Brain Sciences Unit, Cambridge, UK
| | - Alex Clarke
- Department of Psychology, University of Cambridge, Cambridge, UK.
| |
Collapse
|
11
|
Velarde OM, Makse HA, Parra LC. Architecture of the brain's visual system enhances network stability and performance through layers, delays, and feedback. PLoS Comput Biol 2023; 19:e1011078. [PMID: 37948463 PMCID: PMC10664920 DOI: 10.1371/journal.pcbi.1011078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 11/22/2023] [Accepted: 10/19/2023] [Indexed: 11/12/2023] Open
Abstract
In the visual system of primates, image information propagates across successive cortical areas, and there is also local feedback within an area and long-range feedback across areas. Recent findings suggest that the resulting temporal dynamics of neural activity are crucial in several vision tasks. In contrast, artificial neural network models of vision are typically feedforward and do not capitalize on the benefits of temporal dynamics, partly due to concerns about stability and computational costs. In this study, we focus on recurrent networks with feedback connections for visual tasks with static input corresponding to a single fixation. We demonstrate mathematically that a network's dynamics can be stabilized by four key features of biological networks: layer-ordered structure, temporal delays between layers, longer distance feedback across layers, and nonlinear neuronal responses. Conversely, when feedback has a fixed distance, one can omit delays in feedforward connections to achieve more efficient artificial implementations. We also evaluated the effect of feedback connections on object detection and classification performance using standard benchmarks, specifically the COCO and CIFAR10 datasets. Our findings indicate that feedback connections improved the detection of small objects, and classification performance became more robust to noise. We found that performance increased with the temporal dynamics, not unlike what is observed in core vision of primates. These results suggest that delays and layered organization are crucial features for stability and performance in both biological and artificial recurrent neural networks.
Collapse
Affiliation(s)
- Osvaldo Matias Velarde
- Biomedical Engineering Department, The City College of New York, New York, New York, United States of America
| | - Hernán A. Makse
- Levich Institute and Physics Department, The City College of New York, New York, New York, United States of America
| | - Lucas C. Parra
- Biomedical Engineering Department, The City College of New York, New York, New York, United States of America
| |
Collapse
|
12
|
Toosi T, Issa EB. Brain-like Flexible Visual Inference by Harnessing Feedback-Feedforward Alignment. ARXIV 2023:arXiv:2310.20599v1. [PMID: 37961740 PMCID: PMC10635293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
In natural vision, feedback connections support versatile visual inference capabilities such as making sense of the occluded or noisy bottom-up sensory information or mediating pure top-down processes such as imagination. However, the mechanisms by which the feedback pathway learns to give rise to these capabilities flexibly are not clear. We propose that top-down effects emerge through alignment between feedforward and feedback pathways, each optimizing its own objectives. To achieve this co-optimization, we introduce Feedback-Feedforward Alignment (FFA), a learning algorithm that leverages feedback and feedforward pathways as mutual credit assignment computational graphs, enabling alignment. In our study, we demonstrate the effectiveness of FFA in co-optimizing classification and reconstruction tasks on widely used MNIST and CIFAR10 datasets. Notably, the alignment mechanism in FFA endows feedback connections with emergent visual inference functions, including denoising, resolving occlusions, hallucination, and imagination. Moreover, FFA offers bio-plausibility compared to traditional back-propagation (BP) methods in implementation. By repurposing the computational graph of credit assignment into a goal-driven feedback pathway, FFA alleviates weight transport problems encountered in BP, enhancing the bio-plausibility of the learning algorithm. Our study presents FFA as a promising proof-of-concept for the mechanisms underlying how feedback connections in the visual cortex support flexible visual functions. This work also contributes to the broader field of visual inference underlying perceptual phenomena and has implications for developing more biologically inspired learning algorithms.
Collapse
Affiliation(s)
- Tahereh Toosi
- Center for Theoretical Neuroscience, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY
| | - Elias B. Issa
- Department of Neuroscience, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY
| |
Collapse
|
13
|
Berezutskaya J, Freudenburg ZV, Vansteensel MJ, Aarnoutse EJ, Ramsey NF, van Gerven MAJ. Direct speech reconstruction from sensorimotor brain activity with optimized deep learning models. J Neural Eng 2023; 20:056010. [PMID: 37467739 PMCID: PMC10510111 DOI: 10.1088/1741-2552/ace8be] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Revised: 07/12/2023] [Accepted: 07/19/2023] [Indexed: 07/21/2023]
Abstract
Objective.Development of brain-computer interface (BCI) technology is key for enabling communication in individuals who have lost the faculty of speech due to severe motor paralysis. A BCI control strategy that is gaining attention employs speech decoding from neural data. Recent studies have shown that a combination of direct neural recordings and advanced computational models can provide promising results. Understanding which decoding strategies deliver best and directly applicable results is crucial for advancing the field.Approach.In this paper, we optimized and validated a decoding approach based on speech reconstruction directly from high-density electrocorticography recordings from sensorimotor cortex during a speech production task.Main results.We show that (1) dedicated machine learning optimization of reconstruction models is key for achieving the best reconstruction performance; (2) individual word decoding in reconstructed speech achieves 92%-100% accuracy (chance level is 8%); (3) direct reconstruction from sensorimotor brain activity produces intelligible speech.Significance.These results underline the need for model optimization in achieving best speech decoding results and highlight the potential that reconstruction-based speech decoding from sensorimotor cortex can offer for development of next-generation BCI technology for communication.
Collapse
Affiliation(s)
- Julia Berezutskaya
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht 3584 CX, The Netherlands
- Donders Center for Brain, Cognition and Behaviour, Nijmegen 6525 GD, The Netherlands
| | - Zachary V Freudenburg
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht 3584 CX, The Netherlands
| | - Mariska J Vansteensel
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht 3584 CX, The Netherlands
| | - Erik J Aarnoutse
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht 3584 CX, The Netherlands
| | - Nick F Ramsey
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht 3584 CX, The Netherlands
| | - Marcel A J van Gerven
- Donders Center for Brain, Cognition and Behaviour, Nijmegen 6525 GD, The Netherlands
| |
Collapse
|
14
|
Pan X, DeForge A, Schwartz O. Generalizing biological surround suppression based on center surround similarity via deep neural network models. PLoS Comput Biol 2023; 19:e1011486. [PMID: 37738258 PMCID: PMC10550176 DOI: 10.1371/journal.pcbi.1011486] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 10/04/2023] [Accepted: 09/04/2023] [Indexed: 09/24/2023] Open
Abstract
Sensory perception is dramatically influenced by the context. Models of contextual neural surround effects in vision have mostly accounted for Primary Visual Cortex (V1) data, via nonlinear computations such as divisive normalization. However, surround effects are not well understood within a hierarchy, for neurons with more complex stimulus selectivity beyond V1. We utilized feedforward deep convolutional neural networks and developed a gradient-based technique to visualize the most suppressive and excitatory surround. We found that deep neural networks exhibited a key signature of surround effects in V1, highlighting center stimuli that visually stand out from the surround and suppressing responses when the surround stimulus is similar to the center. We found that in some neurons, especially in late layers, when the center stimulus was altered, the most suppressive surround surprisingly can follow the change. Through the visualization approach, we generalized previous understanding of surround effects to more complex stimuli, in ways that have not been revealed in visual cortices. In contrast, the suppression based on center surround similarity was not observed in an untrained network. We identified further successes and mismatches of the feedforward CNNs to the biology. Our results provide a testable hypothesis of surround effects in higher visual cortices, and the visualization approach could be adopted in future biological experimental designs.
Collapse
Affiliation(s)
- Xu Pan
- Department of Computer Science, University of Miami, Coral Gables, FL, United States of America
| | - Annie DeForge
- School of Information, University of California, Berkeley, CA, United States of America
- Bentley University, Waltham, MA, United States of America
| | - Odelia Schwartz
- Department of Computer Science, University of Miami, Coral Gables, FL, United States of America
| |
Collapse
|
15
|
Baek S, Park Y, Paik SB. Species-specific wiring of cortical circuits for small-world networks in the primary visual cortex. PLoS Comput Biol 2023; 19:e1011343. [PMID: 37540638 PMCID: PMC10403141 DOI: 10.1371/journal.pcbi.1011343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 07/10/2023] [Indexed: 08/06/2023] Open
Abstract
Long-range horizontal connections (LRCs) are conspicuous anatomical structures in the primary visual cortex (V1) of mammals, yet their detailed functions in relation to visual processing are not fully understood. Here, we show that LRCs are key components to organize a "small-world network" optimized for each size of the visual cortex, enabling the cost-efficient integration of visual information. Using computational simulations of a biologically inspired model neural network, we found that sparse LRCs added to networks, combined with dense local connections, compose a small-world network and significantly enhance image classification performance. We confirmed that the performance of the network appeared to be strongly correlated with the small-world coefficient of the model network under various conditions. Our theoretical model demonstrates that the amount of LRCs to build a small-world network depends on each size of cortex and that LRCs are beneficial only when the size of the network exceeds a certain threshold. Our model simulation of various sizes of cortices validates this prediction and provides an explanation of the species-specific existence of LRCs in animal data. Our results provide insight into a biological strategy of the brain to balance functional performance and resource cost.
Collapse
Affiliation(s)
- Seungdae Baek
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - Youngjin Park
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - Se-Bum Paik
- Department of Brain and Cognitive Sciences, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| |
Collapse
|
16
|
Doerig A, Sommers RP, Seeliger K, Richards B, Ismael J, Lindsay GW, Kording KP, Konkle T, van Gerven MAJ, Kriegeskorte N, Kietzmann TC. The neuroconnectionist research programme. Nat Rev Neurosci 2023:10.1038/s41583-023-00705-w. [PMID: 37253949 DOI: 10.1038/s41583-023-00705-w] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/21/2023] [Indexed: 06/01/2023]
Abstract
Artificial neural networks (ANNs) inspired by biology are beginning to be widely used to model behavioural and neural data, an approach we call 'neuroconnectionism'. ANNs have been not only lauded as the current best models of information processing in the brain but also criticized for failing to account for basic cognitive functions. In this Perspective article, we propose that arguing about the successes and failures of a restricted set of current ANNs is the wrong approach to assess the promise of neuroconnectionism for brain science. Instead, we take inspiration from the philosophy of science, and in particular from Lakatos, who showed that the core of a scientific research programme is often not directly falsifiable but should be assessed by its capacity to generate novel insights. Following this view, we present neuroconnectionism as a general research programme centred around ANNs as a computational language for expressing falsifiable theories about brain computation. We describe the core of the programme, the underlying computational framework and its tools for testing specific neuroscientific hypotheses and deriving novel understanding. Taking a longitudinal view, we review past and present neuroconnectionist projects and their responses to challenges and argue that the research programme is highly progressive, generating new and otherwise unreachable insights into the workings of the brain.
Collapse
Affiliation(s)
- Adrien Doerig
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany.
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands.
| | - Rowan P Sommers
- Department of Neurobiology of Language, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| | - Katja Seeliger
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Blake Richards
- Department of Neurology and Neurosurgery, McGill University, Montréal, QC, Canada
- School of Computer Science, McGill University, Montréal, QC, Canada
- Mila, Montréal, QC, Canada
- Montréal Neurological Institute, Montréal, QC, Canada
- Learning in Machines and Brains Program, CIFAR, Toronto, ON, Canada
| | | | | | - Konrad P Kording
- Learning in Machines and Brains Program, CIFAR, Toronto, ON, Canada
- Bioengineering, Neuroscience, University of Pennsylvania, Pennsylvania, PA, USA
| | | | | | | | - Tim C Kietzmann
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany
| |
Collapse
|
17
|
Jozwik KM, Kietzmann TC, Cichy RM, Kriegeskorte N, Mur M. Deep Neural Networks and Visuo-Semantic Models Explain Complementary Components of Human Ventral-Stream Representational Dynamics. J Neurosci 2023; 43:1731-1741. [PMID: 36759190 PMCID: PMC10010451 DOI: 10.1523/jneurosci.1424-22.2022] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 11/08/2022] [Accepted: 12/20/2022] [Indexed: 02/11/2023] Open
Abstract
Deep neural networks (DNNs) are promising models of the cortical computations supporting human object recognition. However, despite their ability to explain a significant portion of variance in neural data, the agreement between models and brain representational dynamics is far from perfect. We address this issue by asking which representational features are currently unaccounted for in neural time series data, estimated for multiple areas of the ventral stream via source-reconstructed magnetoencephalography data acquired in human participants (nine females, six males) during object viewing. We focus on the ability of visuo-semantic models, consisting of human-generated labels of object features and categories, to explain variance beyond the explanatory power of DNNs alone. We report a gradual reversal in the relative importance of DNN versus visuo-semantic features as ventral-stream object representations unfold over space and time. Although lower-level visual areas are better explained by DNN features starting early in time (at 66 ms after stimulus onset), higher-level cortical dynamics are best accounted for by visuo-semantic features starting later in time (at 146 ms after stimulus onset). Among the visuo-semantic features, object parts and basic categories drive the advantage over DNNs. These results show that a significant component of the variance unexplained by DNNs in higher-level cortical dynamics is structured and can be explained by readily nameable aspects of the objects. We conclude that current DNNs fail to fully capture dynamic representations in higher-level human visual cortex and suggest a path toward more accurate models of ventral-stream computations.SIGNIFICANCE STATEMENT When we view objects such as faces and cars in our visual environment, their neural representations dynamically unfold over time at a millisecond scale. These dynamics reflect the cortical computations that support fast and robust object recognition. DNNs have emerged as a promising framework for modeling these computations but cannot yet fully account for the neural dynamics. Using magnetoencephalography data acquired in human observers during object viewing, we show that readily nameable aspects of objects, such as 'eye', 'wheel', and 'face', can account for variance in the neural dynamics over and above DNNs. These findings suggest that DNNs and humans may in part rely on different object features for visual recognition and provide guidelines for model improvement.
Collapse
Affiliation(s)
- Kamila M Jozwik
- Department of Psychology, University of Cambridge, Cambridge CB2 3EB, United Kingdom
| | - Tim C Kietzmann
- Institute of Cognitive Science, University of Osnabrück, 49069 Osnabrück, Germany
| | - Radoslaw M Cichy
- Department of Education and Psychology, Freie Universität Berlin, 14195 Berlin, Germany
| | - Nikolaus Kriegeskorte
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York 10027
| | - Marieke Mur
- Department of Psychology, Western University, London, Ontario N6A 3K7, Canada
- Department of Computer Science, Western University, London, Ontario N6A 3K7, Canada
| |
Collapse
|
18
|
Brain-inspired multisensory integration neural network for cross-modal recognition through spatiotemporal dynamics and deep learning. Cogn Neurodyn 2023. [DOI: 10.1007/s11571-023-09932-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
|
19
|
Thivierge JP, Giraud É, Lynn M. Toward a Brain-Inspired Theory of Artificial Learning. Cognit Comput 2023. [DOI: 10.1007/s12559-023-10121-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
|
20
|
Momennejad I. A rubric for human-like agents and NeuroAI. Philos Trans R Soc Lond B Biol Sci 2023; 378:20210446. [PMID: 36511409 PMCID: PMC9745874 DOI: 10.1098/rstb.2021.0446] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Accepted: 10/27/2022] [Indexed: 12/15/2022] Open
Abstract
Researchers across cognitive, neuro- and computer sciences increasingly reference 'human-like' artificial intelligence and 'neuroAI'. However, the scope and use of the terms are often inconsistent. Contributed research ranges widely from mimicking behaviour, to testing machine learning methods as neurally plausible hypotheses at the cellular or functional levels, or solving engineering problems. However, it cannot be assumed nor expected that progress on one of these three goals will automatically translate to progress in others. Here, a simple rubric is proposed to clarify the scope of individual contributions, grounded in their commitments to human-like behaviour, neural plausibility or benchmark/engineering/computer science goals. This is clarified using examples of weak and strong neuroAI and human-like agents, and discussing the generative, corroborate and corrective ways in which the three dimensions interact with one another. The author maintains that future progress in artificial intelligence will need strong interactions across the disciplines, with iterative feedback loops and meticulous validity tests-leading to both known and yet-unknown advances that may span decades to come. This article is part of a discussion meeting issue 'New approaches to 3D vision'.
Collapse
Affiliation(s)
- Ida Momennejad
- Microsoft Research NYC, Reinforcement Learning Station, 300 Lafayette, New York, NY 10012, USA
| |
Collapse
|
21
|
Mokari-Mahallati M, Ebrahimpour R, Bagheri N, Karimi-Rouzbahani H. Deeper neural network models better reflect how humans cope with contrast variation in object recognition. Neurosci Res 2023:S0168-0102(23)00007-X. [PMID: 36681154 DOI: 10.1016/j.neures.2023.01.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 11/27/2022] [Accepted: 01/17/2023] [Indexed: 01/20/2023]
Abstract
Visual inputs are far from ideal in everyday situations such as in the fog where the contrasts of input stimuli are low. However, human perception remains relatively robust to contrast variations. To provide insights about the underlying mechanisms of contrast invariance, we addressed two questions. Do contrast effects disappear along the visual hierarchy? Do later stages of the visual hierarchy contribute to contrast invariance? We ran a behavioral experiment where we manipulated the level of stimulus contrast and the involvement of higher-level visual areas through immediate and delayed backward masking of the stimulus. Backward masking led to significant drop in performance in our visual categorization task, supporting the role of higher-level visual areas in contrast invariance. To obtain mechanistic insights, we ran the same categorization task on three state-of the-art computational models of human vision each with a different depth in visual hierarchy. We found contrast effects all along the visual hierarchy, no matter how far into the hierarchy. Moreover, that final layers of deeper hierarchical models, which had been shown to be best models of final stages of the visual system, coped with contrast effects more effectively. These results suggest that, while contrast effects reach the final stages of the hierarchy, those stages play a significant role in compensating for contrast variations in the visual system.
Collapse
Affiliation(s)
- Masoumeh Mokari-Mahallati
- Department of Electrical Engineering, Shahid Rajaee Teacher Training University, Tehran, Islamic Republic of Iran
| | - Reza Ebrahimpour
- Center for Cognitive Science, Institute for Convergence Science and Technology (ICST), Sharif University of Technology, Tehran P.O.Box:11155-1639, Islamic Republic of Iran; Department of Computer Engineering, Shahid Rajaee Teacher Training University, Tehran, Islamic Republic of Iran; School of Cognitive Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Islamic Republic of Iran.
| | - Nasour Bagheri
- Department of Electrical Engineering, Shahid Rajaee Teacher Training University, Tehran, Islamic Republic of Iran
| | - Hamid Karimi-Rouzbahani
- MRC Cognition & Brain Sciences Unit, University of Cambridge, UK; Mater Research Institute, Faculty of Medicine, University of Queensland, Australia
| |
Collapse
|
22
|
Ali A, Ahmad N, de Groot E, Johannes van Gerven MA, Kietzmann TC. Predictive coding is a consequence of energy efficiency in recurrent neural networks. PATTERNS (NEW YORK, N.Y.) 2022; 3:100639. [PMID: 36569556 PMCID: PMC9768680 DOI: 10.1016/j.patter.2022.100639] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 12/24/2021] [Accepted: 10/27/2022] [Indexed: 11/24/2022]
Abstract
Predictive coding is a promising framework for understanding brain function. It postulates that the brain continuously inhibits predictable sensory input, ensuring preferential processing of surprising elements. A central aspect of this view is its hierarchical connectivity, involving recurrent message passing between excitatory bottom-up signals and inhibitory top-down feedback. Here we use computational modeling to demonstrate that such architectural hardwiring is not necessary. Rather, predictive coding is shown to emerge as a consequence of energy efficiency. When training recurrent neural networks to minimize their energy consumption while operating in predictive environments, the networks self-organize into prediction and error units with appropriate inhibitory and excitatory interconnections and learn to inhibit predictable sensory input. Moving beyond the view of purely top-down-driven predictions, we demonstrate, via virtual lesioning experiments, that networks perform predictions on two timescales: fast lateral predictions among sensory units and slower prediction cycles that integrate evidence over time.
Collapse
Affiliation(s)
- Abdullahi Ali
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands,Corresponding author
| | - Nasir Ahmad
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands
| | - Elgar de Groot
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands,Department of Experimental Psychology, Utrecht University, Utrecht, the Netherlands
| | | | - Tim Christian Kietzmann
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany,Corresponding author
| |
Collapse
|
23
|
Cohen Y, Engel TA, Langdon C, Lindsay GW, Ott T, Peters MAK, Shine JM, Breton-Provencher V, Ramaswamy S. Recent Advances at the Interface of Neuroscience and Artificial Neural Networks. J Neurosci 2022; 42:8514-8523. [PMID: 36351830 PMCID: PMC9665920 DOI: 10.1523/jneurosci.1503-22.2022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 09/30/2022] [Accepted: 10/03/2022] [Indexed: 11/17/2022] Open
Abstract
Biological neural networks adapt and learn in diverse behavioral contexts. Artificial neural networks (ANNs) have exploited biological properties to solve complex problems. However, despite their effectiveness for specific tasks, ANNs are yet to realize the flexibility and adaptability of biological cognition. This review highlights recent advances in computational and experimental research to advance our understanding of biological and artificial intelligence. In particular, we discuss critical mechanisms from the cellular, systems, and cognitive neuroscience fields that have contributed to refining the architecture and training algorithms of ANNs. Additionally, we discuss how recent work used ANNs to understand complex neuronal correlates of cognition and to process high throughput behavioral data.
Collapse
Affiliation(s)
- Yarden Cohen
- Department of Brain Sciences, Weizmann Institute of Science, Rehovot, 76100, Israel
| | - Tatiana A Engel
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, NY 11724
| | | | - Grace W Lindsay
- Department of Psychology, Center for Data Science, New York University, New York, NY 10003
| | - Torben Ott
- Bernstein Center for Computational Neuroscience Berlin, Institute of Biology, Humboldt University of Berlin, 10117, Berlin, Germany
| | - Megan A K Peters
- Department of Cognitive Sciences, University of California-Irvine, Irvine, CA 92697
| | - James M Shine
- Brain and Mind Centre, University of Sydney, Sydney, NSW 2006, Australia
| | | | - Srikanth Ramaswamy
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, NE2 4HH, United Kingdom
| |
Collapse
|
24
|
Ingrosso A, Goldt S. Data-driven emergence of convolutional structure in neural networks. Proc Natl Acad Sci U S A 2022; 119:e2201854119. [PMID: 36161906 PMCID: PMC9546588 DOI: 10.1073/pnas.2201854119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 08/12/2022] [Indexed: 11/18/2022] Open
Abstract
Exploiting data invariances is crucial for efficient learning in both artificial and biological neural circuits. Understanding how neural networks can discover appropriate representations capable of harnessing the underlying symmetries of their inputs is thus crucial in machine learning and neuroscience. Convolutional neural networks, for example, were designed to exploit translation symmetry, and their capabilities triggered the first wave of deep learning successes. However, learning convolutions directly from translation-invariant data with a fully connected network has so far proven elusive. Here we show how initially fully connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs, resulting in localized, space-tiling receptive fields. These receptive fields match the filters of a convolutional network trained on the same task. By carefully designing data models for the visual scene, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs, which has long been recognized as the hallmark of natural images. We provide an analytical and numerical characterization of the pattern formation mechanism responsible for this phenomenon in a simple model and find an unexpected link between receptive field formation and tensor decomposition of higher-order input correlations. These results provide a perspective on the development of low-level feature detectors in various sensory modalities and pave the way for studying the impact of higher-order statistics on learning in neural networks.
Collapse
Affiliation(s)
- Alessandro Ingrosso
- Quantitative Life Sciences, The Abdus Salam International Centre for Theoretical Physics, 34151 Trieste, Italy
| | - Sebastian Goldt
- Department of Physics, International School of Advanced Studies, 34136 Trieste, Italy
| |
Collapse
|
25
|
Baker N, Elder JH. Deep learning models fail to capture the configural nature of human shape perception. iScience 2022; 25:104913. [PMID: 36060067 PMCID: PMC9429800 DOI: 10.1016/j.isci.2022.104913] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 05/06/2022] [Accepted: 08/08/2022] [Indexed: 11/26/2022] Open
|
26
|
Matsumoto N, Eldridge MAG, Fredericks JM, Lowe KA, Richmond BJ. Comparing performance between a deep neural network and monkeys with bilateral removals of visual area TE in categorizing feature-ambiguous stimuli. J Comput Neurosci 2022; 51:381-387. [PMID: 37195295 DOI: 10.1007/s10827-023-00854-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2021] [Revised: 03/24/2023] [Accepted: 05/03/2023] [Indexed: 05/18/2023]
Abstract
In the canonical view of visual processing the neural representation of complex objects emerges as visual information is integrated through a set of convergent, hierarchically organized processing stages, ending in the primate inferior temporal lobe. It seems reasonable to infer that visual perceptual categorization requires the integrity of anterior inferior temporal cortex (area TE). Many deep neural networks (DNNs) are structured to simulate the canonical view of hierarchical processing within the visual system. However, there are some discrepancies between DNNs and the primate brain. Here we evaluated the performance of a simulated hierarchical model of vision in discriminating the same categorization problems presented to monkeys with TE removals. The model was able to simulate the performance of monkeys with TE removals in the categorization task but performed poorly when challenged with visually degraded stimuli. We conclude that further development of the model is required to match the level of visual flexibility present in the monkey visual system.
Collapse
Affiliation(s)
- Narihisa Matsumoto
- Human Informatics and Interaction Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Ibaraki, Japan.
| | - Mark A G Eldridge
- National Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA
| | - J Megan Fredericks
- National Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA
| | - Kaleb A Lowe
- National Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA
| | - Barry J Richmond
- National Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
27
|
Nayebi A, Sagastuy-Brena J, Bear DM, Kar K, Kubilius J, Ganguli S, Sussillo D, DiCarlo JJ, Yamins DLK. Recurrent Connections in the Primate Ventral Visual Stream Mediate a Trade-Off Between Task Performance and Network Size During Core Object Recognition. Neural Comput 2022; 34:1652-1675. [PMID: 35798321 PMCID: PMC10870835 DOI: 10.1162/neco_a_01506] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 02/17/2022] [Indexed: 11/04/2022]
Abstract
The computational role of the abundant feedback connections in the ventral visual stream is unclear, enabling humans and nonhuman primates to effortlessly recognize objects across a multitude of viewing conditions. Prior studies have augmented feedforward convolutional neural networks (CNNs) with recurrent connections to study their role in visual processing; however, often these recurrent networks are optimized directly on neural data or the comparative metrics used are undefined for standard feedforward networks that lack these connections. In this work, we develop task-optimized convolutional recurrent (ConvRNN) network models that more correctly mimic the timing and gross neuroanatomy of the ventral pathway. Properly chosen intermediate-depth ConvRNN circuit architectures, which incorporate mechanisms of feedforward bypassing and recurrent gating, can achieve high performance on a core recognition task, comparable to that of much deeper feedforward networks. We then develop methods that allow us to compare both CNNs and ConvRNNs to finely grained measurements of primate categorization behavior and neural response trajectories across thousands of stimuli. We find that high-performing ConvRNNs provide a better match to these data than feedforward networks of any depth, predicting the precise timings at which each stimulus is behaviorally decoded from neural activation patterns. Moreover, these ConvRNN circuits consistently produce quantitatively accurate predictions of neural dynamics from V4 and IT across the entire stimulus presentation. In fact, we find that the highest-performing ConvRNNs, which best match neural and behavioral data, also achieve a strong Pareto trade-off between task performance and overall network size. Taken together, our results suggest the functional purpose of recurrence in the ventral pathway is to fit a high-performing network in cortex, attaining computational power through temporal rather than spatial complexity.
Collapse
Affiliation(s)
- Aran Nayebi
- Stanford University, Stanford, CA 94305, U.S.A.
| | | | | | | | - Jonas Kubilius
- MIT, Cambridge, MA 02139, U.S.A
- KU Leuven, Leuven 3000, Belgium
| | | | | | | | | |
Collapse
|
28
|
Wang J, Hu X. Convolutional Neural Networks With Gated Recurrent Connections. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:3421-3435. [PMID: 33497326 DOI: 10.1109/tpami.2021.3054614] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The convolutional neural network (CNN) has become a basic model for solving many computer vision problems. In recent years, a new class of CNNs, recurrent convolution neural network (RCNN), inspired by abundant recurrent connections in the visual systems of animals, was proposed. The critical element of RCNN is the recurrent convolutional layer (RCL), which incorporates recurrent connections between neurons in the standard convolutional layer. With increasing number of recurrent computations, the receptive fields (RFs) of neurons in RCL expand unboundedly, which is inconsistent with biological facts. We propose to modulate the RFs of neurons by introducing gates to the recurrent connections. The gates control the amount of context information inputting to the neurons and the neurons' RFs therefore become adaptive. The resulting layer is called gated recurrent convolution layer (GRCL). Multiple GRCLs constitute a deep model called gated RCNN (GRCNN). The GRCNN was evaluated on several computer vision tasks including object recognition, scene text recognition and object detection, and obtained much better results than the RCNN. In addition, when combined with other adaptive RF techniques, the GRCNN demonstrated competitive performance to the state-of-the-art models on benchmark datasets for these tasks.
Collapse
|
29
|
Nicholson DA, Prinz AA. Could simplified stimuli change how the brain performs visual search tasks? A deep neural network study. J Vis 2022; 22:3. [PMID: 35675057 PMCID: PMC9187944 DOI: 10.1167/jov.22.7.3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Accepted: 05/04/2022] [Indexed: 11/24/2022] Open
Abstract
Visual search is a complex behavior influenced by many factors. To control for these factors, many studies use highly simplified stimuli. However, the statistics of these stimuli are very different from the statistics of the natural images that the human visual system is optimized by evolution and experience to perceive. Could this difference change search behavior? If so, simplified stimuli may contribute to effects typically attributed to cognitive processes, such as selective attention. Here we use deep neural networks to test how optimizing models for the statistics of one distribution of images constrains performance on a task using images from a different distribution. We train four deep neural network architectures on one of three source datasets-natural images, faces, and x-ray images-and then adapt them to a visual search task using simplified stimuli. This adaptation produces models that exhibit performance limitations similar to humans, whereas models trained on the search task alone exhibit no such limitations. However, we also find that deep neural networks trained to classify natural images exhibit similar limitations when adapted to a search task that uses a different set of natural images. Therefore, the distribution of data alone cannot explain this effect. We discuss how future work might integrate an optimization-based approach into existing models of visual search behavior.
Collapse
Affiliation(s)
- David A Nicholson
- Emory University, Department of Biology, O. Wayne Rollins Research Center, Atlanta, Georgia
| | - Astrid A Prinz
- Emory University, Department of Biology, O. Wayne Rollins Research Center, Atlanta, Georgia
| |
Collapse
|
30
|
Naumann LB, Keijser J, Sprekeler H. Invariant neural subspaces maintained by feedback modulation. eLife 2022; 11:e76096. [PMID: 35442191 PMCID: PMC9106332 DOI: 10.7554/elife.76096] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 04/06/2022] [Indexed: 11/13/2022] Open
Abstract
Sensory systems reliably process incoming stimuli in spite of changes in context. Most recent models accredit this context invariance to an extraction of increasingly complex sensory features in hierarchical feedforward networks. Here, we study how context-invariant representations can be established by feedback rather than feedforward processing. We show that feedforward neural networks modulated by feedback can dynamically generate invariant sensory representations. The required feedback can be implemented as a slow and spatially diffuse gain modulation. The invariance is not present on the level of individual neurons, but emerges only on the population level. Mechanistically, the feedback modulation dynamically reorients the manifold of neural activity and thereby maintains an invariant neural subspace in spite of contextual variations. Our results highlight the importance of population-level analyses for understanding the role of feedback in flexible sensory processing.
Collapse
Affiliation(s)
- Laura B Naumann
- Modelling of Cognitive Processes, Technical University of BerlinBerlinGermany
- Bernstein Center for Computational NeuroscienceBerlinGermany
| | - Joram Keijser
- Modelling of Cognitive Processes, Technical University of BerlinBerlinGermany
| | - Henning Sprekeler
- Modelling of Cognitive Processes, Technical University of BerlinBerlinGermany
- Bernstein Center for Computational NeuroscienceBerlinGermany
| |
Collapse
|
31
|
The spatiotemporal neural dynamics of object location representations in the human brain. Nat Hum Behav 2022; 6:796-811. [PMID: 35210593 PMCID: PMC9225954 DOI: 10.1038/s41562-022-01302-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 01/14/2022] [Indexed: 12/30/2022]
Abstract
To interact with objects in complex environments, we must know what they are and where they are in spite of challenging viewing conditions. Here, we investigated where, how and when representations of object location and category emerge in the human brain when objects appear on cluttered natural scene images using a combination of functional magnetic resonance imaging, electroencephalography and computational models. We found location representations to emerge along the ventral visual stream towards lateral occipital complex, mirrored by gradual emergence in deep neural networks. Time-resolved analysis suggested that computing object location representations involves recurrent processing in high-level visual cortex. Object category representations also emerged gradually along the ventral visual stream, with evidence for recurrent computations. These results resolve the spatiotemporal dynamics of the ventral visual stream that give rise to representations of where and what objects are present in a scene under challenging viewing conditions.
Collapse
|
32
|
Matsumoto N, Taguchi Y, Shimizu M, Katakami S, Okada M, Sugase-Miyamoto Y. Recurrent Connections Might Be Important for Hierarchical Categorization. Front Syst Neurosci 2022; 16:805990. [PMID: 35283736 PMCID: PMC8911877 DOI: 10.3389/fnsys.2022.805990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Accepted: 01/12/2022] [Indexed: 11/13/2022] Open
Abstract
Visual short-term memory is an important ability of primates and is thought to be stored in area TE. We previously reported that the initial transient responses of neurons in area TE represented information about a global category of faces, e.g., monkey faces vs. human faces vs. simple shapes, and the latter part of the responses represented information about fine categories, e.g., facial expression. The neuronal mechanisms of hierarchical categorization in area TE remain unknown. For this study, we constructed a combined model that consisted of a deep neural network (DNN) and a recurrent neural network and investigated whether this model can replicate the time course of hierarchical categorization. The visual images were stored in the recurrent connections of the model. When the visual images with noise were input to the model, the model outputted the time course of the hierarchical categorization. This result indicates that recurrent connections in the model are important not only for visual short-term memory but for hierarchical categorization, suggesting that recurrent connections in area TE are important for hierarchical categorization.
Collapse
Affiliation(s)
- Narihisa Matsumoto
- Human Informatics and Interaction Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Japan
| | - Yusuke Taguchi
- Human Informatics and Interaction Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Japan
- Graduate School of Science and Technology, University of Tsukuba, Tsukuba, Japan
| | - Masaumi Shimizu
- Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Japan
| | - Shun Katakami
- Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Japan
| | - Masato Okada
- Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Japan
| | - Yasuko Sugase-Miyamoto
- Human Informatics and Interaction Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Japan
| |
Collapse
|
33
|
Mei N, Santana R, Soto D. Informative neural representations of unseen contents during higher-order processing in human brains and deep artificial networks. Nat Hum Behav 2022; 6:720-731. [PMID: 35115676 DOI: 10.1038/s41562-021-01274-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 12/08/2021] [Indexed: 11/09/2022]
Abstract
A framework to pinpoint the scope of unconscious processing is critical to improve models of visual consciousness. Previous research observed brain signatures of unconscious processing in visual cortex, but these were not reliably identified. Further, whether unconscious contents are represented in high-level stages of the ventral visual stream and linked parieto-frontal areas remains unknown. Using a within-subject, high-precision functional magnetic resonance imaging approach, we show that unconscious contents can be decoded from multi-voxel patterns that are highly distributed alongside the ventral visual pathway and also involving parieto-frontal substrates. Classifiers trained with multi-voxel patterns of conscious items generalized to predict the unconscious counterparts, indicating that their neural representations overlap. These findings suggest revisions to models of consciousness such as the neuronal global workspace. We then provide a computational simulation of visual processing/representation without perceptual sensitivity by using deep neural networks performing a similar visual task. The work provides a framework for pinpointing the representation of unconscious knowledge across different task domains.
Collapse
Affiliation(s)
- Ning Mei
- Basque Center on Cognition, Brain and Language, San Sebastian, Spain.
| | - Roberto Santana
- Computer Science and Artificial Intelligence Department, University of Basque Country, San Sebastian, Spain
| | - David Soto
- Basque Center on Cognition, Brain and Language, San Sebastian, Spain. .,Ikerbasque, Basque Foundation for Science, Bilbao, Spain.
| |
Collapse
|
34
|
Singer JJD, Seeliger K, Kietzmann TC, Hebart MN. From photos to sketches - how humans and deep neural networks process objects across different levels of visual abstraction. J Vis 2022; 22:4. [PMID: 35129578 PMCID: PMC8822363 DOI: 10.1167/jov.22.2.4] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Line drawings convey meaning with just a few strokes. Despite strong simplifications, humans can recognize objects depicted in such abstracted images without effort. To what degree do deep convolutional neural networks (CNNs) mirror this human ability to generalize to abstracted object images? While CNNs trained on natural images have been shown to exhibit poor classification performance on drawings, other work has demonstrated highly similar latent representations in the networks for abstracted and natural images. Here, we address these seemingly conflicting findings by analyzing the activation patterns of a CNN trained on natural images across a set of photographs, drawings, and sketches of the same objects and comparing them to human behavior. We find a highly similar representational structure across levels of visual abstraction in early and intermediate layers of the network. This similarity, however, does not translate to later stages in the network, resulting in low classification performance for drawings and sketches. We identified that texture bias in CNNs contributes to the dissimilar representational structure in late layers and the poor performance on drawings. Finally, by fine-tuning late network layers with object drawings, we show that performance can be largely restored, demonstrating the general utility of features learned on natural images in early and intermediate layers for the recognition of drawings. In conclusion, generalization to abstracted images, such as drawings, seems to be an emergent property of CNNs trained on natural images, which is, however, suppressed by domain-related biases that arise during later processing stages in the network.
Collapse
Affiliation(s)
- Johannes J D Singer
- Vision and Computational Cognition Group, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany.,Department of Psychology, Ludwig Maximilian University, Munich, Germany.,
| | - Katja Seeliger
- Vision and Computational Cognition Group, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany.,
| | - Tim C Kietzmann
- Donders Institute for Brain, Cognition and Behavior, Nijmegen, The Netherlands.,
| | - Martin N Hebart
- Vision and Computational Cognition Group, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany.,
| |
Collapse
|
35
|
Blauch NM, Behrmann M, Plaut DC. A connectivity-constrained computational account of topographic organization in primate high-level visual cortex. Proc Natl Acad Sci U S A 2022; 119:2112566119. [PMID: 35027449 DOI: 10.1101/2021.05.29.446297v2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/30/2021] [Indexed: 05/25/2023] Open
Abstract
Inferotemporal (IT) cortex in humans and other primates is topographically organized, containing multiple hierarchically organized areas selective for particular domains, such as faces and scenes. This organization is commonly viewed in terms of evolved domain-specific visual mechanisms. Here, we develop an alternative, domain-general and developmental account of IT cortical organization. The account is instantiated in interactive topographic networks (ITNs), a class of computational models in which a hierarchy of model IT areas, subject to biologically plausible connectivity-based constraints, learns high-level visual representations optimized for multiple domains. We find that minimizing a wiring cost on spatially organized feedforward and lateral connections, alongside realistic constraints on the sign of neuronal connectivity within model IT, results in a hierarchical, topographic organization. This organization replicates a number of key properties of primate IT cortex, including the presence of domain-selective spatial clusters preferentially involved in the representation of faces, objects, and scenes; columnar responses across separate excitatory and inhibitory units; and generic spatial organization whereby the response correlation of pairs of units falls off with their distance. We thus argue that topographic domain selectivity is an emergent property of a visual system optimized to maximize behavioral performance under generic connectivity-based constraints.
Collapse
Affiliation(s)
- Nicholas M Blauch
- Program in Neural Computation, Carnegie Mellon University, Pittsburgh, PA 15213;
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213
| | - Marlene Behrmann
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213;
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA 15213
| | - David C Plaut
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA 15213
| |
Collapse
|
36
|
A connectivity-constrained computational account of topographic organization in primate high-level visual cortex. Proc Natl Acad Sci U S A 2022; 119:2112566119. [PMID: 35027449 PMCID: PMC8784138 DOI: 10.1073/pnas.2112566119] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/30/2021] [Indexed: 12/20/2022] Open
Abstract
Inferotemporal (IT) cortex in humans and other primates is topographically organized, containing multiple hierarchically organized areas selective for particular domains, such as faces and scenes. This organization is commonly viewed in terms of evolved domain-specific visual mechanisms. Here, we develop an alternative, domain-general and developmental account of IT cortical organization. The account is instantiated in interactive topographic networks (ITNs), a class of computational models in which a hierarchy of model IT areas, subject to biologically plausible connectivity-based constraints, learns high-level visual representations optimized for multiple domains. We find that minimizing a wiring cost on spatially organized feedforward and lateral connections, alongside realistic constraints on the sign of neuronal connectivity within model IT, results in a hierarchical, topographic organization. This organization replicates a number of key properties of primate IT cortex, including the presence of domain-selective spatial clusters preferentially involved in the representation of faces, objects, and scenes; columnar responses across separate excitatory and inhibitory units; and generic spatial organization whereby the response correlation of pairs of units falls off with their distance. We thus argue that topographic domain selectivity is an emergent property of a visual system optimized to maximize behavioral performance under generic connectivity-based constraints.
Collapse
|
37
|
Bertoni F, Montobbio N, Sarti A, Citti G. Emergence of Lie Symmetries in Functional Architectures Learned by CNNs. Front Comput Neurosci 2021; 15:694505. [PMID: 34880740 PMCID: PMC8645966 DOI: 10.3389/fncom.2021.694505] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 10/21/2021] [Indexed: 11/30/2022] Open
Abstract
In this paper we study the spontaneous development of symmetries in the early layers of a Convolutional Neural Network (CNN) during learning on natural images. Our architecture is built in such a way to mimic some properties of the early stages of biological visual systems. In particular, it contains a pre-filtering step ℓ0 defined in analogy with the Lateral Geniculate Nucleus (LGN). Moreover, the first convolutional layer is equipped with lateral connections defined as a propagation driven by a learned connectivity kernel, in analogy with the horizontal connectivity of the primary visual cortex (V1). We first show that the ℓ0 filter evolves during the training to reach a radially symmetric pattern well approximated by a Laplacian of Gaussian (LoG), which is a well-known model of the receptive profiles of LGN cells. In line with previous works on CNNs, the learned convolutional filters in the first layer can be approximated by Gabor functions, in agreement with well-established models for the receptive profiles of V1 simple cells. Here, we focus on the geometric properties of the learned lateral connectivity kernel of this layer, showing the emergence of orientation selectivity w.r.t. the tuning of the learned filters. We also examine the short-range connectivity and association fields induced by this connectivity kernel, and show qualitative and quantitative comparisons with known group-based models of V1 horizontal connections. These geometric properties arise spontaneously during the training of the CNN architecture, analogously to the emergence of symmetries in visual systems thanks to brain plasticity driven by external stimuli.
Collapse
Affiliation(s)
- Federico Bertoni
- Sorbonne Université, Paris, France.,Dipartimento di Matematica, Università di Bologna, Bologna, Italy.,Centre d'analyses et de Mathematiques Sociales, CNRS, EHESS, Paris, France
| | - Noemi Montobbio
- Neural Computation Laboratory, Center for Human Technologies, Istituto Italiano di Tecnologia, Genova, Italy
| | - Alessandro Sarti
- Centre d'analyses et de Mathematiques Sociales, CNRS, EHESS, Paris, France
| | - Giovanna Citti
- Dipartimento di Matematica, Università di Bologna, Bologna, Italy.,Centre d'analyses et de Mathematiques Sociales, CNRS, EHESS, Paris, France
| |
Collapse
|
38
|
Thompson JAF. Forms of explanation and understanding for neuroscience and artificial intelligence. J Neurophysiol 2021; 126:1860-1874. [PMID: 34644128 DOI: 10.1152/jn.00195.2021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Much of the controversy evoked by the use of deep neural networks as models of biological neural systems amount to debates over what constitutes scientific progress in neuroscience. To discuss what constitutes scientific progress, one must have a goal in mind (progress toward what?). One such long-term goal is to produce scientific explanations of intelligent capacities (e.g., object recognition, relational reasoning). I argue that the most pressing philosophical questions at the intersection of neuroscience and artificial intelligence are ultimately concerned with defining the phenomena to be explained and with what constitute valid explanations of such phenomena. I propose that a foundation in the philosophy of scientific explanation and understanding can scaffold future discussions about how an integrated science of intelligence might progress. Toward this vision, I review relevant theories of scientific explanation and discuss strategies for unifying the scientific goals of neuroscience and AI.
Collapse
Affiliation(s)
- Jessica A F Thompson
- Human Information Processing Lab, Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
39
|
Ernst MR, Burwick T, Triesch J. Recurrent processing improves occluded object recognition and gives rise to perceptual hysteresis. J Vis 2021; 21:6. [PMID: 34905052 PMCID: PMC8684313 DOI: 10.1167/jov.21.13.6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Over the past decades, object recognition has been predominantly studied and modelled as a feedforward process. This notion was supported by the fast response times in psychophysical and neurophysiological experiments and the recent success of deep feedforward neural networks for object recognition. Recently, however, this prevalent view has shifted and recurrent connectivity in the brain is now believed to contribute significantly to object recognition — especially under challenging conditions, including the recognition of partially occluded objects. Moreover, recurrent dynamics might be the key to understanding perceptual phenomena such as perceptual hysteresis. In this work we investigate if and how artificial neural networks can benefit from recurrent connections. We systematically compare architectures comprised of bottom-up, lateral, and top-down connections. To evaluate the impact of recurrent connections for occluded object recognition, we introduce three stereoscopic occluded object datasets, which span the range from classifying partially occluded hand-written digits to recognizing three-dimensional objects. We find that recurrent architectures perform significantly better than parameter-matched feedforward models. An analysis of the hidden representation of the models suggests that occluders are progressively discounted in later time steps of processing. We demonstrate that feedback can correct the initial misclassifications over time and that the recurrent dynamics lead to perceptual hysteresis. Overall, our results emphasize the importance of recurrent feedback for object recognition in difficult situations.
Collapse
Affiliation(s)
- Markus R Ernst
- Frankfurt Institute for Advanced Studies, Frankfurt am Main, Germany.,Goethe-Universität Frankfurt, Frankfurt am Main, Germany.,
| | - Thomas Burwick
- Frankfurt Institute for Advanced Studies, Frankfurt am Main, Germany.,Goethe-Universität Frankfurt, Frankfurt am Main, Germany.,
| | - Jochen Triesch
- Frankfurt Institute for Advanced Studies, Frankfurt am Main, Germany.,Goethe-Universität Frankfurt, Frankfurt am Main, Germany., https://www.fias.science/en/fellows/detail/triesch-jochen/
| |
Collapse
|
40
|
Abstract
The Wilson-Cowan equations were developed to provide a simplified yet powerful description of neural network dynamics. As such, they embraced nonlinear dynamics, but in an interpretable form. Most importantly, it was the first mathematical formulation to emphasize the significance of interactions between excitatory and inhibitory neural populations, thereby incorporating both cooperation and competition. Subsequent research by many has documented the Wilson-Cowan significance in such diverse fields as visual hallucinations, memory, binocular rivalry, and epilepsy. The fact that these equations are still being used to elucidate a wide range of phenomena attests to their validity as a dynamical approximation to more detailed descriptions of complex neural computations.
Collapse
Affiliation(s)
- Hugh R Wilson
- Centre for Vision Research, York University, Toronto, Canada.
| | - Jack D Cowan
- Department of Mathematics, University of Chicago, Chicago, USA
| |
Collapse
|
41
|
Jang H, Tong F. Convolutional neural networks trained with a developmental sequence of blurry to clear images reveal core differences between face and object processing. J Vis 2021; 21:6. [PMID: 34767621 PMCID: PMC8590164 DOI: 10.1167/jov.21.12.6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Although convolutional neural networks (CNNs) provide a promising model for understanding human vision, most CNNs lack robustness to challenging viewing conditions, such as image blur, whereas human vision is much more reliable. Might robustness to blur be attributable to vision during infancy, given that acuity is initially poor but improves considerably over the first several months of life? Here, we evaluated the potential consequences of such early experiences by training CNN models on face and object recognition tasks while gradually reducing the amount of blur applied to the training images. For CNNs trained on blurry to clear faces, we observed sustained robustness to blur, consistent with a recent report by Vogelsang and colleagues (2018). By contrast, CNNs trained with blurry to clear objects failed to retain robustness to blur. Further analyses revealed that the spatial frequency tuning of the two CNNs was profoundly different. The blurry to clear face-trained network successfully retained a preference for low spatial frequencies, whereas the blurry to clear object-trained CNN exhibited a progressive shift toward higher spatial frequencies. Our findings provide novel computational evidence showing how face recognition, unlike object recognition, allows for more holistic processing. Moreover, our results suggest that blurry vision during infancy is insufficient to account for the robustness of adult vision to blurry objects.
Collapse
Affiliation(s)
- Hojin Jang
- Department of Psychology and Vanderbilt Vision Research Center, Vanderbilt University, Nashville, TN, USA.,
| | - Frank Tong
- Department of Psychology and Vanderbilt Vision Research Center, Vanderbilt University, Nashville, TN, USA.,
| |
Collapse
|
42
|
Prokott KE, Tamura H, Fleming RW. Gloss perception: Searching for a deep neural network that behaves like humans. J Vis 2021; 21:14. [PMID: 34817568 PMCID: PMC8626854 DOI: 10.1167/jov.21.12.14] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Accepted: 08/14/2021] [Indexed: 11/24/2022] Open
Abstract
The visual computations underlying human gloss perception remain poorly understood, and to date there is no image-computable model that reproduces human gloss judgments independent of shape and viewing conditions. Such a model could provide a powerful platform for testing hypotheses about the detailed workings of surface perception. Here, we made use of recent developments in artificial neural networks to test how well we could recreate human responses in a high-gloss versus low-gloss discrimination task. We rendered >70,000 scenes depicting familiar objects made of either mirror-like or near-matte textured materials. We trained numerous classifiers to distinguish the two materials in our images-ranging from linear classifiers using simple pixel statistics to convolutional neural networks (CNNs) with up to 12 layers-and compared their classifications with human judgments. To determine which classifiers made the same kinds of errors as humans, we painstakingly identified a set of 60 images in which human judgments are consistently decoupled from ground truth. We then conducted a Bayesian hyperparameter search to identify which out of several thousand CNNs most resembled humans. We found that, although architecture has only a relatively weak effect, high correlations with humans are somewhat more typical in networks of shallower to intermediate depths (three to five layers). We also trained deep convolutional generative adversarial networks (DCGANs) of different depths to recreate images based on our high- and low-gloss database. Responses from human observers show that two layers in a DCGAN can recreate gloss recognizably for human observers. Together, our results indicate that human gloss classification can best be explained by computations resembling early to mid-level vision.
Collapse
Affiliation(s)
- Konrad Eugen Prokott
- Department of Experimental Psychology, Justus-Liebig-University Giessen, Giessen, Germany
| | - Hideki Tamura
- Department of Computer Science and Engineering, Toyohashi University of Technology, Toyohashi, Aichi, Japan
- Japan Society for Promotion of Sciences, Chiyoda, Tokyo, Japan
| | - Roland W Fleming
- Department of Experimental Psychology, Justus-Liebig-University Giessen, Giessen, Germany
- Center for Mind, Brain and Behavior (CMBB), University of Marburg and Justus Liebig University Giessen, Germany
| |
Collapse
|
43
|
Zheng Y, Jia S, Yu Z, Liu JK, Huang T. Unraveling neural coding of dynamic natural visual scenes via convolutional recurrent neural networks. PATTERNS (NEW YORK, N.Y.) 2021; 2:100350. [PMID: 34693375 PMCID: PMC8515013 DOI: 10.1016/j.patter.2021.100350] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 06/22/2021] [Accepted: 08/23/2021] [Indexed: 11/18/2022]
Abstract
Traditional models of retinal system identification analyze the neural response to artificial stimuli using models consisting of predefined components. The model design is limited to prior knowledge, and the artificial stimuli are too simple to be compared with stimuli processed by the retina. To fill in this gap with an explainable model that reveals how a population of neurons work together to encode the larger field of natural scenes, here we used a deep-learning model for identifying the computational elements of the retinal circuit that contribute to learning the dynamics of natural scenes. Experimental results verify that the recurrent connection plays a key role in encoding complex dynamic visual scenes while learning biological computational underpinnings of the retinal circuit. In addition, the proposed models reveal both the shapes and the locations of the spatiotemporal receptive fields of ganglion cells.
Collapse
Affiliation(s)
- Yajing Zheng
- Department of Computer Science and Technology, National Engineering Laboratory for Video Technology, Peking University, Beijing 100871, China
| | - Shanshan Jia
- Department of Computer Science and Technology, National Engineering Laboratory for Video Technology, Peking University, Beijing 100871, China
| | - Zhaofei Yu
- Department of Computer Science and Technology, National Engineering Laboratory for Video Technology, Peking University, Beijing 100871, China
- Institute for Artificial Intelligence, Peking University, Beijing 100871, China
| | - Jian K. Liu
- School of Computing, University of Leeds, Leeds LS2 9JT, UK
| | - Tiejun Huang
- Department of Computer Science and Technology, National Engineering Laboratory for Video Technology, Peking University, Beijing 100871, China
- Institute for Artificial Intelligence, Peking University, Beijing 100871, China
| |
Collapse
|
44
|
Nonaka S, Majima K, Aoki SC, Kamitani Y. Brain hierarchy score: Which deep neural networks are hierarchically brain-like? iScience 2021; 24:103013. [PMID: 34522856 PMCID: PMC8426272 DOI: 10.1016/j.isci.2021.103013] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 12/31/2020] [Accepted: 08/18/2021] [Indexed: 11/16/2022] Open
Abstract
Achievement of human-level image recognition by deep neural networks (DNNs) has spurred interest in whether and how DNNs are brain-like. Both DNNs and the visual cortex perform hierarchical processing, and correspondence has been shown between hierarchical visual areas and DNN layers in representing visual features. Here, we propose the brain hierarchy (BH) score as a metric to quantify the degree of hierarchical correspondence based on neural decoding and encoding analyses where DNN unit activations and human brain activity are predicted from each other. We find that BH scores for 29 pre-trained DNNs with various architectures are negatively correlated with image recognition performance, thus indicating that recently developed high-performance DNNs are not necessarily brain-like. Experimental manipulations of DNN models suggest that single-path sequential feedforward architecture with broad spatial integration is critical to brain-like hierarchy. Our method may provide new ways to design DNNs in light of their representational homology to the brain. A measure for brain-like hierarchy is proposed to characterize DNNs Encoding/decoding with human fMRI quantifies the hierarchical correspondence Among representative DNN models, high-performance models are not brain-like Critical factors for brain-like hierarchy are explored
Collapse
Affiliation(s)
- Soma Nonaka
- Faculty of Integrated Human Studies, Kyoto University, Kyoto 606-8501, Japan
| | - Kei Majima
- Graduate School of Informatics, Kyoto University, Kyoto 606-8501, Japan
| | - Shuntaro C Aoki
- Graduate School of Informatics, Kyoto University, Kyoto 606-8501, Japan
| | - Yukiyasu Kamitani
- Graduate School of Informatics, Kyoto University, Kyoto 606-8501, Japan.,ATR Computational Neuroscience Laboratories, Seika, Kyoto 619-0288, Japan
| |
Collapse
|
45
|
Pang Z, O'May CB, Choksi B, VanRullen R. Predictive coding feedback results in perceived illusory contours in a recurrent neural network. Neural Netw 2021; 144:164-175. [PMID: 34500255 DOI: 10.1016/j.neunet.2021.08.024] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 08/16/2021] [Accepted: 08/17/2021] [Indexed: 10/20/2022]
Abstract
Modern feedforward convolutional neural networks (CNNs) can now solve some computer vision tasks at super-human levels. However, these networks only roughly mimic human visual perception. One difference from human vision is that they do not appear to perceive illusory contours (e.g. Kanizsa squares) in the same way humans do. Physiological evidence from visual cortex suggests that the perception of illusory contours could involve feedback connections. Would recurrent feedback neural networks perceive illusory contours like humans? In this work we equip a deep feedforward convolutional network with brain-inspired recurrent dynamics. The network was first pretrained with an unsupervised reconstruction objective on a natural image dataset, to expose it to natural object contour statistics. Then, a classification decision head was added and the model was finetuned on a form discrimination task: squares vs. randomly oriented inducer shapes (no illusory contour). Finally, the model was tested with the unfamiliar "illusory contour" configuration: inducer shapes oriented to form an illusory square. Compared with feedforward baselines, the iterative "predictive coding" feedback resulted in more illusory contours being classified as physical squares. The perception of the illusory contour was measurable in the luminance profile of the image reconstructions produced by the model, demonstrating that the model really "sees" the illusion. Ablation studies revealed that natural image pretraining and feedback error correction are both critical to the perception of the illusion. Finally we validated our conclusions in a deeper network (VGG): adding the same predictive coding feedback dynamics again leads to the perception of illusory contours.
Collapse
Affiliation(s)
| | | | | | - Rufin VanRullen
- CerCO, CNRS UMR5549, Toulouse, France; ANITI, Toulouse, France.
| |
Collapse
|
46
|
Storrs KR, Kietzmann TC, Walther A, Mehrer J, Kriegeskorte N. Diverse Deep Neural Networks All Predict Human Inferior Temporal Cortex Well, After Training and Fitting. J Cogn Neurosci 2021; 33:2044-2064. [PMID: 34272948 DOI: 10.1101/2020.05.07.082743] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Deep neural networks (DNNs) trained on object recognition provide the best current models of high-level visual cortex. What remains unclear is how strongly experimental choices, such as network architecture, training, and fitting to brain data, contribute to the observed similarities. Here, we compare a diverse set of nine DNN architectures on their ability to explain the representational geometry of 62 object images in human inferior temporal cortex (hIT), as measured with fMRI. We compare untrained networks to their task-trained counterparts and assess the effect of cross-validated fitting to hIT, by taking a weighted combination of the principal components of features within each layer and, subsequently, a weighted combination of layers. For each combination of training and fitting, we test all models for their correlation with the hIT representational dissimilarity matrix, using independent images and subjects. Trained models outperform untrained models (accounting for 57% more of the explainable variance), suggesting that structured visual features are important for explaining hIT. Model fitting further improves the alignment of DNN and hIT representations (by 124%), suggesting that the relative prevalence of different features in hIT does not readily emerge from the Imagenet object-recognition task used to train the networks. The same models can also explain the disparate representations in primary visual cortex (V1), where stronger weights are given to earlier layers. In each region, all architectures achieved equivalently high performance once trained and fitted. The models' shared properties-deep feedforward hierarchies of spatially restricted nonlinear filters-seem more important than their differences, when modeling human visual representations.
Collapse
Affiliation(s)
- Katherine R Storrs
- Justus Liebig University Giessen, Germany
- Centre for Mind, Brain and Behaviour (CMBB), Research Campus Central Hessen
| | - Tim C Kietzmann
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands
- MRC Cognition and Brain Sciences Unit, Cambridge, United Kingdom
| | | | - Johannes Mehrer
- MRC Cognition and Brain Sciences Unit, Cambridge, United Kingdom
| | | |
Collapse
|
47
|
Lindsay GW. Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future. J Cogn Neurosci 2021; 33:2017-2031. [DOI: 10.1162/jocn_a_01544] [Citation(s) in RCA: 96] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Abstract
Convolutional neural networks (CNNs) were inspired by early findings in the study of biological vision. They have since become successful tools in computer vision and state-of-the-art models of both neural activity and behavior on visual tasks. This review highlights what, in the context of CNNs, it means to be a good model in computational neuroscience and the various ways models can provide insight. Specifically, it covers the origins of CNNs and the methods by which we validate them as models of biological vision. It then goes on to elaborate on what we can learn about biological vision by understanding and experimenting on CNNs and discusses emerging opportunities for the use of CNNs in vision research beyond basic object recognition.
Collapse
|
48
|
Peters B, Kriegeskorte N. Capturing the objects of vision with neural networks. Nat Hum Behav 2021; 5:1127-1144. [PMID: 34545237 DOI: 10.1038/s41562-021-01194-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2019] [Accepted: 08/06/2021] [Indexed: 01/31/2023]
Abstract
Human visual perception carves a scene at its physical joints, decomposing the world into objects, which are selectively attended, tracked and predicted as we engage our surroundings. Object representations emancipate perception from the sensory input, enabling us to keep in mind that which is out of sight and to use perceptual content as a basis for action and symbolic cognition. Human behavioural studies have documented how object representations emerge through grouping, amodal completion, proto-objects and object files. By contrast, deep neural network models of visual object recognition remain largely tethered to sensory input, despite achieving human-level performance at labelling objects. Here, we review related work in both fields and examine how these fields can help each other. The cognitive literature provides a starting point for the development of new experimental tasks that reveal mechanisms of human object perception and serve as benchmarks driving the development of deep neural network models that will put the object into object recognition.
Collapse
Affiliation(s)
- Benjamin Peters
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA.
| | - Nikolaus Kriegeskorte
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA. .,Department of Psychology, Columbia University, New York, NY, USA. .,Department of Neuroscience, Columbia University, New York, NY, USA. .,Department of Electrical Engineering, Columbia University, New York, NY, USA.
| |
Collapse
|
49
|
Svanera M, Morgan AT, Petro LS, Muckli L. A self-supervised deep neural network for image completion resembles early visual cortex fMRI activity patterns for occluded scenes. J Vis 2021; 21:5. [PMID: 34259828 PMCID: PMC8288063 DOI: 10.1167/jov.21.7.5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Accepted: 05/14/2021] [Indexed: 11/24/2022] Open
Abstract
The promise of artificial intelligence in understanding biological vision relies on the comparison of computational models with brain data with the goal of capturing functional principles of visual information processing. Convolutional neural networks (CNN) have successfully matched the transformations in hierarchical processing occurring along the brain's feedforward visual pathway, extending into ventral temporal cortex. However, we are still to learn if CNNs can successfully describe feedback processes in early visual cortex. Here, we investigated similarities between human early visual cortex and a CNN with encoder/decoder architecture, trained with self-supervised learning to fill occlusions and reconstruct an unseen image. Using representational similarity analysis (RSA), we compared 3T functional magnetic resonance imaging (fMRI) data from a nonstimulated patch of early visual cortex in human participants viewing partially occluded images, with the different CNN layer activations from the same images. Results show that our self-supervised image-completion network outperforms a classical object-recognition supervised network (VGG16) in terms of similarity to fMRI data. This work provides additional evidence that optimal models of the visual system might come from less feedforward architectures trained with less supervision. We also find that CNN decoder pathway activations are more similar to brain processing compared to encoder activations, suggesting an integration of mid- and low/middle-level features in early visual cortex. Challenging an artificial intelligence model to learn natural image representations via self-supervised learning and comparing them with brain data can help us to constrain our understanding of information processing, such as neuronal predictive coding.
Collapse
Affiliation(s)
- Michele Svanera
- Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of Glasgow, UK
| | - Andrew T Morgan
- Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of Glasgow, UK
| | - Lucy S Petro
- Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of Glasgow, UK
| | - Lars Muckli
- Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of Glasgow, UK
| |
Collapse
|
50
|
De Cesarei A, Cavicchi S, Cristadoro G, Lippi M. Do Humans and Deep Convolutional Neural Networks Use Visual Information Similarly for the Categorization of Natural Scenes? Cogn Sci 2021; 45:e13009. [PMID: 34170027 PMCID: PMC8365760 DOI: 10.1111/cogs.13009] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Revised: 05/19/2021] [Accepted: 05/31/2021] [Indexed: 11/28/2022]
Abstract
The investigation of visual categorization has recently been aided by the introduction of deep convolutional neural networks (CNNs), which achieve unprecedented accuracy in picture classification after extensive training. Even if the architecture of CNNs is inspired by the organization of the visual brain, the similarity between CNN and human visual processing remains unclear. Here, we investigated this issue by engaging humans and CNNs in a two-class visual categorization task. To this end, pictures containing animals or vehicles were modified to contain only low/high spatial frequency (HSF) information, or were scrambled in the phase of the spatial frequency spectrum. For all types of degradation, accuracy increased as degradation was reduced for both humans and CNNs; however, the thresholds for accurate categorization varied between humans and CNNs. More remarkable differences were observed for HSF information compared to the other two types of degradation, both in terms of overall accuracy and image-level agreement between humans and CNNs. The difficulty with which the CNNs were shown to categorize high-passed natural scenes was reduced by picture whitening, a procedure which is inspired by how visual systems process natural images. The results are discussed concerning the adaptation to regularities in the visual environment (scene statistics); if the visual characteristics of the environment are not learned by CNNs, their visual categorization may depend only on a subset of the visual information on which humans rely, for example, on low spatial frequency information.
Collapse
Affiliation(s)
| | | | | | - Marco Lippi
- Department of Sciences and Methods for EngineeringUniversity of Modena and Reggio Emilia
| |
Collapse
|