51
|
Manos T, Diaz-Pier S, Fortel I, Driscoll I, Zhan L, Leow A. Enhanced simulations of whole-brain dynamics using hybrid resting-state structural connectomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.16.528836. [PMID: 36824821 PMCID: PMC9948985 DOI: 10.1101/2023.02.16.528836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
Abstract
The human brain, composed of billions of neurons and synaptic connections, is an intricate network coordinating a sophisticated balance of excitatory and inhibitory activity between brain regions. The dynamical balance between excitation and inhibition is vital for adjusting neural input/output relationships in cortical networks and regulating the dynamic range of their responses to stimuli. To infer this balance using connectomics, we recently introduced a computational framework based on the Ising model, first developed to explain phase transitions in ferromagnets, and proposed a novel hybrid resting-state structural connectome (rsSC). Here, we show that a generative model based on the Kuramoto phase oscillator can be used to simulate static and dynamic functional connectomes (FC) with rsSC as the coupling weight coefficients, such that the simulated FC well aligns with the observed FC when compared to that simulated with traditional structural connectome. Simulations were performed using the open source framework The Virtual Brain on High Performance Computing infrastructure.
Collapse
|
52
|
Dorahy G, Chen JZ, Balle T. Computer-Aided Drug Design towards New Psychotropic and Neurological Drugs. Molecules 2023; 28:1324. [PMID: 36770990 PMCID: PMC9921936 DOI: 10.3390/molecules28031324] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 01/23/2023] [Accepted: 01/26/2023] [Indexed: 01/31/2023] Open
Abstract
Central nervous system (CNS) disorders are a therapeutic area in drug discovery where demand for new treatments greatly exceeds approved treatment options. This is complicated by the high failure rate in late-stage clinical trials, resulting in exorbitant costs associated with bringing new CNS drugs to market. Computer-aided drug design (CADD) techniques minimise the time and cost burdens associated with drug research and development by ensuring an advantageous starting point for pre-clinical and clinical assessments. The key elements of CADD are divided into ligand-based and structure-based methods. Ligand-based methods encompass techniques including pharmacophore modelling and quantitative structure activity relationships (QSARs), which use the relationship between biological activity and chemical structure to ascertain suitable lead molecules. In contrast, structure-based methods use information about the binding site architecture from an established protein structure to select suitable molecules for further investigation. In recent years, deep learning techniques have been applied in drug design and present an exciting addition to CADD workflows. Despite the difficulties associated with CNS drug discovery, advances towards new pharmaceutical treatments continue to be made, and CADD has supported these findings. This review explores various CADD techniques and discusses applications in CNS drug discovery from 2018 to November 2022.
Collapse
Affiliation(s)
- Georgia Dorahy
- Sydney Pharmacy School, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW 2006, Australia
- Brain and Mind Centre, The University of Sydney, Camperdown, NSW 2050, Australia
| | - Jake Zheng Chen
- Sydney Pharmacy School, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW 2006, Australia
- Brain and Mind Centre, The University of Sydney, Camperdown, NSW 2050, Australia
| | - Thomas Balle
- Sydney Pharmacy School, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW 2006, Australia
- Brain and Mind Centre, The University of Sydney, Camperdown, NSW 2050, Australia
| |
Collapse
|
53
|
Neural mechanisms underlying the hierarchical construction of perceived aesthetic value. Nat Commun 2023; 14:127. [PMID: 36693833 PMCID: PMC9873760 DOI: 10.1038/s41467-022-35654-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 12/15/2022] [Indexed: 01/26/2023] Open
Abstract
Little is known about how the brain computes the perceived aesthetic value of complex stimuli such as visual art. Here, we used computational methods in combination with functional neuroimaging to provide evidence that the aesthetic value of a visual stimulus is computed in a hierarchical manner via a weighted integration over both low and high level stimulus features contained in early and late visual cortex, extending into parietal and lateral prefrontal cortices. Feature representations in parietal and lateral prefrontal cortex may in turn be utilized to produce an overall aesthetic value in the medial prefrontal cortex. Such brain-wide computations are not only consistent with a feature-based mechanism for value construction, but also resemble computations performed by a deep convolutional neural network. Our findings thus shed light on the existence of a general neurocomputational mechanism for rapidly and flexibly producing value judgements across an array of complex novel stimuli and situations.
Collapse
|
54
|
Lee J, Jung M, Lustig N, Lee J. Neural representations of the perception of handwritten digits and visual objects from a convolutional neural network compared to humans. Hum Brain Mapp 2023; 44:2018-2038. [PMID: 36637109 PMCID: PMC9980894 DOI: 10.1002/hbm.26189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 12/04/2022] [Accepted: 12/12/2022] [Indexed: 01/14/2023] Open
Abstract
We investigated neural representations for visual perception of 10 handwritten digits and six visual objects from a convolutional neural network (CNN) and humans using functional magnetic resonance imaging (fMRI). Once our CNN model was fine-tuned using a pre-trained VGG16 model to recognize the visual stimuli from the digit and object categories, representational similarity analysis (RSA) was conducted using neural activations from fMRI and feature representations from the CNN model across all 16 classes. The encoded neural representation of the CNN model exhibited the hierarchical topography mapping of the human visual system. The feature representations in the lower convolutional (Conv) layers showed greater similarity with the neural representations in the early visual areas and parietal cortices, including the posterior cingulate cortex. The feature representations in the higher Conv layers were encoded in the higher-order visual areas, including the ventral/medial/dorsal stream and middle temporal complex. The neural representations in the classification layers were observed mainly in the ventral stream visual cortex (including the inferior temporal cortex), superior parietal cortex, and prefrontal cortex. There was a surprising similarity between the neural representations from the CNN model and the neural representations for human visual perception in the context of the perception of digits versus objects, particularly in the primary visual and associated areas. This study also illustrates the uniqueness of human visual perception. Unlike the CNN model, the neural representation of digits and objects for humans is more widely distributed across the whole brain, including the frontal and temporal areas.
Collapse
Affiliation(s)
- Juhyeon Lee
- Department of Brain and Cognitive EngineeringKorea UniversitySeoulRepublic of Korea
| | - Minyoung Jung
- Department of Brain and Cognitive EngineeringKorea UniversitySeoulRepublic of Korea
| | - Niv Lustig
- Department of Brain and Cognitive EngineeringKorea UniversitySeoulRepublic of Korea
| | - Jong‐Hwan Lee
- Department of Brain and Cognitive EngineeringKorea UniversitySeoulRepublic of Korea
| |
Collapse
|
55
|
Abstract
Models of object recognition have mostly focused upon the hierarchical processing of objects from local edges up to more complex shape features. An alternative strategy that might be involved in pattern recognition centres around coarse-level contrast features. In humans and monkeys, the use of such features is most documented in the domain of face perception. Given prior suggestions that, generally, rodents might rely upon contrast features for object recognition, we hypothesized that they would pick up the typical contrast features relevant for face detection. We trained rats in a face-nonface categorization task with stimuli previously used in computer vision and tested for generalization with new, unseen stimuli by including manipulations of the presence and strength of a range of contrast features previously identified to be relevant for face detection. Although overall generalization performance was low, it was significantly modulated by contrast features. A model taking into account the summed strength of contrast features predicted the variation in accuracy across stimuli. Finally, with deep neural networks, we further investigated and quantified the performance and representations of the animals. The findings suggest that rat behaviour in visual pattern recognition tasks is partially explained by contrast feature processing.
Collapse
|
56
|
Makino H. Arithmetic value representation for hierarchical behavior composition. Nat Neurosci 2023; 26:140-149. [PMID: 36550292 PMCID: PMC9829535 DOI: 10.1038/s41593-022-01211-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 10/21/2022] [Indexed: 12/24/2022]
Abstract
The ability to compose new skills from a preacquired behavior repertoire is a hallmark of biological intelligence. Although artificial agents extract reusable skills from past experience and recombine them in a hierarchical manner, whether the brain similarly composes a novel behavior is largely unknown. In the present study, I show that deep reinforcement learning agents learn to solve a novel composite task by additively combining representations of prelearned action values of constituent subtasks. Learning efficacy in the composite task was further augmented by the introduction of stochasticity in behavior during pretraining. These theoretical predictions were empirically tested in mice, where subtask pretraining enhanced learning of the composite task. Cortex-wide, two-photon calcium imaging revealed analogous neural representations of combined action values, with improved learning when the behavior variability was amplified. Together, these results suggest that the brain composes a novel behavior with a simple arithmetic operation of preacquired action-value representations with stochastic policies.
Collapse
Affiliation(s)
- Hiroshi Makino
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.
| |
Collapse
|
57
|
Jensen CA, Sumanthiran D, Kirkorian HL, Travers BG, Rosengren KS, Rogers TT. Human perception and machine vision reveal rich latent structure in human figure drawings. Front Psychol 2023; 14:1029808. [PMID: 36910741 PMCID: PMC9996750 DOI: 10.3389/fpsyg.2023.1029808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2022] [Accepted: 01/26/2023] [Indexed: 02/25/2023] Open
Abstract
For over a hundred years, children's drawings have been used to assess children's intellectual, emotional, and physical development, characterizing children on the basis of intuitively derived checklists to identify the presence or absence of features within children's drawings. The current study investigates whether contemporary data science tools, including deep neural network models of vision and crowd-based similarity ratings, can reveal latent structure in human figure drawings beyond that captured by checklists, and whether such structure can aid in understanding aspects of the child's cognitive, perceptual, and motor competencies. We introduce three new metrics derived from innovations in machine vision and crowd-sourcing of human judgments and show that they capture a wealth of information about the participant beyond that expressed by standard measures, including age, gender, motor abilities, personal/social behaviors, and communicative skills. Machine-and human-derived metrics captured somewhat different aspects of structure across drawings, and each were independently useful for predicting some participant characteristics. For example, machine embeddings seemed sensitive to the magnitude of the drawing on the page and stroke density, while human-derived embeddings appeared sensitive to the overall shape and parts of a drawing. Both metrics, however, independently explained variation on some outcome measures. Machine embeddings explained more variation than human embeddings on all subscales of the Ages and Stages Questionnaire (a parent report of developmental milestones) and on measures of grip and pinch strength, while each metric accounted for unique variance in models predicting the participant's gender. This research thus suggests that children's drawings may provide a richer basis for characterizing aspects of cognitive, behavioral, and motor development than previously thought.
Collapse
Affiliation(s)
- Clint A Jensen
- Department of Psychology, University of Wisconsin-Madison, Madison, WI, United States
| | - Dillanie Sumanthiran
- Department of Brain and Cognitive Science, University of Rochester, Rochester, NY, United States.,Department of Psychology, University of Rochester, Rochester, NY, United States
| | - Heather L Kirkorian
- Department of Human Development and Family Studies, University of Wisconsin-Madison, Madison, WI, United States
| | - Brittany G Travers
- Occupational Therapy Program, Department of Kinesiology, Waisman Center, University of Wisconsin-Madison, Madison, WI, United States
| | - Karl S Rosengren
- Department of Brain and Cognitive Science, University of Rochester, Rochester, NY, United States.,Department of Psychology, University of Rochester, Rochester, NY, United States
| | - Timothy T Rogers
- Department of Psychology, University of Wisconsin-Madison, Madison, WI, United States
| |
Collapse
|
58
|
Moore JA, Tuladhar A, Ismail Z, Mouches P, Wilms M, Forkert ND. Dementia in Convolutional Neural Networks: Using Deep Learning Models to Simulate Neurodegeneration of the Visual System. Neuroinformatics 2023; 21:45-55. [PMID: 36083416 DOI: 10.1007/s12021-022-09602-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/15/2022] [Indexed: 11/27/2022]
Abstract
Although current research aims to improve deep learning networks by applying knowledge about the healthy human brain and vice versa, the potential of using such networks to model and study neurodegenerative diseases remains largely unexplored. In this work, we present an in-depth feasibility study modeling progressive dementia in silico with deep convolutional neural networks. Therefore, networks were trained to perform visual object recognition and then progressively injured by applying neuronal as well as synaptic injury. After each iteration of injury, network object recognition accuracy, saliency map similarity between the intact and injured networks, and internal activations of the degenerating models were evaluated. The evaluation revealed that cognitive function of the network progressively decreased with increasing injury load whereas this effect was much more pronounced for synaptic damage. The effects of neurodegeneration found for the in silico model are especially similar to the loss of visual cognition seen in patients with posterior cortical atrophy.
Collapse
Affiliation(s)
- Jasmine A Moore
- Department of Radiology, University of Calgary, Calgary, AB, Canada.
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada.
- Biomedical Engineering Program, University of Calgary, Calgary, AB, Canada.
| | - Anup Tuladhar
- Department of Radiology, University of Calgary, Calgary, AB, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
| | - Zahinoor Ismail
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
- Department of Clinical Neurosciences, University of Calgary, Calgary, AB, Canada
- Department of Community Health Sciences, University of Calgary, Calgary, AB, Canada
- Department of Psychiatry, University of Calgary, Calgary, AB, Canada
- O'Brien Institute for Public Health, University of Calgary, Calgary, AB, Canada
| | - Pauline Mouches
- Department of Radiology, University of Calgary, Calgary, AB, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
- Biomedical Engineering Program, University of Calgary, Calgary, AB, Canada
| | - Matthias Wilms
- Department of Radiology, University of Calgary, Calgary, AB, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Nils D Forkert
- Department of Radiology, University of Calgary, Calgary, AB, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
- Department of Clinical Neurosciences, University of Calgary, Calgary, AB, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, Canada
- Department of Electrical and Software Engineering, University of Calgary, Calgary, AB, Canada
| |
Collapse
|
59
|
Wingfield C, Zhang C, Devereux B, Fonteneau E, Thwaites A, Liu X, Woodland P, Marslen-Wilson W, Su L. On the similarities of representations in artificial and brain neural networks for speech recognition. Front Comput Neurosci 2022; 16:1057439. [PMID: 36618270 PMCID: PMC9811675 DOI: 10.3389/fncom.2022.1057439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 11/29/2022] [Indexed: 12/24/2022] Open
Abstract
Introduction In recent years, machines powered by deep learning have achieved near-human levels of performance in speech recognition. The fields of artificial intelligence and cognitive neuroscience have finally reached a similar level of performance, despite their huge differences in implementation, and so deep learning models can-in principle-serve as candidates for mechanistic models of the human auditory system. Methods Utilizing high-performance automatic speech recognition systems, and advanced non-invasive human neuroimaging technology such as magnetoencephalography and multivariate pattern-information analysis, the current study aimed to relate machine-learned representations of speech to recorded human brain representations of the same speech. Results In one direction, we found a quasi-hierarchical functional organization in human auditory cortex qualitatively matched with the hidden layers of deep artificial neural networks trained as part of an automatic speech recognizer. In the reverse direction, we modified the hidden layer organization of the artificial neural network based on neural activation patterns in human brains. The result was a substantial improvement in word recognition accuracy and learned speech representations. Discussion We have demonstrated that artificial and brain neural networks can be mutually informative in the domain of speech recognition.
Collapse
Affiliation(s)
- Cai Wingfield
- Department of Psychology, Lancaster University, Lancaster, United Kingdom
| | - Chao Zhang
- Department of Engineering, University of Cambridge, Cambridge, United Kingdom
| | - Barry Devereux
- School of Electronics, Electrical Engineering and Computer Science, Queens University Belfast, Belfast, United Kingdom
| | - Elisabeth Fonteneau
- Department of Psychology, University Paul Valéry Montpellier, Montpellier, France
| | - Andrew Thwaites
- Department of Psychology, University of Cambridge, Cambridge, United Kingdom
| | - Xunying Liu
- Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
| | - Phil Woodland
- Department of Engineering, University of Cambridge, Cambridge, United Kingdom
| | | | - Li Su
- Department of Neuroscience, Neuroscience Institute, Insigneo Institute for in silico Medicine, University of Sheffield, Sheffield, United Kingdom,Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom,*Correspondence: Li Su
| |
Collapse
|
60
|
Bordelon B, Pehlevan C. Population codes enable learning from few examples by shaping inductive bias. eLife 2022; 11:e78606. [PMID: 36524716 PMCID: PMC9839349 DOI: 10.7554/elife.78606] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 12/15/2022] [Indexed: 12/23/2022] Open
Abstract
Learning from a limited number of experiences requires suitable inductive biases. To identify how inductive biases are implemented in and shaped by neural codes, we analyze sample-efficient learning of arbitrary stimulus-response maps from arbitrary neural codes with biologically-plausible readouts. We develop an analytical theory that predicts the generalization error of the readout as a function of the number of observed examples. Our theory illustrates in a mathematically precise way how the structure of population codes shapes inductive bias, and how a match between the code and the task is crucial for sample-efficient learning. It elucidates a bias to explain observed data with simple stimulus-response maps. Using recordings from the mouse primary visual cortex, we demonstrate the existence of an efficiency bias towards low-frequency orientation discrimination tasks for grating stimuli and low spatial frequency reconstruction tasks for natural images. We reproduce the discrimination bias in a simple model of primary visual cortex, and further show how invariances in the code to certain stimulus variations alter learning performance. We extend our methods to time-dependent neural codes and predict the sample efficiency of readouts from recurrent networks. We observe that many different codes can support the same inductive bias. By analyzing recordings from the mouse primary visual cortex, we demonstrate that biological codes have lower total activity than other codes with identical bias. Finally, we discuss implications of our theory in the context of recent developments in neuroscience and artificial intelligence. Overall, our study provides a concrete method for elucidating inductive biases of the brain and promotes sample-efficient learning as a general normative coding principle.
Collapse
Affiliation(s)
- Blake Bordelon
- John A Paulson School of Engineering and Applied Sciences, Harvard UniversityCambridgeUnited States
- Center for Brain Science, Harvard UniversityCambridgeUnited States
| | - Cengiz Pehlevan
- John A Paulson School of Engineering and Applied Sciences, Harvard UniversityCambridgeUnited States
- Center for Brain Science, Harvard UniversityCambridgeUnited States
| |
Collapse
|
61
|
Chen Y, Wei Z, Gou H, Liu H, Gao L, He X, Zhang X. How far is brain-inspired artificial intelligence away from brain? Front Neurosci 2022; 16:1096737. [PMID: 36570836 PMCID: PMC9783913 DOI: 10.3389/fnins.2022.1096737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Accepted: 11/24/2022] [Indexed: 12/13/2022] Open
Abstract
Fueled by the development of neuroscience and artificial intelligence (AI), recent advances in the brain-inspired AI have manifested a tipping-point in the collaboration of the two fields. AI began with the inspiration of neuroscience, but has evolved to achieve a remarkable performance with little dependence upon neuroscience. However, in a recent collaboration, research into neurobiological explainability of AI models found that these highly accurate models may resemble the neurobiological representation of the same computational processes in the brain, although these models have been developed in the absence of such neuroscientific references. In this perspective, we review the cooperation and separation between neuroscience and AI, and emphasize on the current advance, that is, a new cooperation, the neurobiological explainability of AI. Under the intertwined development of the two fields, we propose a practical framework to evaluate the brain-likeness of AI models, paving the way for their further improvements.
Collapse
Affiliation(s)
- Yucan Chen
- Hefei National Research Center for Physical Sciences at the Microscale, and Department of Radiology, the First Affiliated Hospital of USTC, Division of Life Science and Medicine, University of Science & Technology of China, Hefei, China
| | - Zhengde Wei
- Department of Psychology, School of Humanities and Social Sciences, University of Science and Technology of China, Hefei, Anhui, China
| | - Huixing Gou
- Division of Life Sciences and Medicine, School of Life Sciences, University of Science and Technology of China, Hefei, Anhui, China
| | - Haiyi Liu
- State Key Laboratory of Cognitive Neuroscience and Learning and IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, China
| | - Li Gao
- SILC Business School, Shanghai University, Shanghai, China,*Correspondence: Li Gao,
| | - Xiaosong He
- Department of Psychology, School of Humanities and Social Sciences, University of Science and Technology of China, Hefei, Anhui, China,Xiaosong He,
| | - Xiaochu Zhang
- Hefei National Research Center for Physical Sciences at the Microscale, and Department of Radiology, the First Affiliated Hospital of USTC, Division of Life Science and Medicine, University of Science & Technology of China, Hefei, China,Department of Psychology, School of Humanities and Social Sciences, University of Science and Technology of China, Hefei, Anhui, China,Application Technology Center of Physical Therapy to Brain Disorders, Institute of Advanced Technology, University of Science and Technology of China, Hefei, China,Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, China,Xiaochu Zhang,
| |
Collapse
|
62
|
Gifford AT, Dwivedi K, Roig G, Cichy RM. A large and rich EEG dataset for modeling human visual object recognition. Neuroimage 2022; 264:119754. [PMID: 36400378 PMCID: PMC9771828 DOI: 10.1016/j.neuroimage.2022.119754] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Revised: 09/14/2022] [Accepted: 11/14/2022] [Indexed: 11/16/2022] Open
Abstract
The human brain achieves visual object recognition through multiple stages of linear and nonlinear transformations operating at a millisecond scale. To predict and explain these rapid transformations, computational neuroscientists employ machine learning modeling techniques. However, state-of-the-art models require massive amounts of data to properly train, and to the present day there is a lack of vast brain datasets which extensively sample the temporal dynamics of visual object recognition. Here we collected a large and rich dataset of high temporal resolution EEG responses to images of objects on a natural background. This dataset includes 10 participants, each with 82,160 trials spanning 16,740 image conditions. Through computational modeling we established the quality of this dataset in five ways. First, we trained linearizing encoding models that successfully synthesized the EEG responses to arbitrary images. Second, we correctly identified the recorded EEG data image conditions in a zero-shot fashion, using EEG synthesized responses to hundreds of thousands of candidate image conditions. Third, we show that both the high number of conditions as well as the trial repetitions of the EEG dataset contribute to the trained models' prediction accuracy. Fourth, we built encoding models whose predictions well generalize to novel participants. Fifth, we demonstrate full end-to-end training of randomly initialized DNNs that output EEG responses for arbitrary input images. We release this dataset as a tool to foster research in visual neuroscience and computer vision.
Collapse
Affiliation(s)
- Alessandro T Gifford
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany; Einstein Center for Neurosciences Berlin, Charité - Universitätsmedizin Berlin, Berlin, Germany; Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany.
| | - Kshitij Dwivedi
- Department of Computer Science, Goethe Universität, Frankfurt am Main, Germany
| | - Gemma Roig
- Department of Computer Science, Goethe Universität, Frankfurt am Main, Germany
| | - Radoslaw M Cichy
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany; Einstein Center for Neurosciences Berlin, Charité - Universitätsmedizin Berlin, Berlin, Germany; Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany; Berlin School of Mind and Brain, Humboldt-Universität zu Berlin, Berlin, Germany
| |
Collapse
|
63
|
Zarkeshian P, Kergan T, Ghobadi R, Nicola W, Simon C. Photons guided by axons may enable backpropagation-based learning in the brain. Sci Rep 2022; 12:20720. [PMID: 36456619 PMCID: PMC9715721 DOI: 10.1038/s41598-022-24871-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Accepted: 11/22/2022] [Indexed: 12/03/2022] Open
Abstract
Despite great advances in explaining synaptic plasticity and neuron function, a complete understanding of the brain's learning algorithms is still missing. Artificial neural networks provide a powerful learning paradigm through the backpropagation algorithm which modifies synaptic weights by using feedback connections. Backpropagation requires extensive communication of information back through the layers of a network. This has been argued to be biologically implausible and it is not clear whether backpropagation can be realized in the brain. Here we suggest that biophotons guided by axons provide a potential channel for backward transmission of information in the brain. Biophotons have been experimentally shown to be produced in the brain, yet their purpose is not understood. We propose that biophotons can propagate from each post-synaptic neuron to its pre-synaptic one to carry the required information backward. To reflect the stochastic character of biophoton emissions, our model includes the stochastic backward transmission of teaching signals. We demonstrate that a three-layered network of neurons can learn the MNIST handwritten digit classification task using our proposed backpropagation-like algorithm with stochastic photonic feedback. We model realistic restrictions and show that our system still learns the task for low rates of biophoton emission, information-limited (one bit per photon) backward transmission, and in the presence of noise photons. Our results suggest a new functionality for biophotons and provide an alternate mechanism for backward transmission in the brain.
Collapse
Affiliation(s)
- Parisa Zarkeshian
- grid.22072.350000 0004 1936 7697Department of Physics & Astronomy, University of Calgary, 2500 University Drive NW, Calgary, AB T2N 1N4 Canada ,grid.22072.350000 0004 1936 7697Institute for Quantum Science and Technology, University of Calgary, 2500 University Drive NW, Calgary, AB T2N 1N4 Canada ,grid.22072.350000 0004 1936 7697Hotchkiss Brain Institute, University of Calgary, 3330 Hospital Drive NW, Calgary, AB T2N 4N1 Canada ,1QB Information Technologies (1QBit), Vancouver, BC Canada
| | - Taylor Kergan
- grid.22072.350000 0004 1936 7697Department of Physics & Astronomy, University of Calgary, 2500 University Drive NW, Calgary, AB T2N 1N4 Canada
| | - Roohollah Ghobadi
- grid.22072.350000 0004 1936 7697Department of Physics & Astronomy, University of Calgary, 2500 University Drive NW, Calgary, AB T2N 1N4 Canada ,grid.22072.350000 0004 1936 7697Institute for Quantum Science and Technology, University of Calgary, 2500 University Drive NW, Calgary, AB T2N 1N4 Canada ,grid.22072.350000 0004 1936 7697Hotchkiss Brain Institute, University of Calgary, 3330 Hospital Drive NW, Calgary, AB T2N 4N1 Canada
| | - Wilten Nicola
- grid.22072.350000 0004 1936 7697Department of Physics & Astronomy, University of Calgary, 2500 University Drive NW, Calgary, AB T2N 1N4 Canada ,grid.22072.350000 0004 1936 7697Hotchkiss Brain Institute, University of Calgary, 3330 Hospital Drive NW, Calgary, AB T2N 4N1 Canada ,grid.22072.350000 0004 1936 7697Department of Cell Biology and Anatomy, University of Calgary, Cumming School of Medicine, 3330 Hospital Drive NW, Calgary, AB Canada
| | - Christoph Simon
- grid.22072.350000 0004 1936 7697Department of Physics & Astronomy, University of Calgary, 2500 University Drive NW, Calgary, AB T2N 1N4 Canada ,grid.22072.350000 0004 1936 7697Institute for Quantum Science and Technology, University of Calgary, 2500 University Drive NW, Calgary, AB T2N 1N4 Canada ,grid.22072.350000 0004 1936 7697Hotchkiss Brain Institute, University of Calgary, 3330 Hospital Drive NW, Calgary, AB T2N 4N1 Canada
| |
Collapse
|
64
|
Zafirova Y, Cui D, Raman R, Vogels R. Keep the head in the right place: Face-body interactions in inferior temporal cortex. Neuroimage 2022; 264:119676. [PMID: 36216293 DOI: 10.1016/j.neuroimage.2022.119676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 09/23/2022] [Accepted: 10/06/2022] [Indexed: 11/05/2022] Open
Abstract
In primates, faces and bodies activate distinct regions in the inferior temporal (IT) cortex and are typically studied separately. Yet, primates interact with whole agents and not with random concatenations of faces and bodies. Despite its social importance, it is still poorly understood how faces and bodies interact in IT. Here, we addressed this gap by measuring fMRI activations to whole agents and to unnatural face-body configurations in which the head was mislocated with respect to the body, and examined how these relate to the sum of the activations to their corresponding faces and bodies. First, we mapped patches in the IT of awake macaques that were activated more by images of whole monkeys compared to objects and found that these mostly overlapped with body and face patches. In a second fMRI experiment, we obtained no evidence for superadditive responses in these "monkey patches", with the activation to the monkeys being less or equal to the summed face-body activations. However, monkey patches in the anterior IT were activated more by natural compared to unnatural configurations. The stronger activations to natural configurations could not be explained by the summed face-body activations. These univariate results were supported by regression analyses in which we modeled the activations to both configurations as a weighted linear combination of the activations to the faces and bodies, showing higher regression coefficients for the natural compared to the unnatural configurations. Deeper layers of trained convolutional neural networks also contained units that responded more to natural compared to unnatural monkey configurations. Unlike the monkey fMRI patches, these units showed substantial superadditive responses to the natural configurations. Our monkey fMRI data suggest configuration-sensitive face-body interactions in anterior IT, adding to the evidence for an integrated face-body processing in the primate ventral visual stream, and open the way for mechanistic studies using single unit recordings in these patches.
Collapse
Affiliation(s)
- Yordanka Zafirova
- Laboratorium voor Neuro- en Psychofysiologie, Department of Neurosciences, KU Leuven, Belgium; Leuven Brain Institute, KU Leuven, Belgium
| | - Ding Cui
- Laboratorium voor Neuro- en Psychofysiologie, Department of Neurosciences, KU Leuven, Belgium; Leuven Brain Institute, KU Leuven, Belgium
| | - Rajani Raman
- Laboratorium voor Neuro- en Psychofysiologie, Department of Neurosciences, KU Leuven, Belgium; Leuven Brain Institute, KU Leuven, Belgium
| | - Rufin Vogels
- Laboratorium voor Neuro- en Psychofysiologie, Department of Neurosciences, KU Leuven, Belgium; Leuven Brain Institute, KU Leuven, Belgium.
| |
Collapse
|
65
|
Lele AS, Fang Y, Anwar A, Raychowdhury A. Bio-mimetic high-speed target localization with fused frame and event vision for edge application. Front Neurosci 2022; 16:1010302. [PMID: 36507348 PMCID: PMC9732385 DOI: 10.3389/fnins.2022.1010302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 10/24/2022] [Indexed: 11/26/2022] Open
Abstract
Evolution has honed predatory skills in the natural world where localizing and intercepting fast-moving prey is required. The current generation of robotic systems mimics these biological systems using deep learning. High-speed processing of the camera frames using convolutional neural networks (CNN) (frame pipeline) on such constrained aerial edge-robots gets resource-limited. Adding more compute resources also eventually limits the throughput at the frame rate of the camera as frame-only traditional systems fail to capture the detailed temporal dynamics of the environment. Bio-inspired event cameras and spiking neural networks (SNN) provide an asynchronous sensor-processor pair (event pipeline) capturing the continuous temporal details of the scene for high-speed but lag in terms of accuracy. In this work, we propose a target localization system combining event-camera and SNN-based high-speed target estimation and frame-based camera and CNN-driven reliable object detection by fusing complementary spatio-temporal prowess of event and frame pipelines. One of our main contributions involves the design of an SNN filter that borrows from the neural mechanism for ego-motion cancelation in houseflies. It fuses the vestibular sensors with the vision to cancel the activity corresponding to the predator's self-motion. We also integrate the neuro-inspired multi-pipeline processing with task-optimized multi-neuronal pathway structure in primates and insects. The system is validated to outperform CNN-only processing using prey-predator drone simulations in realistic 3D virtual environments. The system is then demonstrated in a real-world multi-drone set-up with emulated event data. Subsequently, we use recorded actual sensory data from multi-camera and inertial measurement unit (IMU) assembly to show desired working while tolerating the realistic noise in vision and IMU sensors. We analyze the design space to identify optimal parameters for spiking neurons, CNN models, and for checking their effect on the performance metrics of the fused system. Finally, we map the throughput controlling SNN and fusion network on edge-compatible Zynq-7000 FPGA to show a potential 264 outputs per second even at constrained resource availability. This work may open new research directions by coupling multiple sensing and processing modalities inspired by discoveries in neuroscience to break fundamental trade-offs in frame-based computer vision.
Collapse
Affiliation(s)
- Ashwin Sanjay Lele
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, United States
| | - Yan Fang
- Department of Electrical and Computer Engineering, Kennesaw State University, Marietta, GA, United States
| | - Aqeel Anwar
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, United States
| | - Arijit Raychowdhury
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, United States
| |
Collapse
|
66
|
Sleep prevents catastrophic forgetting in spiking neural networks by forming a joint synaptic weight representation. PLoS Comput Biol 2022; 18:e1010628. [PMID: 36399437 PMCID: PMC9674146 DOI: 10.1371/journal.pcbi.1010628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Accepted: 10/03/2022] [Indexed: 11/19/2022] Open
Abstract
Artificial neural networks overwrite previously learned tasks when trained sequentially, a phenomenon known as catastrophic forgetting. In contrast, the brain learns continuously, and typically learns best when new training is interleaved with periods of sleep for memory consolidation. Here we used spiking network to study mechanisms behind catastrophic forgetting and the role of sleep in preventing it. The network could be trained to learn a complex foraging task but exhibited catastrophic forgetting when trained sequentially on different tasks. In synaptic weight space, new task training moved the synaptic weight configuration away from the manifold representing old task leading to forgetting. Interleaving new task training with periods of off-line reactivation, mimicking biological sleep, mitigated catastrophic forgetting by constraining the network synaptic weight state to the previously learned manifold, while allowing the weight configuration to converge towards the intersection of the manifolds representing old and new tasks. The study reveals a possible strategy of synaptic weights dynamics the brain applies during sleep to prevent forgetting and optimize learning.
Collapse
|
67
|
Philippsen A, Tsuji S, Nagai Y. Quantifying developmental and individual differences in spontaneous drawing completion among children. Front Psychol 2022; 13:783446. [DOI: 10.3389/fpsyg.2022.783446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Accepted: 10/19/2022] [Indexed: 11/13/2022] Open
Abstract
This study investigated how children's drawings can provide insights into their cognitive development. It can be challenging to quantify the diversity of children's drawings across their developmental stages as well as between individuals. This study observed children's representational drawing ability by conducting a completion task where children could freely draw on partially drawn objects, and quantitatively analyzed differences in children's drawing tendencies across age and between individuals. First, we conducted preregistered analyses, based on crowd-sourced adult ratings, to investigate the differences of drawing style with the age and autistic traits of the children, where the latter was inspired by reports of atypical drawing among children with autism spectrum disorder (ASD). Additionally, the drawings were quantified using feature representations extracted with a deep convolutional neural network (CNN), which allowed an analysis of the drawings at different perceptual levels (i.e., local or global). Findings revealed a decrease in scribbling and an increase in completion behavior with increasing age. However, no correlation between drawing behavior and autistic traits was found. The network analysis demonstrated that older children adapted to the presented stimuli in a more adult-like manner than younger children. Furthermore, ways to quantify individual differences in how children adapt to the presented stimuli are explored. Based on the predictive coding theory as a unified theory of how perception and behavior might emerge from integrating sensations and predictions, we suggest that our analyses may open up new possibilities for investigating children's cognitive development.
Collapse
|
68
|
Cheon J, Baek S, Paik SB. Invariance of object detection in untrained deep neural networks. Front Comput Neurosci 2022; 16:1030707. [DOI: 10.3389/fncom.2022.1030707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 10/13/2022] [Indexed: 11/06/2022] Open
Abstract
The ability to perceive visual objects with various types of transformations, such as rotation, translation, and scaling, is crucial for consistent object recognition. In machine learning, invariant object detection for a network is often implemented by augmentation with a massive number of training images, but the mechanism of invariant object detection in biological brains—how invariance arises initially and whether it requires visual experience—remains elusive. Here, using a model neural network of the hierarchical visual pathway of the brain, we show that invariance of object detection can emerge spontaneously in the complete absence of learning. First, we found that units selective to a particular object class arise in randomly initialized networks even before visual training. Intriguingly, these units show robust tuning to images of each object class under a wide range of image transformation types, such as viewpoint rotation. We confirmed that this “innate” invariance of object selectivity enables untrained networks to perform an object-detection task robustly, even with images that have been significantly modulated. Our computational model predicts that invariant object tuning originates from combinations of non-invariant units via random feedforward projections, and we confirmed that the predicted profile of feedforward projections is observed in untrained networks. Our results suggest that invariance of object detection is an innate characteristic that can emerge spontaneously in random feedforward networks.
Collapse
|
69
|
Functional Network: A Novel Framework for Interpretability of Deep Neural Networks. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.11.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
70
|
Xu Y, Vaziri-Pashkam M. Understanding transformation tolerant visual object representations in the human brain and convolutional neural networks. Neuroimage 2022; 263:119635. [PMID: 36116617 PMCID: PMC11283825 DOI: 10.1016/j.neuroimage.2022.119635] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 09/12/2022] [Accepted: 09/14/2022] [Indexed: 11/16/2022] Open
Abstract
Forming transformation-tolerant object representations is critical to high-level primate vision. Despite its significance, many details of tolerance in the human brain remain unknown. Likewise, despite the ability of convolutional neural networks (CNNs) to exhibit human-like object categorization performance, whether CNNs form tolerance similar to that of the human brain is unknown. Here we provide the first comprehensive documentation and comparison of three tolerance measures in the human brain and CNNs. We measured fMRI responses from human ventral visual areas to real-world objects across both Euclidean and non-Euclidean feature changes. In single fMRI voxels in higher visual areas, we observed robust object response rank-order preservation across feature changes. This is indicative of functional smoothness in tolerance at the fMRI meso-scale level that has never been reported before. At the voxel population level, we found highly consistent object representational structure across feature changes towards the end of ventral processing. Rank-order preservation, consistency, and a third tolerance measure, cross-decoding success (i.e., a linear classifier's ability to generalize performance across feature changes) showed an overall tight coupling. These tolerance measures were in general lower for Euclidean than non-Euclidean feature changes in lower visual areas, but increased over the course of ventral processing for all feature changes. These characteristics of tolerance, however, were absent in eight CNNs pretrained with ImageNet images with varying network architecture, depth, the presence/absence of recurrent processing, or whether a network was pretrained with the original or stylized ImageNet images that encouraged shape processing. CNNs do not appear to develop the same kind of tolerance as the human brain over the course of visual processing.
Collapse
Affiliation(s)
- Yaoda Xu
- Psychology Department, Yale University, New Haven, CT 06520, USA.
| | | |
Collapse
|
71
|
Geller HA, Bartho R, Thömmes K, Redies C. Statistical image properties predict aesthetic ratings in abstract paintings created by neural style transfer. Front Neurosci 2022; 16:999720. [PMID: 36312022 PMCID: PMC9606769 DOI: 10.3389/fnins.2022.999720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 09/26/2022] [Indexed: 11/13/2022] Open
Abstract
Artificial intelligence has emerged as a powerful computational tool to create artworks. One application is Neural Style Transfer, which allows to transfer the style of one image, such as a painting, onto the content of another image, such as a photograph. In the present study, we ask how Neural Style Transfer affects objective image properties and how beholders perceive the novel (style-transferred) stimuli. In order to focus on the subjective perception of artistic style, we minimized the confounding effect of cognitive processing by eliminating all representational content from the input images. To this aim, we transferred the styles of 25 diverse abstract paintings onto 150 colored random-phase patterns with six different Fourier spectral slopes. This procedure resulted in 150 style-transferred stimuli. We then computed eight statistical image properties (complexity, self-similarity, edge-orientation entropy, variances of neural network features, and color statistics) for each image. In a rating study, we asked participants to evaluate the images along three aesthetic dimensions (Pleasing, Harmonious, and Interesting). Results demonstrate that not only objective image properties, but also subjective aesthetic preferences transferred from the original artworks onto the style-transferred images. The image properties of the style-transferred images explain 50 – 69% of the variance in the ratings. In the multidimensional space of statistical image properties, participants considered style-transferred images to be more Pleasing and Interesting if they were closer to a “sweet spot” where traditional Western paintings (JenAesthetics dataset) are represented. We conclude that NST is a useful tool to create novel artistic stimuli that preserve the image properties of the input style images. In the novel stimuli, we found a strong relationship between statistical image properties and subjective ratings, suggesting a prominent role of perceptual processing in the aesthetic evaluation of abstract images.
Collapse
|
72
|
Valeriani D, Santoro F, Ienca M. The present and future of neural interfaces. Front Neurorobot 2022; 16:953968. [PMID: 36304780 PMCID: PMC9592849 DOI: 10.3389/fnbot.2022.953968] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 07/13/2022] [Indexed: 11/18/2022] Open
Abstract
The 2020's decade will likely witness an unprecedented development and deployment of neurotechnologies for human rehabilitation, personalized use, and cognitive or other enhancement. New materials and algorithms are already enabling active brain monitoring and are allowing the development of biohybrid and neuromorphic systems that can adapt to the brain. Novel brain-computer interfaces (BCIs) have been proposed to tackle a variety of enhancement and therapeutic challenges, from improving decision-making to modulating mood disorders. While these BCIs have generally been developed in an open-loop modality to optimize their internal neural decoders, this decade will increasingly witness their validation in closed-loop systems that are able to continuously adapt to the user's mental states. Therefore, a proactive ethical approach is needed to ensure that these new technological developments go hand in hand with the development of a sound ethical framework. In this perspective article, we summarize recent developments in neural interfaces, ranging from neurohybrid synapses to closed-loop BCIs, and thereby identify the most promising macro-trends in BCI research, such as simulating vs. interfacing the brain, brain recording vs. brain stimulation, and hardware vs. software technology. Particular attention is devoted to central nervous system interfaces, especially those with application in healthcare and human enhancement. Finally, we critically assess the possible futures of neural interfacing and analyze the short- and long-term implications of such neurotechnologies.
Collapse
Affiliation(s)
| | - Francesca Santoro
- Institute for Biological Information Processing - Bioelectronics, IBI-3, Forschungszentrum Juelich, Juelich, Germany
- Faculty of Electrical Engineering and Information Technology, RWTH Aachen University, Aachen, Germany
| | - Marcello Ienca
- College of Humanities, Swiss Federal Institute of Technology Lausanne (EPFL), Lausanne, Switzerland
- *Correspondence: Marcello Ienca
| |
Collapse
|
73
|
Wang MB, Halassa MM. Thalamocortical contribution to flexible learning in neural systems. Netw Neurosci 2022; 6:980-997. [PMID: 36875011 PMCID: PMC9976647 DOI: 10.1162/netn_a_00235] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Accepted: 01/19/2022] [Indexed: 11/04/2022] Open
Abstract
Animal brains evolved to optimize behavior in dynamic environments, flexibly selecting actions that maximize future rewards in different contexts. A large body of experimental work indicates that such optimization changes the wiring of neural circuits, appropriately mapping environmental input onto behavioral outputs. A major unsolved scientific question is how optimal wiring adjustments, which must target the connections responsible for rewards, can be accomplished when the relation between sensory inputs, action taken, and environmental context with rewards is ambiguous. The credit assignment problem can be categorized into context-independent structural credit assignment and context-dependent continual learning. In this perspective, we survey prior approaches to these two problems and advance the notion that the brain's specialized neural architectures provide efficient solutions. Within this framework, the thalamus with its cortical and basal ganglia interactions serves as a systems-level solution to credit assignment. Specifically, we propose that thalamocortical interaction is the locus of meta-learning where the thalamus provides cortical control functions that parametrize the cortical activity association space. By selecting among these control functions, the basal ganglia hierarchically guide thalamocortical plasticity across two timescales to enable meta-learning. The faster timescale establishes contextual associations to enable behavioral flexibility, while the slower one enables generalization to new contexts.
Collapse
Affiliation(s)
- Mien Brabeeba Wang
- Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Michael M. Halassa
- Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
74
|
Wagatsuma N, Hidaka A, Tamura H. Analysis based on neural representation of natural object surfaces to elucidate the mechanisms of a trained AlexNet model. Front Comput Neurosci 2022; 16:979258. [PMID: 36249483 PMCID: PMC9564108 DOI: 10.3389/fncom.2022.979258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Accepted: 09/12/2022] [Indexed: 11/22/2022] Open
Abstract
Analysis and understanding of trained deep neural networks (DNNs) can deepen our understanding of the visual mechanisms involved in primate visual perception. However, due to the limited availability of neural activity data recorded from various cortical areas, the correspondence between the characteristics of artificial and biological neural responses for visually recognizing objects remains unclear at the layer level of DNNs. In the current study, we investigated the relationships between the artificial representations in each layer of a trained AlexNet model (based on a DNN) for object classification and the neural representations in various levels of visual cortices such as the primary visual (V1), intermediate visual (V4), and inferior temporal cortices. Furthermore, we analyzed the profiles of the artificial representations at a single channel level for each layer of the AlexNet model. We found that the artificial representations in the lower-level layers of the trained AlexNet model were strongly correlated with the neural representation in V1, whereas the responses of model neurons in layers at the intermediate and higher-intermediate levels of the trained object classification model exhibited characteristics similar to those of neural activity in V4 neurons. These results suggest that the trained AlexNet model may gradually establish artificial representations for object classification through the hierarchy of its network, in a similar manner to the neural mechanisms by which afferent transmission beginning in the low-level features gradually establishes object recognition as signals progress through the hierarchy of the ventral visual pathway.
Collapse
Affiliation(s)
- Nobuhiko Wagatsuma
- Department of Information Science, Faculty of Science, Toho University, Funabashi, Japan
- *Correspondence: Nobuhiko Wagatsuma,
| | - Akinori Hidaka
- School of Science and Engineering, Tokyo Denki University, Hatoyama-machi, Japan
| | - Hiroshi Tamura
- Graduate School of Frontier Biosciences, Osaka University, Suita, Japan
- Center for Information and Neural Networks (CiNet), Suita, Japan
| |
Collapse
|
75
|
Kim SG. On the encoding of natural music in computational models and human brains. Front Neurosci 2022; 16:928841. [PMID: 36203808 PMCID: PMC9531138 DOI: 10.3389/fnins.2022.928841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 08/15/2022] [Indexed: 11/13/2022] Open
Abstract
This article discusses recent developments and advances in the neuroscience of music to understand the nature of musical emotion. In particular, it highlights how system identification techniques and computational models of music have advanced our understanding of how the human brain processes the textures and structures of music and how the processed information evokes emotions. Musical models relate physical properties of stimuli to internal representations called features, and predictive models relate features to neural or behavioral responses and test their predictions against independent unseen data. The new frameworks do not require orthogonalized stimuli in controlled experiments to establish reproducible knowledge, which has opened up a new wave of naturalistic neuroscience. The current review focuses on how this trend has transformed the domain of the neuroscience of music.
Collapse
|
76
|
van Dyck LE, Denzler SJ, Gruber WR. Guiding visual attention in deep convolutional neural networks based on human eye movements. Front Neurosci 2022; 16:975639. [PMID: 36177359 PMCID: PMC9514055 DOI: 10.3389/fnins.2022.975639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 08/25/2022] [Indexed: 11/13/2022] Open
Abstract
Deep Convolutional Neural Networks (DCNNs) were originally inspired by principles of biological vision, have evolved into best current computational models of object recognition, and consequently indicate strong architectural and functional parallelism with the ventral visual pathway throughout comparisons with neuroimaging and neural time series data. As recent advances in deep learning seem to decrease this similarity, computational neuroscience is challenged to reverse-engineer the biological plausibility to obtain useful models. While previous studies have shown that biologically inspired architectures are able to amplify the human-likeness of the models, in this study, we investigate a purely data-driven approach. We use human eye tracking data to directly modify training examples and thereby guide the models’ visual attention during object recognition in natural images either toward or away from the focus of human fixations. We compare and validate different manipulation types (i.e., standard, human-like, and non-human-like attention) through GradCAM saliency maps against human participant eye tracking data. Our results demonstrate that the proposed guided focus manipulation works as intended in the negative direction and non-human-like models focus on significantly dissimilar image parts compared to humans. The observed effects were highly category-specific, enhanced by animacy and face presence, developed only after feedforward processing was completed, and indicated a strong influence on face detection. With this approach, however, no significantly increased human-likeness was found. Possible applications of overt visual attention in DCNNs and further implications for theories of face detection are discussed.
Collapse
Affiliation(s)
- Leonard Elia van Dyck
- Department of Psychology, University of Salzburg, Salzburg, Austria
- Centre for Cognitive Neuroscience, University of Salzburg, Salzburg, Austria
- *Correspondence: Leonard Elia van Dyck,
| | | | - Walter Roland Gruber
- Department of Psychology, University of Salzburg, Salzburg, Austria
- Centre for Cognitive Neuroscience, University of Salzburg, Salzburg, Austria
| |
Collapse
|
77
|
Baker N, Elder JH. Deep learning models fail to capture the configural nature of human shape perception. iScience 2022; 25:104913. [PMID: 36060067 PMCID: PMC9429800 DOI: 10.1016/j.isci.2022.104913] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 05/06/2022] [Accepted: 08/08/2022] [Indexed: 11/26/2022] Open
|
78
|
Rolls ET, Deco G, Huang CC, Feng J. Multiple cortical visual streams in humans. Cereb Cortex 2022; 33:3319-3349. [PMID: 35834308 DOI: 10.1093/cercor/bhac276] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 06/16/2022] [Accepted: 06/17/2022] [Indexed: 11/14/2022] Open
Abstract
The effective connectivity between 55 visual cortical regions and 360 cortical regions was measured in 171 HCP participants using the HCP-MMP atlas, and complemented with functional connectivity and diffusion tractography. A Ventrolateral Visual "What" Stream for object and face recognition projects hierarchically to the inferior temporal visual cortex, which projects to the orbitofrontal cortex for reward value and emotion, and to the hippocampal memory system. A Ventromedial Visual "Where" Stream for scene representations connects to the parahippocampal gyrus and hippocampus. An Inferior STS (superior temporal sulcus) cortex Semantic Stream receives from the Ventrolateral Visual Stream, from visual inferior parietal PGi, and from the ventromedial-prefrontal reward system and connects to language systems. A Dorsal Visual Stream connects via V2 and V3A to MT+ Complex regions (including MT and MST), which connect to intraparietal regions (including LIP, VIP and MIP) involved in visual motion and actions in space. It performs coordinate transforms for idiothetic update of Ventromedial Stream scene representations. A Superior STS cortex Semantic Stream receives visual inputs from the Inferior STS Visual Stream, PGi, and STV, and auditory inputs from A5, is activated by face expression, motion and vocalization, and is important in social behaviour, and connects to language systems.
Collapse
Affiliation(s)
- Edmund T Rolls
- Oxford Centre for Computational Neuroscience, Oxford, United Kingdom.,Department of Computer Science, University of Warwick, Coventry CV4 7AL, United Kingdom.,Institute of Science and Technology for Brain Inspired Intelligence, Fudan University, Shanghai 200403, China
| | - Gustavo Deco
- Computational Neuroscience Group, Department of Information and Communication Technologies, Center for Brain and Cognition, Universitat Pompeu Fabra, Roc Boronat 138, Barcelona 08018, Spain.,Brain and Cognition, Pompeu Fabra University, Barcelona 08018, Spain.,Institució Catalana de la Recerca i Estudis Avançats (ICREA), Universitat Pompeu Fabra, Passeig Lluís Companys 23, Barcelona 08010, Spain
| | - Chu-Chung Huang
- Shanghai Key Laboratory of Brain Functional Genomics (Ministry of Education), Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, Shanghai 200602, China.,Shanghai Center for Brain Science and Brain-Inspired Technology, Shanghai 200602, China
| | - Jianfeng Feng
- Department of Computer Science, University of Warwick, Coventry CV4 7AL, United Kingdom.,Institute of Science and Technology for Brain Inspired Intelligence, Fudan University, Shanghai 200403, China
| |
Collapse
|
79
|
Zhou H, Deng J, Cai D, Lv X, Wu BM. Effects of Image Dataset Configuration on the Accuracy of Rice Disease Recognition Based on Convolution Neural Network. FRONTIERS IN PLANT SCIENCE 2022; 13:910878. [PMID: 35865283 PMCID: PMC9295741 DOI: 10.3389/fpls.2022.910878] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 05/10/2022] [Indexed: 06/02/2023]
Abstract
In recent years, the convolution neural network has been the most widely used deep learning algorithm in the field of plant disease diagnosis and has performed well in classification. However, in practice, there are still some specific issues that have not been paid adequate attention to. For instance, the same pathogen may cause similar or different symptoms when infecting plant leaves, while the same pathogen may cause similar or disparate symptoms on different parts of the plant. Therefore, questions come up naturally: should the images showing different symptoms of the same disease be in one class or two separate classes in the image database? Also, how will the different classification methods affect the results of image recognition? In this study, taking rice leaf blast and neck blast caused by Magnaporthe oryzae, and rice sheath blight caused by Rhizoctonia solani as examples, three experiments were designed to explore how database configuration affects recognition accuracy in recognizing different symptoms of the same disease on the same plant part, similar symptoms of the same disease on different parts, and different symptoms on different parts. The results suggested that when the symptoms of the same disease were the same or similar, no matter whether they were on the same plant part or not, training combined classes of these images can get better performance than training them separately. When the difference between symptoms was obvious, the classification was relatively easy, and both separate training and combined training could achieve relatively high recognition accuracy. The results also, to a certain extent, indicated that the greater the number of images in the training data set, the higher the average classification accuracy.
Collapse
|
80
|
Face identity coding in the deep neural network and primate brain. Commun Biol 2022; 5:611. [PMID: 35725902 PMCID: PMC9209415 DOI: 10.1038/s42003-022-03557-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Accepted: 06/01/2022] [Indexed: 01/01/2023] Open
Abstract
A central challenge in face perception research is to understand how neurons encode face identities. This challenge has not been met largely due to the lack of simultaneous access to the entire face processing neural network and the lack of a comprehensive multifaceted model capable of characterizing a large number of facial features. Here, we addressed this challenge by conducting in silico experiments using a pre-trained face recognition deep neural network (DNN) with a diverse array of stimuli. We identified a subset of DNN units selective to face identities, and these identity-selective units demonstrated generalized discriminability to novel faces. Visualization and manipulation of the network revealed the importance of identity-selective units in face recognition. Importantly, using our monkey and human single-neuron recordings, we directly compared the response of artificial units with real primate neurons to the same stimuli and found that artificial units shared a similar representation of facial features as primate neurons. We also observed a region-based feature coding mechanism in DNN units as in human neurons. Together, by directly linking between artificial and primate neural systems, our results shed light on how the primate brain performs face recognition tasks.
Collapse
|
81
|
Nicholson DA, Prinz AA. Could simplified stimuli change how the brain performs visual search tasks? A deep neural network study. J Vis 2022; 22:3. [PMID: 35675057 PMCID: PMC9187944 DOI: 10.1167/jov.22.7.3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Accepted: 05/04/2022] [Indexed: 11/24/2022] Open
Abstract
Visual search is a complex behavior influenced by many factors. To control for these factors, many studies use highly simplified stimuli. However, the statistics of these stimuli are very different from the statistics of the natural images that the human visual system is optimized by evolution and experience to perceive. Could this difference change search behavior? If so, simplified stimuli may contribute to effects typically attributed to cognitive processes, such as selective attention. Here we use deep neural networks to test how optimizing models for the statistics of one distribution of images constrains performance on a task using images from a different distribution. We train four deep neural network architectures on one of three source datasets-natural images, faces, and x-ray images-and then adapt them to a visual search task using simplified stimuli. This adaptation produces models that exhibit performance limitations similar to humans, whereas models trained on the search task alone exhibit no such limitations. However, we also find that deep neural networks trained to classify natural images exhibit similar limitations when adapted to a search task that uses a different set of natural images. Therefore, the distribution of data alone cannot explain this effect. We discuss how future work might integrate an optimization-based approach into existing models of visual search behavior.
Collapse
Affiliation(s)
- David A Nicholson
- Emory University, Department of Biology, O. Wayne Rollins Research Center, Atlanta, Georgia
| | - Astrid A Prinz
- Emory University, Department of Biology, O. Wayne Rollins Research Center, Atlanta, Georgia
| |
Collapse
|
82
|
Malhotra G, Dujmović M, Bowers JS. Feature blindness: A challenge for understanding and modelling visual object recognition. PLoS Comput Biol 2022; 18:e1009572. [PMID: 35560155 PMCID: PMC9132323 DOI: 10.1371/journal.pcbi.1009572] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 05/25/2022] [Accepted: 03/19/2022] [Indexed: 12/02/2022] Open
Abstract
Humans rely heavily on the shape of objects to recognise them. Recently, it has been argued that Convolutional Neural Networks (CNNs) can also show a shape-bias, provided their learning environment contains this bias. This has led to the proposal that CNNs provide good mechanistic models of shape-bias and, more generally, human visual processing. However, it is also possible that humans and CNNs show a shape-bias for very different reasons, namely, shape-bias in humans may be a consequence of architectural and cognitive constraints whereas CNNs show a shape-bias as a consequence of learning the statistics of the environment. We investigated this question by exploring shape-bias in humans and CNNs when they learn in a novel environment. We observed that, in this new environment, humans (i) focused on shape and overlooked many non-shape features, even when non-shape features were more diagnostic, (ii) learned based on only one out of multiple predictive features, and (iii) failed to learn when global features, such as shape, were absent. This behaviour contrasted with the predictions of a statistical inference model with no priors, showing the strong role that shape-bias plays in human feature selection. It also contrasted with CNNs that (i) preferred to categorise objects based on non-shape features, and (ii) increased reliance on these non-shape features as they became more predictive. This was the case even when the CNN was pre-trained to have a shape-bias and the convolutional backbone was frozen. These results suggest that shape-bias has a different source in humans and CNNs: while learning in CNNs is driven by the statistical properties of the environment, humans are highly constrained by their previous biases, which suggests that cognitive constraints play a key role in how humans learn to recognise novel objects. Any object consists of hundreds of visual features that can be used to recognise it. How do humans select which feature to use? Do we always choose features that are best at predicting the object? In a series of experiments using carefully designed stimuli, we find that humans frequently ignore many features that are clearly visible and highly predictive. This behaviour is statistically inefficient and we show that it contrasts with statistical inference models such as state-of-the-art neural networks. Unlike humans, these models learn to rely on the most predictive feature when trained on the same data. We argue that the reason underlying human behaviour may be a bias to look for features that are less hungry for cognitive resources and generalise better to novel instances. Models that incorporate cognitive constraints may not only allow us to better understand human vision but also help us develop machine learning models that are more robust to changes in incidental features of objects.
Collapse
Affiliation(s)
- Gaurav Malhotra
- School of Psychological Sciences, University of Bristol, Bristol, United Kingdom
- * E-mail:
| | - Marin Dujmović
- School of Psychological Sciences, University of Bristol, Bristol, United Kingdom
| | - Jeffrey S. Bowers
- School of Psychological Sciences, University of Bristol, Bristol, United Kingdom
| |
Collapse
|
83
|
Abstract
Three decades ago, Atick et al. suggested that human frequency sensitivity may emerge from the enhancement required for a more efficient analysis of retinal images. Here we reassess the relevance of low-level vision tasks in the explanation of the contrast sensitivity functions (CSFs) in light of 1) the current trend of using artificial neural networks for studying vision, and 2) the current knowledge of retinal image representations. As a first contribution, we show that a very popular type of convolutional neural networks (CNNs), called autoencoders, may develop human-like CSFs in the spatiotemporal and chromatic dimensions when trained to perform some basic low-level vision tasks (like retinal noise and optical blur removal), but not others (like chromatic) adaptation or pure reconstruction after simple bottlenecks). As an illustrative example, the best CNN (in the considered set of simple architectures for enhancement of the retinal signal) reproduces the CSFs with a root mean square error of 11% of the maximum sensitivity. As a second contribution, we provide experimental evidence of the fact that, for some functional goals (at low abstraction level), deeper CNNs that are better in reaching the quantitative goal are actually worse in replicating human-like phenomena (such as the CSFs). This low-level result (for the explored networks) is not necessarily in contradiction with other works that report advantages of deeper nets in modeling higher level vision goals. However, in line with a growing body of literature, our results suggests another word of caution about CNNs in vision science because the use of simplified units or unrealistic architectures in goal optimization may be a limitation for the modeling and understanding of human vision.
Collapse
Affiliation(s)
- Qiang Li
- Image Processing Lab, Parc Cientific, Universitat de Valéncia, Spain.,
| | - Alex Gomez-Villa
- Computer Vision Center, Universitat Autónoma de Barcelona, Spain.,
| | - Marcelo Bertalmío
- Instituto de Óptica, Spanish National Research Council (CSIC), Spain.,
| | - Jesús Malo
- Image Processing Lab, Parc Cientific, Universitat de Valéncia, Spain., http://isp.uv.es
| |
Collapse
|
84
|
Xu Q, Shen J, Ran X, Tang H, Pan G, Liu JK. Robust Transcoding Sensory Information With Neural Spikes. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:1935-1946. [PMID: 34665741 DOI: 10.1109/tnnls.2021.3107449] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Neural coding, including encoding and decoding, is one of the key problems in neuroscience for understanding how the brain uses neural signals to relate sensory perception and motor behaviors with neural systems. However, most of the existed studies only aim at dealing with the continuous signal of neural systems, while lacking a unique feature of biological neurons, termed spike, which is the fundamental information unit for neural computation as well as a building block for brain-machine interface. Aiming at these limitations, we propose a transcoding framework to encode multi-modal sensory information into neural spikes and then reconstruct stimuli from spikes. Sensory information can be compressed into 10% in terms of neural spikes, yet re-extract 100% of information by reconstruction. Our framework can not only feasibly and accurately reconstruct dynamical visual and auditory scenes, but also rebuild the stimulus patterns from functional magnetic resonance imaging (fMRI) brain activities. More importantly, it has a superb ability of noise immunity for various types of artificial noises and background signals. The proposed framework provides efficient ways to perform multimodal feature representation and reconstruction in a high-throughput fashion, with potential usage for efficient neuromorphic computing in a noisy environment.
Collapse
|
85
|
Charles Leek E, Leonardis A, Heinke D. Deep neural networks and image classification in biological vision. Vision Res 2022; 197:108058. [PMID: 35487146 DOI: 10.1016/j.visres.2022.108058] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Revised: 04/12/2022] [Accepted: 04/13/2022] [Indexed: 10/18/2022]
Abstract
In this paper we consider recent advances in the use of deep convolutional neural networks to understanding biological vision. We focus on claims about the plausibility of feedforward deep convolutional neural networks (fDCNNs) as models of image classification in the biological system. Despite the putative similarity of these networks to some properties of the biological vision system, and the remarkable levels of performance accuracy of some fDCNNs, we argue that their plausibility as a framework for understanding image classification remains unclear. We highlight two key issues that we suggest are relevant to the evaluation of any form of DNN used to examine biological vision: (1) Network transparency under analysis - that is, the challenge of understanding what networks do, and how they do it. (2) Identifying appropriate benchmarks for comparing network performance and the biological system using both quantitative and qualitative performance measures. We show that there are important divergences between fDCNNs and biological vision that reflect fundamental differences in computational architectures, and representational structures, supporting image classification in these networks and the biological system.
Collapse
Affiliation(s)
| | | | - Dietmar Heinke
- School of Computer Science, University of Birmingham, UK
| |
Collapse
|
86
|
Caucheteux C, King JR. Brains and algorithms partially converge in natural language processing. Commun Biol 2022; 5:134. [PMID: 35173264 PMCID: PMC8850612 DOI: 10.1038/s42003-022-03036-1] [Citation(s) in RCA: 51] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Accepted: 12/29/2021] [Indexed: 11/29/2022] Open
Abstract
Deep learning algorithms trained to predict masked words from large amount of text have recently been shown to generate activations similar to those of the human brain. However, what drives this similarity remains currently unknown. Here, we systematically compare a variety of deep language models to identify the computational principles that lead them to generate brain-like representations of sentences. Specifically, we analyze the brain responses to 400 isolated sentences in a large cohort of 102 subjects, each recorded for two hours with functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG). We then test where and when each of these algorithms maps onto the brain responses. Finally, we estimate how the architecture, training, and performance of these models independently account for the generation of brain-like representations. Our analyses reveal two main findings. First, the similarity between the algorithms and the brain primarily depends on their ability to predict words from context. Second, this similarity reveals the rise and maintenance of perceptual, lexical, and compositional representations within each cortical region. Overall, this study shows that modern language algorithms partially converge towards brain-like solutions, and thus delineates a promising path to unravel the foundations of natural language processing.
Collapse
Affiliation(s)
- Charlotte Caucheteux
- Facebook AI Research, Paris, France.
- Université Paris-Saclay, Inria, CEA, Palaiseau, France.
| | - Jean-Rémi King
- Facebook AI Research, Paris, France.
- École normale supérieure, PSL University, CNRS, Paris, France.
| |
Collapse
|
87
|
Alipour A, Beggs JM, Brown JW, James TW. A computational examination of the two-streams hypothesis: which pathway needs a longer memory? Cogn Neurodyn 2022; 16:149-165. [PMID: 35126775 PMCID: PMC8807798 DOI: 10.1007/s11571-021-09703-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Revised: 06/26/2021] [Accepted: 07/14/2021] [Indexed: 02/03/2023] Open
Abstract
The two visual streams hypothesis is a robust example of neural functional specialization that has inspired countless studies over the past four decades. According to one prominent version of the theory, the fundamental goal of the dorsal visual pathway is the transformation of retinal information for visually-guided motor behavior. To that end, the dorsal stream processes input using absolute (or veridical) metrics only when the movement is initiated, necessitating very little, or no, memory. Conversely, because the ventral visual pathway does not involve motor behavior (its output does not influence the real world), the ventral stream processes input using relative (or illusory) metrics and can accumulate or integrate sensory evidence over long time constants, which provides a substantial capacity for memory. In this study, we tested these relations between functional specialization, processing metrics, and memory by training identical recurrent neural networks to perform either a viewpoint-invariant object classification task or an orientation/size determination task. The former task relies on relative metrics, benefits from accumulating sensory evidence, and is usually attributed to the ventral stream. The latter task relies on absolute metrics, can be computed accurately in the moment, and is usually attributed to the dorsal stream. To quantify the amount of memory required for each task, we chose two types of neural network models. Using a long-short-term memory (LSTM) recurrent network, we found that viewpoint-invariant object categorization (object task) required a longer memory than orientation/size determination (orientation task). Additionally, to dissect this memory effect, we considered factors that contributed to longer memory in object tasks. First, we used two different sets of objects, one with self-occlusion of features and one without. Second, we defined object classes either strictly by visual feature similarity or (more liberally) by semantic label. The models required greater memory when features were self-occluded and when object classes were defined by visual feature similarity, showing that self-occlusion and visual similarity among object task samples are contributing to having a long memory. The same set of tasks modeled using modified leaky-integrator echo state recurrent networks (LiESN), however, did not replicate the results, except under some conditions. This may be because LiESNs cannot perform fine-grained memory adjustments due to their network-wide memory coefficient and fixed recurrent weights. In sum, the LSTM simulations suggest that longer memory is advantageous for performing viewpoint-invariant object classification (a putative ventral stream function) because it allows for interpolation of features across viewpoints. The results further suggest that orientation/size determination (a putative dorsal stream function) does not benefit from longer memory. These findings are consistent with the two visual streams theory of functional specialization. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s11571-021-09703-z.
Collapse
Affiliation(s)
- Abolfazl Alipour
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN USA
- Program in Neuroscience, Indiana University, Bloomington, IN USA
| | - John M Beggs
- Program in Neuroscience, Indiana University, Bloomington, IN USA
- Department of Physics, Indiana University, Bloomington, IN USA
| | - Joshua W Brown
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN USA
- Program in Neuroscience, Indiana University, Bloomington, IN USA
| | - Thomas W James
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN USA
- Program in Neuroscience, Indiana University, Bloomington, IN USA
| |
Collapse
|
88
|
|
89
|
Dado T, Güçlütürk Y, Ambrogioni L, Ras G, Bosch S, van Gerven M, Güçlü U. Hyperrealistic neural decoding for reconstructing faces from fMRI activations via the GAN latent space. Sci Rep 2022; 12:141. [PMID: 34997012 PMCID: PMC8741893 DOI: 10.1038/s41598-021-03938-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 11/16/2021] [Indexed: 11/24/2022] Open
Abstract
Neural decoding can be conceptualized as the problem of mapping brain responses back to sensory stimuli via a feature space. We introduce (i) a novel experimental paradigm that uses well-controlled yet highly naturalistic stimuli with a priori known feature representations and (ii) an implementation thereof for HYPerrealistic reconstruction of PERception (HYPER) of faces from brain recordings. To this end, we embrace the use of generative adversarial networks (GANs) at the earliest step of our neural decoding pipeline by acquiring fMRI data as participants perceive face images synthesized by the generator network of a GAN. We show that the latent vectors used for generation effectively capture the same defining stimulus properties as the fMRI measurements. As such, these latents (conditioned on the GAN) are used as the in-between feature representations underlying the perceived images that can be predicted in neural decoding for (re-)generation of the originally perceived stimuli, leading to the most accurate reconstructions of perception to date.
Collapse
Affiliation(s)
- Thirza Dado
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands.
| | - Yağmur Güçlütürk
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Luca Ambrogioni
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Gabriëlle Ras
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Sander Bosch
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Marcel van Gerven
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Umut Güçlü
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| |
Collapse
|
90
|
Ayzenberg V, Kamps FS, Dilks DD, Lourenco SF. Skeletal representations of shape in the human visual cortex. Neuropsychologia 2022; 164:108092. [PMID: 34801519 PMCID: PMC9840386 DOI: 10.1016/j.neuropsychologia.2021.108092] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Revised: 11/07/2021] [Accepted: 11/17/2021] [Indexed: 01/17/2023]
Abstract
Shape perception is crucial for object recognition. However, it remains unknown exactly how shape information is represented and used by the visual system. Here, we tested the hypothesis that the visual system represents object shape via a skeletal structure. Using functional magnetic resonance imaging (fMRI) and representational similarity analysis (RSA), we found that a model of skeletal similarity explained significant unique variance in the response profiles of V3 and LO. Moreover, the skeletal model remained predictive in these regions even when controlling for other models of visual similarity that approximate low-to high-level visual features (i.e., Gabor-jet, GIST, HMAX, and AlexNet), and across different surface forms, a manipulation that altered object contours while preserving the underlying skeleton. Together, these findings shed light on shape processing in human vision, as well as the computational properties of V3 and LO. We discuss how these regions may support two putative roles of shape skeletons: namely, perceptual organization and object recognition.
Collapse
Affiliation(s)
- Vladislav Ayzenberg
- Department of Psychology, Carnegie Mellon University, USA,Corresponding author: (V. Ayzenberg)
| | - Frederik S. Kamps
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, USA
| | | | - Stella F. Lourenco
- Department of Psychology, Emory University, USA,Corresponding author: (S.F. Lourenco)
| |
Collapse
|
91
|
Wammes J, Norman KA, Turk-Browne N. Increasing stimulus similarity drives nonmonotonic representational change in hippocampus. eLife 2022; 11:e68344. [PMID: 34989336 PMCID: PMC8735866 DOI: 10.7554/elife.68344] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Accepted: 08/09/2021] [Indexed: 12/16/2022] Open
Abstract
Studies of hippocampal learning have obtained seemingly contradictory results, with manipulations that increase coactivation of memories sometimes leading to differentiation of these memories, but sometimes not. These results could potentially be reconciled using the nonmonotonic plasticity hypothesis, which posits that representational change (memories moving apart or together) is a U-shaped function of the coactivation of these memories during learning. Testing this hypothesis requires manipulating coactivation over a wide enough range to reveal the full U-shape. To accomplish this, we used a novel neural network image synthesis procedure to create pairs of stimuli that varied parametrically in their similarity in high-level visual regions that provide input to the hippocampus. Sequences of these pairs were shown to human participants during high-resolution fMRI. As predicted, learning changed the representations of paired images in the dentate gyrus as a U-shaped function of image similarity, with neural differentiation occurring only for moderately similar images.
Collapse
Affiliation(s)
- Jeffrey Wammes
- Department of Psychology, Yale UniversityNew HavenUnited States
- Department of Psychology, Queen’s UniversityKingstonCanada
| | - Kenneth A Norman
- Department of Psychology, Princeton UniversityPrincetonUnited States
- Princeton Neuroscience Institute, Princeton UniversityPrincetonUnited States
| | | |
Collapse
|
92
|
Pramod RT, Arun SP. Improving Machine Vision Using Human Perceptual Representations: The Case of Planar Reflection Symmetry for Object Classification. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:228-241. [PMID: 32750809 PMCID: PMC7611439 DOI: 10.1109/tpami.2020.3008107] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Achieving human-like visual abilities is a holy grail for machine vision, yet precisely how insights from human vision can improve machines has remained unclear. Here, we demonstrate two key conceptual advances: First, we show that most machine vision models are systematically different from human object perception. To do so, we collected a large dataset of perceptual distances between isolated objects in humans and asked whether these perceptual data can be predicted by many common machine vision algorithms. We found that while the best algorithms explain ∼ 70 percent of the variance in the perceptual data, all the algorithms we tested make systematic errors on several types of objects. In particular, machine algorithms underestimated distances between symmetric objects compared to human perception. Second, we show that fixing these systematic biases can lead to substantial gains in classification performance. In particular, augmenting a state-of-the-art convolutional neural network with planar/reflection symmetry scores along multiple axes produced significant improvements in classification accuracy (1-10 percent) across categories. These results show that machine vision can be improved by discovering and fixing systematic differences from human vision.
Collapse
|
93
|
Kiat JE, Luck SJ, Beckner AG, Hayes TR, Pomaranski KI, Henderson JM, Oakes LM. Linking patterns of infant eye movements to a neural network model of the ventral stream using representational similarity analysis. Dev Sci 2022; 25:e13155. [PMID: 34240787 PMCID: PMC8639751 DOI: 10.1111/desc.13155] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 06/23/2021] [Accepted: 07/01/2021] [Indexed: 01/03/2023]
Abstract
Little is known about the development of higher-level areas of visual cortex during infancy, and even less is known about how the development of visually guided behavior is related to the different levels of the cortical processing hierarchy. As a first step toward filling these gaps, we used representational similarity analysis (RSA) to assess links between gaze patterns and a neural network model that captures key properties of the ventral visual processing stream. We recorded the eye movements of 4- to 12-month-old infants (N = 54) as they viewed photographs of scenes. For each infant, we calculated the similarity of the gaze patterns for each pair of photographs. We also analyzed the images using a convolutional neural network model in which the successive layers correspond approximately to the sequence of areas along the ventral stream. For each layer of the network, we calculated the similarity of the activation patterns for each pair of photographs, which was then compared with the infant gaze data. We found that the network layers corresponding to lower-level areas of visual cortex accounted for gaze patterns better in younger infants than in older infants, whereas the network layers corresponding to higher-level areas of visual cortex accounted for gaze patterns better in older infants than in younger infants. Thus, between 4 and 12 months, gaze becomes increasingly controlled by more abstract, higher-level representations. These results also demonstrate the feasibility of using RSA to link infant gaze behavior to neural network models. A video abstract of this article can be viewed at https://youtu.be/K5mF2Rw98Is.
Collapse
|
94
|
Abstract
Face-selective neurons are observed in the primate visual pathway and are considered as the basis of face detection in the brain. However, it has been debated as to whether this neuronal selectivity can arise innately or whether it requires training from visual experience. Here, using a hierarchical deep neural network model of the ventral visual stream, we suggest a mechanism in which face-selectivity arises in the complete absence of training. We found that units selective to faces emerge robustly in randomly initialized networks and that these units reproduce many characteristics observed in monkeys. This innate selectivity also enables the untrained network to perform face-detection tasks. Intriguingly, we observed that units selective to various non-face objects can also arise innately in untrained networks. Our results imply that the random feedforward connections in early, untrained deep neural networks may be sufficient for initializing primitive visual selectivity.
Collapse
|
95
|
Tuladhar A, Moore JA, Ismail Z, Forkert ND. Modeling Neurodegeneration in silico With Deep Learning. Front Neuroinform 2021; 15:748370. [PMID: 34867256 PMCID: PMC8640525 DOI: 10.3389/fninf.2021.748370] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 10/21/2021] [Indexed: 11/13/2022] Open
Abstract
Deep neural networks, inspired by information processing in the brain, can achieve human-like performance for various tasks. However, research efforts to use these networks as models of the brain have primarily focused on modeling healthy brain function so far. In this work, we propose a paradigm for modeling neural diseases in silico with deep learning and demonstrate its use in modeling posterior cortical atrophy (PCA), an atypical form of Alzheimer’s disease affecting the visual cortex. We simulated PCA in deep convolutional neural networks (DCNNs) trained for visual object recognition by randomly injuring connections between artificial neurons. Results showed that injured networks progressively lost their object recognition capability. Simulated PCA impacted learned representations hierarchically, as networks lost object-level representations before category-level representations. Incorporating this paradigm in computational neuroscience will be essential for developing in silico models of the brain and neurological diseases. The paradigm can be expanded to incorporate elements of neural plasticity and to other cognitive domains such as motor control, auditory cognition, language processing, and decision making.
Collapse
Affiliation(s)
- Anup Tuladhar
- Department of Radiology, University of Calgary, Calgary, AB, Canada.,Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
| | - Jasmine A Moore
- Department of Radiology, University of Calgary, Calgary, AB, Canada.,Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada.,Biomedical Engineering Program, University of Calgary, Calgary, AB, Canada
| | - Zahinoor Ismail
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada.,Department of Clinical Neurosciences, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, University of Calgary, Calgary, AB, Canada.,Department of Psychiatry, University of Calgary, Calgary, AB, Canada.,O'Brien Institute for Public Health, University of Calgary, Calgary, AB, Canada
| | - Nils D Forkert
- Department of Radiology, University of Calgary, Calgary, AB, Canada.,Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada.,Department of Clinical Neurosciences, University of Calgary, Calgary, AB, Canada.,Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| |
Collapse
|
96
|
Thompson JAF. Forms of explanation and understanding for neuroscience and artificial intelligence. J Neurophysiol 2021; 126:1860-1874. [PMID: 34644128 DOI: 10.1152/jn.00195.2021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Much of the controversy evoked by the use of deep neural networks as models of biological neural systems amount to debates over what constitutes scientific progress in neuroscience. To discuss what constitutes scientific progress, one must have a goal in mind (progress toward what?). One such long-term goal is to produce scientific explanations of intelligent capacities (e.g., object recognition, relational reasoning). I argue that the most pressing philosophical questions at the intersection of neuroscience and artificial intelligence are ultimately concerned with defining the phenomena to be explained and with what constitute valid explanations of such phenomena. I propose that a foundation in the philosophy of scientific explanation and understanding can scaffold future discussions about how an integrated science of intelligence might progress. Toward this vision, I review relevant theories of scientific explanation and discuss strategies for unifying the scientific goals of neuroscience and AI.
Collapse
Affiliation(s)
- Jessica A F Thompson
- Human Information Processing Lab, Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
97
|
Ernst MR, Burwick T, Triesch J. Recurrent processing improves occluded object recognition and gives rise to perceptual hysteresis. J Vis 2021; 21:6. [PMID: 34905052 PMCID: PMC8684313 DOI: 10.1167/jov.21.13.6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Over the past decades, object recognition has been predominantly studied and modelled as a feedforward process. This notion was supported by the fast response times in psychophysical and neurophysiological experiments and the recent success of deep feedforward neural networks for object recognition. Recently, however, this prevalent view has shifted and recurrent connectivity in the brain is now believed to contribute significantly to object recognition — especially under challenging conditions, including the recognition of partially occluded objects. Moreover, recurrent dynamics might be the key to understanding perceptual phenomena such as perceptual hysteresis. In this work we investigate if and how artificial neural networks can benefit from recurrent connections. We systematically compare architectures comprised of bottom-up, lateral, and top-down connections. To evaluate the impact of recurrent connections for occluded object recognition, we introduce three stereoscopic occluded object datasets, which span the range from classifying partially occluded hand-written digits to recognizing three-dimensional objects. We find that recurrent architectures perform significantly better than parameter-matched feedforward models. An analysis of the hidden representation of the models suggests that occluders are progressively discounted in later time steps of processing. We demonstrate that feedback can correct the initial misclassifications over time and that the recurrent dynamics lead to perceptual hysteresis. Overall, our results emphasize the importance of recurrent feedback for object recognition in difficult situations.
Collapse
Affiliation(s)
- Markus R Ernst
- Frankfurt Institute for Advanced Studies, Frankfurt am Main, Germany.,Goethe-Universität Frankfurt, Frankfurt am Main, Germany.,
| | - Thomas Burwick
- Frankfurt Institute for Advanced Studies, Frankfurt am Main, Germany.,Goethe-Universität Frankfurt, Frankfurt am Main, Germany.,
| | - Jochen Triesch
- Frankfurt Institute for Advanced Studies, Frankfurt am Main, Germany.,Goethe-Universität Frankfurt, Frankfurt am Main, Germany., https://www.fias.science/en/fellows/detail/triesch-jochen/
| |
Collapse
|
98
|
Hennig JA, Oby ER, Losey DM, Batista AP, Yu BM, Chase SM. How learning unfolds in the brain: toward an optimization view. Neuron 2021; 109:3720-3735. [PMID: 34648749 PMCID: PMC8639641 DOI: 10.1016/j.neuron.2021.09.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 08/25/2021] [Accepted: 09/02/2021] [Indexed: 12/17/2022]
Abstract
How do changes in the brain lead to learning? To answer this question, consider an artificial neural network (ANN), where learning proceeds by optimizing a given objective or cost function. This "optimization framework" may provide new insights into how the brain learns, as many idiosyncratic features of neural activity can be recapitulated by an ANN trained to perform the same task. Nevertheless, there are key features of how neural population activity changes throughout learning that cannot be readily explained in terms of optimization and are not typically features of ANNs. Here we detail three of these features: (1) the inflexibility of neural variability throughout learning, (2) the use of multiple learning processes even during simple tasks, and (3) the presence of large task-nonspecific activity changes. We propose that understanding the role of these features in the brain will be key to describing biological learning using an optimization framework.
Collapse
Affiliation(s)
- Jay A Hennig
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA; Center for the Neural Basis of Cognition, Pittsburgh, PA, USA; Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA.
| | - Emily R Oby
- Center for the Neural Basis of Cognition, Pittsburgh, PA, USA; Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA, USA
| | - Darby M Losey
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA; Center for the Neural Basis of Cognition, Pittsburgh, PA, USA; Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Aaron P Batista
- Center for the Neural Basis of Cognition, Pittsburgh, PA, USA; Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA, USA
| | - Byron M Yu
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA; Center for the Neural Basis of Cognition, Pittsburgh, PA, USA; Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, USA; Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Steven M Chase
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA; Center for the Neural Basis of Cognition, Pittsburgh, PA, USA; Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
99
|
Battleday RM, Peterson JC, Griffiths TL. From convolutional neural networks to models of higher-level cognition (and back again). Ann N Y Acad Sci 2021; 1505:55-78. [PMID: 33754368 PMCID: PMC9292363 DOI: 10.1111/nyas.14593] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 02/12/2021] [Accepted: 02/26/2021] [Indexed: 11/29/2022]
Abstract
The remarkable successes of convolutional neural networks (CNNs) in modern computer vision are by now well known, and they are increasingly being explored as computational models of the human visual system. In this paper, we ask whether CNNs might also provide a basis for modeling higher-level cognition, focusing on the core phenomena of similarity and categorization. The most important advance comes from the ability of CNNs to learn high-dimensional representations of complex naturalistic images, substantially extending the scope of traditional cognitive models that were previously only evaluated with simple artificial stimuli. In all cases, the most successful combinations arise when CNN representations are used with cognitive models that have the capacity to transform them to better fit human behavior. One consequence of these insights is a toolkit for the integration of cognitively motivated constraints back into CNN training paradigms in computer vision and machine learning, and we review cases where this leads to improved performance. A second consequence is a roadmap for how CNNs and cognitive models can be more fully integrated in the future, allowing for flexible end-to-end algorithms that can learn representations from data while still retaining the structured behavior characteristic of human cognition.
Collapse
Affiliation(s)
| | | | - Thomas L. Griffiths
- Department of Computer SciencePrinceton UniversityPrincetonNew Jersey
- Department of PsychologyPrinceton UniversityPrincetonNew Jersey
| |
Collapse
|
100
|
van Dyck LE, Kwitt R, Denzler SJ, Gruber WR. Comparing Object Recognition in Humans and Deep Convolutional Neural Networks-An Eye Tracking Study. Front Neurosci 2021; 15:750639. [PMID: 34690686 PMCID: PMC8526843 DOI: 10.3389/fnins.2021.750639] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Accepted: 09/16/2021] [Indexed: 11/30/2022] Open
Abstract
Deep convolutional neural networks (DCNNs) and the ventral visual pathway share vast architectural and functional similarities in visual challenges such as object recognition. Recent insights have demonstrated that both hierarchical cascades can be compared in terms of both exerted behavior and underlying activation. However, these approaches ignore key differences in spatial priorities of information processing. In this proof-of-concept study, we demonstrate a comparison of human observers (N = 45) and three feedforward DCNNs through eye tracking and saliency maps. The results reveal fundamentally different resolutions in both visualization methods that need to be considered for an insightful comparison. Moreover, we provide evidence that a DCNN with biologically plausible receptive field sizes called vNet reveals higher agreement with human viewing behavior as contrasted with a standard ResNet architecture. We find that image-specific factors such as category, animacy, arousal, and valence have a direct link to the agreement of spatial object recognition priorities in humans and DCNNs, while other measures such as difficulty and general image properties do not. With this approach, we try to open up new perspectives at the intersection of biological and computer vision research.
Collapse
Affiliation(s)
- Leonard Elia van Dyck
- Department of Psychology, University of Salzburg, Salzburg, Austria.,Center for Cognitive Neuroscience, University of Salzburg, Salzburg, Austria
| | - Roland Kwitt
- Department of Computer Science, University of Salzburg, Salzburg, Austria
| | | | - Walter Roland Gruber
- Department of Psychology, University of Salzburg, Salzburg, Austria.,Center for Cognitive Neuroscience, University of Salzburg, Salzburg, Austria
| |
Collapse
|