1
|
Subramoney A, Bellec G, Scherr F, Legenstein R, Maass W. Fast learning without synaptic plasticity in spiking neural networks. Sci Rep 2024; 14:8557. [PMID: 38609429 PMCID: PMC11015027 DOI: 10.1038/s41598-024-55769-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 02/27/2024] [Indexed: 04/14/2024] Open
Abstract
Spiking neural networks are of high current interest, both from the perspective of modelling neural networks of the brain and for porting their fast learning capability and energy efficiency into neuromorphic hardware. But so far we have not been able to reproduce fast learning capabilities of the brain in spiking neural networks. Biological data suggest that a synergy of synaptic plasticity on a slow time scale with network dynamics on a faster time scale is responsible for fast learning capabilities of the brain. We show here that a suitable orchestration of this synergy between synaptic plasticity and network dynamics does in fact reproduce fast learning capabilities of generic recurrent networks of spiking neurons. This points to the important role of recurrent connections in spiking networks, since these are necessary for enabling salient network dynamics. We show more specifically that the proposed synergy enables synaptic weights to encode more general information such as priors and task structures, since moment-to-moment processing of new information can be delegated to the network dynamics.
Collapse
Affiliation(s)
- Anand Subramoney
- Institute for Theoretical Computer Science, Graz University of Technology, Graz, Austria
- Department of Computer Science, Royal Holloway University of London, Egham, UK
| | - Guillaume Bellec
- Institute for Theoretical Computer Science, Graz University of Technology, Graz, Austria
- Laboratory of Computational Neuroscience, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Franz Scherr
- Institute for Theoretical Computer Science, Graz University of Technology, Graz, Austria
| | - Robert Legenstein
- Institute for Theoretical Computer Science, Graz University of Technology, Graz, Austria
| | - Wolfgang Maass
- Institute for Theoretical Computer Science, Graz University of Technology, Graz, Austria.
| |
Collapse
|
2
|
Scott DN, Frank MJ. Adaptive control of synaptic plasticity integrates micro- and macroscopic network function. Neuropsychopharmacology 2023; 48:121-144. [PMID: 36038780 PMCID: PMC9700774 DOI: 10.1038/s41386-022-01374-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 06/23/2022] [Accepted: 06/24/2022] [Indexed: 11/09/2022]
Abstract
Synaptic plasticity configures interactions between neurons and is therefore likely to be a primary driver of behavioral learning and development. How this microscopic-macroscopic interaction occurs is poorly understood, as researchers frequently examine models within particular ranges of abstraction and scale. Computational neuroscience and machine learning models offer theoretically powerful analyses of plasticity in neural networks, but results are often siloed and only coarsely linked to biology. In this review, we examine connections between these areas, asking how network computations change as a function of diverse features of plasticity and vice versa. We review how plasticity can be controlled at synapses by calcium dynamics and neuromodulatory signals, the manifestation of these changes in networks, and their impacts in specialized circuits. We conclude that metaplasticity-defined broadly as the adaptive control of plasticity-forges connections across scales by governing what groups of synapses can and can't learn about, when, and to what ends. The metaplasticity we discuss acts by co-opting Hebbian mechanisms, shifting network properties, and routing activity within and across brain systems. Asking how these operations can go awry should also be useful for understanding pathology, which we address in the context of autism, schizophrenia and Parkinson's disease.
Collapse
Affiliation(s)
- Daniel N Scott
- Cognitive Linguistic, and Psychological Sciences, Brown University, Providence, RI, USA.
- Carney Institute for Brain Science, Brown University, Providence, RI, USA.
| | - Michael J Frank
- Cognitive Linguistic, and Psychological Sciences, Brown University, Providence, RI, USA.
- Carney Institute for Brain Science, Brown University, Providence, RI, USA.
| |
Collapse
|
3
|
Whelan MT, Jimenez-Rodriguez A, Prescott TJ, Vasilaki E. A robotic model of hippocampal reverse replay for reinforcement learning. BIOINSPIRATION & BIOMIMETICS 2022; 18:015007. [PMID: 36327454 DOI: 10.1088/1748-3190/ac9ffc] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Accepted: 11/03/2022] [Indexed: 06/16/2023]
Abstract
Hippocampal reverse replay, a phenomenon in which recently active hippocampal cells reactivate in the reverse order, is thought to contribute to learning, particularly reinforcement learning (RL), in animals. Here, we present a novel computational model which exploits reverse replay to improve stability and performance on a homing task. The model takes inspiration from the hippocampal-striatal network, and learning occurs via a three-factor RL rule. To augment this model with hippocampal reverse replay, we derived a policy gradient learning rule that associates place-cell activity with responses in cells representing actions and a supervised learning rule of the same form, interpreting the replay activity as a 'target' frequency. We evaluated the model using a simulated robot spatial navigation task inspired by the Morris water maze. Results suggest that reverse replay can improve performance stability over multiple trials. Our model exploits reverse reply as an additional source for propagating information about desirable synaptic changes, reducing the requirements for long-time scales in eligibility traces combined with low learning rates. We conclude that reverse replay can positively contribute to RL, although less stable learning is possible in its absence. Analogously, we postulate that reverse replay may enhance RL in the mammalian hippocampal-striatal system rather than provide its core mechanism.
Collapse
Affiliation(s)
- Matthew T Whelan
- Department of Computer Science, The University of Sheffield, Sheffield, United Kingdom
- Sheffield Robotics, Sheffield, United Kingdom
| | - Alejandro Jimenez-Rodriguez
- Department of Computer Science, The University of Sheffield, Sheffield, United Kingdom
- Sheffield Robotics, Sheffield, United Kingdom
| | - Tony J Prescott
- Department of Computer Science, The University of Sheffield, Sheffield, United Kingdom
- Sheffield Robotics, Sheffield, United Kingdom
| | - Eleni Vasilaki
- Department of Computer Science, The University of Sheffield, Sheffield, United Kingdom
- Sheffield Robotics, Sheffield, United Kingdom
| |
Collapse
|
4
|
Manneschi L, Gigante G, Vasilaki E, Del Giudice P. Signal neutrality, scalar property, and collapsing boundaries as consequences of a learned multi-timescale strategy. PLoS Comput Biol 2022; 18:e1009393. [PMID: 35930590 PMCID: PMC9462745 DOI: 10.1371/journal.pcbi.1009393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Revised: 09/09/2022] [Accepted: 06/08/2022] [Indexed: 11/18/2022] Open
Abstract
We postulate that three fundamental elements underlie a decision making process: perception of time passing, information processing in multiple timescales and reward maximisation. We build a simple reinforcement learning agent upon these principles that we train on a random dot-like task. Our results, similar to the experimental data, demonstrate three emerging signatures. (1) signal neutrality: insensitivity to the signal coherence in the interval preceding the decision. (2) Scalar property: the mean of the response times varies widely for different signal coherences, yet the shape of the distributions stays almost unchanged. (3) Collapsing boundaries: the “effective” decision-making boundary changes over time in a manner reminiscent of the theoretical optimal. Removing the perception of time or the multiple timescales from the model does not preserve the distinguishing signatures. Our results suggest an alternative explanation for signal neutrality. We propose that it is not part of motor planning. It is part of the decision-making process and emerges from information processing on multiple timescales.
Collapse
Affiliation(s)
- Luca Manneschi
- Department of Computer Science, University of Sheffield, Sheffield, United Kingdom
- * E-mail:
| | - Guido Gigante
- Istituto Superiore di Sanità, Rome, Italy
- INFN, Sezione di Roma, Rome, Italy
| | - Eleni Vasilaki
- Department of Computer Science, University of Sheffield, Sheffield, United Kingdom
- Institute of Neuroinformatics, University of Zurich and ETH Zurich, Switzerland
| | - Paolo Del Giudice
- Istituto Superiore di Sanità, Rome, Italy
- INFN, Sezione di Roma, Rome, Italy
| |
Collapse
|
5
|
Yang S, Gao T, Wang J, Deng B, Azghadi MR, Lei T, Linares-Barranco B. Self-Adaptive Multicompartment: A Unified Self-Adaptive Multicompartmental Spiking Neuron Model for Learning With Working Memory. Front Neurosci 2022; 16:850945. [PMID: 35527819 PMCID: PMC9074872 DOI: 10.3389/fnins.2022.850945] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2022] [Accepted: 03/15/2022] [Indexed: 11/13/2022] Open
Abstract
Working memory is a fundamental feature of biological brains for perception, cognition, and learning. In addition, learning with working memory, which has been show in conventional artificial intelligence systems through recurrent neural networks, is instrumental to advanced cognitive intelligence. However, it is hard to endow a simple neuron model with working memory, and to understand the biological mechanisms that have resulted in such a powerful ability at the neuronal level. This article presents a novel self-adaptive multicompartment spiking neuron model, referred to as SAM, for spike-based learning with working memory. SAM integrates four major biological principles including sparse coding, dendritic non-linearity, intrinsic self-adaptive dynamics, and spike-driven learning. We first describe SAM’s design and explore the impacts of critical parameters on its biological dynamics. We then use SAM to build spiking networks to accomplish several different tasks including supervised learning of the MNIST dataset using sequential spatiotemporal encoding, noisy spike pattern classification, sparse coding during pattern classification, spatiotemporal feature detection, meta-learning with working memory applied to a navigation task and the MNIST classification task, and working memory for spatiotemporal learning. Our experimental results highlight the energy efficiency and robustness of SAM in these wide range of challenging tasks. The effects of SAM model variations on its working memory are also explored, hoping to offer insight into the biological mechanisms underlying working memory in the brain. The SAM model is the first attempt to integrate the capabilities of spike-driven learning and working memory in a unified single neuron with multiple timescale dynamics. The competitive performance of SAM could potentially contribute to the development of efficient adaptive neuromorphic computing systems for various applications from robotics to edge computing.
Collapse
Affiliation(s)
- Shuangming Yang
- School of Electrical and Information Engineering, Tianjin University, Tianjin, China
- *Correspondence: Shuangming Yang,
| | - Tian Gao
- School of Electrical and Information Engineering, Tianjin University, Tianjin, China
| | - Jiang Wang
- School of Electrical and Information Engineering, Tianjin University, Tianjin, China
| | - Bin Deng
- School of Electrical and Information Engineering, Tianjin University, Tianjin, China
| | | | - Tao Lei
- School of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi’an, China
- Tao Lei,
| | | |
Collapse
|
6
|
Jordan J, Schmidt M, Senn W, Petrovici MA. Evolving interpretable plasticity for spiking networks. eLife 2021; 10:66273. [PMID: 34709176 PMCID: PMC8553337 DOI: 10.7554/elife.66273] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Accepted: 08/19/2021] [Indexed: 11/25/2022] Open
Abstract
Continuous adaptation allows survival in an ever-changing world. Adjustments in the synaptic coupling strength between neurons are essential for this capability, setting us apart from simpler, hard-wired organisms. How these changes can be mathematically described at the phenomenological level, as so-called ‘plasticity rules’, is essential both for understanding biological information processing and for developing cognitively performant artificial systems. We suggest an automated approach for discovering biophysically plausible plasticity rules based on the definition of task families, associated performance measures and biophysical constraints. By evolving compact symbolic expressions, we ensure the discovered plasticity rules are amenable to intuitive understanding, fundamental for successful communication and human-guided generalization. We successfully apply our approach to typical learning scenarios and discover previously unknown mechanisms for learning efficiently from rewards, recover efficient gradient-descent methods for learning from target signals, and uncover various functionally equivalent STDP-like rules with tuned homeostatic mechanisms. Our brains are incredibly adaptive. Every day we form memories, acquire new knowledge or refine existing skills. This stands in contrast to our current computers, which typically can only perform pre-programmed actions. Our own ability to adapt is the result of a process called synaptic plasticity, in which the strength of the connections between neurons can change. To better understand brain function and build adaptive machines, researchers in neuroscience and artificial intelligence (AI) are modeling the underlying mechanisms. So far, most work towards this goal was guided by human intuition – that is, by the strategies scientists think are most likely to succeed. Despite the tremendous progress, this approach has two drawbacks. First, human time is limited and expensive. And second, researchers have a natural – and reasonable – tendency to incrementally improve upon existing models, rather than starting from scratch. Jordan, Schmidt et al. have now developed a new approach based on ‘evolutionary algorithms’. These computer programs search for solutions to problems by mimicking the process of biological evolution, such as the concept of survival of the fittest. The approach exploits the increasing availability of cheap but powerful computers. Compared to its predecessors (or indeed human brains), it also uses search strategies that are less biased by previous models. The evolutionary algorithms were presented with three typical learning scenarios. In the first, the computer had to spot a repeating pattern in a continuous stream of input without receiving feedback on how well it was doing. In the second scenario, the computer received virtual rewards whenever it behaved in the desired manner – an example of reinforcement learning. Finally, in the third ‘supervised learning’ scenario, the computer was told exactly how much its behavior deviated from the desired behavior. For each of these scenarios, the evolutionary algorithms were able to discover mechanisms of synaptic plasticity to solve the new task successfully. Using evolutionary algorithms to study how computers ‘learn’ will provide new insights into how brains function in health and disease. It could also pave the way for developing intelligent machines that can better adapt to the needs of their users.
Collapse
Affiliation(s)
- Jakob Jordan
- Department of Physiology, University of Bern, Bern, Switzerland
| | - Maximilian Schmidt
- Ascent Robotics, Tokyo, Japan.,RIKEN Center for Brain Science, Tokyo, Japan
| | - Walter Senn
- Department of Physiology, University of Bern, Bern, Switzerland
| | - Mihai A Petrovici
- Department of Physiology, University of Bern, Bern, Switzerland.,Kirchhoff-Institute for Physics, Heidelberg University, Heidelberg, Germany
| |
Collapse
|
7
|
Zambrano D, Roelfsema PR, Bohte S. Learning continuous-time working memory tasks with on-policy neural reinforcement learning. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.11.072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
8
|
|
9
|
Towards spike-based machine intelligence with neuromorphic computing. Nature 2019; 575:607-617. [PMID: 31776490 DOI: 10.1038/s41586-019-1677-2] [Citation(s) in RCA: 292] [Impact Index Per Article: 58.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Accepted: 07/09/2019] [Indexed: 11/08/2022]
Abstract
Guided by brain-like 'spiking' computational frameworks, neuromorphic computing-brain-inspired computing for machine intelligence-promises to realize artificial intelligence while reducing the energy requirements of computing platforms. This interdisciplinary field began with the implementation of silicon circuits for biological neural routines, but has evolved to encompass the hardware implementation of algorithms with spike-based encoding and event-driven representations. Here we provide an overview of the developments in neuromorphic computing for both algorithms and hardware and highlight the fundamentals of learning and hardware frameworks. We discuss the main challenges and the future prospects of neuromorphic computing, with emphasis on algorithm-hardware codesign.
Collapse
|
10
|
Jordan J, Weidel P, Morrison A. A Closed-Loop Toolchain for Neural Network Simulations of Learning Autonomous Agents. Front Comput Neurosci 2019; 13:46. [PMID: 31427939 PMCID: PMC6687756 DOI: 10.3389/fncom.2019.00046] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Accepted: 06/25/2019] [Indexed: 11/17/2022] Open
Abstract
Neural network simulation is an important tool for generating and evaluating hypotheses on the structure, dynamics, and function of neural circuits. For scientific questions addressing organisms operating autonomously in their environments, in particular where learning is involved, it is crucial to be able to operate such simulations in a closed-loop fashion. In such a set-up, the neural agent continuously receives sensory stimuli from the environment and provides motor signals that manipulate the environment or move the agent within it. So far, most studies requiring such functionality have been conducted with custom simulation scripts and manually implemented tasks. This makes it difficult for other researchers to reproduce and build upon previous work and nearly impossible to compare the performance of different learning architectures. In this work, we present a novel approach to solve this problem, connecting benchmark tools from the field of machine learning and state-of-the-art neural network simulators from computational neuroscience. The resulting toolchain enables researchers in both fields to make use of well-tested high-performance simulation software supporting biologically plausible neuron, synapse and network models and allows them to evaluate and compare their approach on the basis of standardized environments with various levels of complexity. We demonstrate the functionality of the toolchain by implementing a neuronal actor-critic architecture for reinforcement learning in the NEST simulator and successfully training it on two different environments from the OpenAI Gym. We compare its performance to a previously suggested neural network model of reinforcement learning in the basal ganglia and a generic Q-learning algorithm.
Collapse
Affiliation(s)
- Jakob Jordan
- Department of Physiology, University of Bern, Bern, Switzerland
- Institute of Neuroscience and Medicine (INM-6) & Institute for Advanced Simulation (IAS-6) & JARA-Institute Brain Structure Function Relationship (JBI 1/INM-10), Research Centre Jülich, Jülich, Germany
| | - Philipp Weidel
- Institute of Neuroscience and Medicine (INM-6) & Institute for Advanced Simulation (IAS-6) & JARA-Institute Brain Structure Function Relationship (JBI 1/INM-10), Research Centre Jülich, Jülich, Germany
- aiCTX, Zurich, Switzerland
- Department of Computer Science, RWTH Aachen University, Aachen, Germany
| | - Abigail Morrison
- Institute of Neuroscience and Medicine (INM-6) & Institute for Advanced Simulation (IAS-6) & JARA-Institute Brain Structure Function Relationship (JBI 1/INM-10), Research Centre Jülich, Jülich, Germany
- Faculty of Psychology, Institute of Cognitive Neuroscience, Ruhr-University Bochum, Bochum, Germany
| |
Collapse
|
11
|
Bing Z, Baumann I, Jiang Z, Huang K, Cai C, Knoll A. Supervised Learning in SNN via Reward-Modulated Spike-Timing-Dependent Plasticity for a Target Reaching Vehicle. Front Neurorobot 2019; 13:18. [PMID: 31130854 PMCID: PMC6509616 DOI: 10.3389/fnbot.2019.00018] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Accepted: 04/15/2019] [Indexed: 11/16/2022] Open
Abstract
Spiking neural networks (SNNs) offer many advantages over traditional artificial neural networks (ANNs) such as biological plausibility, fast information processing, and energy efficiency. Although SNNs have been used to solve a variety of control tasks using the Spike-Timing-Dependent Plasticity (STDP) learning rule, existing solutions usually involve hard-coded network architectures solving specific tasks rather than solving different kinds of tasks generally. This results in neglecting one of the biggest advantages of ANNs, i.e., being general-purpose and easy-to-use due to their simple network architecture, which usually consists of an input layer, one or multiple hidden layers and an output layer. This paper addresses the problem by introducing an end-to-end learning approach of spiking neural networks constructed with one hidden layer and reward-modulated Spike-Timing-Dependent Plasticity (R-STDP) synapses in an all-to-all fashion. We use the supervised reward-modulated Spike-Timing-Dependent-Plasticity learning rule to train two different SNN-based sub-controllers to replicate a desired obstacle avoiding and goal approaching behavior, provided by pre-generated datasets. Together they make up a target-reaching controller, which is used to control a simulated mobile robot to reach a target area while avoiding obstacles in its path. We demonstrate the performance and effectiveness of our trained SNNs to achieve target reaching tasks in different unknown scenarios.
Collapse
Affiliation(s)
- Zhenshan Bing
- Chair of Robotics, Artificial Intelligence and Embedded Systems, Department of Informatics, Technical University of Munich, Munich, Germany
| | - Ivan Baumann
- Chair of Robotics, Artificial Intelligence and Embedded Systems, Department of Informatics, Technical University of Munich, Munich, Germany
| | - Zhuangyi Jiang
- Chair of Robotics, Artificial Intelligence and Embedded Systems, Department of Informatics, Technical University of Munich, Munich, Germany
| | - Kai Huang
- Department of Data and Computer Science, Sun Yat-Sen University, Guangzhou, China.,Peng Cheng Laboratory, Shenzhen, China
| | - Caixia Cai
- Chair of Robotics, Artificial Intelligence and Embedded Systems, Department of Informatics, Technical University of Munich, Munich, Germany
| | - Alois Knoll
- Chair of Robotics, Artificial Intelligence and Embedded Systems, Department of Informatics, Technical University of Munich, Munich, Germany
| |
Collapse
|
12
|
Wunderlich T, Kungl AF, Müller E, Hartel A, Stradmann Y, Aamir SA, Grübl A, Heimbrecht A, Schreiber K, Stöckel D, Pehle C, Billaudelle S, Kiene G, Mauch C, Schemmel J, Meier K, Petrovici MA. Demonstrating Advantages of Neuromorphic Computation: A Pilot Study. Front Neurosci 2019; 13:260. [PMID: 30971881 PMCID: PMC6444279 DOI: 10.3389/fnins.2019.00260] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Accepted: 03/05/2019] [Indexed: 11/26/2022] Open
Abstract
Neuromorphic devices represent an attempt to mimic aspects of the brain's architecture and dynamics with the aim of replicating its hallmark functional capabilities in terms of computational power, robust learning and energy efficiency. We employ a single-chip prototype of the BrainScaleS 2 neuromorphic system to implement a proof-of-concept demonstration of reward-modulated spike-timing-dependent plasticity in a spiking network that learns to play a simplified version of the Pong video game by smooth pursuit. This system combines an electronic mixed-signal substrate for emulating neuron and synapse dynamics with an embedded digital processor for on-chip learning, which in this work also serves to simulate the virtual environment and learning agent. The analog emulation of neuronal membrane dynamics enables a 1000-fold acceleration with respect to biological real-time, with the entire chip operating on a power budget of 57 mW. Compared to an equivalent simulation using state-of-the-art software, the on-chip emulation is at least one order of magnitude faster and three orders of magnitude more energy-efficient. We demonstrate how on-chip learning can mitigate the effects of fixed-pattern noise, which is unavoidable in analog substrates, while making use of temporal variability for action exploration. Learning compensates imperfections of the physical substrate, as manifested in neuronal parameter variability, by adapting synaptic weights to match respective excitability of individual neurons.
Collapse
Affiliation(s)
- Timo Wunderlich
- Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
| | - Akos F Kungl
- Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
| | - Eric Müller
- Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
| | - Andreas Hartel
- Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
| | - Yannik Stradmann
- Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
| | - Syed Ahmed Aamir
- Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
| | - Andreas Grübl
- Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
| | - Arthur Heimbrecht
- Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
| | - Korbinian Schreiber
- Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
| | - David Stöckel
- Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
| | - Christian Pehle
- Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
| | - Sebastian Billaudelle
- Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
| | - Gerd Kiene
- Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
| | - Christian Mauch
- Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
| | - Johannes Schemmel
- Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
| | - Karlheinz Meier
- Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
| | - Mihai A Petrovici
- Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany.,Department of Physiology, University of Bern, Bern, Switzerland
| |
Collapse
|
13
|
Mozafari M, Kheradpisheh SR, Masquelier T, Nowzari-Dalini A, Ganjtabesh M. First-Spike-Based Visual Categorization Using Reward-Modulated STDP. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:6178-6190. [PMID: 29993898 DOI: 10.1109/tnnls.2018.2826721] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Reinforcement learning (RL) has recently regained popularity with major achievements such as beating the European game of Go champion. Here, for the first time, we show that RL can be used efficiently to train a spiking neural network (SNN) to perform object recognition in natural images without using an external classifier. We used a feedforward convolutional SNN and a temporal coding scheme where the most strongly activated neurons fire first, while less activated ones fire later, or not at all. In the highest layers, each neuron was assigned to an object category, and it was assumed that the stimulus category was the category of the first neuron to fire. If this assumption was correct, the neuron was rewarded, i.e., spike-timing-dependent plasticity (STDP) was applied, which reinforced the neuron's selectivity. Otherwise, anti-STDP was applied, which encouraged the neuron to learn something else. As demonstrated on various image data sets (Caltech, ETH-80, and NORB), this reward-modulated STDP (R-STDP) approach has extracted particularly discriminative visual features, whereas classic unsupervised STDP extracts any feature that consistently repeats. As a result, R-STDP has outperformed STDP on these data sets. Furthermore, R-STDP is suitable for online learning and can adapt to drastic changes such as label permutations. Finally, it is worth mentioning that both feature extraction and classification were done with spikes, using at most one spike per neuron. Thus, the network is hardware friendly and energy efficient.
Collapse
|
14
|
John RA, Tiwari N, Yaoyi C, Tiwari N, Kulkarni M, Nirmal A, Nguyen AC, Basu A, Mathews N. Ultralow Power Dual-Gated Subthreshold Oxide Neuristors: An Enabler for Higher Order Neuronal Temporal Correlations. ACS NANO 2018; 12:11263-11273. [PMID: 30395439 DOI: 10.1021/acsnano.8b05903] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Inspired by neural computing, the pursuit of ultralow power neuromorphic architectures with highly distributed memory and parallel processing capability has recently gained more traction. However, emulation of biological signal processing via artificial neuromorphic architectures does not exploit the immense interplay between local activities and global neuromodulations observed in biological neural networks and hence are unable to mimic complex biologically plausible adaptive functions like heterosynaptic plasticity and homeostasis. Here, we demonstrate emulation of complex neuronal behaviors like heterosynaptic plasticity, homeostasis, association, correlation, and coincidence in a single neuristor via a dual-gated architecture. This multiple gating approach allows one gate to capture the effect of local activity correlations and the second gate to represent global neuromodulations, allowing additional modulations which augment their plasticity, enabling higher order temporal correlations at a unitary level. Moreover, the dual-gate operation extends the available dynamic range of synaptic conductance while maintaining symmetry in the weight-update operation, expanding the number of accessible memory states. Finally, operating neuristors in the subthreshold regime enable synaptic weight changes with high gain while maintaining ultralow power consumption of the order of femto-Joules.
Collapse
Affiliation(s)
- Rohit Abraham John
- School of Materials Science and Engineering , Nanyang Technological University , 50 Nanyang Avenue , Singapore 639798
| | - Nidhi Tiwari
- Energy Research Institute at NTU (ERI@N) , Nanyang Technological University , Singapore 637553
| | - Chen Yaoyi
- School of Materials Science and Engineering , Nanyang Technological University , 50 Nanyang Avenue , Singapore 639798
| | - Naveen Tiwari
- School of Materials Science and Engineering , Nanyang Technological University , 50 Nanyang Avenue , Singapore 639798
| | - Mohit Kulkarni
- School of Materials Science and Engineering , Nanyang Technological University , 50 Nanyang Avenue , Singapore 639798
| | - Amoolya Nirmal
- School of Materials Science and Engineering , Nanyang Technological University , 50 Nanyang Avenue , Singapore 639798
| | - Anh Chien Nguyen
- School of Materials Science and Engineering , Nanyang Technological University , 50 Nanyang Avenue , Singapore 639798
| | - Arindam Basu
- School of Electrical and Electronic Engineering , Nanyang Technological University , 50 Nanyang Avenue , Singapore 639798
| | - Nripan Mathews
- School of Materials Science and Engineering , Nanyang Technological University , 50 Nanyang Avenue , Singapore 639798
- Energy Research Institute at NTU (ERI@N) , Nanyang Technological University , Singapore 637553
| |
Collapse
|
15
|
Richards BA, Lillicrap TP. Dendritic solutions to the credit assignment problem. Curr Opin Neurobiol 2018; 54:28-36. [PMID: 30205266 DOI: 10.1016/j.conb.2018.08.003] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Revised: 07/19/2018] [Accepted: 08/07/2018] [Indexed: 11/27/2022]
Abstract
Guaranteeing that synaptic plasticity leads to effective learning requires a means for assigning credit to each neuron for its contribution to behavior. The 'credit assignment problem' refers to the fact that credit assignment is non-trivial in hierarchical networks with multiple stages of processing. One difficulty is that if credit signals are integrated with other inputs, then it is hard for synaptic plasticity rules to distinguish credit-related activity from non-credit-related activity. A potential solution is to use the spatial layout and non-linear properties of dendrites to distinguish credit signals from other inputs. In cortical pyramidal neurons, evidence hints that top-down feedback signals are integrated in the distal apical dendrites and have a distinct impact on spike-firing and synaptic plasticity. This suggests that the distal apical dendrites of pyramidal neurons help the brain to solve the credit assignment problem.
Collapse
Affiliation(s)
- Blake A Richards
- Department of Biological Sciences, University of Toronto Scarborough, Toronto, ON, Canada; Department of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada; Learning in Machines and Brains Program, Canadian Institute for Advanced Research, Toronto, ON, Canada
| | | |
Collapse
|
16
|
Cope AJ, Vasilaki E, Minors D, Sabo C, Marshall JAR, Barron AB. Abstract concept learning in a simple neural network inspired by the insect brain. PLoS Comput Biol 2018; 14:e1006435. [PMID: 30222735 PMCID: PMC6160224 DOI: 10.1371/journal.pcbi.1006435] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Revised: 09/27/2018] [Accepted: 08/15/2018] [Indexed: 12/24/2022] Open
Abstract
The capacity to learn abstract concepts such as 'sameness' and 'difference' is considered a higher-order cognitive function, typically thought to be dependent on top-down neocortical processing. It is therefore surprising that honey bees apparantly have this capacity. Here we report a model of the structures of the honey bee brain that can learn sameness and difference, as well as a range of complex and simple associative learning tasks. Our model is constrained by the known connections and properties of the mushroom body, including the protocerebral tract, and provides a good fit to the learning rates and performances of real bees in all tasks, including learning sameness and difference. The model proposes a novel mechanism for learning the abstract concepts of 'sameness' and 'difference' that is compatible with the insect brain, and is not dependent on top-down or executive control processing.
Collapse
Affiliation(s)
- Alex J. Cope
- Department of Computer Science, University of Sheffield, Sheffield, UK
- Sheffield Robotics, University of Sheffield, Sheffield, UK
| | - Eleni Vasilaki
- Department of Computer Science, University of Sheffield, Sheffield, UK
- Sheffield Robotics, University of Sheffield, Sheffield, UK
| | - Dorian Minors
- Department of Biological Sciences, Macquarie University, Sydney, Australia
| | - Chelsea Sabo
- Department of Computer Science, University of Sheffield, Sheffield, UK
- Sheffield Robotics, University of Sheffield, Sheffield, UK
| | - James A. R. Marshall
- Department of Computer Science, University of Sheffield, Sheffield, UK
- Sheffield Robotics, University of Sheffield, Sheffield, UK
| | - Andrew B. Barron
- Department of Biological Sciences, Macquarie University, Sydney, Australia
| |
Collapse
|
17
|
Martinolli M, Gerstner W, Gilra A. Multi-Timescale Memory Dynamics Extend Task Repertoire in a Reinforcement Learning Network With Attention-Gated Memory. Front Comput Neurosci 2018; 12:50. [PMID: 30061819 PMCID: PMC6055065 DOI: 10.3389/fncom.2018.00050] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Accepted: 06/18/2018] [Indexed: 11/13/2022] Open
Abstract
The interplay of reinforcement learning and memory is at the core of several recent neural network models, such as the Attention-Gated MEmory Tagging (AuGMEnT) model. While successful at various animal learning tasks, we find that the AuGMEnT network is unable to cope with some hierarchical tasks, where higher-level stimuli have to be maintained over a long time, while lower-level stimuli need to be remembered and forgotten over a shorter timescale. To overcome this limitation, we introduce a hybrid AuGMEnT, with leaky (or short-timescale) and non-leaky (or long-timescale) memory units, that allows the exchange of low-level information while maintaining high-level one. We test the performance of the hybrid AuGMEnT network on two cognitive reference tasks, sequence prediction and 12AX.
Collapse
Affiliation(s)
- Marco Martinolli
- School of Computer and Communication Sciences, School of Life Sciences, Brain-Mind Institute, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Wulfram Gerstner
- School of Computer and Communication Sciences, School of Life Sciences, Brain-Mind Institute, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Aditya Gilra
- School of Computer and Communication Sciences, School of Life Sciences, Brain-Mind Institute, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| |
Collapse
|
18
|
Gerstner W, Lehmann M, Liakoni V, Corneil D, Brea J. Eligibility Traces and Plasticity on Behavioral Time Scales: Experimental Support of NeoHebbian Three-Factor Learning Rules. Front Neural Circuits 2018; 12:53. [PMID: 30108488 PMCID: PMC6079224 DOI: 10.3389/fncir.2018.00053] [Citation(s) in RCA: 112] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Accepted: 06/19/2018] [Indexed: 11/13/2022] Open
Abstract
Most elementary behaviors such as moving the arm to grasp an object or walking into the next room to explore a museum evolve on the time scale of seconds; in contrast, neuronal action potentials occur on the time scale of a few milliseconds. Learning rules of the brain must therefore bridge the gap between these two different time scales. Modern theories of synaptic plasticity have postulated that the co-activation of pre- and postsynaptic neurons sets a flag at the synapse, called an eligibility trace, that leads to a weight change only if an additional factor is present while the flag is set. This third factor, signaling reward, punishment, surprise, or novelty, could be implemented by the phasic activity of neuromodulators or specific neuronal inputs signaling special events. While the theoretical framework has been developed over the last decades, experimental evidence in support of eligibility traces on the time scale of seconds has been collected only during the last few years. Here we review, in the context of three-factor rules of synaptic plasticity, four key experiments that support the role of synaptic eligibility traces in combination with a third factor as a biological implementation of neoHebbian three-factor learning rules.
Collapse
Affiliation(s)
- Wulfram Gerstner
- School of Computer Science and School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | | | | | | | | |
Collapse
|
19
|
Bing Z, Meschede C, Röhrbein F, Huang K, Knoll AC. A Survey of Robotics Control Based on Learning-Inspired Spiking Neural Networks. Front Neurorobot 2018; 12:35. [PMID: 30034334 PMCID: PMC6043678 DOI: 10.3389/fnbot.2018.00035] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Accepted: 06/14/2018] [Indexed: 11/30/2022] Open
Abstract
Biological intelligence processes information using impulses or spikes, which makes those living creatures able to perceive and act in the real world exceptionally well and outperform state-of-the-art robots in almost every aspect of life. To make up the deficit, emerging hardware technologies and software knowledge in the fields of neuroscience, electronics, and computer science have made it possible to design biologically realistic robots controlled by spiking neural networks (SNNs), inspired by the mechanism of brains. However, a comprehensive review on controlling robots based on SNNs is still missing. In this paper, we survey the developments of the past decade in the field of spiking neural networks for control tasks, with particular focus on the fast emerging robotics-related applications. We first highlight the primary impetuses of SNN-based robotics tasks in terms of speed, energy efficiency, and computation capabilities. We then classify those SNN-based robotic applications according to different learning rules and explicate those learning rules with their corresponding robotic applications. We also briefly present some existing platforms that offer an interaction between SNNs and robotics simulations for exploration and exploitation. Finally, we conclude our survey with a forecast of future challenges and some associated potential research topics in terms of controlling robots based on SNNs.
Collapse
Affiliation(s)
- Zhenshan Bing
- Chair of Robotics, Artificial Intelligence and Real-time Systems, Department of Informatics, Technical University of Munich, Munich, Germany
| | - Claus Meschede
- Chair of Robotics, Artificial Intelligence and Real-time Systems, Department of Informatics, Technical University of Munich, Munich, Germany
| | - Florian Röhrbein
- Chair of Robotics, Artificial Intelligence and Real-time Systems, Department of Informatics, Technical University of Munich, Munich, Germany
| | - Kai Huang
- Department of Data and Computer Science, Sun Yat-Sen University, Guangzhou, China
| | - Alois C. Knoll
- Chair of Robotics, Artificial Intelligence and Real-time Systems, Department of Informatics, Technical University of Munich, Munich, Germany
| |
Collapse
|
20
|
Zannone S, Brzosko Z, Paulsen O, Clopath C. Acetylcholine-modulated plasticity in reward-driven navigation: a computational study. Sci Rep 2018; 8:9486. [PMID: 29930322 PMCID: PMC6013476 DOI: 10.1038/s41598-018-27393-2] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Accepted: 05/29/2018] [Indexed: 11/08/2022] Open
Abstract
Neuromodulation plays a fundamental role in the acquisition of new behaviours. In previous experimental work, we showed that acetylcholine biases hippocampal synaptic plasticity towards depression, and the subsequent application of dopamine can retroactively convert depression into potentiation. We also demonstrated that incorporating this sequentially neuromodulated Spike-Timing-Dependent Plasticity (STDP) rule in a network model of navigation yields effective learning of changing reward locations. Here, we employ computational modelling to further characterize the effects of cholinergic depression on behaviour. We find that acetylcholine, by allowing learning from negative outcomes, enhances exploration over the action space. We show that this results in a variety of effects, depending on the structure of the model, the environment and the task. Interestingly, sequentially neuromodulated STDP also yields flexible learning, surpassing the performance of other reward-modulated plasticity rules.
Collapse
Affiliation(s)
- Sara Zannone
- Imperial College London, Department of Bioengineering, South Kensington Campus, London, United Kingdom
| | - Zuzanna Brzosko
- University of Cambridge, Department of Physiology, Development and Neuroscience, Physiological Laboratory, Cambridge, United Kingdom
| | - Ole Paulsen
- University of Cambridge, Department of Physiology, Development and Neuroscience, Physiological Laboratory, Cambridge, United Kingdom
| | - Claudia Clopath
- Imperial College London, Department of Bioengineering, South Kensington Campus, London, United Kingdom.
| |
Collapse
|
21
|
Gönner L, Vitay J, Hamker FH. Predictive Place-Cell Sequences for Goal-Finding Emerge from Goal Memory and the Cognitive Map: A Computational Model. Front Comput Neurosci 2017; 11:84. [PMID: 29075187 PMCID: PMC5643423 DOI: 10.3389/fncom.2017.00084] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Accepted: 09/01/2017] [Indexed: 01/19/2023] Open
Abstract
Hippocampal place-cell sequences observed during awake immobility often represent previous experience, suggesting a role in memory processes. However, recent reports of goals being overrepresented in sequential activity suggest a role in short-term planning, although a detailed understanding of the origins of hippocampal sequential activity and of its functional role is still lacking. In particular, it is unknown which mechanism could support efficient planning by generating place-cell sequences biased toward known goal locations, in an adaptive and constructive fashion. To address these questions, we propose a model of spatial learning and sequence generation as interdependent processes, integrating cortical contextual coding, synaptic plasticity and neuromodulatory mechanisms into a map-based approach. Following goal learning, sequential activity emerges from continuous attractor network dynamics biased by goal memory inputs. We apply Bayesian decoding on the resulting spike trains, allowing a direct comparison with experimental data. Simulations show that this model (1) explains the generation of never-experienced sequence trajectories in familiar environments, without requiring virtual self-motion signals, (2) accounts for the bias in place-cell sequences toward goal locations, (3) highlights their utility in flexible route planning, and (4) provides specific testable predictions.
Collapse
Affiliation(s)
- Lorenz Gönner
- Artificial Intelligence, Department of Computer Science, Technische Universität Chemnitz, Chemnitz, Germany
| | - Julien Vitay
- Artificial Intelligence, Department of Computer Science, Technische Universität Chemnitz, Chemnitz, Germany
| | - Fred H Hamker
- Artificial Intelligence, Department of Computer Science, Technische Universität Chemnitz, Chemnitz, Germany.,Bernstein Center Computational Neuroscience, Humboldt-Universität Berlin, Berlin, Germany
| |
Collapse
|
22
|
Sanda P, Skorheim S, Bazhenov M. Multi-layer network utilizing rewarded spike time dependent plasticity to learn a foraging task. PLoS Comput Biol 2017; 13:e1005705. [PMID: 28961245 PMCID: PMC5636167 DOI: 10.1371/journal.pcbi.1005705] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Revised: 10/11/2017] [Accepted: 07/26/2017] [Indexed: 12/01/2022] Open
Abstract
Neural networks with a single plastic layer employing reward modulated spike time dependent plasticity (STDP) are capable of learning simple foraging tasks. Here we demonstrate advanced pattern discrimination and continuous learning in a network of spiking neurons with multiple plastic layers. The network utilized both reward modulated and non-reward modulated STDP and implemented multiple mechanisms for homeostatic regulation of synaptic efficacy, including heterosynaptic plasticity, gain control, output balancing, activity normalization of rewarded STDP and hard limits on synaptic strength. We found that addition of a hidden layer of neurons employing non-rewarded STDP created neurons that responded to the specific combinations of inputs and thus performed basic classification of the input patterns. When combined with a following layer of neurons implementing rewarded STDP, the network was able to learn, despite the absence of labeled training data, discrimination between rewarding patterns and the patterns designated as punishing. Synaptic noise allowed for trial-and-error learning that helped to identify the goal-oriented strategies which were effective in task solving. The study predicts a critical set of properties of the spiking neuronal network with STDP that was sufficient to solve a complex foraging task involving pattern classification and decision making. This study explores how intelligent behavior emerges from the basic principles known at the cellular level of biological neuronal network dynamics. Compared to the approaches used in the artificial intelligence community, we applied biologically realistic modeling of neuronal dynamics and plasticity. The building blocks of the model are spiking neurons, spike-time dependent plasticity (STDP) and homeostatic rules, known experimentally, which are shown to play a fundamental role in both keeping the network stable and capable of continous learning. Our study predicts that a combination of these principles makes possible a foraging behavior in a previously unknown environment, including pattern classification to distinct between environment shapes which are rewarded and those which are punished and decision making to select the optimal strategy to acquire the maximal number of the rewarded elements. To solve this complex task we used multi-layer neuronal processing that implemented pattern generalization by unsupervised STDP at the earlier processing step, as commonly observed in the animal and human sensory processing, followed by reinforcement learning at the later steps. In the model, the intelligent behavior emerged spontaneously due to the network organization implementing both local unsupervised plasticity and reward feedback resulting from a successful behavior in the environment.
Collapse
Affiliation(s)
- Pavel Sanda
- Department of Medicine, University of California, San Diego, La Jolla, California, United States of America
| | - Steven Skorheim
- Information and Systems Sciences Lab, HRL Laboratories, LLC, Malibu, California, United States of America
| | - Maxim Bazhenov
- Department of Medicine, University of California, San Diego, La Jolla, California, United States of America
- * E-mail:
| |
Collapse
|
23
|
Pande S, Morgan F, Krewer F, Harkin J, McDaid L, McGinley B. Rapid application prototyping for hardware modular spiking neural network architectures. Neural Comput Appl 2017. [DOI: 10.1007/s00521-015-2136-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
24
|
Brzosko Z, Zannone S, Schultz W, Clopath C, Paulsen O. Sequential neuromodulation of Hebbian plasticity offers mechanism for effective reward-based navigation. eLife 2017; 6. [PMID: 28691903 PMCID: PMC5546805 DOI: 10.7554/elife.27756] [Citation(s) in RCA: 61] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2017] [Accepted: 07/07/2017] [Indexed: 11/14/2022] Open
Abstract
Spike timing-dependent plasticity (STDP) is under neuromodulatory control, which is correlated with distinct behavioral states. Previously, we reported that dopamine, a reward signal, broadens the time window for synaptic potentiation and modulates the outcome of hippocampal STDP even when applied after the plasticity induction protocol (Brzosko et al., 2015). Here, we demonstrate that sequential neuromodulation of STDP by acetylcholine and dopamine offers an efficacious model of reward-based navigation. Specifically, our experimental data in mouse hippocampal slices show that acetylcholine biases STDP toward synaptic depression, whilst subsequent application of dopamine converts this depression into potentiation. Incorporating this bidirectional neuromodulation-enabled correlational synaptic learning rule into a computational model yields effective navigation toward changing reward locations, as in natural foraging behavior. Thus, temporally sequenced neuromodulation of STDP enables associations to be made between actions and outcomes and also provides a possible mechanism for aligning the time scales of cellular and behavioral learning. DOI:http://dx.doi.org/10.7554/eLife.27756.001
Collapse
Affiliation(s)
- Zuzanna Brzosko
- Department of Physiology, Development and Neuroscience, Physiological Laboratory, Cambridge, United Kingdom
| | - Sara Zannone
- Department of Bioengineering, Imperial College London, South Kensington Campus, London, United Kingdom
| | - Wolfram Schultz
- Department of Physiology, Development and Neuroscience, Physiological Laboratory, Cambridge, United Kingdom
| | - Claudia Clopath
- Department of Bioengineering, Imperial College London, South Kensington Campus, London, United Kingdom
| | - Ole Paulsen
- Department of Physiology, Development and Neuroscience, Physiological Laboratory, Cambridge, United Kingdom
| |
Collapse
|
25
|
Rasmussen D, Voelker A, Eliasmith C. A neural model of hierarchical reinforcement learning. PLoS One 2017; 12:e0180234. [PMID: 28683111 PMCID: PMC5500327 DOI: 10.1371/journal.pone.0180234] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2016] [Accepted: 06/12/2017] [Indexed: 11/19/2022] Open
Abstract
We develop a novel, biologically detailed neural model of reinforcement learning (RL) processes in the brain. This model incorporates a broad range of biological features that pose challenges to neural RL, such as temporally extended action sequences, continuous environments involving unknown time delays, and noisy/imprecise computations. Most significantly, we expand the model into the realm of hierarchical reinforcement learning (HRL), which divides the RL process into a hierarchy of actions at different levels of abstraction. Here we implement all the major components of HRL in a neural model that captures a variety of known anatomical and physiological properties of the brain. We demonstrate the performance of the model in a range of different environments, in order to emphasize the aim of understanding the brain’s general reinforcement learning ability. These results show that the model compares well to previous modelling work and demonstrates improved performance as a result of its hierarchical ability. We also show that the model’s behaviour is consistent with available data on human hierarchical RL, and generate several novel predictions.
Collapse
Affiliation(s)
| | - Aaron Voelker
- Centre for Theoretical Neuroscience, University of Waterloo, Waterloo, ON, Canada
| | - Chris Eliasmith
- Applied Brain Research, Inc., Waterloo, ON, Canada
- Centre for Theoretical Neuroscience, University of Waterloo, Waterloo, ON, Canada
| |
Collapse
|
26
|
A computational model of the integration of landmarks and motion in the insect central complex. PLoS One 2017; 12:e0172325. [PMID: 28241061 PMCID: PMC5328262 DOI: 10.1371/journal.pone.0172325] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2016] [Accepted: 02/02/2017] [Indexed: 11/19/2022] Open
Abstract
The insect central complex (CX) is an enigmatic structure whose computational function has evaded inquiry, but has been implicated in a wide range of behaviours. Recent experimental evidence from the fruit fly (Drosophila melanogaster) and the cockroach (Blaberus discoidalis) has demonstrated the existence of neural activity corresponding to the animal's orientation within a virtual arena (a neural 'compass'), and this provides an insight into one component of the CX structure. There are two key features of the compass activity: an offset between the angle represented by the compass and the true angular position of visual features in the arena, and the remapping of the 270° visual arena onto an entire circle of neurons in the compass. Here we present a computational model which can reproduce this experimental evidence in detail, and predicts the computational mechanisms that underlie the data. We predict that both the offset and remapping of the fly's orientation onto the neural compass can be explained by plasticity in the synaptic weights between segments of the visual field and the neurons representing orientation. Furthermore, we predict that this learning is reliant on the existence of neural pathways that detect rotational motion across the whole visual field and uses this rotation signal to drive the rotation of activity in a neural ring attractor. Our model also reproduces the 'transitioning' between visual landmarks seen when rotationally symmetric landmarks are presented. This model can provide the basis for further investigation into the role of the central complex, which promises to be a key structure for understanding insect behaviour, as well as suggesting approaches towards creating fully autonomous robotic agents.
Collapse
|
27
|
Building functional networks of spiking model neurons. Nat Neurosci 2016; 19:350-5. [PMID: 26906501 DOI: 10.1038/nn.4241] [Citation(s) in RCA: 89] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2015] [Accepted: 01/11/2016] [Indexed: 12/14/2022]
Abstract
Most of the networks used by computer scientists and many of those studied by modelers in neuroscience represent unit activities as continuous variables. Neurons, however, communicate primarily through discontinuous spiking. We review methods for transferring our ability to construct interesting networks that perform relevant tasks from the artificial continuous domain to more realistic spiking network models. These methods raise a number of issues that warrant further theoretical and experimental study.
Collapse
|
28
|
Frémaux N, Gerstner W. Neuromodulated Spike-Timing-Dependent Plasticity, and Theory of Three-Factor Learning Rules. Front Neural Circuits 2016; 9:85. [PMID: 26834568 PMCID: PMC4717313 DOI: 10.3389/fncir.2015.00085] [Citation(s) in RCA: 138] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2015] [Accepted: 12/14/2015] [Indexed: 11/13/2022] Open
Abstract
Classical Hebbian learning puts the emphasis on joint pre- and postsynaptic activity, but neglects the potential role of neuromodulators. Since neuromodulators convey information about novelty or reward, the influence of neuromodulators on synaptic plasticity is useful not just for action learning in classical conditioning, but also to decide "when" to create new memories in response to a flow of sensory stimuli. In this review, we focus on timing requirements for pre- and postsynaptic activity in conjunction with one or several phasic neuromodulatory signals. While the emphasis of the text is on conceptual models and mathematical theories, we also discuss some experimental evidence for neuromodulation of Spike-Timing-Dependent Plasticity. We highlight the importance of synaptic mechanisms in bridging the temporal gap between sensory stimulation and neuromodulatory signals, and develop a framework for a class of neo-Hebbian three-factor learning rules that depend on presynaptic activity, postsynaptic variables as well as the influence of neuromodulators.
Collapse
Affiliation(s)
- Nicolas Frémaux
- School of Computer Science and Brain Mind Institute, School of Life Sciences, École Polytechnique Fédérale de Lausanne Lausanne, Switzerland
| | - Wulfram Gerstner
- School of Computer Science and Brain Mind Institute, School of Life Sciences, École Polytechnique Fédérale de Lausanne Lausanne, Switzerland
| |
Collapse
|
29
|
Brosch T, Neumann H, Roelfsema PR. Reinforcement Learning of Linking and Tracing Contours in Recurrent Neural Networks. PLoS Comput Biol 2015; 11:e1004489. [PMID: 26496502 PMCID: PMC4619762 DOI: 10.1371/journal.pcbi.1004489] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2015] [Accepted: 08/05/2015] [Indexed: 11/30/2022] Open
Abstract
The processing of a visual stimulus can be subdivided into a number of stages. Upon stimulus presentation there is an early phase of feedforward processing where the visual information is propagated from lower to higher visual areas for the extraction of basic and complex stimulus features. This is followed by a later phase where horizontal connections within areas and feedback connections from higher areas back to lower areas come into play. In this later phase, image elements that are behaviorally relevant are grouped by Gestalt grouping rules and are labeled in the cortex with enhanced neuronal activity (object-based attention in psychology). Recent neurophysiological studies revealed that reward-based learning influences these recurrent grouping processes, but it is not well understood how rewards train recurrent circuits for perceptual organization. This paper examines the mechanisms for reward-based learning of new grouping rules. We derive a learning rule that can explain how rewards influence the information flow through feedforward, horizontal and feedback connections. We illustrate the efficiency with two tasks that have been used to study the neuronal correlates of perceptual organization in early visual cortex. The first task is called contour-integration and demands the integration of collinear contour elements into an elongated curve. We show how reward-based learning causes an enhancement of the representation of the to-be-grouped elements at early levels of a recurrent neural network, just as is observed in the visual cortex of monkeys. The second task is curve-tracing where the aim is to determine the endpoint of an elongated curve composed of connected image elements. If trained with the new learning rule, neural networks learn to propagate enhanced activity over the curve, in accordance with neurophysiological data. We close the paper with a number of model predictions that can be tested in future neurophysiological and computational studies. Our experience with the visual world allows us to group image elements that belong to the same perceptual object and to segregate them from other objects and the background. If subjects learn to group contour elements, this experience influences neuronal activity in early visual cortical areas, including the primary visual cortex (V1). Learning presumably depends on alterations in the pattern of connections within and between areas of the visual cortex. However, the processes that control changes in connectivity are not well understood. Here we present the first computational model that can train a neural network to integrate collinear contour elements into elongated curves and to trace a curve through the visual field. The new learning algorithm trains fully recurrent neural networks, provided the connectivity causes the networks to reach a stable state. The model reproduces the behavioral performance of monkeys trained in these tasks and explains the patterns of neuronal activity in the visual cortex that emerge during learning, which is remarkable because the only feedback for the model is a reward for successful trials. We discuss a number of the model predictions that can be tested in future neuroscientific work.
Collapse
Affiliation(s)
- Tobias Brosch
- University of Ulm, Institute of Neural Information Processing, Ulm, Germany
| | - Heiko Neumann
- University of Ulm, Institute of Neural Information Processing, Ulm, Germany
- * E-mail:
| | - Pieter R. Roelfsema
- Department of Vision & Cognition, Netherlands Institute for Neuroscience (KNAW), Amsterdam, The Netherlands
- Department of Integrative Neurophysiology, Center for Neurogenomics and Cognitive Research, VU University, Amsterdam, The Netherlands
- Psychiatry Department, Academic Medical Center, Amsterdam, The Netherlands
| |
Collapse
|
30
|
Gehring TV, Luksys G, Sandi C, Vasilaki E. Detailed classification of swimming paths in the Morris Water Maze: multiple strategies within one trial. Sci Rep 2015; 5:14562. [PMID: 26423140 PMCID: PMC4589698 DOI: 10.1038/srep14562] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2015] [Accepted: 08/26/2015] [Indexed: 10/29/2022] Open
Abstract
The Morris Water Maze is a widely used task in studies of spatial learning with rodents. Classical performance measures of animals in the Morris Water Maze include the escape latency, and the cumulative distance to the platform. Other methods focus on classifying trajectory patterns to stereotypical classes representing different animal strategies. However, these approaches typically consider trajectories as a whole, and as a consequence they assign one full trajectory to one class, whereas animals often switch between these strategies, and their corresponding classes, within a single trial. To this end, we take a different approach: we look for segments of diverse animal behaviour within one trial and employ a semi-automated classification method for identifying the various strategies exhibited by the animals within a trial. Our method allows us to reveal significant and systematic differences in the exploration strategies of two animal groups (stressed, non-stressed), that would be unobserved by earlier methods.
Collapse
Affiliation(s)
- Tiago V Gehring
- Department of Computer Science, University of Sheffield, Sheffield, UK
| | - Gediminas Luksys
- Division of Cognitive Neuroscience, University of Basel, Basel, Switzerland
| | - Carmen Sandi
- Laboratory of Behavioral Genetics, Brain Mind Institute, EPFL, Lausanne, Switzerland
| | - Eleni Vasilaki
- Department of Computer Science, University of Sheffield, Sheffield, UK.,Theoretical Neurobiology and Neuroengineering Lab, University of Antwerp, Wilrijk, Belgium.,INSIGNEO Institute for in Silico Medicine, University of Sheffield, Sheffield, UK
| |
Collapse
|
31
|
Rasku J, Pyykkö I, Levo H, Kentala E, Manchaiah V. Disease Profiling for Computerized Peer Support of Ménière's Disease. JMIR Rehabil Assist Technol 2015; 2:e9. [PMID: 28582248 PMCID: PMC5454554 DOI: 10.2196/rehab.4109] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Revised: 06/16/2015] [Accepted: 07/12/2015] [Indexed: 01/09/2023] Open
Abstract
Background Peer support is an emerging form of person-driven active health care. Chronic conditions such as Ménière’s disease (a disorder of the inner ear) need continuing rehabilitation and support that is beyond the scope of routine clinical medical practice. Hence, peer-support programs can be helpful in supplementing some of the rehabilitation aspects. Objective The aim of this study was to design a computerized data collection system for the peer support of Menière’s disease that is capable in profiling the subject for diagnosis and in assisting with problem solving. Methods The expert program comprises several data entries focusing on symptoms, activity limitations, participation restrictions, quality of life, attitude and personality trait, and an evaluation of disease-specific impact. Data was collected from 740 members of the Finnish Ménière’s Federation and utilized in the construction and evaluation of the program. Results The program verifies the diagnosis of a person by using an expert system, and the inference engine selects 50 cases with matched symptom severity by using a nearest neighbor algorithm. These cases are then used as a reference group to compare with the person’s attitude, sense of coherence, and anxiety. The program provides feedback for the person and uses this information to guide the person through the problem-solving process. Conclusions This computer-based peer-support program is the first example of an advanced computer-oriented approach using artificial intelligence, both in the profiling of the disease and in profiling the person’s complaints for hearing loss, tinnitus, and vertigo.
Collapse
Affiliation(s)
- Jyrki Rasku
- School of Information Sciences, Tampere University, Tampere, Finland.,Hearing and Balance Research Unit, Department of Otorhinolaryngology, Tampere University, Tampere, Finland
| | - Ilmari Pyykkö
- Hearing and Balance Research Unit, Department of Otorhinolaryngology, Tampere University, Tampere, Finland
| | - Hilla Levo
- Department of Otolaryngology, University of Helsinki, Helsinki, Finland
| | - Erna Kentala
- Department of Otolaryngology, University of Helsinki, Helsinki, Finland
| | - Vinaya Manchaiah
- Department of Speech and Hearing Sciences, Lamar University, Beaumont, TX, United States.,The Swedish Institute for Disability Research, Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden.,Audiology India, Mysore, India
| |
Collapse
|
32
|
Barron AB, Gurney KN, Meah LFS, Vasilaki E, Marshall JAR. Decision-making and action selection in insects: inspiration from vertebrate-based theories. Front Behav Neurosci 2015; 9:216. [PMID: 26347627 PMCID: PMC4539514 DOI: 10.3389/fnbeh.2015.00216] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2015] [Accepted: 07/30/2015] [Indexed: 11/13/2022] Open
Abstract
Effective decision-making, one of the most crucial functions of the brain, entails the analysis of sensory information and the selection of appropriate behavior in response to stimuli. Here, we consider the current state of knowledge on the mechanisms of decision-making and action selection in the insect brain, with emphasis on the olfactory processing system. Theoretical and computational models of decision-making emphasize the importance of using inhibitory connections to couple evidence-accumulating pathways; this coupling allows for effective discrimination between competing alternatives and thus enables a decision maker to reach a stable unitary decision. Theory also shows that the coupling of pathways can be implemented using a variety of different mechanisms and vastly improves the performance of decision-making systems. The vertebrate basal ganglia appear to resolve stable action selection by being a point of convergence for multiple excitatory and inhibitory inputs such that only one possible response is selected and all other alternatives are suppressed. Similar principles appear to operate within the insect brain. The insect lateral protocerebrum (LP) serves as a point of convergence for multiple excitatory and inhibitory channels of olfactory information to effect stable decision and action selection, at least for olfactory information. The LP is a rather understudied region of the insect brain, yet this premotor region may be key to effective resolution of action section. We argue that it may be beneficial to use models developed to explore the operation of the vertebrate brain as inspiration when considering action selection in the invertebrate domain. Such an approach may facilitate the proposal of new hypotheses and furthermore frame experimental studies for how decision-making and action selection might be achieved in insects.
Collapse
Affiliation(s)
- Andrew B Barron
- Department of Biological Sciences, Macquarie University North Ryde, NSW, Australia
| | - Kevin N Gurney
- Department of Psychology, The University of Sheffield Sheffield, UK
| | - Lianne F S Meah
- Department of Computer Science, The University of Sheffield Sheffield, UK
| | - Eleni Vasilaki
- Department of Computer Science, The University of Sheffield Sheffield, UK
| | - James A R Marshall
- Department of Computer Science, The University of Sheffield Sheffield, UK
| |
Collapse
|
33
|
Kocaturk M, Gulcur HO, Canbeyli R. Toward Building Hybrid Biological/in silico Neural Networks for Motor Neuroprosthetic Control. Front Neurorobot 2015; 9:8. [PMID: 26321943 PMCID: PMC4531252 DOI: 10.3389/fnbot.2015.00008] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2015] [Accepted: 07/15/2015] [Indexed: 11/13/2022] Open
Abstract
In this article, we introduce the Bioinspired Neuroprosthetic Design Environment (BNDE) as a practical platform for the development of novel brain–machine interface (BMI) controllers, which are based on spiking model neurons. We built the BNDE around a hard real-time system so that it is capable of creating simulated synapses from extracellularly recorded neurons to model neurons. In order to evaluate the practicality of the BNDE for neuroprosthetic control experiments, a novel, adaptive BMI controller was developed and tested using real-time closed-loop simulations. The present controller consists of two in silico medium spiny neurons, which receive simulated synaptic inputs from recorded motor cortical neurons. In the closed-loop simulations, the recordings from the cortical neurons were imitated using an external, hardware-based neural signal synthesizer. By implementing a reward-modulated spike timing-dependent plasticity rule, the controller achieved perfect target reach accuracy for a two-target reaching task in one-dimensional space. The BNDE combines the flexibility of software-based spiking neural network (SNN) simulations with powerful online data visualization tools and is a low-cost, PC-based, and all-in-one solution for developing neurally inspired BMI controllers. We believe that the BNDE is the first implementation, which is capable of creating hybrid biological/in silico neural networks for motor neuroprosthetic control and utilizes multiple CPU cores for computationally intensive real-time SNN simulations.
Collapse
Affiliation(s)
- Mehmet Kocaturk
- Institute of Biomedical Engineering, Bogazici University , Istanbul , Turkey ; Department of Biomedical Engineering, Istanbul Medipol University , Istanbul , Turkey
| | - Halil Ozcan Gulcur
- Institute of Biomedical Engineering, Bogazici University , Istanbul , Turkey
| | - Resit Canbeyli
- Department of Psychology, Bogazici University , Istanbul , Turkey
| |
Collapse
|
34
|
Choice-correlated activity fluctuations underlie learning of neuronal category representation. Nat Commun 2015; 6:6454. [PMID: 25759251 PMCID: PMC4382677 DOI: 10.1038/ncomms7454] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2014] [Accepted: 01/29/2015] [Indexed: 11/30/2022] Open
Abstract
The ability to categorize stimuli into discrete behaviourally relevant groups is an essential cognitive function. To elucidate the neural mechanisms underlying categorization, we constructed a cortical circuit model that is capable of learning a motion categorization task through reward-dependent plasticity. Here we show that stable category representations develop in neurons intermediate to sensory and decision layers if they exhibit choice-correlated activity fluctuations (choice probability). In the model, choice probability and task-specific interneuronal correlations emerge from plasticity of top-down projections from decision neurons. Specific model predictions are confirmed by analysis of single-neuron activity from the monkey parietal cortex, which reveals a mixture of directional and categorical tuning, and a positive correlation between category selectivity and choice probability. Beyond demonstrating a circuit mechanism for categorization, the present work suggests a key role of plastic top-down feedback in simultaneously shaping both neural tuning and correlated neural variability. The ability to categorize stimuli into discrete behaviourally relevant groups is an essential cognitive function. Here, the authors demonstrate a critical role for choice-correlated activity fluctuations in the emergence of stable cortical category representations.
Collapse
|
35
|
Esposito U, Giugliano M, Vasilaki E. Adaptation of short-term plasticity parameters via error-driven learning may explain the correlation between activity-dependent synaptic properties, connectivity motifs and target specificity. Front Comput Neurosci 2015; 8:175. [PMID: 25688203 PMCID: PMC4310301 DOI: 10.3389/fncom.2014.00175] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2014] [Accepted: 12/31/2014] [Indexed: 01/09/2023] Open
Abstract
The anatomical connectivity among neurons has been experimentally found to be largely non-random across brain areas. This means that certain connectivity motifs occur at a higher frequency than would be expected by chance. Of particular interest, short-term synaptic plasticity properties were found to colocalize with specific motifs: an over-expression of bidirectional motifs has been found in neuronal pairs where short-term facilitation dominates synaptic transmission among the neurons, whereas an over-expression of unidirectional motifs has been observed in neuronal pairs where short-term depression dominates. In previous work we found that, given a network with fixed short-term properties, the interaction between short- and long-term plasticity of synaptic transmission is sufficient for the emergence of specific motifs. Here, we introduce an error-driven learning mechanism for short-term plasticity that may explain how such observed correspondences develop from randomly initialized dynamic synapses. By allowing synapses to change their properties, neurons are able to adapt their own activity depending on an error signal. This results in more rich dynamics and also, provided that the learning mechanism is target-specific, leads to specialized groups of synapses projecting onto functionally different targets, qualitatively replicating the experimental results of Wang and collaborators.
Collapse
Affiliation(s)
- Umberto Esposito
- Department Computer Science, University of Sheffield Sheffield, UK
| | - Michele Giugliano
- Department Computer Science, University of Sheffield Sheffield, UK ; Theoretical Neurobiology and Neuroengineering Laboratory, Department Biomedical Sciences, University of Antwerp Antwerp, Belgium ; Laboratory of Neural Microcircuitry, Brain Mind Institute, Swiss Federal Institute of Technology of Lausanne École Polytechnique Fédérale de Lausanne, Switzerland
| | - Eleni Vasilaki
- Department Computer Science, University of Sheffield Sheffield, UK ; Theoretical Neurobiology and Neuroengineering Laboratory, Department Biomedical Sciences, University of Antwerp Antwerp, Belgium ; INSIGNEO Institute for in Silico Medicine, University of Sheffield Sheffield, UK
| |
Collapse
|
36
|
Shah A, Gurney KN. Finding minimal action sequences with a simple evaluation of actions. Front Comput Neurosci 2014; 8:151. [PMID: 25506326 PMCID: PMC4247113 DOI: 10.3389/fncom.2014.00151] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2013] [Accepted: 11/03/2014] [Indexed: 11/13/2022] Open
Abstract
Animals are able to discover the minimal number of actions that achieves an outcome (the minimal action sequence). In most accounts of this, actions are associated with a measure of behavior that is higher for actions that lead to the outcome with a shorter action sequence, and learning mechanisms find the actions associated with the highest measure. In this sense, previous accounts focus on more than the simple binary signal of "was the outcome achieved?"; they focus on "how well was the outcome achieved?" However, such mechanisms may not govern all types of behavioral development. In particular, in the process of action discovery (Redgrave and Gurney, 2006), actions are reinforced if they simply lead to a salient outcome because biological reinforcement signals occur too quickly to evaluate the consequences of an action beyond an indication of the outcome's occurrence. Thus, action discovery mechanisms focus on the simple evaluation of "was the outcome achieved?" and not "how well was the outcome achieved?" Notwithstanding this impoverishment of information, can the process of action discovery find the minimal action sequence? We address this question by implementing computational mechanisms, referred to in this paper as no-cost learning rules, in which each action that leads to the outcome is associated with the same measure of behavior. No-cost rules focus on "was the outcome achieved?" and are consistent with action discovery. No-cost rules discover the minimal action sequence in simulated tasks and execute it for a substantial amount of time. Extensive training, however, results in extraneous actions, suggesting that a separate process (which has been proposed in action discovery) must attenuate learning if no-cost rules participate in behavioral development. We describe how no-cost rules develop behavior, what happens when attenuation is disrupted, and relate the new mechanisms to wider computational and biological context.
Collapse
Affiliation(s)
- Ashvin Shah
- Department of Psychology, The University of SheffieldSheffield, UK
| | | |
Collapse
|
37
|
Esposito U, Giugliano M, van Rossum M, Vasilaki E. Measuring symmetry, asymmetry and randomness in neural network connectivity. PLoS One 2014; 9:e100805. [PMID: 25006663 PMCID: PMC4090069 DOI: 10.1371/journal.pone.0100805] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2013] [Accepted: 05/29/2014] [Indexed: 11/19/2022] Open
Abstract
Cognitive functions are stored in the connectome, the wiring diagram of the brain, which exhibits non-random features, so-called motifs. In this work, we focus on bidirectional, symmetric motifs, i.e. two neurons that project to each other via connections of equal strength, and unidirectional, non-symmetric motifs, i.e. within a pair of neurons only one neuron projects to the other. We hypothesise that such motifs have been shaped via activity dependent synaptic plasticity processes. As a consequence, learning moves the distribution of the synaptic connections away from randomness. Our aim is to provide a global, macroscopic, single parameter characterisation of the statistical occurrence of bidirectional and unidirectional motifs. To this end we define a symmetry measure that does not require any a priori thresholding of the weights or knowledge of their maximal value. We calculate its mean and variance for random uniform or Gaussian distributions, which allows us to introduce a confidence measure of how significantly symmetric or asymmetric a specific configuration is, i.e. how likely it is that the configuration is the result of chance. We demonstrate the discriminatory power of our symmetry measure by inspecting the eigenvalues of different types of connectivity matrices. We show that a Gaussian weight distribution biases the connectivity motifs to more symmetric configurations than a uniform distribution and that introducing a random synaptic pruning, mimicking developmental regulation in synaptogenesis, biases the connectivity motifs to more asymmetric configurations, regardless of the distribution. We expect that our work will benefit the computational modelling community, by providing a systematic way to characterise symmetry and asymmetry in network structures. Further, our symmetry measure will be of use to electrophysiologists that investigate symmetry of network connectivity.
Collapse
Affiliation(s)
- Umberto Esposito
- Department of Computer Science, University of Sheffield, Sheffield, United Kingdom
| | - Michele Giugliano
- Department of Computer Science, University of Sheffield, Sheffield, United Kingdom
- Theoretical Neurobiology and Neuroengineering Laboratory, Department of Biomedical Sciences, University of Antwerp, Wilrijk, Belgium
- Laboratory of Neural Microcircuitry, Brain Mind Institute, École polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Mark van Rossum
- School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
| | - Eleni Vasilaki
- Department of Computer Science, University of Sheffield, Sheffield, United Kingdom
- Theoretical Neurobiology and Neuroengineering Laboratory, Department of Biomedical Sciences, University of Antwerp, Wilrijk, Belgium
- * E-mail:
| |
Collapse
|
38
|
FRIEDRICH JOHANNES, URBANCZIK ROBERT, SENN WALTER. CODE-SPECIFIC LEARNING RULES IMPROVE ACTION SELECTION BY POPULATIONS OF SPIKING NEURONS. Int J Neural Syst 2014; 24:1450002. [DOI: 10.1142/s0129065714500026] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Population coding is widely regarded as a key mechanism for achieving reliable behavioral decisions. We previously introduced reinforcement learning for population-based decision making by spiking neurons. Here we generalize population reinforcement learning to spike-based plasticity rules that take account of the postsynaptic neural code. We consider spike/no-spike, spike count and spike latency codes. The multi-valued and continuous-valued features in the postsynaptic code allow for a generalization of binary decision making to multi-valued decision making and continuous-valued action selection. We show that code-specific learning rules speed up learning both for the discrete classification and the continuous regression tasks. The suggested learning rules also speed up with increasing population size as opposed to standard reinforcement learning rules. Continuous action selection is further shown to explain realistic learning speeds in the Morris water maze. Finally, we introduce the concept of action perturbation as opposed to the classical weight- or node-perturbation as an exploration mechanism underlying reinforcement learning. Exploration in the action space greatly increases the speed of learning as compared to exploration in the neuron or weight space.
Collapse
Affiliation(s)
- JOHANNES FRIEDRICH
- Institute of Physiology, University of Bern, Bühlplatz 5, 3012 Bern, Switzerland
- Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, UK
| | - ROBERT URBANCZIK
- Institute of Physiology, University of Bern, Bühlplatz 5, 3012 Bern, Switzerland
- Center for Cognition, Learning and Memory, University of Bern, Factory Street 8, CH-3012 Bern, Switzerland
| | - WALTER SENN
- Institute of Physiology, University of Bern, Bühlplatz 5, 3012 Bern, Switzerland
- Center for Cognition, Learning and Memory, University of Bern, Factory Street 8, CH-3012 Bern, Switzerland
| |
Collapse
|
39
|
Vasilaki E, Giugliano M. Emergence of connectivity motifs in networks of model neurons with short- and long-term plastic synapses. PLoS One 2014; 9:e84626. [PMID: 24454735 PMCID: PMC3893143 DOI: 10.1371/journal.pone.0084626] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2013] [Accepted: 11/16/2013] [Indexed: 11/29/2022] Open
Abstract
Recent experimental data from the rodent cerebral cortex and olfactory bulb indicate that specific connectivity motifs are correlated with short-term dynamics of excitatory synaptic transmission. It was observed that neurons with short-term facilitating synapses form predominantly reciprocal pairwise connections, while neurons with short-term depressing synapses form predominantly unidirectional pairwise connections. The cause of these structural differences in excitatory synaptic microcircuits is unknown. We show that these connectivity motifs emerge in networks of model neurons, from the interactions between short-term synaptic dynamics (SD) and long-term spike-timing dependent plasticity (STDP). While the impact of STDP on SD was shown in simultaneous neuronal pair recordings in vitro, the mutual interactions between STDP and SD in large networks are still the subject of intense research. Our approach combines an SD phenomenological model with an STDP model that faithfully captures long-term plasticity dependence on both spike times and frequency. As a proof of concept, we first simulate and analyze recurrent networks of spiking neurons with random initial connection efficacies and where synapses are either all short-term facilitating or all depressing. For identical external inputs to the network, and as a direct consequence of internally generated activity, we find that networks with depressing synapses evolve unidirectional connectivity motifs, while networks with facilitating synapses evolve reciprocal connectivity motifs. We then show that the same results hold for heterogeneous networks, including both facilitating and depressing synapses. This does not contradict a recent theory that proposes that motifs are shaped by external inputs, but rather complements it by examining the role of both the external inputs and the internally generated network activity. Our study highlights the conditions under which SD-STDP might explain the correlation between facilitation and reciprocal connectivity motifs, as well as between depression and unidirectional motifs.
Collapse
Affiliation(s)
- Eleni Vasilaki
- Department of Computer Science, University of Sheffield, Sheffield, United Kingdom
- Department of Biomedical Sciences, University of Antwerp, Wilrijk, Belgium
| | - Michele Giugliano
- Department of Computer Science, University of Sheffield, Sheffield, United Kingdom
- Department of Biomedical Sciences, University of Antwerp, Wilrijk, Belgium
- Brain Mind Institute, Swiss Federal Institute of Technology of Lausanne, Lausanne, Switzerland
- * E-mail:
| |
Collapse
|
40
|
Mahmoudi B, Pohlmeyer EA, Prins NW, Geng S, Sanchez JC. Towards autonomous neuroprosthetic control using Hebbian reinforcement learning. J Neural Eng 2013; 10:066005. [PMID: 24100047 DOI: 10.1088/1741-2560/10/6/066005] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
OBJECTIVE Our goal was to design an adaptive neuroprosthetic controller that could learn the mapping from neural states to prosthetic actions and automatically adjust adaptation using only a binary evaluative feedback as a measure of desirability/undesirability of performance. APPROACH Hebbian reinforcement learning (HRL) in a connectionist network was used for the design of the adaptive controller. The method combines the efficiency of supervised learning with the generality of reinforcement learning. The convergence properties of this approach were studied using both closed-loop control simulations and open-loop simulations that used primate neural data from robot-assisted reaching tasks. MAIN RESULTS The HRL controller was able to perform classification and regression tasks using its episodic and sequential learning modes, respectively. In our experiments, the HRL controller quickly achieved convergence to an effective control policy, followed by robust performance. The controller also automatically stopped adapting the parameters after converging to a satisfactory control policy. Additionally, when the input neural vector was reorganized, the controller resumed adaptation to maintain performance. SIGNIFICANCE By estimating an evaluative feedback directly from the user, the HRL control algorithm may provide an efficient method for autonomous adaptation of neuroprosthetic systems. This method may enable the user to teach the controller the desired behavior using only a simple feedback signal.
Collapse
Affiliation(s)
- Babak Mahmoudi
- Department of Neurosurgery, Emory University, Atlanta, GA, USA
| | | | | | | | | |
Collapse
|
41
|
Frémaux N, Sprekeler H, Gerstner W. Reinforcement learning using a continuous time actor-critic framework with spiking neurons. PLoS Comput Biol 2013; 9:e1003024. [PMID: 23592970 PMCID: PMC3623741 DOI: 10.1371/journal.pcbi.1003024] [Citation(s) in RCA: 70] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2012] [Accepted: 02/22/2013] [Indexed: 11/26/2022] Open
Abstract
Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity. As every dog owner knows, animals repeat behaviors that earn them rewards. But what is the brain machinery that underlies this reward-based learning? Experimental research points to plasticity of the synaptic connections between neurons, with an important role played by the neuromodulator dopamine, but the exact way synaptic activity and neuromodulation interact during learning is not precisely understood. Here we propose a model explaining how reward signals might interplay with synaptic plasticity, and use the model to solve a simulated maze navigation task. Our model extends an idea from the theory of reinforcement learning: one group of neurons form an “actor,” responsible for choosing the direction of motion of the animal. Another group of neurons, the “critic,” whose role is to predict the rewards the actor will gain, uses the mismatch between actual and expected reward to teach the synapses feeding both groups. Our learning agent learns to reliably navigate its maze to find the reward. Remarkably, the synaptic learning rule that we derive from theoretical considerations is similar to previous rules based on experimental evidence.
Collapse
Affiliation(s)
- Nicolas Frémaux
- School of Computer and Communication Sciences and School of Life Sciences, Brain Mind Institute, École Polytechnique Fédérale de Lausanne, 1015 Lausanne EPFL, Switzerland
| | - Henning Sprekeler
- School of Computer and Communication Sciences and School of Life Sciences, Brain Mind Institute, École Polytechnique Fédérale de Lausanne, 1015 Lausanne EPFL, Switzerland
- Theoretical Neuroscience Lab, Institute for Theoretical Biology, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Wulfram Gerstner
- School of Computer and Communication Sciences and School of Life Sciences, Brain Mind Institute, École Polytechnique Fédérale de Lausanne, 1015 Lausanne EPFL, Switzerland
- * E-mail:
| |
Collapse
|
42
|
Soltoggio A, Lemme A, Reinhart F, Steil JJ. Rare neural correlations implement robotic conditioning with delayed rewards and disturbances. Front Neurorobot 2013; 7:6. [PMID: 23565092 PMCID: PMC3613617 DOI: 10.3389/fnbot.2013.00006] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2012] [Accepted: 03/06/2013] [Indexed: 11/13/2022] Open
Abstract
Neural conditioning associates cues and actions with following rewards. The environments in which robots operate, however, are pervaded by a variety of disturbing stimuli and uncertain timing. In particular, variable reward delays make it difficult to reconstruct which previous actions are responsible for following rewards. Such an uncertainty is handled by biological neural networks, but represents a challenge for computational models, suggesting the lack of a satisfactory theory for robotic neural conditioning. The present study demonstrates the use of rare neural correlations in making correct associations between rewards and previous cues or actions. Rare correlations are functional in selecting sparse synapses to be eligible for later weight updates if a reward occurs. The repetition of this process singles out the associating and reward-triggering pathways, and thereby copes with distal rewards. The neural network displays macro-level classical and operant conditioning, which is demonstrated in an interactive real-life human-robot interaction. The proposed mechanism models realistic conditioning in humans and animals and implements similar behaviors in neuro-robotic platforms.
Collapse
Affiliation(s)
- Andrea Soltoggio
- Faculty of Technology, Research Institute for Cognition and Robotics (CoR-Lab), Bielefeld University Bielefeld, Germany
| | | | | | | |
Collapse
|
43
|
Eliasmith C, Stewart TC, Choo X, Bekolay T, DeWolf T, Tang Y, Tang C, Rasmussen D. A large-scale model of the functioning brain. Science 2012. [PMID: 23197532 DOI: 10.1126/science.1225266] [Citation(s) in RCA: 330] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
A central challenge for cognitive and systems neuroscience is to relate the incredibly complex behavior of animals to the equally complex activity of their brains. Recently described, large-scale neural models have not bridged this gap between neural activity and biological function. In this work, we present a 2.5-million-neuron model of the brain (called "Spaun") that bridges this gap by exhibiting many different behaviors. The model is presented only with visual image sequences, and it draws all of its responses with a physically modeled arm. Although simplified, the model captures many aspects of neuroanatomy, neurophysiology, and psychological behavior, which we demonstrate via eight diverse tasks.
Collapse
Affiliation(s)
- Chris Eliasmith
- Centre for Theoretical Neuroscience, University of Waterloo, Waterloo, ON N2J 3G1, Canada.
| | | | | | | | | | | | | | | |
Collapse
|
44
|
Probst D, Maass W, Markram H, Gewaltig MO. Liquid Computing in a Simplified Model of Cortical Layer IV: Learning to Balance a Ball. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING – ICANN 2012 2012. [DOI: 10.1007/978-3-642-33269-2_27] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|
45
|
Abstract
A central criticism of standard theoretical approaches to constructing stable, recurrent model networks is that the synaptic connection weights need to be finely-tuned. This criticism is severe because proposed rules for learning these weights have been shown to have various limitations to their biological plausibility. Hence it is unlikely that such rules are used to continuously fine-tune the network in vivo. We describe a learning rule that is able to tune synaptic weights in a biologically plausible manner. We demonstrate and test this rule in the context of the oculomotor integrator, showing that only known neural signals are needed to tune the weights. We demonstrate that the rule appropriately accounts for a wide variety of experimental results, and is robust under several kinds of perturbation. Furthermore, we show that the rule is able to achieve stability as good as or better than that provided by the linearly optimal weights often used in recurrent models of the integrator. Finally, we discuss how this rule can be generalized to tune a wide variety of recurrent attractor networks, such as those found in head direction and path integration systems, suggesting that it may be used to tune a wide variety of stable neural systems.
Collapse
|
46
|
Friedrich J, Urbanczik R, Senn W. Spatio-temporal credit assignment in neuronal population learning. PLoS Comput Biol 2011; 7:e1002092. [PMID: 21738460 PMCID: PMC3127803 DOI: 10.1371/journal.pcbi.1002092] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2010] [Accepted: 05/02/2011] [Indexed: 01/27/2023] Open
Abstract
In learning from trial and error, animals need to relate behavioral decisions to environmental reinforcement even though it may be difficult to assign credit to a particular decision when outcomes are uncertain or subject to delays. When considering the biophysical basis of learning, the credit-assignment problem is compounded because the behavioral decisions themselves result from the spatio-temporal aggregation of many synaptic releases. We present a model of plasticity induction for reinforcement learning in a population of leaky integrate and fire neurons which is based on a cascade of synaptic memory traces. Each synaptic cascade correlates presynaptic input first with postsynaptic events, next with the behavioral decisions and finally with external reinforcement. For operant conditioning, learning succeeds even when reinforcement is delivered with a delay so large that temporal contiguity between decision and pertinent reward is lost due to intervening decisions which are themselves subject to delayed reinforcement. This shows that the model provides a viable mechanism for temporal credit assignment. Further, learning speeds up with increasing population size, so the plasticity cascade simultaneously addresses the spatial problem of assigning credit to synapses in different population neurons. Simulations on other tasks, such as sequential decision making, serve to contrast the performance of the proposed scheme to that of temporal difference-based learning. We argue that, due to their comparative robustness, synaptic plasticity cascades are attractive basic models of reinforcement learning in the brain. The key mechanisms supporting memory and learning in the brain rely on changing the strength of synapses which control the transmission of information between neurons. But how are appropriate changes determined when animals learn from trial and error? Information on success or failure is likely signaled to synapses by neurotransmitters like dopamine. But interpreting this reward signal is difficult because the number of synaptic transmissions occurring during behavioral decision making is huge and each transmission may have contributed differently to the decision, or perhaps not at all. Extrapolating from experimental evidence on synaptic plasticity, we suggest a computational model where each synapse collects information about its contributions to the decision process by means of a cascade of transient memory traces. The final trace then remodulates the reward signal when the persistent change of the synaptic strength is triggered. Simulation results show that with the suggested synaptic plasticity rule a simple neural network can learn even difficult tasks by trial and error, e.g., when the decision - reward sequence is scrambled due to large delays in reward delivery.
Collapse
Affiliation(s)
| | | | - Walter Senn
- Department of Physiology, University of Bern, Bern, Switzerland
- * E-mail:
| |
Collapse
|
47
|
Neural mechanisms and computations underlying stress effects on learning and memory. Curr Opin Neurobiol 2011; 21:502-8. [DOI: 10.1016/j.conb.2011.03.003] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2011] [Revised: 02/08/2011] [Accepted: 03/25/2011] [Indexed: 11/22/2022]
|
48
|
An imperfect dopaminergic error signal can drive temporal-difference learning. PLoS Comput Biol 2011; 7:e1001133. [PMID: 21589888 PMCID: PMC3093351 DOI: 10.1371/journal.pcbi.1001133] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2010] [Accepted: 04/06/2011] [Indexed: 12/03/2022] Open
Abstract
An open problem in the field of computational neuroscience is how to link synaptic plasticity to system-level learning. A promising framework in this context is temporal-difference (TD) learning. Experimental evidence that supports the hypothesis that the mammalian brain performs temporal-difference learning includes the resemblance of the phasic activity of the midbrain dopaminergic neurons to the TD error and the discovery that cortico-striatal synaptic plasticity is modulated by dopamine. However, as the phasic dopaminergic signal does not reproduce all the properties of the theoretical TD error, it is unclear whether it is capable of driving behavior adaptation in complex tasks. Here, we present a spiking temporal-difference learning model based on the actor-critic architecture. The model dynamically generates a dopaminergic signal with realistic firing rates and exploits this signal to modulate the plasticity of synapses as a third factor. The predictions of our proposed plasticity dynamics are in good agreement with experimental results with respect to dopamine, pre- and post-synaptic activity. An analytical mapping from the parameters of our proposed plasticity dynamics to those of the classical discrete-time TD algorithm reveals that the biological constraints of the dopaminergic signal entail a modified TD algorithm with self-adapting learning parameters and an adapting offset. We show that the neuronal network is able to learn a task with sparse positive rewards as fast as the corresponding classical discrete-time TD algorithm. However, the performance of the neuronal network is impaired with respect to the traditional algorithm on a task with both positive and negative rewards and breaks down entirely on a task with purely negative rewards. Our model demonstrates that the asymmetry of a realistic dopaminergic signal enables TD learning when learning is driven by positive rewards but not when driven by negative rewards. What are the physiological changes that take place in the brain when we solve a problem or learn a new skill? It is commonly assumed that behavior adaptations are realized on the microscopic level by changes in synaptic efficacies. However, this is hard to verify experimentally due to the difficulties of identifying the relevant synapses and monitoring them over long periods during a behavioral task. To address this question computationally, we develop a spiking neuronal network model of actor-critic temporal-difference learning, a variant of reinforcement learning for which neural correlates have already been partially established. The network learns a complex task by means of an internally generated reward signal constrained by recent findings on the dopaminergic system. Our model combines top-down and bottom-up modelling approaches to bridge the gap between synaptic plasticity and system-level learning. It paves the way for further investigations of the dopaminergic system in reward learning in the healthy brain and in pathological conditions such as Parkinson's disease, and can be used as a module in functional models based on brain-scale circuitry.
Collapse
|
49
|
Democratic population decisions result in robust policy-gradient learning: a parametric study with GPU simulations. PLoS One 2011; 6:e18539. [PMID: 21572529 PMCID: PMC3087717 DOI: 10.1371/journal.pone.0018539] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2010] [Accepted: 03/03/2011] [Indexed: 11/28/2022] Open
Abstract
High performance computing on the Graphics Processing Unit (GPU) is an emerging
field driven by the promise of high computational power at a low cost. However,
GPU programming is a non-trivial task and moreover architectural limitations
raise the question of whether investing effort in this direction may be
worthwhile. In this work, we use GPU programming to simulate a two-layer network
of Integrate-and-Fire neurons with varying degrees of recurrent connectivity and
investigate its ability to learn a simplified navigation task using a
policy-gradient learning rule stemming from Reinforcement Learning. The purpose
of this paper is twofold. First, we want to support the use of GPUs in the field
of Computational Neuroscience. Second, using GPU computing power, we investigate
the conditions under which the said architecture and learning rule demonstrate
best performance. Our work indicates that networks featuring strong
Mexican-Hat-shaped recurrent connections in the top layer, where decision making
is governed by the formation of a stable activity bump in the neural population
(a “non-democratic” mechanism), achieve mediocre learning results at
best. In absence of recurrent connections, where all neurons “vote”
independently (“democratic”) for a decision via population vector
readout, the task is generally learned better and more robustly. Our study would
have been extremely difficult on a desktop computer without the use of GPU
programming. We present the routines developed for this purpose and show that a
speed improvement of 5x up to 42x is provided versus optimised Python code. The
higher speed is achieved when we exploit the parallelism of the GPU in the
search of learning parameters. This suggests that efficient GPU programming can
significantly reduce the time needed for simulating networks of spiking neurons,
particularly when multiple parameter configurations are investigated.
Collapse
|
50
|
Vassiliades V, Cleanthous A, Christodoulou C. Multiagent reinforcement learning: spiking and nonspiking agents in the iterated Prisoner's Dilemma. ACTA ACUST UNITED AC 2011; 22:639-53. [PMID: 21421435 DOI: 10.1109/tnn.2011.2111384] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
This paper investigates multiagent reinforcement learning (MARL) in a general-sum game where the payoffs' structure is such that the agents are required to exploit each other in a way that benefits all agents. The contradictory nature of these games makes their study in multiagent systems quite challenging. In particular, we investigate MARL with spiking and nonspiking agents in the Iterated Prisoner's Dilemma by exploring the conditions required to enhance its cooperative outcome. The spiking agents are neural networks with leaky integrate-and-fire neurons trained with two different learning algorithms: 1) reinforcement of stochastic synaptic transmission, or 2) reward-modulated spike-timing-dependent plasticity with eligibility trace. The nonspiking agents use a tabular representation and are trained with Q- and SARSA learning algorithms, with a novel reward transformation process also being applied to the Q-learning agents. According to the results, the cooperative outcome is enhanced by: 1) transformed internal reinforcement signals and a combination of a high learning rate and a low discount factor with an appropriate exploration schedule in the case of non-spiking agents, and 2) having longer eligibility trace time constant in the case of spiking agents. Moreover, it is shown that spiking and nonspiking agents have similar behavior and therefore they can equally well be used in a multiagent interaction setting. For training the spiking agents in the case where more than one output neuron competes for reinforcement, a novel and necessary modification that enhances competition is applied to the two learning algorithms utilized, in order to avoid a possible synaptic saturation. This is done by administering to the networks additional global reinforcement signals for every spike of the output neurons that were not "responsible" for the preceding decision.
Collapse
|