1
|
Zajzon B, Duarte R, Morrison A. Toward reproducible models of sequence learning: replication and analysis of a modular spiking network with reward-based learning. Front Integr Neurosci 2023; 17:935177. [PMID: 37396571 PMCID: PMC10310927 DOI: 10.3389/fnint.2023.935177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 05/15/2023] [Indexed: 07/04/2023] Open
Abstract
To acquire statistical regularities from the world, the brain must reliably process, and learn from, spatio-temporally structured information. Although an increasing number of computational models have attempted to explain how such sequence learning may be implemented in the neural hardware, many remain limited in functionality or lack biophysical plausibility. If we are to harvest the knowledge within these models and arrive at a deeper mechanistic understanding of sequential processing in cortical circuits, it is critical that the models and their findings are accessible, reproducible, and quantitatively comparable. Here we illustrate the importance of these aspects by providing a thorough investigation of a recently proposed sequence learning model. We re-implement the modular columnar architecture and reward-based learning rule in the open-source NEST simulator, and successfully replicate the main findings of the original study. Building on these, we perform an in-depth analysis of the model's robustness to parameter settings and underlying assumptions, highlighting its strengths and weaknesses. We demonstrate a limitation of the model consisting in the hard-wiring of the sequence order in the connectivity patterns, and suggest possible solutions. Finally, we show that the core functionality of the model is retained under more biologically-plausible constraints.
Collapse
Affiliation(s)
- Barna Zajzon
- Institute of Neuroscience and Medicine (INM-6) and Institute for Advanced Simulation (IAS-6) and JARA-BRAIN Institute I, Jülich Research Centre, Jülich, Germany
- Department of Computer Science 3—Software Engineering, RWTH Aachen University, Aachen, Germany
| | - Renato Duarte
- Donders Institute for Brain, Cognition and Behavior, Radboud University, Nijmegen, Netherlands
| | - Abigail Morrison
- Institute of Neuroscience and Medicine (INM-6) and Institute for Advanced Simulation (IAS-6) and JARA-BRAIN Institute I, Jülich Research Centre, Jülich, Germany
- Department of Computer Science 3—Software Engineering, RWTH Aachen University, Aachen, Germany
| |
Collapse
|
2
|
Weidel P, Duarte R, Morrison A. Unsupervised Learning and Clustered Connectivity Enhance Reinforcement Learning in Spiking Neural Networks. Front Comput Neurosci 2021; 15:543872. [PMID: 33746728 PMCID: PMC7970044 DOI: 10.3389/fncom.2021.543872] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Accepted: 02/08/2021] [Indexed: 11/13/2022] Open
Abstract
Reinforcement learning is a paradigm that can account for how organisms learn to adapt their behavior in complex environments with sparse rewards. To partition an environment into discrete states, implementations in spiking neuronal networks typically rely on input architectures involving place cells or receptive fields specified ad hoc by the researcher. This is problematic as a model for how an organism can learn appropriate behavioral sequences in unknown environments, as it fails to account for the unsupervised and self-organized nature of the required representations. Additionally, this approach presupposes knowledge on the part of the researcher on how the environment should be partitioned and represented and scales poorly with the size or complexity of the environment. To address these issues and gain insights into how the brain generates its own task-relevant mappings, we propose a learning architecture that combines unsupervised learning on the input projections with biologically motivated clustered connectivity within the representation layer. This combination allows input features to be mapped to clusters; thus the network self-organizes to produce clearly distinguishable activity patterns that can serve as the basis for reinforcement learning on the output projections. On the basis of the MNIST and Mountain Car tasks, we show that our proposed model performs better than either a comparable unclustered network or a clustered network with static input projections. We conclude that the combination of unsupervised learning and clustered connectivity provides a generic representational substrate suitable for further computation.
Collapse
Affiliation(s)
- Philipp Weidel
- Institute of Neuroscience and Medicine (INM-6) & Institute for Advanced Simulation (IAS-6) & JARA-Institute Brain Structure-Function Relationship (JBI-1 / INM-10), Research Centre Jülich, Jülich, Germany.,Department of Computer Science 3 - Software Engineering, RWTH Aachen University, Aachen, Germany
| | - Renato Duarte
- Institute of Neuroscience and Medicine (INM-6) & Institute for Advanced Simulation (IAS-6) & JARA-Institute Brain Structure-Function Relationship (JBI-1 / INM-10), Research Centre Jülich, Jülich, Germany
| | - Abigail Morrison
- Institute of Neuroscience and Medicine (INM-6) & Institute for Advanced Simulation (IAS-6) & JARA-Institute Brain Structure-Function Relationship (JBI-1 / INM-10), Research Centre Jülich, Jülich, Germany.,Department of Computer Science 3 - Software Engineering, RWTH Aachen University, Aachen, Germany
| |
Collapse
|
3
|
Huang J, Ruan X, Yu N, Fan Q, Li J, Cai J. A Cognitive Model Based on Neuromodulated Plasticity. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2016; 2016:4296356. [PMID: 27872638 PMCID: PMC5107251 DOI: 10.1155/2016/4296356] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/06/2016] [Revised: 07/17/2016] [Accepted: 09/22/2016] [Indexed: 11/18/2022]
Abstract
Associative learning, including classical conditioning and operant conditioning, is regarded as the most fundamental type of learning for animals and human beings. Many models have been proposed surrounding classical conditioning or operant conditioning. However, a unified and integrated model to explain the two types of conditioning is much less studied. Here, a model based on neuromodulated synaptic plasticity is presented. The model is bioinspired including multistored memory module and simulated VTA dopaminergic neurons to produce reward signal. The synaptic weights are modified according to the reward signal, which simulates the change of associative strengths in associative learning. The experiment results in real robots prove the suitability and validity of the proposed model.
Collapse
Affiliation(s)
- Jing Huang
- Institute of Artificial Intelligence and Robotics, Beijing University of Technology, Beijing 100124, China
- Pilot College, Beijing University of Technology, Beijing 101101, China
| | - Xiaogang Ruan
- Institute of Artificial Intelligence and Robotics, Beijing University of Technology, Beijing 100124, China
| | - Naigong Yu
- Institute of Artificial Intelligence and Robotics, Beijing University of Technology, Beijing 100124, China
| | - Qingwu Fan
- Pilot College, Beijing University of Technology, Beijing 101101, China
| | - Jiaming Li
- Pilot College, Beijing University of Technology, Beijing 101101, China
| | - Jianxian Cai
- Institute of Artificial Intelligence and Robotics, Beijing University of Technology, Beijing 100124, China
| |
Collapse
|
4
|
Soltoggio A, Lemme A, Reinhart F, Steil JJ. Rare neural correlations implement robotic conditioning with delayed rewards and disturbances. Front Neurorobot 2013; 7:6. [PMID: 23565092 PMCID: PMC3613617 DOI: 10.3389/fnbot.2013.00006] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2012] [Accepted: 03/06/2013] [Indexed: 11/13/2022] Open
Abstract
Neural conditioning associates cues and actions with following rewards. The environments in which robots operate, however, are pervaded by a variety of disturbing stimuli and uncertain timing. In particular, variable reward delays make it difficult to reconstruct which previous actions are responsible for following rewards. Such an uncertainty is handled by biological neural networks, but represents a challenge for computational models, suggesting the lack of a satisfactory theory for robotic neural conditioning. The present study demonstrates the use of rare neural correlations in making correct associations between rewards and previous cues or actions. Rare correlations are functional in selecting sparse synapses to be eligible for later weight updates if a reward occurs. The repetition of this process singles out the associating and reward-triggering pathways, and thereby copes with distal rewards. The neural network displays macro-level classical and operant conditioning, which is demonstrated in an interactive real-life human-robot interaction. The proposed mechanism models realistic conditioning in humans and animals and implements similar behaviors in neuro-robotic platforms.
Collapse
Affiliation(s)
- Andrea Soltoggio
- Faculty of Technology, Research Institute for Cognition and Robotics (CoR-Lab), Bielefeld University Bielefeld, Germany
| | | | | | | |
Collapse
|
5
|
Abstract
In the course of trial-and-error learning, the results of actions, manifested as rewards or punishments, occur often seconds after the actions that caused them. How can a reward be associated with an earlier action when the neural activity that caused that action is no longer present in the network? This problem is referred to as the distal reward problem. A recent computational study proposes a solution using modulated plasticity with spiking neurons and argues that precise firing patterns in the millisecond range are essential for such a solution. In contrast, the study reported in this letter shows that it is the rarity of correlating neural activity, and not the spike timing, that allows the network to solve the distal reward problem. In this study, rare correlations are detected in a standard rate-based computational model by means of a threshold-augmented Hebbian rule. The novel modulated plasticity rule allows a randomly connected network to learn in classical and instrumental conditioning scenarios with delayed rewards. The rarity of correlations is shown to be a pivotal factor in the learning and in handling various delays of the reward. This study additionally suggests the hypothesis that short-term synaptic plasticity may implement eligibility traces and thereby serve as a selection mechanism in promoting candidate synapses for long-term storage.
Collapse
Affiliation(s)
- Andrea Soltoggio
- Research Institute for Cognition and Robotics and Faculty of Technology, Bielefeld University, Bielefeld 33615, Germany
| | - Jochen J. Steil
- Research Institute for Cognition and Robotics and Faculty of Technology, Bielefeld University, Bielefeld 33615, Germany
| |
Collapse
|
6
|
Ungureanu M, Stoliar P, Llopis R, Casanova F, Hueso LE. Non-Hebbian learning implementation in light-controlled resistive memory devices. PLoS One 2012; 7:e52042. [PMID: 23251679 PMCID: PMC3522635 DOI: 10.1371/journal.pone.0052042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2012] [Accepted: 11/08/2012] [Indexed: 11/19/2022] Open
Abstract
Non-Hebbian learning is often encountered in different bio-organisms. In these processes, the strength of a synapse connecting two neurons is controlled not only by the signals exchanged between the neurons, but also by an additional factor external to the synaptic structure. Here we show the implementation of non-Hebbian learning in a single solid-state resistive memory device. The output of our device is controlled not only by the applied voltages, but also by the illumination conditions under which it operates. We demonstrate that our metal/oxide/semiconductor device learns more efficiently at higher applied voltages but also when light, an external parameter, is present during the information writing steps. Conversely, memory erasing is more efficiently at higher applied voltages and in the dark. Translating neuronal activity into simple solid-state devices could provide a deeper understanding of complex brain processes and give insight into non-binary computing possibilities.
Collapse
Affiliation(s)
| | - Pablo Stoliar
- CIC nanoGUNE Consolider, Donostia - San Sebastian, Spain
- LPS, CNRS - UPS, Bât. 510, Orsay, France
- ECyT, UNSAM, San Martín, Buenos Aires, Argentina
| | - Roger Llopis
- CIC nanoGUNE Consolider, Donostia - San Sebastian, Spain
| | - Fèlix Casanova
- CIC nanoGUNE Consolider, Donostia - San Sebastian, Spain
- IKERBASQUE, Basque Foundation for Science, Bilbao, Spain
| | - Luis E. Hueso
- CIC nanoGUNE Consolider, Donostia - San Sebastian, Spain
- IKERBASQUE, Basque Foundation for Science, Bilbao, Spain
| |
Collapse
|
7
|
From modulated Hebbian plasticity to simple behavior learning through noise and weight saturation. Neural Netw 2012; 34:28-41. [DOI: 10.1016/j.neunet.2012.06.005] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2011] [Revised: 06/08/2012] [Accepted: 06/17/2012] [Indexed: 11/21/2022]
|
8
|
An imperfect dopaminergic error signal can drive temporal-difference learning. PLoS Comput Biol 2011; 7:e1001133. [PMID: 21589888 PMCID: PMC3093351 DOI: 10.1371/journal.pcbi.1001133] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2010] [Accepted: 04/06/2011] [Indexed: 12/03/2022] Open
Abstract
An open problem in the field of computational neuroscience is how to link synaptic plasticity to system-level learning. A promising framework in this context is temporal-difference (TD) learning. Experimental evidence that supports the hypothesis that the mammalian brain performs temporal-difference learning includes the resemblance of the phasic activity of the midbrain dopaminergic neurons to the TD error and the discovery that cortico-striatal synaptic plasticity is modulated by dopamine. However, as the phasic dopaminergic signal does not reproduce all the properties of the theoretical TD error, it is unclear whether it is capable of driving behavior adaptation in complex tasks. Here, we present a spiking temporal-difference learning model based on the actor-critic architecture. The model dynamically generates a dopaminergic signal with realistic firing rates and exploits this signal to modulate the plasticity of synapses as a third factor. The predictions of our proposed plasticity dynamics are in good agreement with experimental results with respect to dopamine, pre- and post-synaptic activity. An analytical mapping from the parameters of our proposed plasticity dynamics to those of the classical discrete-time TD algorithm reveals that the biological constraints of the dopaminergic signal entail a modified TD algorithm with self-adapting learning parameters and an adapting offset. We show that the neuronal network is able to learn a task with sparse positive rewards as fast as the corresponding classical discrete-time TD algorithm. However, the performance of the neuronal network is impaired with respect to the traditional algorithm on a task with both positive and negative rewards and breaks down entirely on a task with purely negative rewards. Our model demonstrates that the asymmetry of a realistic dopaminergic signal enables TD learning when learning is driven by positive rewards but not when driven by negative rewards. What are the physiological changes that take place in the brain when we solve a problem or learn a new skill? It is commonly assumed that behavior adaptations are realized on the microscopic level by changes in synaptic efficacies. However, this is hard to verify experimentally due to the difficulties of identifying the relevant synapses and monitoring them over long periods during a behavioral task. To address this question computationally, we develop a spiking neuronal network model of actor-critic temporal-difference learning, a variant of reinforcement learning for which neural correlates have already been partially established. The network learns a complex task by means of an internally generated reward signal constrained by recent findings on the dopaminergic system. Our model combines top-down and bottom-up modelling approaches to bridge the gap between synaptic plasticity and system-level learning. It paves the way for further investigations of the dopaminergic system in reward learning in the healthy brain and in pathological conditions such as Parkinson's disease, and can be used as a module in functional models based on brain-scale circuitry.
Collapse
|
9
|
Duff A, Verschure PF. Unifying perceptual and behavioral learning with a correlative subspace learning rule. Neurocomputing 2010. [DOI: 10.1016/j.neucom.2009.11.048] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
10
|
Shimoda S, Kimura H. Biomimetic approach to tacit learning based on compound control. ACTA ACUST UNITED AC 2009; 40:77-90. [PMID: 19651559 DOI: 10.1109/tsmcb.2009.2014470] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The remarkable capability of living organisms to adapt to unknown environments is due to learning mechanisms that are totally different from the current artificial machine-learning paradigm. Computational media composed of identical elements that have simple activity rules play a major role in biological control, such as the activities of neurons in brains and the molecular interactions in intracellular control. As a result of integrations of the individual activities of the computational media, new behavioral patterns emerge to adapt to changing environments. We previously implemented this feature of biological controls in a form of machine learning and succeeded to realize bipedal walking without the robot model or trajectory planning. Despite the success of bipedal walking, it was a puzzle as to why the individual activities of the computational media could achieve the global behavior. In this paper, we answer this question by taking a statistical approach that connects the individual activities of computational media to global network behaviors. We show that the individual activities can generate optimized behaviors from a particular global viewpoint, i.e., autonomous rhythm generation and learning of balanced postures, without using global performance indices.
Collapse
Affiliation(s)
- Shingo Shimoda
- RIKEN Brain Science Institute (BSI)-Toyota Collaboration Center, Nagoya 463-0003, Japan.
| | | |
Collapse
|
11
|
Potjans W, Morrison A, Diesmann M. A spiking neural network model of an actor-critic learning agent. Neural Comput 2009; 21:301-39. [PMID: 19196231 DOI: 10.1162/neco.2008.08-07-593] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
The ability to adapt behavior to maximize reward as a result of interactions with the environment is crucial for the survival of any higher organism. In the framework of reinforcement learning, temporal-difference learning algorithms provide an effective strategy for such goal-directed adaptation, but it is unclear to what extent these algorithms are compatible with neural computation. In this article, we present a spiking neural network model that implements actor-critic temporal-difference learning by combining local plasticity rules with a global reward signal. The network is capable of solving a nontrivial gridworld task with sparse rewards. We derive a quantitative mapping of plasticity parameters and synaptic weights to the corresponding variables in the standard algorithmic formulation and demonstrate that the network learns with a similar speed to its discrete time counterpart and attains the same equilibrium performance.
Collapse
Affiliation(s)
- Wiebke Potjans
- Computational Neuroscience Group, RIKEN Brain Science Institute, Wako City, Saitama 351-0198, Japan.
| | | | | |
Collapse
|
12
|
Kolodziejski C, Porr B, Wörgötter F. On the Asymptotic Equivalence Between Differential Hebbian and Temporal Difference Learning. Neural Comput 2009; 21:1173-202. [DOI: 10.1162/neco.2008.04-08-750] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
In this theoretical contribution, we provide mathematical proof that two of the most important classes of network learning—correlation-based differential Hebbian learning and reward-based temporal difference learning—are asymptotically equivalent when timing the learning with a modulatory signal. This opens the opportunity to consistently reformulate most of the abstract reinforcement learning framework from a correlation-based perspective more closely related to the biophysics of neurons.
Collapse
Affiliation(s)
- Christoph Kolodziejski
- Bernstein Center for Computational Neuroscience, University of Göttingen, 37073 Göttingen, Germany
| | - Bernd Porr
- Department of Electronics and Electrical Engineering, University of Glasgow, Glasgow, Scotland
| | - Florentin Wörgötter
- Bernstein Center for Computational Neuroscience, University of Göttingen, 37073 Göttingen, Germany
| |
Collapse
|
13
|
Kolodziejski C, Porr B, Wörgötter F. Mathematical properties of neuronal TD-rules and differential Hebbian learning: a comparison. BIOLOGICAL CYBERNETICS 2008; 98:259-272. [PMID: 18196266 PMCID: PMC2798052 DOI: 10.1007/s00422-007-0209-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2007] [Accepted: 12/19/2007] [Indexed: 05/25/2023]
Abstract
A confusingly wide variety of temporally asymmetric learning rules exists related to reinforcement learning and/or to spike-timing dependent plasticity, many of which look exceedingly similar, while displaying strongly different behavior. These rules often find their use in control tasks, for example in robotics and for this rigorous convergence and numerical stability is required. The goal of this article is to review these rules and compare them to provide a better overview over their different properties. Two main classes will be discussed: temporal difference (TD) rules and correlation based (differential hebbian) rules and some transition cases. In general we will focus on neuronal implementations with changeable synaptic weights and a time-continuous representation of activity. In a machine learning (non-neuronal) context, for TD-learning a solid mathematical theory has existed since several years. This can partly be transferred to a neuronal framework, too. On the other hand, only now a more complete theory has also emerged for differential Hebb rules. In general rules differ by their convergence conditions and their numerical stability, which can lead to very undesirable behavior, when wanting to apply them. For TD, convergence can be enforced with a certain output condition assuring that the delta-error drops on average to zero (output control). Correlation based rules, on the other hand, converge when one input drops to zero (input control). Temporally asymmetric learning rules treat situations where incoming stimuli follow each other in time. Thus, it is necessary to remember the first stimulus to be able to relate it to the later occurring second one. To this end different types of so-called eligibility traces are being used by these two different types of rules. This aspect leads again to different properties of TD and differential Hebbian learning as discussed here. Thus, this paper, while also presenting several novel mathematical results, is mainly meant to provide a road map through the different neuronally emulated temporal asymmetrical learning rules and their behavior to provide some guidance for possible applications.
Collapse
Affiliation(s)
- Christoph Kolodziejski
- Bernstein Center for Computational Neuroscience, University of Göttingen, Bunsenstr. 10, 37073 Göttingen, Germany
| | - Bernd Porr
- Department of Electronics and Electrical Engineering, University of Glasgow, Glasgow, GT12 8LT Scotland
| | - Florentin Wörgötter
- Bernstein Center for Computational Neuroscience, University of Göttingen, Bunsenstr. 10, 37073 Göttingen, Germany
| |
Collapse
|