Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Porr B, Wörgötter F. Learning with "relevance": using a third factor to stabilize Hebbian learning. Neural Comput 2007;19:2694-719. [PMID: 17716008 DOI: 10.1162/neco.2007.19.10.2694] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]

For:	Porr B, Wörgötter F. Learning with "relevance": using a third factor to stabilize Hebbian learning. Neural Comput 2007;19:2694-719. [PMID: 17716008 DOI: 10.1162/neco.2007.19.10.2694] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]

Number

Cited by Other Article(s)

Zajzon B, Duarte R, Morrison A. Toward reproducible models of sequence learning: replication and analysis of a modular spiking network with reward-based learning. Front Integr Neurosci 2023;17:935177. [PMID: 37396571 PMCID: PMC10310927 DOI: 10.3389/fnint.2023.935177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 05/15/2023] [Indexed: 07/04/2023] Open

Weidel P, Duarte R, Morrison A. Unsupervised Learning and Clustered Connectivity Enhance Reinforcement Learning in Spiking Neural Networks. Front Comput Neurosci 2021;15:543872. [PMID: 33746728 PMCID: PMC7970044 DOI: 10.3389/fncom.2021.543872] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Accepted: 02/08/2021] [Indexed: 11/13/2022] Open

Huang J, Ruan X, Yu N, Fan Q, Li J, Cai J. A Cognitive Model Based on Neuromodulated Plasticity. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2016;2016:4296356. [PMID: 27872638 PMCID: PMC5107251 DOI: 10.1155/2016/4296356] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/06/2016] [Revised: 07/17/2016] [Accepted: 09/22/2016] [Indexed: 11/18/2022]

Soltoggio A, Lemme A, Reinhart F, Steil JJ. Rare neural correlations implement robotic conditioning with delayed rewards and disturbances. Front Neurorobot 2013;7:6. [PMID: 23565092 PMCID: PMC3613617 DOI: 10.3389/fnbot.2013.00006] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2012] [Accepted: 03/06/2013] [Indexed: 11/13/2022] Open

Soltoggio A, Steil JJ. Solving the Distal Reward Problem with Rare Correlations. Neural Comput 2013;25:940-78. [DOI: 10.1162/neco_a_00419] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]

Ungureanu M, Stoliar P, Llopis R, Casanova F, Hueso LE. Non-Hebbian learning implementation in light-controlled resistive memory devices. PLoS One 2012;7:e52042. [PMID: 23251679 PMCID: PMC3522635 DOI: 10.1371/journal.pone.0052042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2012] [Accepted: 11/08/2012] [Indexed: 11/19/2022] Open

From modulated Hebbian plasticity to simple behavior learning through noise and weight saturation. Neural Netw 2012;34:28-41. [DOI: 10.1016/j.neunet.2012.06.005] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2011] [Revised: 06/08/2012] [Accepted: 06/17/2012] [Indexed: 11/21/2022]

An imperfect dopaminergic error signal can drive temporal-difference learning. PLoS Comput Biol 2011;7:e1001133. [PMID: 21589888 PMCID: PMC3093351 DOI: 10.1371/journal.pcbi.1001133] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2010] [Accepted: 04/06/2011] [Indexed: 12/03/2022] Open

Abstract

An open problem in the field of computational neuroscience is how to link synaptic plasticity to system-level learning. A promising framework in this context is temporal-difference (TD) learning. Experimental evidence that supports the hypothesis that the mammalian brain performs temporal-difference learning includes the resemblance of the phasic activity of the midbrain dopaminergic neurons to the TD error and the discovery that cortico-striatal synaptic plasticity is modulated by dopamine. However, as the phasic dopaminergic signal does not reproduce all the properties of the theoretical TD error, it is unclear whether it is capable of driving behavior adaptation in complex tasks. Here, we present a spiking temporal-difference learning model based on the actor-critic architecture. The model dynamically generates a dopaminergic signal with realistic firing rates and exploits this signal to modulate the plasticity of synapses as a third factor. The predictions of our proposed plasticity dynamics are in good agreement with experimental results with respect to dopamine, pre- and post-synaptic activity. An analytical mapping from the parameters of our proposed plasticity dynamics to those of the classical discrete-time TD algorithm reveals that the biological constraints of the dopaminergic signal entail a modified TD algorithm with self-adapting learning parameters and an adapting offset. We show that the neuronal network is able to learn a task with sparse positive rewards as fast as the corresponding classical discrete-time TD algorithm. However, the performance of the neuronal network is impaired with respect to the traditional algorithm on a task with both positive and negative rewards and breaks down entirely on a task with purely negative rewards. Our model demonstrates that the asymmetry of a realistic dopaminergic signal enables TD learning when learning is driven by positive rewards but not when driven by negative rewards.

What are the physiological changes that take place in the brain when we solve a problem or learn a new skill? It is commonly assumed that behavior adaptations are realized on the microscopic level by changes in synaptic efficacies. However, this is hard to verify experimentally due to the difficulties of identifying the relevant synapses and monitoring them over long periods during a behavioral task. To address this question computationally, we develop a spiking neuronal network model of actor-critic temporal-difference learning, a variant of reinforcement learning for which neural correlates have already been partially established. The network learns a complex task by means of an internally generated reward signal constrained by recent findings on the dopaminergic system. Our model combines top-down and bottom-up modelling approaches to bridge the gap between synaptic plasticity and system-level learning. It paves the way for further investigations of the dopaminergic system in reward learning in the healthy brain and in pathological conditions such as Parkinson's disease, and can be used as a module in functional models based on brain-scale circuitry.

Collapse

Duff A, Verschure PF. Unifying perceptual and behavioral learning with a correlative subspace learning rule. Neurocomputing 2010. [DOI: 10.1016/j.neucom.2009.11.048] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Shimoda S, Kimura H. Biomimetic approach to tacit learning based on compound control. ACTA ACUST UNITED AC 2009;40:77-90. [PMID: 19651559 DOI: 10.1109/tsmcb.2009.2014470] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Potjans W, Morrison A, Diesmann M. A spiking neural network model of an actor-critic learning agent. Neural Comput 2009;21:301-39. [PMID: 19196231 DOI: 10.1162/neco.2008.08-07-593] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]

Kolodziejski C, Porr B, Wörgötter F. On the Asymptotic Equivalence Between Differential Hebbian and Temporal Difference Learning. Neural Comput 2009;21:1173-202. [DOI: 10.1162/neco.2008.04-08-750] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]

Kolodziejski C, Porr B, Wörgötter F. Mathematical properties of neuronal TD-rules and differential Hebbian learning: a comparison. BIOLOGICAL CYBERNETICS 2008;98:259-272. [PMID: 18196266 PMCID: PMC2798052 DOI: 10.1007/s00422-007-0209-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2007] [Accepted: 12/19/2007] [Indexed: 05/25/2023]

Abstract

A confusingly wide variety of temporally asymmetric learning rules exists related to reinforcement learning and/or to spike-timing dependent plasticity, many of which look exceedingly similar, while displaying strongly different behavior. These rules often find their use in control tasks, for example in robotics and for this rigorous convergence and numerical stability is required. The goal of this article is to review these rules and compare them to provide a better overview over their different properties. Two main classes will be discussed: temporal difference (TD) rules and correlation based (differential hebbian) rules and some transition cases. In general we will focus on neuronal implementations with changeable synaptic weights and a time-continuous representation of activity. In a machine learning (non-neuronal) context, for TD-learning a solid mathematical theory has existed since several years. This can partly be transferred to a neuronal framework, too. On the other hand, only now a more complete theory has also emerged for differential Hebb rules. In general rules differ by their convergence conditions and their numerical stability, which can lead to very undesirable behavior, when wanting to apply them. For TD, convergence can be enforced with a certain output condition assuring that the delta-error drops on average to zero (output control). Correlation based rules, on the other hand, converge when one input drops to zero (input control). Temporally asymmetric learning rules treat situations where incoming stimuli follow each other in time. Thus, it is necessary to remember the first stimulus to be able to relate it to the later occurring second one. To this end different types of so-called eligibility traces are being used by these two different types of rules. This aspect leads again to different properties of TD and differential Hebbian learning as discussed here. Thus, this paper, while also presenting several novel mathematical results, is mainly meant to provide a road map through the different neuronally emulated temporal asymmetrical learning rules and their behavior to provide some guidance for possible applications.

Collapse