Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Vasilaki E, Frémaux N, Urbanczik R, Senn W, Gerstner W. Spike-based reinforcement learning in continuous state and action space: when policy gradient methods fail. PLoS Comput Biol 2009;5:e1000586. [PMID: 19997492 PMCID: PMC2778872 DOI: 10.1371/journal.pcbi.1000586] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2009] [Accepted: 10/30/2009] [Indexed: 12/01/2022] Open

For:	Vasilaki E, Frémaux N, Urbanczik R, Senn W, Gerstner W. Spike-based reinforcement learning in continuous state and action space: when policy gradient methods fail. PLoS Comput Biol 2009;5:e1000586. [PMID: 19997492 PMCID: PMC2778872 DOI: 10.1371/journal.pcbi.1000586] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2009] [Accepted: 10/30/2009] [Indexed: 12/01/2022] Open

Number

Cited by Other Article(s)

Subramoney A, Bellec G, Scherr F, Legenstein R, Maass W. Fast learning without synaptic plasticity in spiking neural networks. Sci Rep 2024;14:8557. [PMID: 38609429 PMCID: PMC11015027 DOI: 10.1038/s41598-024-55769-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 02/27/2024] [Indexed: 04/14/2024] Open

Scott DN, Frank MJ. Adaptive control of synaptic plasticity integrates micro- and macroscopic network function. Neuropsychopharmacology 2023;48:121-144. [PMID: 36038780 PMCID: PMC9700774 DOI: 10.1038/s41386-022-01374-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 06/23/2022] [Accepted: 06/24/2022] [Indexed: 11/09/2022]

Whelan MT, Jimenez-Rodriguez A, Prescott TJ, Vasilaki E. A robotic model of hippocampal reverse replay for reinforcement learning. BIOINSPIRATION & BIOMIMETICS 2022;18:015007. [PMID: 36327454 DOI: 10.1088/1748-3190/ac9ffc] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Accepted: 11/03/2022] [Indexed: 06/16/2023]

Manneschi L, Gigante G, Vasilaki E, Del Giudice P. Signal neutrality, scalar property, and collapsing boundaries as consequences of a learned multi-timescale strategy. PLoS Comput Biol 2022;18:e1009393. [PMID: 35930590 PMCID: PMC9462745 DOI: 10.1371/journal.pcbi.1009393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Revised: 09/09/2022] [Accepted: 06/08/2022] [Indexed: 11/18/2022] Open

Yang S, Gao T, Wang J, Deng B, Azghadi MR, Lei T, Linares-Barranco B. Self-Adaptive Multicompartment: A Unified Self-Adaptive Multicompartmental Spiking Neuron Model for Learning With Working Memory. Front Neurosci 2022;16:850945. [PMID: 35527819 PMCID: PMC9074872 DOI: 10.3389/fnins.2022.850945] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2022] [Accepted: 03/15/2022] [Indexed: 11/13/2022] Open

Abstract

Working memory is a fundamental feature of biological brains for perception, cognition, and learning. In addition, learning with working memory, which has been show in conventional artificial intelligence systems through recurrent neural networks, is instrumental to advanced cognitive intelligence. However, it is hard to endow a simple neuron model with working memory, and to understand the biological mechanisms that have resulted in such a powerful ability at the neuronal level. This article presents a novel self-adaptive multicompartment spiking neuron model, referred to as SAM, for spike-based learning with working memory. SAM integrates four major biological principles including sparse coding, dendritic non-linearity, intrinsic self-adaptive dynamics, and spike-driven learning. We first describe SAM’s design and explore the impacts of critical parameters on its biological dynamics. We then use SAM to build spiking networks to accomplish several different tasks including supervised learning of the MNIST dataset using sequential spatiotemporal encoding, noisy spike pattern classification, sparse coding during pattern classification, spatiotemporal feature detection, meta-learning with working memory applied to a navigation task and the MNIST classification task, and working memory for spatiotemporal learning. Our experimental results highlight the energy efficiency and robustness of SAM in these wide range of challenging tasks. The effects of SAM model variations on its working memory are also explored, hoping to offer insight into the biological mechanisms underlying working memory in the brain. The SAM model is the first attempt to integrate the capabilities of spike-driven learning and working memory in a unified single neuron with multiple timescale dynamics. The competitive performance of SAM could potentially contribute to the development of efficient adaptive neuromorphic computing systems for various applications from robotics to edge computing.

Collapse

Jordan J, Schmidt M, Senn W, Petrovici MA. Evolving interpretable plasticity for spiking networks. eLife 2021;10:66273. [PMID: 34709176 PMCID: PMC8553337 DOI: 10.7554/elife.66273] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Accepted: 08/19/2021] [Indexed: 11/25/2022] Open

Abstract

Continuous adaptation allows survival in an ever-changing world. Adjustments in the synaptic coupling strength between neurons are essential for this capability, setting us apart from simpler, hard-wired organisms. How these changes can be mathematically described at the phenomenological level, as so-called ‘plasticity rules’, is essential both for understanding biological information processing and for developing cognitively performant artificial systems. We suggest an automated approach for discovering biophysically plausible plasticity rules based on the definition of task families, associated performance measures and biophysical constraints. By evolving compact symbolic expressions, we ensure the discovered plasticity rules are amenable to intuitive understanding, fundamental for successful communication and human-guided generalization. We successfully apply our approach to typical learning scenarios and discover previously unknown mechanisms for learning efficiently from rewards, recover efficient gradient-descent methods for learning from target signals, and uncover various functionally equivalent STDP-like rules with tuned homeostatic mechanisms.

Our brains are incredibly adaptive. Every day we form memories, acquire new knowledge or refine existing skills. This stands in contrast to our current computers, which typically can only perform pre-programmed actions. Our own ability to adapt is the result of a process called synaptic plasticity, in which the strength of the connections between neurons can change. To better understand brain function and build adaptive machines, researchers in neuroscience and artificial intelligence (AI) are modeling the underlying mechanisms.

So far, most work towards this goal was guided by human intuition – that is, by the strategies scientists think are most likely to succeed. Despite the tremendous progress, this approach has two drawbacks. First, human time is limited and expensive. And second, researchers have a natural – and reasonable – tendency to incrementally improve upon existing models, rather than starting from scratch.

Jordan, Schmidt et al. have now developed a new approach based on ‘evolutionary algorithms’. These computer programs search for solutions to problems by mimicking the process of biological evolution, such as the concept of survival of the fittest. The approach exploits the increasing availability of cheap but powerful computers. Compared to its predecessors (or indeed human brains), it also uses search strategies that are less biased by previous models.

The evolutionary algorithms were presented with three typical learning scenarios. In the first, the computer had to spot a repeating pattern in a continuous stream of input without receiving feedback on how well it was doing. In the second scenario, the computer received virtual rewards whenever it behaved in the desired manner – an example of reinforcement learning. Finally, in the third ‘supervised learning’ scenario, the computer was told exactly how much its behavior deviated from the desired behavior. For each of these scenarios, the evolutionary algorithms were able to discover mechanisms of synaptic plasticity to solve the new task successfully.

Using evolutionary algorithms to study how computers ‘learn’ will provide new insights into how brains function in health and disease. It could also pave the way for developing intelligent machines that can better adapt to the needs of their users.

Collapse

Zambrano D, Roelfsema PR, Bohte S. Learning continuous-time working memory tasks with on-policy neural reinforcement learning. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.11.072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

An alternative to backpropagation through time. NAT MACH INTELL 2020. [DOI: 10.1038/s42256-020-0162-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Towards spike-based machine intelligence with neuromorphic computing. Nature 2019;575:607-617. [PMID: 31776490 DOI: 10.1038/s41586-019-1677-2] [Citation(s) in RCA: 292] [Impact Index Per Article: 58.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Accepted: 07/09/2019] [Indexed: 11/08/2022]

Jordan J, Weidel P, Morrison A. A Closed-Loop Toolchain for Neural Network Simulations of Learning Autonomous Agents. Front Comput Neurosci 2019;13:46. [PMID: 31427939 PMCID: PMC6687756 DOI: 10.3389/fncom.2019.00046] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Accepted: 06/25/2019] [Indexed: 11/17/2022] Open

Bing Z, Baumann I, Jiang Z, Huang K, Cai C, Knoll A. Supervised Learning in SNN via Reward-Modulated Spike-Timing-Dependent Plasticity for a Target Reaching Vehicle. Front Neurorobot 2019;13:18. [PMID: 31130854 PMCID: PMC6509616 DOI: 10.3389/fnbot.2019.00018] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Accepted: 04/15/2019] [Indexed: 11/16/2022] Open

Wunderlich T, Kungl AF, Müller E, Hartel A, Stradmann Y, Aamir SA, Grübl A, Heimbrecht A, Schreiber K, Stöckel D, Pehle C, Billaudelle S, Kiene G, Mauch C, Schemmel J, Meier K, Petrovici MA. Demonstrating Advantages of Neuromorphic Computation: A Pilot Study. Front Neurosci 2019;13:260. [PMID: 30971881 PMCID: PMC6444279 DOI: 10.3389/fnins.2019.00260] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Accepted: 03/05/2019] [Indexed: 11/26/2022] Open

Affiliation(s)

Timo Wunderlich Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
Akos F Kungl Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
Eric Müller Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
Andreas Hartel Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
Yannik Stradmann Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
Syed Ahmed Aamir Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
Andreas Grübl Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
Arthur Heimbrecht Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
Korbinian Schreiber Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
David Stöckel Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
Christian Pehle Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
Sebastian Billaudelle Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
Gerd Kiene Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
Christian Mauch Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
Johannes Schemmel Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
Karlheinz Meier Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany
Mihai A Petrovici Department of Physics, Kirchhoff Institute for Physics, Heidelberg University, Heidelberg, Germany.,Department of Physiology, University of Bern, Bern, Switzerland

Collapse

Mozafari M, Kheradpisheh SR, Masquelier T, Nowzari-Dalini A, Ganjtabesh M. First-Spike-Based Visual Categorization Using Reward-Modulated STDP. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018;29:6178-6190. [PMID: 29993898 DOI: 10.1109/tnnls.2018.2826721] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

John RA, Tiwari N, Yaoyi C, Tiwari N, Kulkarni M, Nirmal A, Nguyen AC, Basu A, Mathews N. Ultralow Power Dual-Gated Subthreshold Oxide Neuristors: An Enabler for Higher Order Neuronal Temporal Correlations. ACS NANO 2018;12:11263-11273. [PMID: 30395439 DOI: 10.1021/acsnano.8b05903] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Richards BA, Lillicrap TP. Dendritic solutions to the credit assignment problem. Curr Opin Neurobiol 2018;54:28-36. [PMID: 30205266 DOI: 10.1016/j.conb.2018.08.003] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Revised: 07/19/2018] [Accepted: 08/07/2018] [Indexed: 11/27/2022]

Cope AJ, Vasilaki E, Minors D, Sabo C, Marshall JAR, Barron AB. Abstract concept learning in a simple neural network inspired by the insect brain. PLoS Comput Biol 2018;14:e1006435. [PMID: 30222735 PMCID: PMC6160224 DOI: 10.1371/journal.pcbi.1006435] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Revised: 09/27/2018] [Accepted: 08/15/2018] [Indexed: 12/24/2022] Open

Martinolli M, Gerstner W, Gilra A. Multi-Timescale Memory Dynamics Extend Task Repertoire in a Reinforcement Learning Network With Attention-Gated Memory. Front Comput Neurosci 2018;12:50. [PMID: 30061819 PMCID: PMC6055065 DOI: 10.3389/fncom.2018.00050] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Accepted: 06/18/2018] [Indexed: 11/13/2022] Open

Gerstner W, Lehmann M, Liakoni V, Corneil D, Brea J. Eligibility Traces and Plasticity on Behavioral Time Scales: Experimental Support of NeoHebbian Three-Factor Learning Rules. Front Neural Circuits 2018;12:53. [PMID: 30108488 PMCID: PMC6079224 DOI: 10.3389/fncir.2018.00053] [Citation(s) in RCA: 112] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Accepted: 06/19/2018] [Indexed: 11/13/2022] Open

Bing Z, Meschede C, Röhrbein F, Huang K, Knoll AC. A Survey of Robotics Control Based on Learning-Inspired Spiking Neural Networks. Front Neurorobot 2018;12:35. [PMID: 30034334 PMCID: PMC6043678 DOI: 10.3389/fnbot.2018.00035] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Accepted: 06/14/2018] [Indexed: 11/30/2022] Open

Zannone S, Brzosko Z, Paulsen O, Clopath C. Acetylcholine-modulated plasticity in reward-driven navigation: a computational study. Sci Rep 2018;8:9486. [PMID: 29930322 PMCID: PMC6013476 DOI: 10.1038/s41598-018-27393-2] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Accepted: 05/29/2018] [Indexed: 11/08/2022] Open

Gönner L, Vitay J, Hamker FH. Predictive Place-Cell Sequences for Goal-Finding Emerge from Goal Memory and the Cognitive Map: A Computational Model. Front Comput Neurosci 2017;11:84. [PMID: 29075187 PMCID: PMC5643423 DOI: 10.3389/fncom.2017.00084] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Accepted: 09/01/2017] [Indexed: 01/19/2023] Open

Sanda P, Skorheim S, Bazhenov M. Multi-layer network utilizing rewarded spike time dependent plasticity to learn a foraging task. PLoS Comput Biol 2017;13:e1005705. [PMID: 28961245 PMCID: PMC5636167 DOI: 10.1371/journal.pcbi.1005705] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Revised: 10/11/2017] [Accepted: 07/26/2017] [Indexed: 12/01/2022] Open

Abstract

Neural networks with a single plastic layer employing reward modulated spike time dependent plasticity (STDP) are capable of learning simple foraging tasks. Here we demonstrate advanced pattern discrimination and continuous learning in a network of spiking neurons with multiple plastic layers. The network utilized both reward modulated and non-reward modulated STDP and implemented multiple mechanisms for homeostatic regulation of synaptic efficacy, including heterosynaptic plasticity, gain control, output balancing, activity normalization of rewarded STDP and hard limits on synaptic strength. We found that addition of a hidden layer of neurons employing non-rewarded STDP created neurons that responded to the specific combinations of inputs and thus performed basic classification of the input patterns. When combined with a following layer of neurons implementing rewarded STDP, the network was able to learn, despite the absence of labeled training data, discrimination between rewarding patterns and the patterns designated as punishing. Synaptic noise allowed for trial-and-error learning that helped to identify the goal-oriented strategies which were effective in task solving. The study predicts a critical set of properties of the spiking neuronal network with STDP that was sufficient to solve a complex foraging task involving pattern classification and decision making.

This study explores how intelligent behavior emerges from the basic principles known at the cellular level of biological neuronal network dynamics. Compared to the approaches used in the artificial intelligence community, we applied biologically realistic modeling of neuronal dynamics and plasticity. The building blocks of the model are spiking neurons, spike-time dependent plasticity (STDP) and homeostatic rules, known experimentally, which are shown to play a fundamental role in both keeping the network stable and capable of continous learning. Our study predicts that a combination of these principles makes possible a foraging behavior in a previously unknown environment, including pattern classification to distinct between environment shapes which are rewarded and those which are punished and decision making to select the optimal strategy to acquire the maximal number of the rewarded elements. To solve this complex task we used multi-layer neuronal processing that implemented pattern generalization by unsupervised STDP at the earlier processing step, as commonly observed in the animal and human sensory processing, followed by reinforcement learning at the later steps. In the model, the intelligent behavior emerged spontaneously due to the network organization implementing both local unsupervised plasticity and reward feedback resulting from a successful behavior in the environment.

Collapse

Pande S, Morgan F, Krewer F, Harkin J, McDaid L, McGinley B. Rapid application prototyping for hardware modular spiking neural network architectures. Neural Comput Appl 2017. [DOI: 10.1007/s00521-015-2136-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Brzosko Z, Zannone S, Schultz W, Clopath C, Paulsen O. Sequential neuromodulation of Hebbian plasticity offers mechanism for effective reward-based navigation. eLife 2017;6. [PMID: 28691903 PMCID: PMC5546805 DOI: 10.7554/elife.27756] [Citation(s) in RCA: 61] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2017] [Accepted: 07/07/2017] [Indexed: 11/14/2022] Open

Rasmussen D, Voelker A, Eliasmith C. A neural model of hierarchical reinforcement learning. PLoS One 2017;12:e0180234. [PMID: 28683111 PMCID: PMC5500327 DOI: 10.1371/journal.pone.0180234] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2016] [Accepted: 06/12/2017] [Indexed: 11/19/2022] Open

A computational model of the integration of landmarks and motion in the insect central complex. PLoS One 2017;12:e0172325. [PMID: 28241061 PMCID: PMC5328262 DOI: 10.1371/journal.pone.0172325] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2016] [Accepted: 02/02/2017] [Indexed: 11/19/2022] Open

Building functional networks of spiking model neurons. Nat Neurosci 2016;19:350-5. [PMID: 26906501 DOI: 10.1038/nn.4241] [Citation(s) in RCA: 89] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2015] [Accepted: 01/11/2016] [Indexed: 12/14/2022]

Frémaux N, Gerstner W. Neuromodulated Spike-Timing-Dependent Plasticity, and Theory of Three-Factor Learning Rules. Front Neural Circuits 2016;9:85. [PMID: 26834568 PMCID: PMC4717313 DOI: 10.3389/fncir.2015.00085] [Citation(s) in RCA: 138] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2015] [Accepted: 12/14/2015] [Indexed: 11/13/2022] Open

Brosch T, Neumann H, Roelfsema PR. Reinforcement Learning of Linking and Tracing Contours in Recurrent Neural Networks. PLoS Comput Biol 2015;11:e1004489. [PMID: 26496502 PMCID: PMC4619762 DOI: 10.1371/journal.pcbi.1004489] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2015] [Accepted: 08/05/2015] [Indexed: 11/30/2022] Open

Abstract

The processing of a visual stimulus can be subdivided into a number of stages. Upon stimulus presentation there is an early phase of feedforward processing where the visual information is propagated from lower to higher visual areas for the extraction of basic and complex stimulus features. This is followed by a later phase where horizontal connections within areas and feedback connections from higher areas back to lower areas come into play. In this later phase, image elements that are behaviorally relevant are grouped by Gestalt grouping rules and are labeled in the cortex with enhanced neuronal activity (object-based attention in psychology). Recent neurophysiological studies revealed that reward-based learning influences these recurrent grouping processes, but it is not well understood how rewards train recurrent circuits for perceptual organization. This paper examines the mechanisms for reward-based learning of new grouping rules. We derive a learning rule that can explain how rewards influence the information flow through feedforward, horizontal and feedback connections. We illustrate the efficiency with two tasks that have been used to study the neuronal correlates of perceptual organization in early visual cortex. The first task is called contour-integration and demands the integration of collinear contour elements into an elongated curve. We show how reward-based learning causes an enhancement of the representation of the to-be-grouped elements at early levels of a recurrent neural network, just as is observed in the visual cortex of monkeys. The second task is curve-tracing where the aim is to determine the endpoint of an elongated curve composed of connected image elements. If trained with the new learning rule, neural networks learn to propagate enhanced activity over the curve, in accordance with neurophysiological data. We close the paper with a number of model predictions that can be tested in future neurophysiological and computational studies.

Our experience with the visual world allows us to group image elements that belong to the same perceptual object and to segregate them from other objects and the background. If subjects learn to group contour elements, this experience influences neuronal activity in early visual cortical areas, including the primary visual cortex (V1). Learning presumably depends on alterations in the pattern of connections within and between areas of the visual cortex. However, the processes that control changes in connectivity are not well understood. Here we present the first computational model that can train a neural network to integrate collinear contour elements into elongated curves and to trace a curve through the visual field. The new learning algorithm trains fully recurrent neural networks, provided the connectivity causes the networks to reach a stable state. The model reproduces the behavioral performance of monkeys trained in these tasks and explains the patterns of neuronal activity in the visual cortex that emerge during learning, which is remarkable because the only feedback for the model is a reward for successful trials. We discuss a number of the model predictions that can be tested in future neuroscientific work.

Collapse

Gehring TV, Luksys G, Sandi C, Vasilaki E. Detailed classification of swimming paths in the Morris Water Maze: multiple strategies within one trial. Sci Rep 2015;5:14562. [PMID: 26423140 PMCID: PMC4589698 DOI: 10.1038/srep14562] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2015] [Accepted: 08/26/2015] [Indexed: 10/29/2022] Open

Rasku J, Pyykkö I, Levo H, Kentala E, Manchaiah V. Disease Profiling for Computerized Peer Support of Ménière's Disease. JMIR Rehabil Assist Technol 2015;2:e9. [PMID: 28582248 PMCID: PMC5454554 DOI: 10.2196/rehab.4109] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Revised: 06/16/2015] [Accepted: 07/12/2015] [Indexed: 01/09/2023] Open

Barron AB, Gurney KN, Meah LFS, Vasilaki E, Marshall JAR. Decision-making and action selection in insects: inspiration from vertebrate-based theories. Front Behav Neurosci 2015;9:216. [PMID: 26347627 PMCID: PMC4539514 DOI: 10.3389/fnbeh.2015.00216] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2015] [Accepted: 07/30/2015] [Indexed: 11/13/2022] Open

Abstract

Effective decision-making, one of the most crucial functions of the brain, entails the analysis of sensory information and the selection of appropriate behavior in response to stimuli. Here, we consider the current state of knowledge on the mechanisms of decision-making and action selection in the insect brain, with emphasis on the olfactory processing system. Theoretical and computational models of decision-making emphasize the importance of using inhibitory connections to couple evidence-accumulating pathways; this coupling allows for effective discrimination between competing alternatives and thus enables a decision maker to reach a stable unitary decision. Theory also shows that the coupling of pathways can be implemented using a variety of different mechanisms and vastly improves the performance of decision-making systems. The vertebrate basal ganglia appear to resolve stable action selection by being a point of convergence for multiple excitatory and inhibitory inputs such that only one possible response is selected and all other alternatives are suppressed. Similar principles appear to operate within the insect brain. The insect lateral protocerebrum (LP) serves as a point of convergence for multiple excitatory and inhibitory channels of olfactory information to effect stable decision and action selection, at least for olfactory information. The LP is a rather understudied region of the insect brain, yet this premotor region may be key to effective resolution of action section. We argue that it may be beneficial to use models developed to explore the operation of the vertebrate brain as inspiration when considering action selection in the invertebrate domain. Such an approach may facilitate the proposal of new hypotheses and furthermore frame experimental studies for how decision-making and action selection might be achieved in insects.

Collapse

Kocaturk M, Gulcur HO, Canbeyli R. Toward Building Hybrid Biological/in silico Neural Networks for Motor Neuroprosthetic Control. Front Neurorobot 2015;9:8. [PMID: 26321943 PMCID: PMC4531252 DOI: 10.3389/fnbot.2015.00008] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2015] [Accepted: 07/15/2015] [Indexed: 11/13/2022] Open

Choice-correlated activity fluctuations underlie learning of neuronal category representation. Nat Commun 2015;6:6454. [PMID: 25759251 PMCID: PMC4382677 DOI: 10.1038/ncomms7454] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2014] [Accepted: 01/29/2015] [Indexed: 11/30/2022] Open

Esposito U, Giugliano M, Vasilaki E. Adaptation of short-term plasticity parameters via error-driven learning may explain the correlation between activity-dependent synaptic properties, connectivity motifs and target specificity. Front Comput Neurosci 2015;8:175. [PMID: 25688203 PMCID: PMC4310301 DOI: 10.3389/fncom.2014.00175] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2014] [Accepted: 12/31/2014] [Indexed: 01/09/2023] Open

Shah A, Gurney KN. Finding minimal action sequences with a simple evaluation of actions. Front Comput Neurosci 2014;8:151. [PMID: 25506326 PMCID: PMC4247113 DOI: 10.3389/fncom.2014.00151] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2013] [Accepted: 11/03/2014] [Indexed: 11/13/2022] Open

Abstract

Animals are able to discover the minimal number of actions that achieves an outcome (the minimal action sequence). In most accounts of this, actions are associated with a measure of behavior that is higher for actions that lead to the outcome with a shorter action sequence, and learning mechanisms find the actions associated with the highest measure. In this sense, previous accounts focus on more than the simple binary signal of "was the outcome achieved?"; they focus on "how well was the outcome achieved?" However, such mechanisms may not govern all types of behavioral development. In particular, in the process of action discovery (Redgrave and Gurney, 2006), actions are reinforced if they simply lead to a salient outcome because biological reinforcement signals occur too quickly to evaluate the consequences of an action beyond an indication of the outcome's occurrence. Thus, action discovery mechanisms focus on the simple evaluation of "was the outcome achieved?" and not "how well was the outcome achieved?" Notwithstanding this impoverishment of information, can the process of action discovery find the minimal action sequence? We address this question by implementing computational mechanisms, referred to in this paper as no-cost learning rules, in which each action that leads to the outcome is associated with the same measure of behavior. No-cost rules focus on "was the outcome achieved?" and are consistent with action discovery. No-cost rules discover the minimal action sequence in simulated tasks and execute it for a substantial amount of time. Extensive training, however, results in extraneous actions, suggesting that a separate process (which has been proposed in action discovery) must attenuate learning if no-cost rules participate in behavioral development. We describe how no-cost rules develop behavior, what happens when attenuation is disrupted, and relate the new mechanisms to wider computational and biological context.

Collapse

Esposito U, Giugliano M, van Rossum M, Vasilaki E. Measuring symmetry, asymmetry and randomness in neural network connectivity. PLoS One 2014;9:e100805. [PMID: 25006663 PMCID: PMC4090069 DOI: 10.1371/journal.pone.0100805] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2013] [Accepted: 05/29/2014] [Indexed: 11/19/2022] Open

Abstract

Cognitive functions are stored in the connectome, the wiring diagram of the brain, which exhibits non-random features, so-called motifs. In this work, we focus on bidirectional, symmetric motifs, i.e. two neurons that project to each other via connections of equal strength, and unidirectional, non-symmetric motifs, i.e. within a pair of neurons only one neuron projects to the other. We hypothesise that such motifs have been shaped via activity dependent synaptic plasticity processes. As a consequence, learning moves the distribution of the synaptic connections away from randomness. Our aim is to provide a global, macroscopic, single parameter characterisation of the statistical occurrence of bidirectional and unidirectional motifs. To this end we define a symmetry measure that does not require any a priori thresholding of the weights or knowledge of their maximal value. We calculate its mean and variance for random uniform or Gaussian distributions, which allows us to introduce a confidence measure of how significantly symmetric or asymmetric a specific configuration is, i.e. how likely it is that the configuration is the result of chance. We demonstrate the discriminatory power of our symmetry measure by inspecting the eigenvalues of different types of connectivity matrices. We show that a Gaussian weight distribution biases the connectivity motifs to more symmetric configurations than a uniform distribution and that introducing a random synaptic pruning, mimicking developmental regulation in synaptogenesis, biases the connectivity motifs to more asymmetric configurations, regardless of the distribution. We expect that our work will benefit the computational modelling community, by providing a systematic way to characterise symmetry and asymmetry in network structures. Further, our symmetry measure will be of use to electrophysiologists that investigate symmetry of network connectivity.

Collapse

FRIEDRICH JOHANNES, URBANCZIK ROBERT, SENN WALTER. CODE-SPECIFIC LEARNING RULES IMPROVE ACTION SELECTION BY POPULATIONS OF SPIKING NEURONS. Int J Neural Syst 2014;24:1450002. [DOI: 10.1142/s0129065714500026] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Vasilaki E, Giugliano M. Emergence of connectivity motifs in networks of model neurons with short- and long-term plastic synapses. PLoS One 2014;9:e84626. [PMID: 24454735 PMCID: PMC3893143 DOI: 10.1371/journal.pone.0084626] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2013] [Accepted: 11/16/2013] [Indexed: 11/29/2022] Open

Abstract

Recent experimental data from the rodent cerebral cortex and olfactory bulb indicate that specific connectivity motifs are correlated with short-term dynamics of excitatory synaptic transmission. It was observed that neurons with short-term facilitating synapses form predominantly reciprocal pairwise connections, while neurons with short-term depressing synapses form predominantly unidirectional pairwise connections. The cause of these structural differences in excitatory synaptic microcircuits is unknown. We show that these connectivity motifs emerge in networks of model neurons, from the interactions between short-term synaptic dynamics (SD) and long-term spike-timing dependent plasticity (STDP). While the impact of STDP on SD was shown in simultaneous neuronal pair recordings in vitro, the mutual interactions between STDP and SD in large networks are still the subject of intense research. Our approach combines an SD phenomenological model with an STDP model that faithfully captures long-term plasticity dependence on both spike times and frequency. As a proof of concept, we first simulate and analyze recurrent networks of spiking neurons with random initial connection efficacies and where synapses are either all short-term facilitating or all depressing. For identical external inputs to the network, and as a direct consequence of internally generated activity, we find that networks with depressing synapses evolve unidirectional connectivity motifs, while networks with facilitating synapses evolve reciprocal connectivity motifs. We then show that the same results hold for heterogeneous networks, including both facilitating and depressing synapses. This does not contradict a recent theory that proposes that motifs are shaped by external inputs, but rather complements it by examining the role of both the external inputs and the internally generated network activity. Our study highlights the conditions under which SD-STDP might explain the correlation between facilitation and reciprocal connectivity motifs, as well as between depression and unidirectional motifs.

Collapse

Mahmoudi B, Pohlmeyer EA, Prins NW, Geng S, Sanchez JC. Towards autonomous neuroprosthetic control using Hebbian reinforcement learning. J Neural Eng 2013;10:066005. [PMID: 24100047 DOI: 10.1088/1741-2560/10/6/066005] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Frémaux N, Sprekeler H, Gerstner W. Reinforcement learning using a continuous time actor-critic framework with spiking neurons. PLoS Comput Biol 2013;9:e1003024. [PMID: 23592970 PMCID: PMC3623741 DOI: 10.1371/journal.pcbi.1003024] [Citation(s) in RCA: 70] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2012] [Accepted: 02/22/2013] [Indexed: 11/26/2022] Open

Abstract

Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.

As every dog owner knows, animals repeat behaviors that earn them rewards. But what is the brain machinery that underlies this reward-based learning? Experimental research points to plasticity of the synaptic connections between neurons, with an important role played by the neuromodulator dopamine, but the exact way synaptic activity and neuromodulation interact during learning is not precisely understood. Here we propose a model explaining how reward signals might interplay with synaptic plasticity, and use the model to solve a simulated maze navigation task. Our model extends an idea from the theory of reinforcement learning: one group of neurons form an “actor,” responsible for choosing the direction of motion of the animal. Another group of neurons, the “critic,” whose role is to predict the rewards the actor will gain, uses the mismatch between actual and expected reward to teach the synapses feeding both groups. Our learning agent learns to reliably navigate its maze to find the reward. Remarkably, the synaptic learning rule that we derive from theoretical considerations is similar to previous rules based on experimental evidence.

Collapse

Soltoggio A, Lemme A, Reinhart F, Steil JJ. Rare neural correlations implement robotic conditioning with delayed rewards and disturbances. Front Neurorobot 2013;7:6. [PMID: 23565092 PMCID: PMC3613617 DOI: 10.3389/fnbot.2013.00006] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2012] [Accepted: 03/06/2013] [Indexed: 11/13/2022] Open

Eliasmith C, Stewart TC, Choo X, Bekolay T, DeWolf T, Tang Y, Tang C, Rasmussen D. A large-scale model of the functioning brain. Science 2012. [PMID: 23197532 DOI: 10.1126/science.1225266] [Citation(s) in RCA: 330] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]

Probst D, Maass W, Markram H, Gewaltig MO. Liquid Computing in a Simplified Model of Cortical Layer IV: Learning to Balance a Ball. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING – ICANN 2012 2012. [DOI: 10.1007/978-3-642-33269-2_27] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]

Fine-tuning and the stability of recurrent neural networks. PLoS One 2011;6:e22885. [PMID: 21980334 PMCID: PMC3181247 DOI: 10.1371/journal.pone.0022885] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2011] [Accepted: 07/06/2011] [Indexed: 11/19/2022] Open

Friedrich J, Urbanczik R, Senn W. Spatio-temporal credit assignment in neuronal population learning. PLoS Comput Biol 2011;7:e1002092. [PMID: 21738460 PMCID: PMC3127803 DOI: 10.1371/journal.pcbi.1002092] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2010] [Accepted: 05/02/2011] [Indexed: 01/27/2023] Open

Abstract

In learning from trial and error, animals need to relate behavioral decisions to environmental reinforcement even though it may be difficult to assign credit to a particular decision when outcomes are uncertain or subject to delays. When considering the biophysical basis of learning, the credit-assignment problem is compounded because the behavioral decisions themselves result from the spatio-temporal aggregation of many synaptic releases. We present a model of plasticity induction for reinforcement learning in a population of leaky integrate and fire neurons which is based on a cascade of synaptic memory traces. Each synaptic cascade correlates presynaptic input first with postsynaptic events, next with the behavioral decisions and finally with external reinforcement. For operant conditioning, learning succeeds even when reinforcement is delivered with a delay so large that temporal contiguity between decision and pertinent reward is lost due to intervening decisions which are themselves subject to delayed reinforcement. This shows that the model provides a viable mechanism for temporal credit assignment. Further, learning speeds up with increasing population size, so the plasticity cascade simultaneously addresses the spatial problem of assigning credit to synapses in different population neurons. Simulations on other tasks, such as sequential decision making, serve to contrast the performance of the proposed scheme to that of temporal difference-based learning. We argue that, due to their comparative robustness, synaptic plasticity cascades are attractive basic models of reinforcement learning in the brain.

The key mechanisms supporting memory and learning in the brain rely on changing the strength of synapses which control the transmission of information between neurons. But how are appropriate changes determined when animals learn from trial and error? Information on success or failure is likely signaled to synapses by neurotransmitters like dopamine. But interpreting this reward signal is difficult because the number of synaptic transmissions occurring during behavioral decision making is huge and each transmission may have contributed differently to the decision, or perhaps not at all. Extrapolating from experimental evidence on synaptic plasticity, we suggest a computational model where each synapse collects information about its contributions to the decision process by means of a cascade of transient memory traces. The final trace then remodulates the reward signal when the persistent change of the synaptic strength is triggered. Simulation results show that with the suggested synaptic plasticity rule a simple neural network can learn even difficult tasks by trial and error, e.g., when the decision - reward sequence is scrambled due to large delays in reward delivery.

Collapse

Neural mechanisms and computations underlying stress effects on learning and memory. Curr Opin Neurobiol 2011;21:502-8. [DOI: 10.1016/j.conb.2011.03.003] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2011] [Revised: 02/08/2011] [Accepted: 03/25/2011] [Indexed: 11/22/2022]

An imperfect dopaminergic error signal can drive temporal-difference learning. PLoS Comput Biol 2011;7:e1001133. [PMID: 21589888 PMCID: PMC3093351 DOI: 10.1371/journal.pcbi.1001133] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2010] [Accepted: 04/06/2011] [Indexed: 12/03/2022] Open

Abstract

An open problem in the field of computational neuroscience is how to link synaptic plasticity to system-level learning. A promising framework in this context is temporal-difference (TD) learning. Experimental evidence that supports the hypothesis that the mammalian brain performs temporal-difference learning includes the resemblance of the phasic activity of the midbrain dopaminergic neurons to the TD error and the discovery that cortico-striatal synaptic plasticity is modulated by dopamine. However, as the phasic dopaminergic signal does not reproduce all the properties of the theoretical TD error, it is unclear whether it is capable of driving behavior adaptation in complex tasks. Here, we present a spiking temporal-difference learning model based on the actor-critic architecture. The model dynamically generates a dopaminergic signal with realistic firing rates and exploits this signal to modulate the plasticity of synapses as a third factor. The predictions of our proposed plasticity dynamics are in good agreement with experimental results with respect to dopamine, pre- and post-synaptic activity. An analytical mapping from the parameters of our proposed plasticity dynamics to those of the classical discrete-time TD algorithm reveals that the biological constraints of the dopaminergic signal entail a modified TD algorithm with self-adapting learning parameters and an adapting offset. We show that the neuronal network is able to learn a task with sparse positive rewards as fast as the corresponding classical discrete-time TD algorithm. However, the performance of the neuronal network is impaired with respect to the traditional algorithm on a task with both positive and negative rewards and breaks down entirely on a task with purely negative rewards. Our model demonstrates that the asymmetry of a realistic dopaminergic signal enables TD learning when learning is driven by positive rewards but not when driven by negative rewards.

What are the physiological changes that take place in the brain when we solve a problem or learn a new skill? It is commonly assumed that behavior adaptations are realized on the microscopic level by changes in synaptic efficacies. However, this is hard to verify experimentally due to the difficulties of identifying the relevant synapses and monitoring them over long periods during a behavioral task. To address this question computationally, we develop a spiking neuronal network model of actor-critic temporal-difference learning, a variant of reinforcement learning for which neural correlates have already been partially established. The network learns a complex task by means of an internally generated reward signal constrained by recent findings on the dopaminergic system. Our model combines top-down and bottom-up modelling approaches to bridge the gap between synaptic plasticity and system-level learning. It paves the way for further investigations of the dopaminergic system in reward learning in the healthy brain and in pathological conditions such as Parkinson's disease, and can be used as a module in functional models based on brain-scale circuitry.

Collapse

Democratic population decisions result in robust policy-gradient learning: a parametric study with GPU simulations. PLoS One 2011;6:e18539. [PMID: 21572529 PMCID: PMC3087717 DOI: 10.1371/journal.pone.0018539] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2010] [Accepted: 03/03/2011] [Indexed: 11/28/2022] Open

Abstract

High performance computing on the Graphics Processing Unit (GPU) is an emerging field driven by the promise of high computational power at a low cost. However, GPU programming is a non-trivial task and moreover architectural limitations raise the question of whether investing effort in this direction may be worthwhile. In this work, we use GPU programming to simulate a two-layer network of Integrate-and-Fire neurons with varying degrees of recurrent connectivity and investigate its ability to learn a simplified navigation task using a policy-gradient learning rule stemming from Reinforcement Learning. The purpose of this paper is twofold. First, we want to support the use of GPUs in the field of Computational Neuroscience. Second, using GPU computing power, we investigate the conditions under which the said architecture and learning rule demonstrate best performance. Our work indicates that networks featuring strong Mexican-Hat-shaped recurrent connections in the top layer, where decision making is governed by the formation of a stable activity bump in the neural population (a “non-democratic” mechanism), achieve mediocre learning results at best. In absence of recurrent connections, where all neurons “vote” independently (“democratic”) for a decision via population vector readout, the task is generally learned better and more robustly. Our study would have been extremely difficult on a desktop computer without the use of GPU programming. We present the routines developed for this purpose and show that a speed improvement of 5x up to 42x is provided versus optimised Python code. The higher speed is achieved when we exploit the parallelism of the GPU in the search of learning parameters. This suggests that efficient GPU programming can significantly reduce the time needed for simulating networks of spiking neurons, particularly when multiple parameter configurations are investigated.

Collapse

Vassiliades V, Cleanthous A, Christodoulou C. Multiagent reinforcement learning: spiking and nonspiking agents in the iterated Prisoner's Dilemma. ACTA ACUST UNITED AC 2011;22:639-53. [PMID: 21421435 DOI: 10.1109/tnn.2011.2111384] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Abstract

This paper investigates multiagent reinforcement learning (MARL) in a general-sum game where the payoffs' structure is such that the agents are required to exploit each other in a way that benefits all agents. The contradictory nature of these games makes their study in multiagent systems quite challenging. In particular, we investigate MARL with spiking and nonspiking agents in the Iterated Prisoner's Dilemma by exploring the conditions required to enhance its cooperative outcome. The spiking agents are neural networks with leaky integrate-and-fire neurons trained with two different learning algorithms: 1) reinforcement of stochastic synaptic transmission, or 2) reward-modulated spike-timing-dependent plasticity with eligibility trace. The nonspiking agents use a tabular representation and are trained with Q- and SARSA learning algorithms, with a novel reward transformation process also being applied to the Q-learning agents. According to the results, the cooperative outcome is enhanced by: 1) transformed internal reinforcement signals and a combination of a high learning rate and a low discount factor with an appropriate exploration schedule in the case of non-spiking agents, and 2) having longer eligibility trace time constant in the case of spiking agents. Moreover, it is shown that spiking and nonspiking agents have similar behavior and therefore they can equally well be used in a multiagent interaction setting. For training the spiking agents in the case where more than one output neuron competes for reinforcement, a novel and necessary modification that enhances competition is applied to the two learning algorithms utilized, in order to avoid a possible synaptic saturation. This is done by administering to the networks additional global reinforcement signals for every spike of the output neurons that were not "responsible" for the preceding decision.

Collapse