1
|
Regimes and mechanisms of transient amplification in abstract and biological neural networks. PLoS Comput Biol 2022; 18:e1010365. [PMID: 35969604 PMCID: PMC9377633 DOI: 10.1371/journal.pcbi.1010365] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 07/06/2022] [Indexed: 11/24/2022] Open
Abstract
Neuronal networks encode information through patterns of activity that define the networks’ function. The neurons’ activity relies on specific connectivity structures, yet the link between structure and function is not fully understood. Here, we tackle this structure-function problem with a new conceptual approach. Instead of manipulating the connectivity directly, we focus on upper triangular matrices, which represent the network dynamics in a given orthonormal basis obtained by the Schur decomposition. This abstraction allows us to independently manipulate the eigenspectrum and feedforward structures of a connectivity matrix. Using this method, we describe a diverse repertoire of non-normal transient amplification, and to complement the analysis of the dynamical regimes, we quantify the geometry of output trajectories through the effective rank of both the eigenvector and the dynamics matrices. Counter-intuitively, we find that shrinking the eigenspectrum’s imaginary distribution leads to highly amplifying regimes in linear and long-lasting dynamics in nonlinear networks. We also find a trade-off between amplification and dimensionality of neuronal dynamics, i.e., trajectories in neuronal state-space. Networks that can amplify a large number of orthogonal initial conditions produce neuronal trajectories that lie in the same subspace of the neuronal state-space. Finally, we examine networks of excitatory and inhibitory neurons. We find that the strength of global inhibition is directly linked with the amplitude of amplification, such that weakening inhibitory weights also decreases amplification, and that the eigenspectrum’s imaginary distribution grows with an increase in the ratio between excitatory-to-inhibitory and excitatory-to-excitatory connectivity strengths. Consequently, the strength of global inhibition reveals itself as a strong signature for amplification and a potential control mechanism to switch dynamical regimes. Our results shed a light on how biological networks, i.e., networks constrained by Dale’s law, may be optimised for specific dynamical regimes. The architecture of neuronal networks lies at the heart of its dynamic behaviour, or in other words, the function of the system. However, the relationship between changes in the architecture and their effect on the dynamics, a structure-function problem, is still poorly understood. Here, we approach this problem by studying a rotated connectivity matrix that is easier to manipulate and interpret. We focus our analysis on a dynamical regime that arises from the biological property that neurons are usually not connected symmetrically, which may result in a non-normal connectivity matrix. Our techniques unveil distinct expressions of the dynamical regime of non-normal amplification. Moreover, we devise a way to analyse the geometry of the dynamics: we assign a single number to a network that quantifies how dissimilar its repertoire of behaviours can be. Finally, using our approach, we can close the loop back to the original neuronal architecture and find that biologically plausible networks use the strength of inhibition and excitatory-to-inhibitory connectivity strength to navigate the different dynamical regimes of non-normal amplification.
Collapse
|
2
|
Ladosz P, Ben-Iwhiwhu E, Dick J, Ketz N, Kolouri S, Krichmar JL, Pilly PK, Soltoggio A. Deep Reinforcement Learning With Modulated Hebbian Plus Q-Network Architecture. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:2045-2056. [PMID: 34559664 DOI: 10.1109/tnnls.2021.3110281] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this article, we consider a subclass of partially observable Markov decision process (POMDP) problems which we termed confounding POMDPs. In these types of POMDPs, temporal difference (TD)-based reinforcement learning (RL) algorithms struggle, as TD error cannot be easily derived from observations. We solve these types of problems using a new bio-inspired neural architecture that combines a modulated Hebbian network (MOHN) with deep Q-network (DQN), which we call modulated Hebbian plus Q-network architecture (MOHQA). The key idea is to use a Hebbian network with rarely correlated bio-inspired neural traces to bridge temporal delays between actions and rewards when confounding observations and sparse rewards result in inaccurate TD errors. In MOHQA, DQN learns low-level features and control, while the MOHN contributes to high-level decisions by associating rewards with past states and actions. Thus, the proposed architecture combines two modules with significantly different learning algorithms, a Hebbian associative network and a classical DQN pipeline, exploiting the advantages of both. Simulations on a set of POMDPs and on the Malmo environment show that the proposed algorithm improved DQN's results and even outperformed control tests with advantage-actor critic (A2C), quantile regression DQN with long short-term memory (QRDQN + LSTM), Monte Carlo policy gradient (REINFORCE), and aggregated memory for reinforcement learning (AMRL) algorithms on most difficult POMDPs with confounding stimuli and sparse rewards.
Collapse
|
3
|
Triche A, Maida AS, Kumar A. Exploration in neo-Hebbian reinforcement learning: Computational approaches to the exploration-exploitation balance with bio-inspired neural networks. Neural Netw 2022; 151:16-33. [DOI: 10.1016/j.neunet.2022.03.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 03/08/2022] [Accepted: 03/14/2022] [Indexed: 10/18/2022]
|
4
|
Calderon CB, Verguts T, Frank MJ. Thunderstruck: The ACDC model of flexible sequences and rhythms in recurrent neural circuits. PLoS Comput Biol 2022; 18:e1009854. [PMID: 35108283 PMCID: PMC8843237 DOI: 10.1371/journal.pcbi.1009854] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 02/14/2022] [Accepted: 01/21/2022] [Indexed: 11/18/2022] Open
Abstract
Adaptive sequential behavior is a hallmark of human cognition. In particular, humans can learn to produce precise spatiotemporal sequences given a certain context. For instance, musicians can not only reproduce learned action sequences in a context-dependent manner, they can also quickly and flexibly reapply them in any desired tempo or rhythm without overwriting previous learning. Existing neural network models fail to account for these properties. We argue that this limitation emerges from the fact that sequence information (i.e., the position of the action) and timing (i.e., the moment of response execution) are typically stored in the same neural network weights. Here, we augment a biologically plausible recurrent neural network of cortical dynamics to include a basal ganglia-thalamic module which uses reinforcement learning to dynamically modulate action. This “associative cluster-dependent chain” (ACDC) model modularly stores sequence and timing information in distinct loci of the network. This feature increases computational power and allows ACDC to display a wide range of temporal properties (e.g., multiple sequences, temporal shifting, rescaling, and compositionality), while still accounting for several behavioral and neurophysiological empirical observations. Finally, we apply this ACDC network to show how it can learn the famous “Thunderstruck” song intro and then flexibly play it in a “bossa nova” rhythm without further training. How do humans flexibly adapt action sequences? For instance, musicians can learn a song and quickly speed up or slow down the tempo, or even play the song following a completely different rhythm (e.g., a rock song using a bossa nova rhythm). In this work, we build a biologically plausible network of cortico-basal ganglia interactions that explains how this temporal flexibility may emerge in the brain. Crucially, our model factorizes sequence order and action timing, respectively represented in cortical and basal ganglia dynamics. This factorization allows full temporal flexibility, i.e. the timing of a learned action sequence can be recomposed without interfering with the order of the sequence. As such, our model is capable of learning asynchronous action sequences, and flexibly shift, rescale, and recompose them, while accounting for biological data.
Collapse
Affiliation(s)
- Cristian Buc Calderon
- Department of Cognitive, Linguistic & Psychological Sciences, Brown University, Providence, Rhode Island, United States of America
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
- Carney Institute for Brain Science, Brown University, Providence, Rhode Island, United States of America
- * E-mail:
| | - Tom Verguts
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
| | - Michael J. Frank
- Department of Cognitive, Linguistic & Psychological Sciences, Brown University, Providence, Rhode Island, United States of America
- Carney Institute for Brain Science, Brown University, Providence, Rhode Island, United States of America
| |
Collapse
|
5
|
|
6
|
Miconi T. Biologically plausible learning in recurrent neural networks reproduces neural dynamics observed during cognitive tasks. eLife 2017; 6. [PMID: 28230528 PMCID: PMC5398889 DOI: 10.7554/elife.20899] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2016] [Accepted: 02/17/2017] [Indexed: 12/28/2022] Open
Abstract
Neural activity during cognitive tasks exhibits complex dynamics that flexibly encode task-relevant variables. Chaotic recurrent networks, which spontaneously generate rich dynamics, have been proposed as a model of cortical computation during cognitive tasks. However, existing methods for training these networks are either biologically implausible, and/or require a continuous, real-time error signal to guide learning. Here we show that a biologically plausible learning rule can train such recurrent networks, guided solely by delayed, phasic rewards at the end of each trial. Networks endowed with this learning rule can successfully learn nontrivial tasks requiring flexible (context-dependent) associations, memory maintenance, nonlinear mixed selectivities, and coordination among multiple outputs. The resulting networks replicate complex dynamics previously observed in animal cortex, such as dynamic encoding of task features and selective integration of sensory inputs. We conclude that recurrent neural networks offer a plausible model of cortical dynamics during both learning and performance of flexible behavior.
Collapse
Affiliation(s)
- Thomas Miconi
- The Neurosciences Institute, California, United States
| |
Collapse
|
7
|
Burms J, Caluwaerts K, Dambre J. Reward-Modulated Hebbian Plasticity as Leverage for Partially Embodied Control in Compliant Robotics. Front Neurorobot 2015; 9:9. [PMID: 26347645 PMCID: PMC4538293 DOI: 10.3389/fnbot.2015.00009] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2015] [Accepted: 07/29/2015] [Indexed: 11/16/2022] Open
Abstract
In embodied computation (or morphological computation), part of the complexity of motor control is offloaded to the body dynamics. We demonstrate that a simple Hebbian-like learning rule can be used to train systems with (partial) embodiment, and can be extended outside of the scope of traditional neural networks. To this end, we apply the learning rule to optimize the connection weights of recurrent neural networks with different topologies and for various tasks. We then apply this learning rule to a simulated compliant tensegrity robot by optimizing static feedback controllers that directly exploit the dynamics of the robot body. This leads to partially embodied controllers, i.e., hybrid controllers that naturally integrate the computations that are performed by the robot body into a neural network architecture. Our results demonstrate the universal applicability of reward-modulated Hebbian learning. Furthermore, they demonstrate the robustness of systems trained with the learning rule. This study strengthens our belief that compliant robots should or can be seen as computational units, instead of dumb hardware that needs a complex controller. This link between compliant robotics and neural networks is also the main reason for our search for simple universal learning rules for both neural networks and robotics.
Collapse
Affiliation(s)
- Jeroen Burms
- Computing Systems Laboratory (Reservoir Team), Electronics and Information Systems Department (ELIS), Ghent University , Ghent , Belgium
| | - Ken Caluwaerts
- Computing Systems Laboratory (Reservoir Team), Electronics and Information Systems Department (ELIS), Ghent University , Ghent , Belgium ; Intelligent Robotics Group, NASA Ames Research Center, Oak Ridge Associated Universities , Moffett Field, CA , USA
| | - Joni Dambre
- Computing Systems Laboratory (Reservoir Team), Electronics and Information Systems Department (ELIS), Ghent University , Ghent , Belgium
| |
Collapse
|
8
|
Chou TS, Bucci LD, Krichmar JL. Learning touch preferences with a tactile robot using dopamine modulated STDP in a model of insular cortex. Front Neurorobot 2015; 9:6. [PMID: 26257639 PMCID: PMC4510776 DOI: 10.3389/fnbot.2015.00006] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2015] [Accepted: 07/02/2015] [Indexed: 11/17/2022] Open
Abstract
Neurorobots enable researchers to study how behaviors are produced by neural mechanisms in an uncertain, noisy, real-world environment. To investigate how the somatosensory system processes noisy, real-world touch inputs, we introduce a neurorobot called CARL-SJR, which has a full-body tactile sensory area. The design of CARL-SJR is such that it encourages people to communicate with it through gentle touch. CARL-SJR provides feedback to users by displaying bright colors on its surface. In the present study, we show that CARL-SJR is capable of learning associations between conditioned stimuli (CS; a color pattern on its surface) and unconditioned stimuli (US; a preferred touch pattern) by applying a spiking neural network (SNN) with neurobiologically inspired plasticity. Specifically, we modeled the primary somatosensory cortex, prefrontal cortex, striatum, and the insular cortex, which is important for hedonic touch, to process noisy data generated directly from CARL-SJR's tactile sensory area. To facilitate learning, we applied dopamine-modulated Spike Timing Dependent Plasticity (STDP) to our simulated prefrontal cortex, striatum, and insular cortex. To cope with noisy, varying inputs, the SNN was tuned to produce traveling waves of activity that carried spatiotemporal information. Despite the noisy tactile sensors, spike trains, and variations in subject hand swipes, the learning was quite robust. Further, insular cortex activities in the incremental pathway of dopaminergic reward system allowed us to control CARL-SJR's preference for touch direction without heavily pre-processed inputs. The emerged behaviors we found in this model match animal's behaviors wherein they prefer touch in particular areas and directions. Thus, the results in this paper could serve as an explanation on the underlying neural mechanisms for developing tactile preferences and hedonic touch.
Collapse
Affiliation(s)
- Ting-Shuo Chou
- Department of Computer Sciences, University of California, Irvine Irvine, CA, USA
| | - Liam D Bucci
- Department of Cognitive Sciences, University of California, Irvine Irvine, CA, USA
| | - Jeffrey L Krichmar
- Department of Computer Sciences, University of California, Irvine Irvine, CA, USA ; Department of Cognitive Sciences, University of California, Irvine Irvine, CA, USA
| |
Collapse
|
9
|
Aswolinskiy W, Pipa G. RM-SORN: a reward-modulated self-organizing recurrent neural network. Front Comput Neurosci 2015; 9:36. [PMID: 25852533 PMCID: PMC4371712 DOI: 10.3389/fncom.2015.00036] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2014] [Accepted: 03/04/2015] [Indexed: 11/13/2022] Open
Abstract
Neural plasticity plays an important role in learning and memory. Reward-modulation of plasticity offers an explanation for the ability of the brain to adapt its neural activity to achieve a rewarded goal. Here, we define a neural network model that learns through the interaction of Intrinsic Plasticity (IP) and reward-modulated Spike-Timing-Dependent Plasticity (STDP). IP enables the network to explore possible output sequences and STDP, modulated by reward, reinforces the creation of the rewarded output sequences. The model is tested on tasks for prediction, recall, non-linear computation, pattern recognition, and sequence generation. It achieves performance comparable to networks trained with supervised learning, while using simple, biologically motivated plasticity rules, and rewarding strategies. The results confirm the importance of investigating the interaction of several plasticity rules in the context of reward-modulated learning and whether reward-modulated self-organization can explain the amazing capabilities of the brain.
Collapse
Affiliation(s)
- Witali Aswolinskiy
- Institute of Cognitive Science, University of Osnabrück Osnabrück, Germany
| | - Gordon Pipa
- Institute of Cognitive Science, University of Osnabrück Osnabrück, Germany
| |
Collapse
|
10
|
Soltoggio A. Short-term plasticity as cause-effect hypothesis testing in distal reward learning. BIOLOGICAL CYBERNETICS 2015; 109:75-94. [PMID: 25189158 DOI: 10.1007/s00422-014-0628-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Accepted: 08/06/2014] [Indexed: 06/03/2023]
Abstract
Asynchrony, overlaps, and delays in sensory-motor signals introduce ambiguity as to which stimuli, actions, and rewards are causally related. Only the repetition of reward episodes helps distinguish true cause-effect relationships from coincidental occurrences. In the model proposed here, a novel plasticity rule employs short- and long-term changes to evaluate hypotheses on cause-effect relationships. Transient weights represent hypotheses that are consolidated in long-term memory only when they consistently predict or cause future rewards. The main objective of the model is to preserve existing network topologies when learning with ambiguous information flows. Learning is also improved by biasing the exploration of the stimulus-response space toward actions that in the past occurred before rewards. The model indicates under which conditions beliefs can be consolidated in long-term memory, it suggests a solution to the plasticity-stability dilemma, and proposes an interpretation of the role of short-term plasticity.
Collapse
Affiliation(s)
- Andrea Soltoggio
- Computer Science Department, Loughborough University, Loughborough, LE11 3TU, UK,
| |
Collapse
|
11
|
Soltoggio A, Lemme A, Reinhart F, Steil JJ. Rare neural correlations implement robotic conditioning with delayed rewards and disturbances. Front Neurorobot 2013; 7:6. [PMID: 23565092 PMCID: PMC3613617 DOI: 10.3389/fnbot.2013.00006] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2012] [Accepted: 03/06/2013] [Indexed: 11/13/2022] Open
Abstract
Neural conditioning associates cues and actions with following rewards. The environments in which robots operate, however, are pervaded by a variety of disturbing stimuli and uncertain timing. In particular, variable reward delays make it difficult to reconstruct which previous actions are responsible for following rewards. Such an uncertainty is handled by biological neural networks, but represents a challenge for computational models, suggesting the lack of a satisfactory theory for robotic neural conditioning. The present study demonstrates the use of rare neural correlations in making correct associations between rewards and previous cues or actions. Rare correlations are functional in selecting sparse synapses to be eligible for later weight updates if a reward occurs. The repetition of this process singles out the associating and reward-triggering pathways, and thereby copes with distal rewards. The neural network displays macro-level classical and operant conditioning, which is demonstrated in an interactive real-life human-robot interaction. The proposed mechanism models realistic conditioning in humans and animals and implements similar behaviors in neuro-robotic platforms.
Collapse
Affiliation(s)
- Andrea Soltoggio
- Faculty of Technology, Research Institute for Cognition and Robotics (CoR-Lab), Bielefeld University Bielefeld, Germany
| | | | | | | |
Collapse
|
12
|
From modulated Hebbian plasticity to simple behavior learning through noise and weight saturation. Neural Netw 2012; 34:28-41. [DOI: 10.1016/j.neunet.2012.06.005] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2011] [Revised: 06/08/2012] [Accepted: 06/17/2012] [Indexed: 11/21/2022]
|