151
|
Bermudez-Contreras E. Deep reinforcement learning to study spatial navigation, learning and memory in artificial and biological agents. BIOLOGICAL CYBERNETICS 2021; 115:131-134. [PMID: 33564968 DOI: 10.1007/s00422-021-00862-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Accepted: 01/19/2021] [Indexed: 06/12/2023]
Abstract
Despite the recent advancements and popularity of deep learning that has resulted from the advent of numerous industrial applications, artificial neural networks (ANNs) still lack crucial features from their biological counterparts that could improve their performance and their potential to advance our understanding of how the brain works. One avenue that has been proposed to change this is to strengthen the interaction between artificial intelligence (AI) research and neuroscience. Since their historical beginnings, ANNs and AI, in general, have developed in close alignment with both neuroscience and psychology. In addition to deep learning, reinforcement learning (RL) is another approach that is strongly linked to AI and neuroscience to understand how learning is implemented in the brain. In a recently published article, Botvinick et al. (Neuron, 107:603-616, 2020) explain why deep reinforcement learning (DRL) is important for neuroscience as a framework to study learning, representations and decision making. Here, I summarise Botvinick et al.'s main arguments and frame them in the context of the study of learning, memory and spatial navigation. I believe that applying this approach to study spatial navigation can provide useful insights for the understanding of how the brain builds, processes and stores representations of the outside world to extract knowledge.
Collapse
|
152
|
Raman DV, O'Leary T. Frozen algorithms: how the brain's wiring facilitates learning. Curr Opin Neurobiol 2021; 67:207-214. [PMID: 33508698 PMCID: PMC8202511 DOI: 10.1016/j.conb.2020.12.017] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 12/21/2020] [Accepted: 12/30/2020] [Indexed: 12/03/2022]
Abstract
Synapses and neural connectivity are plastic and shaped by experience. But to what extent does connectivity itself influence the ability of a neural circuit to learn? Insights from optimization theory and AI shed light on how learning can be implemented in neural circuits. Though abstract in their nature, learning algorithms provide a principled set of hypotheses on the necessary ingredients for learning in neural circuits. These include the kinds of signals and circuit motifs that enable learning from experience, as well as an appreciation of the constraints that make learning challenging in a biological setting. Remarkably, some simple connectivity patterns can boost the efficiency of relatively crude learning rules, showing how the brain can use anatomy to compensate for the biological constraints of known synaptic plasticity mechanisms. Modern connectomics provides rich data for exploring this principle, and may reveal how brain connectivity is constrained by the requirement to learn efficiently.
Collapse
Affiliation(s)
- Dhruva V Raman
- Department of Engineering, University of Cambridge, United Kingdom
| | - Timothy O'Leary
- Department of Engineering, University of Cambridge, United Kingdom.
| |
Collapse
|
153
|
Starkweather CK, Uchida N. Dopamine signals as temporal difference errors: recent advances. Curr Opin Neurobiol 2021; 67:95-105. [PMID: 33186815 PMCID: PMC8107188 DOI: 10.1016/j.conb.2020.08.014] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 08/24/2020] [Accepted: 08/26/2020] [Indexed: 11/28/2022]
Abstract
In the brain, dopamine is thought to drive reward-based learning by signaling temporal difference reward prediction errors (TD errors), a 'teaching signal' used to train computers. Recent studies using optogenetic manipulations have provided multiple pieces of evidence supporting that phasic dopamine signals function as TD errors. Furthermore, novel experimental results have indicated that when the current state of the environment is uncertain, dopamine neurons compute TD errors using 'belief states' or a probability distribution over potential states. It remains unclear how belief states are computed but emerging evidence suggests involvement of the prefrontal cortex and the hippocampus. These results refine our understanding of the role of dopamine in learning and the algorithms by which dopamine functions in the brain.
Collapse
Affiliation(s)
- Clara Kwon Starkweather
- Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
| | - Naoshige Uchida
- Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA.
| |
Collapse
|
154
|
|
155
|
|
156
|
|
157
|
Banerjee A, Rikhye RV, Marblestone A. Reinforcement-guided learning in frontal neocortex: emerging computational concepts. Curr Opin Behav Sci 2021. [DOI: 10.1016/j.cobeha.2021.02.019] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
158
|
Patai EZ, Spiers HJ. The Versatile Wayfinder: Prefrontal Contributions to Spatial Navigation. Trends Cogn Sci 2021; 25:520-533. [PMID: 33752958 DOI: 10.1016/j.tics.2021.02.010] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Revised: 02/22/2021] [Accepted: 02/23/2021] [Indexed: 12/15/2022]
Abstract
The prefrontal cortex (PFC) supports decision-making, goal tracking, and planning. Spatial navigation is a behavior that taxes these cognitive processes, yet the role of the PFC in models of navigation has been largely overlooked. In humans, activity in dorsolateral PFC (dlPFC) and ventrolateral PFC (vlPFC) during detours, reveal a role in inhibition and replanning. Dorsal anterior cingulate cortex (dACC) is implicated in planning and spontaneous internally-generated changes of route. Orbitofrontal cortex (OFC) integrates representations of the environment with the value of actions, providing a 'map' of possible decisions. In rodents, medial frontal areas interact with hippocampus during spatial decisions and switching between navigation strategies. In reviewing these advances, we provide a framework for how different prefrontal regions may contribute to different stages of navigation.
Collapse
Affiliation(s)
- Eva Zita Patai
- Centre for Neuroimaging Sciences, Institute of Psychiatry, Psychology and Neuroscience (IoPPN), King's College London, UK; Institute of Behavioural Neuroscience, Department of Experimental Psychology, Division of Psychology and Language sciences, University College London, UK.
| | - Hugo J Spiers
- Institute of Behavioural Neuroscience, Department of Experimental Psychology, Division of Psychology and Language sciences, University College London, UK.
| |
Collapse
|
159
|
Silva C, Porter BS, Hillman KL. Stimulation in the Rat Anterior Insula and Anterior Cingulate During an Effortful Weightlifting Task. Front Neurosci 2021; 15:643384. [PMID: 33716659 PMCID: PMC7952617 DOI: 10.3389/fnins.2021.643384] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 02/11/2021] [Indexed: 12/14/2022] Open
Abstract
When performing tasks, animals must continually assess how much effort is being expended, and gage this against ever-changing physiological states. As effort costs mount, persisting in the task may be unwise. The anterior cingulate cortex (ACC) and the anterior insular cortex are implicated in this process of cost-benefit decision-making, yet their precise contributions toward driving effortful persistence are not well understood. Here we investigated whether electrical stimulation of the ACC or insular cortex would alter effortful persistence in a novel weightlifting task (WLT). In the WLT an animal is challenged to pull a rope 30 cm to trigger food reward dispensing. To make the action increasingly effortful, 45 g of weight is progressively added to the rope after every 10 successful pulls. The animal can quit the task at any point - with the rope weight at the time of quitting taken as the "break weight." Ten male Sprague-Dawley rats were implanted with stimulating electrodes in either the ACC [cingulate cortex area 1 (Cg1) in rodent] or anterior insula and then assessed in the WLT during stimulation. Low-frequency (10 Hz), high-frequency (130 Hz), and sham stimulations were performed. We predicted that low-frequency stimulation (LFS) of Cg1 in particular would increase persistence in the WLT. Contrary to our predictions, LFS of Cg1 resulted in shorter session duration, lower break weights, and fewer attempts on the break weight. High-frequency stimulation of Cg1 led to an increase in time spent off-task. LFS of the anterior insula was associated with a marginal increase in attempts on the break weight. Taken together our data suggest that stimulation of the rodent Cg1 during an effortful task alters certain aspects of effortful behavior, while insula stimulation has little effect.
Collapse
Affiliation(s)
| | | | - Kristin L. Hillman
- Department of Psychology, Brain Health Research Centre, University of Otago, Dunedin, New Zealand
| |
Collapse
|
160
|
Cross L, Cockburn J, Yue Y, O'Doherty JP. Using deep reinforcement learning to reveal how the brain encodes abstract state-space representations in high-dimensional environments. Neuron 2021; 109:724-738.e7. [PMID: 33326755 PMCID: PMC7897245 DOI: 10.1016/j.neuron.2020.11.021] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Revised: 10/15/2020] [Accepted: 11/17/2020] [Indexed: 11/21/2022]
Abstract
Humans possess an exceptional aptitude to efficiently make decisions from high-dimensional sensory observations. However, it is unknown how the brain compactly represents the current state of the environment to guide this process. The deep Q-network (DQN) achieves this by capturing highly nonlinear mappings from multivariate inputs to the values of potential actions. We deployed DQN as a model of brain activity and behavior in participants playing three Atari video games during fMRI. Hidden layers of DQN exhibited a striking resemblance to voxel activity in a distributed sensorimotor network, extending throughout the dorsal visual pathway into posterior parietal cortex. Neural state-space representations emerged from nonlinear transformations of the pixel space bridging perception to action and reward. These transformations reshape axes to reflect relevant high-level features and strip away information about task-irrelevant sensory features. Our findings shed light on the neural encoding of task representations for decision-making in real-world situations.
Collapse
Affiliation(s)
- Logan Cross
- Computation and Neural Systems, California Institute of Technology, Pasadena, CA 91125, USA.
| | - Jeff Cockburn
- Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA 91125, USA
| | - Yisong Yue
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA 91125, USA
| | - John P O'Doherty
- Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA 91125, USA
| |
Collapse
|
161
|
Baram AB, Muller TH, Nili H, Garvert MM, Behrens TEJ. Entorhinal and ventromedial prefrontal cortices abstract and generalize the structure of reinforcement learning problems. Neuron 2021; 109:713-723.e7. [PMID: 33357385 PMCID: PMC7889496 DOI: 10.1016/j.neuron.2020.11.024] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Revised: 10/09/2020] [Accepted: 11/19/2020] [Indexed: 11/25/2022]
Abstract
Knowledge of the structure of a problem, such as relationships between stimuli, enables rapid learning and flexible inference. Humans and other animals can abstract this structural knowledge and generalize it to solve new problems. For example, in spatial reasoning, shortest-path inferences are immediate in new environments. Spatial structural transfer is mediated by cells in entorhinal and (in humans) medial prefrontal cortices, which maintain their co-activation structure across different environments and behavioral states. Here, using fMRI, we show that entorhinal and ventromedial prefrontal cortex (vmPFC) representations perform a much broader role in generalizing the structure of problems. We introduce a task-remapping paradigm, where subjects solve multiple reinforcement learning (RL) problems differing in structural or sensory properties. We show that, as with space, entorhinal representations are preserved across different RL problems only if task structure is preserved. In vmPFC and ventral striatum, representations of prediction error also depend on task structure.
Collapse
Affiliation(s)
- Alon Boaz Baram
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, UK.
| | - Timothy Howard Muller
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, UK
| | - Hamed Nili
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, UK
| | - Mona Maria Garvert
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, UK; Max-Planck-Institute for Human Cognitive and Brain Sciences, Stephanstraße 1a, 04103, Leipzig, Germany
| | - Timothy Edward John Behrens
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, UK; Wellcome Trust Centre for Neuroimaging, University College London, London WC1N 3AR, UK
| |
Collapse
|
162
|
Alexandre F. A global framework for a systemic view of brain modeling. Brain Inform 2021; 8:3. [PMID: 33591440 PMCID: PMC7886931 DOI: 10.1186/s40708-021-00126-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 02/05/2021] [Indexed: 11/23/2022] Open
Abstract
The brain is a complex system, due to the heterogeneity of its structure, the diversity of the functions in which it participates and to its reciprocal relationships with the body and the environment. A systemic description of the brain is presented here, as a contribution to developing a brain theory and as a general framework where specific models in computational neuroscience can be integrated and associated with global information flows and cognitive functions. In an enactive view, this framework integrates the fundamental organization of the brain in sensorimotor loops with the internal and the external worlds, answering four fundamental questions (what, why, where and how). Our survival-oriented definition of behavior gives a prominent role to pavlovian and instrumental conditioning, augmented during phylogeny by the specific contribution of other kinds of learning, related to semantic memory in the posterior cortex, episodic memory in the hippocampus and working memory in the frontal cortex. This framework highlights that responses can be prepared in different ways, from pavlovian reflexes and habitual behavior to deliberations for goal-directed planning and reasoning, and explains that these different kinds of responses coexist, collaborate and compete for the control of behavior. It also lays emphasis on the fact that cognition can be described as a dynamical system of interacting memories, some acting to provide information to others, to replace them when they are not efficient enough, or to help for their improvement. Describing the brain as an architecture of learning systems has also strong implications in Machine Learning. Our biologically informed view of pavlovian and instrumental conditioning can be very precious to revisit classical Reinforcement Learning and provide a basis to ensure really autonomous learning.
Collapse
Affiliation(s)
- Frederic Alexandre
- INRIA Bordeaux Sud-Ouest, Talence, France. .,Institute of Neurodegenerative Diseases, University of Bordeaux, CNRS UMR 5293, 146 rue Leo Saignat, 33076, Bordeaux, France. .,LaBRI, University of Bordeaux, Bordeaux INP, CNRS UMR 5800, Talence, France.
| |
Collapse
|
163
|
Fang H, Zeng Y, Zhao F. Brain Inspired Sequences Production by Spiking Neural Networks With Reward-Modulated STDP. Front Comput Neurosci 2021; 15:612041. [PMID: 33664661 PMCID: PMC7921721 DOI: 10.3389/fncom.2021.612041] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 01/19/2021] [Indexed: 11/13/2022] Open
Abstract
Understanding and producing embedded sequences according to supra-regular grammars in language has always been considered a high-level cognitive function of human beings, named "syntax barrier" between humans and animals. However, some neurologists recently showed that macaques could be trained to produce embedded sequences involving supra-regular grammars through a well-designed experiment paradigm. Via comparing macaques and preschool children's experimental results, they claimed that human uniqueness might only lie in the speed and learning strategy resulting from the chunking mechanism. Inspired by their research, we proposed a Brain-inspired Sequence Production Spiking Neural Network (SP-SNN) to model the same production process, followed by memory and learning mechanisms of the multi-brain region cooperation. After experimental verification, we demonstrated that SP-SNN could also handle embedded sequence production tasks, striding over the "syntax barrier." SP-SNN used Population-Coding and STDP mechanism to realize working memory, Reward-Modulated STDP mechanism for acquiring supra-regular grammars. Therefore, SP-SNN needs to simultaneously coordinate short-term plasticity (STP) and long-term plasticity (LTP) mechanisms. Besides, we found that the chunking mechanism indeed makes a difference in improving our model's robustness. As far as we know, our work is the first one toward the "syntax barrier" in the SNN field, providing the computational foundation for further study of related underlying animals' neural mechanisms in the future.
Collapse
Affiliation(s)
- Hongjian Fang
- Research Center for Brain-Inspired Intelligence, Institute of Automation, Chinese Academy of Sciences, Beijing, China.,School of Future Technology, University of Chinese Academy of Sciences, Beijing, China
| | - Yi Zeng
- Research Center for Brain-Inspired Intelligence, Institute of Automation, Chinese Academy of Sciences, Beijing, China.,School of Future Technology, University of Chinese Academy of Sciences, Beijing, China.,Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China.,National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| | - Feifei Zhao
- Research Center for Brain-Inspired Intelligence, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
164
|
The Best Laid Plans: Computational Principles of Anterior Cingulate Cortex. Trends Cogn Sci 2021; 25:316-329. [PMID: 33593641 DOI: 10.1016/j.tics.2021.01.008] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 01/17/2021] [Accepted: 01/19/2021] [Indexed: 12/26/2022]
Abstract
Despite continual debate for the past 30 years about the function of anterior cingulate cortex (ACC), its key contribution to neurocognition remains unknown. However, recent computational modeling work has provided insight into this question. Here we review computational models that illustrate three core principles of ACC function, related to hierarchy, world models, and cost. We also discuss four constraints on the neural implementation of these principles, related to modularity, binding, encoding, and learning and regulation. These observations suggest a role for ACC in hierarchical model-based hierarchical reinforcement learning (HMB-HRL), which instantiates a mechanism motivating the execution of high-level plans.
Collapse
|
165
|
Abstract
A large body of work has linked dopaminergic signaling to learning and reward processing. It stresses the role of dopamine in reward prediction error signaling, a key neural signal that allows us to learn from past experiences, and that facilitates optimal choice behavior. Latterly, it has become clear that dopamine does not merely code prediction error size but also signals the difference between the expected value of rewards, and the value of rewards actually received, which is obtained through the integration of reward attributes such as the type, amount, probability and delay. More recent work has posited a role of dopamine in learning beyond rewards. These theories suggest that dopamine codes absolute or unsigned prediction errors, playing a key role in how the brain models associative regularities within its environment, while incorporating critical information about the reliability of those regularities. Work is emerging supporting this perspective and, it has inspired theoretical models of how certain forms of mental pathology may emerge in relation to dopamine function. Such pathology is frequently related to disturbed inferences leading to altered internal models of the environment. Thus, it is critical to understand the role of dopamine in error-related learning and inference.
Collapse
Affiliation(s)
- Kelly M. J. Diederen
- Department of Psychosis Studies,
Institute of Psychiatry, Psychology and Neuroscience, King’s College London,
London, UK
| | - Paul C. Fletcher
- Department of Psychiatry,
University of Cambridge, Cambridge, UK
- Cambridgeshire and Peterborough
NHS Foundation Trust, Cambridge, UK
- Wellcome Trust MRC Institute of
Metabolic Science, University of Cambridge, Cambridge Biomedical Campus,
Cambridge, UK
| |
Collapse
|
166
|
Tomov MS, Schulz E, Gershman SJ. Multi-task reinforcement learning in humans. Nat Hum Behav 2021; 5:764-773. [PMID: 33510391 DOI: 10.1038/s41562-020-01035-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Accepted: 12/10/2020] [Indexed: 01/01/2023]
Abstract
The ability to transfer knowledge across tasks and generalize to novel ones is an important hallmark of human intelligence. Yet not much is known about human multitask reinforcement learning. We study participants' behaviour in a two-step decision-making task with multiple features and changing reward functions. We compare their behaviour with two algorithms for multitask reinforcement learning, one that maps previous policies and encountered features to new reward functions and one that approximates value functions across tasks, as well as to standard model-based and model-free algorithms. Across three exploratory experiments and a large preregistered confirmatory experiment, our results provide evidence that participants who are able to learn the task use a strategy that maps previously learned policies to novel scenarios. These results enrich our understanding of human reinforcement learning in complex environments with changing task demands.
Collapse
Affiliation(s)
- Momchil S Tomov
- Program in Neuroscience, Harvard Medical School, Boston, MA, USA. .,Center for Brain Science, Harvard University, Cambridge, MA, USA.
| | - Eric Schulz
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany. .,Department of Psychology, Harvard University, Cambridge, MA, USA.
| | - Samuel J Gershman
- Center for Brain Science, Harvard University, Cambridge, MA, USA.,Department of Psychology, Harvard University, Cambridge, MA, USA.,Center for Brains, Minds and Machines, Cambridge, MA, USA
| |
Collapse
|
167
|
Bari BA, Cohen JY. Dynamic decision making and value computations in medial frontal cortex. INTERNATIONAL REVIEW OF NEUROBIOLOGY 2021; 158:83-113. [PMID: 33785157 DOI: 10.1016/bs.irn.2020.12.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Dynamic decision making requires an intact medial frontal cortex. Recent work has combined theory and single-neuron measurements in frontal cortex to advance models of decision making. We review behavioral tasks that have been used to study dynamic decision making and algorithmic models of these tasks using reinforcement learning theory. We discuss studies linking neurophysiology and quantitative decision variables. We conclude with hypotheses about the role of other cortical and subcortical structures in dynamic decision making, including ascending neuromodulatory systems.
Collapse
Affiliation(s)
- Bilal A Bari
- The Solomon H. Snyder Department of Neuroscience, Brain Science Institute, Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD, United States
| | - Jeremiah Y Cohen
- The Solomon H. Snyder Department of Neuroscience, Brain Science Institute, Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD, United States.
| |
Collapse
|
168
|
Pouncy T, Tsividis P, Gershman SJ. What Is the Model in Model-Based Planning? Cogn Sci 2021; 45:e12928. [PMID: 33398907 DOI: 10.1111/cogs.12928] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Revised: 11/17/2020] [Accepted: 11/17/2020] [Indexed: 11/28/2022]
Abstract
Flexibility is one of the hallmarks of human problem-solving. In everyday life, people adapt to changes in common tasks with little to no additional training. Much of the existing work on flexibility in human problem-solving has focused on how people adapt to tasks in new domains by drawing on solutions from previously learned domains. In real-world tasks, however, humans must generalize across a wide range of within-domain variation. In this work we argue that representational abstraction plays an important role in such within-domain generalization. We then explore the nature of this representational abstraction in realistically complex tasks like video games by demonstrating how the same model-based planning framework produces distinct generalization behaviors under different classes of task representation. Finally, we compare the behavior of agents with these task representations to humans in a series of novel grid-based video game tasks. Our results provide evidence for the claim that within-domain flexibility in humans derives from task representations composed of propositional rules written in terms of objects and relational categories.
Collapse
Affiliation(s)
- Thomas Pouncy
- Department of Psychology and Center for Brain Science, Harvard University
| | - Pedro Tsividis
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology
| | - Samuel J Gershman
- Department of Psychology and Center for Brain Science, Harvard University.,Center for Brains, Minds and Machines, Massachusetts Institute of Technology
| |
Collapse
|
169
|
Tessereau C, O’Dea R, Coombes S, Bast T. Reinforcement learning approaches to hippocampus-dependent flexible spatial navigation. Brain Neurosci Adv 2021; 5:2398212820975634. [PMID: 33954259 PMCID: PMC8042550 DOI: 10.1177/2398212820975634] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Accepted: 10/21/2020] [Indexed: 11/17/2022] Open
Abstract
Humans and non-human animals show great flexibility in spatial navigation, including the ability to return to specific locations based on as few as one single experience. To study spatial navigation in the laboratory, watermaze tasks, in which rats have to find a hidden platform in a pool of cloudy water surrounded by spatial cues, have long been used. Analogous tasks have been developed for human participants using virtual environments. Spatial learning in the watermaze is facilitated by the hippocampus. In particular, rapid, one-trial, allocentric place learning, as measured in the delayed-matching-to-place variant of the watermaze task, which requires rodents to learn repeatedly new locations in a familiar environment, is hippocampal dependent. In this article, we review some computational principles, embedded within a reinforcement learning framework, that utilise hippocampal spatial representations for navigation in watermaze tasks. We consider which key elements underlie their efficacy, and discuss their limitations in accounting for hippocampus-dependent navigation, both in terms of behavioural performance (i.e. how well do they reproduce behavioural measures of rapid place learning) and neurobiological realism (i.e. how well do they map to neurobiological substrates involved in rapid place learning). We discuss how an actor-critic architecture, enabling simultaneous assessment of the value of the current location and of the optimal direction to follow, can reproduce one-trial place learning performance as shown on watermaze and virtual delayed-matching-to-place tasks by rats and humans, respectively, if complemented with map-like place representations. The contribution of actor-critic mechanisms to delayed-matching-to-place performance is consistent with neurobiological findings implicating the striatum and hippocampo-striatal interaction in delayed-matching-to-place performance, given that the striatum has been associated with actor-critic mechanisms. Moreover, we illustrate that hierarchical computations embedded within an actor-critic architecture may help to account for aspects of flexible spatial navigation. The hierarchical reinforcement learning approach separates trajectory control via a temporal-difference error from goal selection via a goal prediction error and may account for flexible, trial-specific, navigation to familiar goal locations, as required in some arm-maze place memory tasks, although it does not capture one-trial learning of new goal locations, as observed in open field, including watermaze and virtual, delayed-matching-to-place tasks. Future models of one-shot learning of new goal locations, as observed on delayed-matching-to-place tasks, should incorporate hippocampal plasticity mechanisms that integrate new goal information with allocentric place representation, as such mechanisms are supported by substantial empirical evidence.
Collapse
Affiliation(s)
- Charline Tessereau
- School of Mathematical Sciences, University of Nottingham, Nottingham, UK
- School of Psychology, University of Nottingham, Nottingham, UK
- Neuroscience@Nottingham
| | - Reuben O’Dea
- School of Mathematical Sciences, University of Nottingham, Nottingham, UK
- Neuroscience@Nottingham
| | - Stephen Coombes
- School of Mathematical Sciences, University of Nottingham, Nottingham, UK
- Neuroscience@Nottingham
| | - Tobias Bast
- School of Psychology, University of Nottingham, Nottingham, UK
- Neuroscience@Nottingham
| |
Collapse
|
170
|
Schilling M, Paskarbeit J, Ritter H, Schneider A, Cruse H. From Adaptive Locomotion to Predictive Action Selection – Cognitive Control for a Six-Legged Walker. IEEE T ROBOT 2021. [DOI: 10.1109/tro.2021.3106832] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
171
|
Márton CD, Schultz SR, Averbeck BB. Learning to select actions shapes recurrent dynamics in the corticostriatal system. Neural Netw 2020; 132:375-393. [PMID: 32992244 PMCID: PMC7685243 DOI: 10.1016/j.neunet.2020.09.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 09/03/2020] [Accepted: 09/11/2020] [Indexed: 01/03/2023]
Abstract
Learning to select appropriate actions based on their values is fundamental to adaptive behavior. This form of learning is supported by fronto-striatal systems. The dorsal-lateral prefrontal cortex (dlPFC) and the dorsal striatum (dSTR), which are strongly interconnected, are key nodes in this circuitry. Substantial experimental evidence, including neurophysiological recordings, have shown that neurons in these structures represent key aspects of learning. The computational mechanisms that shape the neurophysiological responses, however, are not clear. To examine this, we developed a recurrent neural network (RNN) model of the dlPFC-dSTR circuit and trained it on an oculomotor sequence learning task. We compared the activity generated by the model to activity recorded from monkey dlPFC and dSTR in the same task. This network consisted of a striatal component which encoded action values, and a prefrontal component which selected appropriate actions. After training, this system was able to autonomously represent and update action values and select actions, thus being able to closely approximate the representational structure in corticostriatal recordings. We found that learning to select the correct actions drove action-sequence representations further apart in activity space, both in the model and in the neural data. The model revealed that learning proceeds by increasing the distance between sequence-specific representations. This makes it more likely that the model will select the appropriate action sequence as learning develops. Our model thus supports the hypothesis that learning in networks drives the neural representations of actions further apart, increasing the probability that the network generates correct actions as learning proceeds. Altogether, this study advances our understanding of how neural circuit dynamics are involved in neural computation, revealing how dynamics in the corticostriatal system support task learning.
Collapse
Affiliation(s)
- Christian D Márton
- Centre for Neurotechnology & Department of Bioengineering, Imperial College London, London, SW7 2AZ, UK; Laboratory of Neuropsychology, Section on Learning and Decision Making, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA.
| | - Simon R Schultz
- Centre for Neurotechnology & Department of Bioengineering, Imperial College London, London, SW7 2AZ, UK
| | - Bruno B Averbeck
- Laboratory of Neuropsychology, Section on Learning and Decision Making, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
172
|
Piette C, Touboul J, Venance L. Engrams of Fast Learning. Front Cell Neurosci 2020; 14:575915. [PMID: 33250712 PMCID: PMC7676431 DOI: 10.3389/fncel.2020.575915] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 09/24/2020] [Indexed: 01/22/2023] Open
Abstract
Fast learning designates the behavioral and neuronal mechanisms underlying the acquisition of a long-term memory trace after a unique and brief experience. As such it is opposed to incremental, slower reinforcement or procedural learning requiring repetitive training. This learning process, found in most animal species, exists in a large spectrum of natural behaviors, such as one-shot associative, spatial, or perceptual learning, and is a core principle of human episodic memory. We review here the neuronal and synaptic long-term changes associated with fast learning in mammals and discuss some hypotheses related to their underlying mechanisms. We first describe the variety of behavioral paradigms used to test fast learning memories: those preferentially involve a single and brief (from few hundred milliseconds to few minutes) exposures to salient stimuli, sufficient to trigger a long-lasting memory trace and new adaptive responses. We then focus on neuronal activity patterns observed during fast learning and the emergence of long-term selective responses, before documenting the physiological correlates of fast learning. In the search for the engrams of fast learning, a growing body of evidence highlights long-term changes in gene expression, structural, intrinsic, and synaptic plasticities. Finally, we discuss the potential role of the sparse and bursting nature of neuronal activity observed during the fast learning, especially in the induction plasticity mechanisms leading to the rapid establishment of long-term synaptic modifications. We conclude with more theoretical perspectives on network dynamics that could enable fast learning, with an overview of some theoretical approaches in cognitive neuroscience and artificial intelligence.
Collapse
Affiliation(s)
- Charlotte Piette
- Center for Interdisciplinary Research in Biology, College de France, INSERM U1050, CNRS UMR7241, Université PSL, Paris, France.,Department of Mathematics and Volen National Center for Complex Systems, Brandeis University, Waltham, MA, United States
| | - Jonathan Touboul
- Department of Mathematics and Volen National Center for Complex Systems, Brandeis University, Waltham, MA, United States
| | - Laurent Venance
- Center for Interdisciplinary Research in Biology, College de France, INSERM U1050, CNRS UMR7241, Université PSL, Paris, France
| |
Collapse
|
173
|
Shen X, Zhang X, Huang Y, Chen S, Wang Y. Task Learning Over Multi-Day Recording via Internally Rewarded Reinforcement Learning Based Brain Machine Interfaces. IEEE Trans Neural Syst Rehabil Eng 2020; 28:3089-3099. [PMID: 33232240 DOI: 10.1109/tnsre.2020.3039970] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Autonomous brain machine interfaces (BMIs) aim to enable paralyzed people to self-evaluate their movement intention to control external devices. Previous reinforcement learning (RL)-based decoders interpret the mapping between neural activity and movements using the external reward for well-trained subjects, and have not investigated the task learning procedure. The brain has developed a learning mechanism to identify the correct actions that lead to rewards in the new task. This internal guidance can be utilized to replace the external reference to advance BMIs as an autonomous system. In this study, we propose to build an internally rewarded reinforcement learning-based BMI framework using the multi-site recording to demonstrate the autonomous learning ability of the BMI decoder on the new task. We test the model on the neural data collected over multiple days while the rats were learning a new lever discrimination task. The primary motor cortex (M1) and medial prefrontal cortex (mPFC) spikes are interpreted by the proposed RL framework into the discrete lever press actions. The neural activity of the mPFC post the action duration is interpreted as the internal reward information, where a support vector machine is implemented to classify the reward vs. non-reward trials with a high accuracy of 87.5% across subjects. This internal reward is used to replace the external water reward to update the decoder, which is able to adapt to the nonstationary neural activity during subject learning. The multi-cortical recording allows us to take in more cortical recordings as input and uses internal critics to guide the decoder learning. Comparing with the classic decoder using M1 activity as the only input and external guidance, the proposed system with multi-cortical recordings shows a better decoding accuracy. More importantly, our internally rewarded decoder demonstrates the autonomous learning ability on the new task as the decoder successfully addresses the time-variant neural patterns while subjects are learning, and works asymptotically as the subjects' behavioral learning progresses. It reveals the potential of endowing BMIs with autonomous task learning ability in the RL framework.
Collapse
|
174
|
Tsuda B, Tye KM, Siegelmann HT, Sejnowski TJ. A modeling framework for adaptive lifelong learning with transfer and savings through gating in the prefrontal cortex. Proc Natl Acad Sci U S A 2020; 117:29872-29882. [PMID: 33154155 PMCID: PMC7703668 DOI: 10.1073/pnas.2009591117] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The prefrontal cortex encodes and stores numerous, often disparate, schemas and flexibly switches between them. Recent research on artificial neural networks trained by reinforcement learning has made it possible to model fundamental processes underlying schema encoding and storage. Yet how the brain is able to create new schemas while preserving and utilizing old schemas remains unclear. Here we propose a simple neural network framework that incorporates hierarchical gating to model the prefrontal cortex's ability to flexibly encode and use multiple disparate schemas. We show how gating naturally leads to transfer learning and robust memory savings. We then show how neuropsychological impairments observed in patients with prefrontal damage are mimicked by lesions of our network. Our architecture, which we call DynaMoE, provides a fundamental framework for how the prefrontal cortex may handle the abundance of schemas necessary to navigate the real world.
Collapse
Affiliation(s)
- Ben Tsuda
- Computational Neurobiology Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037;
- Neurosciences Graduate Program, University of California San Diego, La Jolla, CA 92093
- Medical Scientist Training Program, University of California San Diego, La Jolla, CA 92093
| | - Kay M Tye
- Systems Neuroscience Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037
| | - Hava T Siegelmann
- Biologically Inspired Neural & Dynamical Systems Laboratory, School of Computer Science, University of Massachusetts Amherst, Amherst, MA, 01003
| | - Terrence J Sejnowski
- Computational Neurobiology Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037;
- Institute for Neural Computation, University of California San Diego, La Jolla, CA 92093
- Division of Biological Sciences, University of California San Diego, La Jolla, CA 92093
| |
Collapse
|
175
|
Eckstein MK, Collins AGE. Computational evidence for hierarchically structured reinforcement learning in humans. Proc Natl Acad Sci U S A 2020; 117:29381-29389. [PMID: 33229518 PMCID: PMC7703642 DOI: 10.1073/pnas.1912330117] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Humans have the fascinating ability to achieve goals in a complex and constantly changing world, still surpassing modern machine-learning algorithms in terms of flexibility and learning speed. It is generally accepted that a crucial factor for this ability is the use of abstract, hierarchical representations, which employ structure in the environment to guide learning and decision making. Nevertheless, how we create and use these hierarchical representations is poorly understood. This study presents evidence that human behavior can be characterized as hierarchical reinforcement learning (RL). We designed an experiment to test specific predictions of hierarchical RL using a series of subtasks in the realm of context-based learning and observed several behavioral markers of hierarchical RL, such as asymmetric switch costs between changes in higher-level versus lower-level features, faster learning in higher-valued compared to lower-valued contexts, and preference for higher-valued compared to lower-valued contexts. We replicated these results across three independent samples. We simulated three models-a classic RL, a hierarchical RL, and a hierarchical Bayesian model-and compared their behavior to human results. While the flat RL model captured some aspects of participants' sensitivity to outcome values, and the hierarchical Bayesian model captured some markers of transfer, only hierarchical RL accounted for all patterns observed in human behavior. This work shows that hierarchical RL, a biologically inspired and computationally simple algorithm, can capture human behavior in complex, hierarchical environments and opens the avenue for future research in this field.
Collapse
Affiliation(s)
- Maria K Eckstein
- Department of Psychology, University of California, Berkeley, CA 94704
| | - Anne G E Collins
- Department of Psychology, University of California, Berkeley, CA 94704
| |
Collapse
|
176
|
Zhang Z, Cheng H, Yang T. A recurrent neural network framework for flexible and adaptive decision making based on sequence learning. PLoS Comput Biol 2020; 16:e1008342. [PMID: 33141824 PMCID: PMC7673505 DOI: 10.1371/journal.pcbi.1008342] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2020] [Revised: 11/18/2020] [Accepted: 09/16/2020] [Indexed: 11/25/2022] Open
Abstract
The brain makes flexible and adaptive responses in a complicated and ever-changing environment for an organism's survival. To achieve this, the brain needs to understand the contingencies between its sensory inputs, actions, and rewards. This is analogous to the statistical inference that has been extensively studied in the natural language processing field, where recent developments of recurrent neural networks have found many successes. We wonder whether these neural networks, the gated recurrent unit (GRU) networks in particular, reflect how the brain solves the contingency problem. Therefore, we build a GRU network framework inspired by the statistical learning approach of NLP and test it with four exemplar behavior tasks previously used in empirical studies. The network models are trained to predict future events based on past events, both comprising sensory, action, and reward events. We show the networks can successfully reproduce animal and human behavior. The networks generalize the training, perform Bayesian inference in novel conditions, and adapt their choices when event contingencies vary. Importantly, units in the network encode task variables and exhibit activity patterns that match previous neurophysiology findings. Our results suggest that the neural network approach based on statistical sequence learning may reflect the brain's computational principle underlying flexible and adaptive behaviors and serve as a useful approach to understand the brain.
Collapse
Affiliation(s)
- Zhewei Zhang
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, China
- University of Chinese Academy of Sciences, China
| | - Huzi Cheng
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, China
| | - Tianming Yang
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, China
- Shanghai Center for Brain Science and Brain-Inspired Intelligence Technology, China
| |
Collapse
|
177
|
Jin C, Chen W, Cao Y, Xu Z, Tan Z, Zhang X, Deng L, Zheng C, Zhou J, Shi H, Feng J. Development and evaluation of an artificial intelligence system for COVID-19 diagnosis. Nat Commun 2020; 11:5088. [PMID: 33037212 DOI: 10.1101/823377] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Accepted: 09/04/2020] [Indexed: 05/22/2023] Open
Abstract
Early detection of COVID-19 based on chest CT enables timely treatment of patients and helps control the spread of the disease. We proposed an artificial intelligence (AI) system for rapid COVID-19 detection and performed extensive statistical analysis of CTs of COVID-19 based on the AI system. We developed and evaluated our system on a large dataset with more than 10 thousand CT volumes from COVID-19, influenza-A/B, non-viral community acquired pneumonia (CAP) and non-pneumonia subjects. In such a difficult multi-class diagnosis task, our deep convolutional neural network-based system is able to achieve an area under the receiver operating characteristic curve (AUC) of 97.81% for multi-way classification on test cohort of 3,199 scans, AUC of 92.99% and 93.25% on two publicly available datasets, CC-CCII and MosMedData respectively. In a reader study involving five radiologists, the AI system outperforms all of radiologists in more challenging tasks at a speed of two orders of magnitude above them. Diagnosis performance of chest x-ray (CXR) is compared to that of CT. Detailed interpretation of deep network is also performed to relate system outputs with CT presentations. The code is available at https://github.com/ChenWWWeixiang/diagnosis_covid19 .
Collapse
Affiliation(s)
- Cheng Jin
- Department of Automation, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China
| | - Weixiang Chen
- Department of Automation, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China
| | - Yukun Cao
- Department of Radiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei Province Key Laboratory of Molecular Imaging, Wuhan, China
| | - Zhanwei Xu
- Department of Automation, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China
| | - Zimeng Tan
- Department of Automation, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China
| | - Xin Zhang
- Department of Radiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei Province Key Laboratory of Molecular Imaging, Wuhan, China
| | - Lei Deng
- Department of Automation, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China
| | - Chuansheng Zheng
- Department of Radiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei Province Key Laboratory of Molecular Imaging, Wuhan, China
| | - Jie Zhou
- Department of Automation, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China
| | - Heshui Shi
- Department of Radiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
- Hubei Province Key Laboratory of Molecular Imaging, Wuhan, China.
| | - Jianjiang Feng
- Department of Automation, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China.
| |
Collapse
|
178
|
Pelekanos V, Premereur E, Mitchell DJ, Chakraborty S, Mason S, Lee ACH, Mitchell AS. Corticocortical and Thalamocortical Changes in Functional Connectivity and White Matter Structural Integrity after Reward-Guided Learning of Visuospatial Discriminations in Rhesus Monkeys. J Neurosci 2020; 40:7887-7901. [PMID: 32900835 PMCID: PMC7548693 DOI: 10.1523/jneurosci.0364-20.2020] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Revised: 06/30/2020] [Accepted: 07/25/2020] [Indexed: 12/14/2022] Open
Abstract
The frontal cortex and temporal lobes together regulate complex learning and memory capabilities. Here, we collected resting-state functional and diffusion-weighted MRI data before and after male rhesus macaque monkeys received extensive training to learn novel visuospatial discriminations (reward-guided learning). We found functional connectivity changes in orbitofrontal, ventromedial prefrontal, inferotemporal, entorhinal, retrosplenial, and anterior cingulate cortices, the subicular complex, and the dorsal, medial thalamus. These corticocortical and thalamocortical changes in functional connectivity were accompanied by related white matter structural alterations in the uncinate fasciculus, fornix, and ventral prefrontal tract: tracts that connect (sub)cortical networks and are implicated in learning and memory processes in monkeys and humans. After the well-trained monkeys received fornix transection, they were impaired in learning new visuospatial discriminations. In addition, the functional connectivity profile that was observed after the training was altered. These changes were accompanied by white matter changes in the ventral prefrontal tract, although the integrity of the uncinate fasciculus remained unchanged. Our experiments highlight the importance of different communication relayed among corticocortical and thalamocortical circuitry for the ability to learn new visuospatial associations (learning-to-learn) and to make reward-guided decisions.SIGNIFICANCE STATEMENT Frontal neural networks and the temporal lobes contribute to reward-guided learning in mammals. Here, we provide novel insight by showing that specific corticocortical and thalamocortical functional connectivity is altered after rhesus monkeys received extensive training to learn novel visuospatial discriminations. Contiguous white matter fiber pathways linking these gray matter structures, namely, the uncinate fasciculus, fornix, and ventral prefrontal tract, showed structural changes after completing training in the visuospatial task. Additionally, different patterns of functional and structural connectivity are reported after removal of subcortical connections within the extended hippocampal system, via fornix transection. These results highlight the importance of both corticocortical and thalamocortical interactions in reward-guided learning in the normal brain and identify brain structures important for memory capabilities after injury.
Collapse
Affiliation(s)
- Vassilis Pelekanos
- Department of Experimental Psychology, University of Oxford, Oxford OX1 3SR, United Kingdom
| | - Elsie Premereur
- Laboratory for Neuro- and Psychophysiology, KU Leuven, 3000 Leuven, Belgium
| | - Daniel J Mitchell
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge CB2 7EF, United Kingdom
| | - Subhojit Chakraborty
- Department of Neuroinflammation, Queen Square Multiple Sclerosis Centre, Institute of Neurology, University College London, London WC1N 3BG, United Kingdom
| | - Stuart Mason
- Department of Experimental Psychology, University of Oxford, Oxford OX1 3SR, United Kingdom
| | - Andy C H Lee
- Department of Psychology (Scarborough), University of Toronto, Toronto, Ontario M1C 1A4, Canada
- Rotman Research Institute, Baycrest Centre, Toronto, Ontario M6A 2E1, Canada
| | - Anna S Mitchell
- Department of Experimental Psychology, University of Oxford, Oxford OX1 3SR, United Kingdom
| |
Collapse
|
179
|
van Lieshout LLF, de Lange FP, Cools R. Why so curious? Quantifying mechanisms of information seeking. Curr Opin Behav Sci 2020. [DOI: 10.1016/j.cobeha.2020.08.005] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
180
|
Park SA, Miller DS, Nili H, Ranganath C, Boorman ED. Map Making: Constructing, Combining, and Inferring on Abstract Cognitive Maps. Neuron 2020; 107:1226-1238.e8. [PMID: 32702288 PMCID: PMC7529977 DOI: 10.1016/j.neuron.2020.06.030] [Citation(s) in RCA: 90] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2019] [Revised: 05/29/2020] [Accepted: 06/24/2020] [Indexed: 10/23/2022]
Abstract
Cognitive maps enable efficient inferences from limited experience that can guide novel decisions. We tested whether the hippocampus (HC), entorhinal cortex (EC), and ventromedial prefrontal cortex (vmPFC)/medial orbitofrontal cortex (mOFC) organize abstract and discrete relational information into a cognitive map to guide novel inferences. Subjects learned the status of people in two unseen 2D social hierarchies, with each dimension learned on a separate day. Although one dimension was behaviorally relevant, multivariate activity patterns in HC, EC, and vmPFC/mOFC were linearly related to the Euclidean distance between people in the mentally reconstructed 2D space. Hubs created unique comparisons between the hierarchies, enabling inferences between novel pairs. We found that both behavior and neural activity in EC and vmPFC/mOFC reflected the Euclidean distance to the retrieved hub, which was reinstated in HC. These findings reveal how abstract and discrete relational structures are represented, are combined, and enable novel inferences in the human brain.
Collapse
Affiliation(s)
- Seongmin A Park
- Center for Mind and Brain, University of California, Davis, Davis, CA, USA; Center for Neuroscience, University of California, Davis, Davis, CA, USA.
| | - Douglas S Miller
- Center for Mind and Brain, University of California, Davis, Davis, CA, USA; Center for Neuroscience, University of California, Davis, Davis, CA, USA
| | - Hamed Nili
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, UK
| | - Charan Ranganath
- Center for Neuroscience, University of California, Davis, Davis, CA, USA; Department of Psychology, University of California, Davis, Davis, CA, USA
| | - Erie D Boorman
- Center for Mind and Brain, University of California, Davis, Davis, CA, USA; Department of Psychology, University of California, Davis, Davis, CA, USA.
| |
Collapse
|
181
|
Mark S, Moran R, Parr T, Kennerley SW, Behrens TEJ. Transferring structural knowledge across cognitive maps in humans and models. Nat Commun 2020; 11:4783. [PMID: 32963219 PMCID: PMC7508979 DOI: 10.1038/s41467-020-18254-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 08/14/2020] [Indexed: 01/15/2023] Open
Abstract
Relations between task elements often follow hidden underlying structural forms such as periodicities or hierarchies, whose inferences fosters performance. However, transferring structural knowledge to novel environments requires flexible representations that are generalizable over particularities of the current environment, such as its stimuli and size. We suggest that humans represent structural forms as abstract basis sets and that in novel tasks, the structural form is inferred and the relevant basis set is transferred. Using a computational model, we show that such representation allows inference of the underlying structural form, important task states, effective behavioural policies and the existence of unobserved state-trajectories. In two experiments, participants learned three abstract graphs during two successive days. We tested how structural knowledge acquired on Day-1 affected Day-2 performance. In line with our model, participants who had a correct structural prior were able to infer the existence of unobserved state-trajectories and appropriate behavioural policies.
Collapse
Affiliation(s)
- Shirley Mark
- Wellcome Trust Centre for Neuroimaging, UCL. Queen Square 12, London, WC1N 3BG, UK.
| | - Rani Moran
- Max Planck UCL Center for Computational Psychiatry and Aging Research, Russell Square 10-12, London, WC1B 5EH, UK
| | - Thomas Parr
- Wellcome Trust Centre for Neuroimaging, UCL. Queen Square 12, London, WC1N 3BG, UK
| | - Steve W Kennerley
- Sobell Department of Motor Neuroscience, University College London, London, UK
| | - Timothy E J Behrens
- Wellcome Centre for Integrative Neuroimaging, Centre for Functional Magnetic Resonance Imaging of the Brain, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DU, UK
- Wellcome Centre for Human Neuroimaging, Institute of Neurology, University College London, 12 Queen Square, London, WC1N 3BG, UK
| |
Collapse
|
182
|
Feasibility Analysis and Application of Reinforcement Learning Algorithm Based on Dynamic Parameter Adjustment. ALGORITHMS 2020. [DOI: 10.3390/a13090239] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Reinforcement learning, as a branch of machine learning, has been gradually applied in the control field. However, in the practical application of the algorithm, the hyperparametric approach to network settings for deep reinforcement learning still follows the empirical attempts of traditional machine learning (supervised learning and unsupervised learning). This method ignores part of the information generated by agents exploring the environment contained in the updating of the reinforcement learning value function, which will affect the performance of the convergence and cumulative return of reinforcement learning. The reinforcement learning algorithm based on dynamic parameter adjustment is a new method for setting learning rate parameters of deep reinforcement learning. Based on the traditional method of setting parameters for reinforcement learning, this method analyzes the advantages of different learning rates at different stages of reinforcement learning and dynamically adjusts the learning rates in combination with the temporal-difference (TD) error values to achieve the advantages of different learning rates in different stages to improve the rationality of the algorithm in practical application. At the same time, by combining the Robbins–Monro approximation algorithm and deep reinforcement learning algorithm, it is proved that the algorithm of dynamic regulation learning rate can theoretically meet the convergence requirements of the intelligent control algorithm. In the experiment, the effect of this method is analyzed through the continuous control scenario in the standard experimental environment of ”Car-on-The-Hill” of reinforcement learning, and it is verified that the new method can achieve better results than the traditional reinforcement learning in practical application. According to the model characteristics of the deep reinforcement learning, a more suitable setting method for the learning rate of the deep reinforcement learning network proposed. At the same time, the feasibility of the method has been proved both in theory and in the application. Therefore, the method of setting the learning rate parameter is worthy of further development and research.
Collapse
|
183
|
Prior cortical activity differences during an action observation plus motor imagery task related to motor adaptation performance of a coordinated multi-limb complex task. Cogn Neurodyn 2020; 14:769-779. [PMID: 33101530 DOI: 10.1007/s11571-020-09633-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2020] [Revised: 08/24/2020] [Accepted: 09/01/2020] [Indexed: 12/16/2022] Open
Abstract
Motor adaptation is the ability to develop new motor skills that makes performing a consolidated motor task under different psychophysical conditions possible. There exists a proven relationship between prior brain activity at rest and motor adaptation. However, the brain activity at rest is highly variable both between and within subjects. Here we hypothesize that the cortical activity during the original task to be later adapted is a more reliable and stronger determinant of motor adaptation. Consequently, we present a study to find cortical areas whose activity, both at rest and during first-person virtual reality simulation of bicycle riding, characterizes the subjects who did and did not adapt to ride a reverse steering bicycle, a complex motor adaptation task involving all limbs and balance. The results showed that cortical activity differences during the simulated task were higher, more significant, spatially larger, and spectrally wider than at rest for good performers. In this sense, the activity of the left anterior insula, left dorsolateral and ventrolateral inferior prefrontal areas, and left inferior premotor cortex (action understanding hub of the mirror neuron circuit) during simulated bicycle riding are the areas with the most descriptive power for the ability of adapting the motor task. Trials registration Trial was registered with the NIH Clinical Trials Registry (clinicaltrials.gov), with the registration number NCT02999516 (21/12/2016).
Collapse
|
184
|
Collins AGE, Cockburn J. Beyond dichotomies in reinforcement learning. Nat Rev Neurosci 2020; 21:576-586. [PMID: 32873936 DOI: 10.1038/s41583-020-0355-6] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/20/2020] [Indexed: 11/09/2022]
Abstract
Reinforcement learning (RL) is a framework of particular importance to psychology, neuroscience and machine learning. Interactions between these fields, as promoted through the common hub of RL, has facilitated paradigm shifts that relate multiple levels of analysis in a singular framework (for example, relating dopamine function to a computationally defined RL signal). Recently, more sophisticated RL algorithms have been proposed to better account for human learning, and in particular its oft-documented reliance on two separable systems: a model-based (MB) system and a model-free (MF) system. However, along with many benefits, this dichotomous lens can distort questions, and may contribute to an unnecessarily narrow perspective on learning and decision-making. Here, we outline some of the consequences that come from overconfidently mapping algorithms, such as MB versus MF RL, with putative cognitive processes. We argue that the field is well positioned to move beyond simplistic dichotomies, and we propose a means of refocusing research questions towards the rich and complex components that comprise learning and decision-making.
Collapse
Affiliation(s)
- Anne G E Collins
- Department of Psychology and the Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, USA.
| | - Jeffrey Cockburn
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA
| |
Collapse
|
185
|
Cortese A, Lau H, Kawato M. Unconscious reinforcement learning of hidden brain states supported by confidence. Nat Commun 2020; 11:4429. [PMID: 32868772 PMCID: PMC7459278 DOI: 10.1038/s41467-020-17828-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 07/13/2020] [Indexed: 12/11/2022] Open
Abstract
Can humans be trained to make strategic use of latent representations in their own brains? We investigate how human subjects can derive reward-maximizing choices from intrinsic high-dimensional information represented stochastically in neural activity. Reward contingencies are defined in real-time by fMRI multivoxel patterns; optimal action policies thereby depend on multidimensional brain activity taking place below the threshold of consciousness, by design. We find that subjects can solve the task within two hundred trials and errors, as their reinforcement learning processes interact with metacognitive functions (quantified as the meaningfulness of their decision confidence). Computational modelling and multivariate analyses identify a frontostriatal neural mechanism by which the brain may untangle the 'curse of dimensionality': synchronization of confidence representations in prefrontal cortex with reward prediction errors in basal ganglia support exploration of latent task representations. These results may provide an alternative starting point for future investigations into unconscious learning and functions of metacognition.
Collapse
Affiliation(s)
- Aurelio Cortese
- Computational Neuroscience Laboratories, ATR Institute International, 2-2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0288, Japan.
| | - Hakwan Lau
- Department of Psychology, UCLA, 1285 Franz Hall, Los Angeles, CA, 90095, USA
- Brain Research Institute, UCLA, 695 Charles E Young Dr S, Los Angeles, CA, 90095, USA
- Department of Psychology, University of Hong Kong, 627, The Jockey Club Tower, Pok Fu Lam Rd, Pok Fu Lam, Hong Kong
- State Key Laboratory for Brain and Cognitive Sciences, University of Hong Kong, 5 Sassoon Rd, Pok Fu Lam, Hong Kong
| | - Mitsuo Kawato
- Computational Neuroscience Laboratories, ATR Institute International, 2-2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0288, Japan.
- RIKEN Center for Advanced Intelligence Project, ATR Institute International, 2-2-2 Hikaridai, Seika-cho, Soraku-Gun, Kyoto, 619-0288, Japan.
| |
Collapse
|
186
|
Trial-by-trial dynamics of reward prediction error-associated signals during extinction learning and renewal. Prog Neurobiol 2020; 197:101901. [PMID: 32846162 DOI: 10.1016/j.pneurobio.2020.101901] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 07/06/2020] [Accepted: 08/18/2020] [Indexed: 11/24/2022]
Abstract
Reward prediction errors (RPEs) have been suggested to drive associative learning processes, but their precise temporal dynamics at the single-neuron level remain elusive. Here, we studied the neural correlates of RPEs, focusing on their trial-by-trial dynamics during an operant extinction learning paradigm. Within a single behavioral session, pigeons went through acquisition, extinction and renewal - the context-dependent response recovery after extinction. We recorded single units from the avian prefrontal cortex analogue, the nidopallium caudolaterale (NCL) and found that the omission of reward during extinction led to a peak of population activity that moved backwards in time as trials progressed. The chronological order of these signal changes during the progress of learning was indicative of temporal shifts of RPE signals that started during reward omission and then moved backwards to the presentation of the conditioned stimulus. Switches from operant choices to avoidance behavior (and vice versa) coincided with changes in population activity during the animals' decision-making. On the single unit level, we found more diverse patterns where some neurons' activity correlated with RPE signals whereas others correlated with the absolute value during the outcome period. Finally, we demonstrated that mere sensory contextual changes during the renewal test were sufficient to elicit signals likely associated with RPEs. Thus, RPEs are truly expectancy-driven since they can be elicited by changes in reward expectation, without an actual change in the quality or quantity of reward.
Collapse
|
187
|
Klos C, Kalle Kossio YF, Goedeke S, Gilra A, Memmesheimer RM. Dynamical Learning of Dynamics. PHYSICAL REVIEW LETTERS 2020; 125:088103. [PMID: 32909804 DOI: 10.1103/physrevlett.125.088103] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Revised: 06/24/2020] [Accepted: 07/21/2020] [Indexed: 06/11/2023]
Abstract
The ability of humans and animals to quickly adapt to novel tasks is difficult to reconcile with the standard paradigm of learning by slow synaptic weight modification. Here, we show that fixed-weight neural networks can learn to generate required dynamics by imitation. After appropriate weight pretraining, the networks quickly and dynamically adapt to learn new tasks and thereafter continue to achieve them without further teacher feedback. We explain this ability and illustrate it with a variety of target dynamics, ranging from oscillatory trajectories to driven and chaotic dynamical systems.
Collapse
Affiliation(s)
- Christian Klos
- Neural Network Dynamics and Computation, Institute of Genetics, University of Bonn, 53115 Bonn, Germany
| | | | - Sven Goedeke
- Neural Network Dynamics and Computation, Institute of Genetics, University of Bonn, 53115 Bonn, Germany
| | - Aditya Gilra
- Neural Network Dynamics and Computation, Institute of Genetics, University of Bonn, 53115 Bonn, Germany
- Department of Computer Science, and Neuroscience Institute, University of Sheffield, Sheffield S1 4DP, United Kingdom
| | - Raoul-Martin Memmesheimer
- Neural Network Dynamics and Computation, Institute of Genetics, University of Bonn, 53115 Bonn, Germany
| |
Collapse
|
188
|
Diaconescu AO, Stecy M, Kasper L, Burke CJ, Nagy Z, Mathys C, Tobler PN. Neural arbitration between social and individual learning systems. eLife 2020; 9:54051. [PMID: 32779568 PMCID: PMC7476763 DOI: 10.7554/elife.54051] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Accepted: 08/10/2020] [Indexed: 12/20/2022] Open
Abstract
Decision making requires integrating knowledge gathered from personal experiences with advice from others. The neural underpinnings of the process of arbitrating between information sources has not been fully elucidated. In this study, we formalized arbitration as the relative precision of predictions, afforded by each learning system, using hierarchical Bayesian modeling. In a probabilistic learning task, participants predicted the outcome of a lottery using recommendations from a more informed advisor and/or self-sampled outcomes. Decision confidence, as measured by the number of points participants wagered on their predictions, varied with our definition of arbitration as a ratio of precisions. Functional neuroimaging demonstrated that arbitration signals were independent of decision confidence and involved modality-specific brain regions. Arbitrating in favor of self-gathered information activated the dorsolateral prefrontal cortex and the midbrain, whereas arbitrating in favor of social information engaged the ventromedial prefrontal cortex and the amygdala. These findings indicate that relative precision captures arbitration between social and individual learning systems at both behavioral and neural levels.
Collapse
Affiliation(s)
- Andreea Oliviana Diaconescu
- Translational Neuromodeling Unit, Institute for Biomedical Engineering, University of Zurich & ETH Zurich, Zurich, Switzerland.,Laboratory for Social and Neural Systems Research, Department of Economics, University of Zurich, Zurich, Switzerland.,University of Basel, Department of Psychiatry (UPK), Basel, Switzerland.,Krembil Centre for Neuroinformatics, Centre for Addiction and Mental Health (CAMH), University of Toronto, Toronto, Canada
| | - Madeline Stecy
- Translational Neuromodeling Unit, Institute for Biomedical Engineering, University of Zurich & ETH Zurich, Zurich, Switzerland.,Laboratory for Social and Neural Systems Research, Department of Economics, University of Zurich, Zurich, Switzerland.,Rutgers Robert Wood Johnson Medical School, New Brunswick, United States
| | - Lars Kasper
- Translational Neuromodeling Unit, Institute for Biomedical Engineering, University of Zurich & ETH Zurich, Zurich, Switzerland.,Laboratory for Social and Neural Systems Research, Department of Economics, University of Zurich, Zurich, Switzerland.,Institute for Biomedical Engineering, MRI Technology Group, ETH Zürich & University of Zurich, Zurich, Switzerland
| | - Christopher J Burke
- Laboratory for Social and Neural Systems Research, Department of Economics, University of Zurich, Zurich, Switzerland
| | - Zoltan Nagy
- Laboratory for Social and Neural Systems Research, Department of Economics, University of Zurich, Zurich, Switzerland
| | - Christoph Mathys
- Translational Neuromodeling Unit, Institute for Biomedical Engineering, University of Zurich & ETH Zurich, Zurich, Switzerland.,Interacting Minds Centre, Aarhus University, Aarhus, Denmark.,Scuola Internazionale Superiore di Studi Avanzati (SISSA), Trieste, Italy
| | - Philippe N Tobler
- Laboratory for Social and Neural Systems Research, Department of Economics, University of Zurich, Zurich, Switzerland
| |
Collapse
|
189
|
Deep Reinforcement Learning and Its Neuroscientific Implications. Neuron 2020; 107:603-616. [DOI: 10.1016/j.neuron.2020.06.014] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Revised: 06/08/2020] [Accepted: 06/12/2020] [Indexed: 11/23/2022]
|
190
|
Dissociable Neural Systems Support the Learning and Transfer of Hierarchical Control Structure. J Neurosci 2020; 40:6624-6637. [PMID: 32690614 DOI: 10.1523/jneurosci.0847-20.2020] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 05/15/2020] [Accepted: 07/08/2020] [Indexed: 11/21/2022] Open
Abstract
Humans can draw insight from previous experiences to quickly adapt to novel environments that share a common underlying structure. Here we combine functional imaging and computational modeling to identify the neural systems that support the discovery and transfer of hierarchical task structure. Human subjects (male and female) completed multiple blocks of a reinforcement learning task that contained a global hierarchical structure governing stimulus-response action mapping. First, behavioral and computational evidence showed that humans successfully discover and transfer the hierarchical rule structure embedded within the task. Next, analysis of fMRI BOLD data revealed activity across a frontoparietal network that was specifically associated with the discovery of this embedded structure. Finally, activity throughout a cingulo-opercular network supported the transfer and implementation of this discovered structure. Together, these results reveal a division of labor in which dissociable neural systems support the learning and transfer of abstract control structures.SIGNIFICANCE STATEMENT A fundamental and defining feature of human behavior is the ability to generalize knowledge from the past to support future action. Although the neural circuits underlying more direct forms of learning have been well established over the last century, we still lack a solid framework from which to investigate more abstract, higher-order human learning and knowledge generalization. We designed a novel behavioral paradigm to specifically isolate a learning process in which previous knowledge, rather than directly indicating the correct action, instead guides the search for the correct action. Moreover, we identify that this learning process is achieved via the coordinated and temporally specific activity of two prominent cognitive control brain networks.
Collapse
|
191
|
Self-organization of action hierarchy and compositionality by reinforcement learning with recurrent neural networks. Neural Netw 2020; 129:149-162. [PMID: 32534378 DOI: 10.1016/j.neunet.2020.06.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Revised: 05/25/2020] [Accepted: 06/02/2020] [Indexed: 11/20/2022]
Abstract
Recurrent neural networks (RNNs) for reinforcement learning (RL) have shown distinct advantages, e.g., solving memory-dependent tasks and meta-learning. However, little effort has been spent on improving RNN architectures and on understanding the underlying neural mechanisms for performance gain. In this paper, we propose a novel, multiple-timescale, stochastic RNN for RL. Empirical results show that the network can autonomously learn to abstract sub-goals and can self-develop an action hierarchy using internal dynamics in a challenging continuous control task. Furthermore, we show that the self-developed compositionality of the network enhances faster re-learning when adapting to a new task that is a re-composition of previously learned sub-goals, than when starting from scratch. We also found that improved performance can be achieved when neural activities are subject to stochastic rather than deterministic dynamics.
Collapse
|
192
|
Smith R, Schwartenbeck P, Parr T, Friston KJ. An Active Inference Approach to Modeling Structure Learning: Concept Learning as an Example Case. Front Comput Neurosci 2020; 14:41. [PMID: 32508611 PMCID: PMC7250191 DOI: 10.3389/fncom.2020.00041] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Accepted: 04/17/2020] [Indexed: 11/13/2022] Open
Abstract
Within computational neuroscience, the algorithmic and neural basis of structure learning remains poorly understood. Concept learning is one primary example, which requires both a type of internal model expansion process (adding novel hidden states that explain new observations), and a model reduction process (merging different states into one underlying cause and thus reducing model complexity via meta-learning). Although various algorithmic models of concept learning have been proposed within machine learning and cognitive science, many are limited to various degrees by an inability to generalize, the need for very large amounts of training data, and/or insufficiently established biological plausibility. Using concept learning as an example case, we introduce a novel approach for modeling structure learning-and specifically state-space expansion and reduction-within the active inference framework and its accompanying neural process theory. Our aim is to demonstrate its potential to facilitate a novel line of active inference research in this area. The approach we lay out is based on the idea that a generative model can be equipped with extra (hidden state or cause) "slots" that can be engaged when an agent learns about novel concepts. This can be combined with a Bayesian model reduction process, in which any concept learning-associated with these slots-can be reset in favor of a simpler model with higher model evidence. We use simulations to illustrate this model's ability to add new concepts to its state space (with relatively few observations) and increase the granularity of the concepts it currently possesses. We also simulate the predicted neural basis of these processes. We further show that it can accomplish a simple form of "one-shot" generalization to new stimuli. Although deliberately simple, these simulation results highlight ways in which active inference could offer useful resources in developing neurocomputational models of structure learning. They provide a template for how future active inference research could apply this approach to real-world structure learning problems and assess the added utility it may offer.
Collapse
Affiliation(s)
- Ryan Smith
- Laureate Institute for Brain Research, Tulsa, OK, United States
| | - Philipp Schwartenbeck
- Wellcome Centre for Human Neuroimaging, Institute of Neurology, University College London, London, United Kingdom
| | - Thomas Parr
- Wellcome Centre for Human Neuroimaging, Institute of Neurology, University College London, London, United Kingdom
| | - Karl J. Friston
- Wellcome Centre for Human Neuroimaging, Institute of Neurology, University College London, London, United Kingdom
| |
Collapse
|
193
|
Bartolo R, Saunders RC, Mitz AR, Averbeck BB. Dimensionality, information and learning in prefrontal cortex. PLoS Comput Biol 2020; 16:e1007514. [PMID: 32330126 PMCID: PMC7202668 DOI: 10.1371/journal.pcbi.1007514] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 05/06/2020] [Accepted: 03/11/2020] [Indexed: 01/12/2023] Open
Abstract
Learning leads to changes in population patterns of neural activity. In this study we wanted to examine how these changes in patterns of activity affect the dimensionality of neural responses and information about choices. We addressed these questions by carrying out high channel count recordings in dorsal-lateral prefrontal cortex (dlPFC; 768 electrodes) while monkeys performed a two-armed bandit reinforcement learning task. The high channel count recordings allowed us to study population coding while monkeys learned choices between actions or objects. We found that the dimensionality of neural population activity was higher across blocks in which animals learned the values of novel pairs of objects, than across blocks in which they learned the values of actions. The increase in dimensionality with learning in object blocks was related to less shared information across blocks, and therefore patterns of neural activity that were less similar, when compared to learning in action blocks. Furthermore, these differences emerged with learning, and were not a simple function of the choice of a visual image or action. Therefore, learning the values of novel objects increases the dimensionality of neural representations in dlPFC. In this study we found that learning to choose rewarding objects increased the diversity of patterns of activity, measured as the dimensionality of the response, observed in dorsal-lateral prefrontal cortex. The dimensionality increase for learning to choose rewarding objects was larger than the dimensionality increase for learning to choose rewarding actions. The dimensionality increase was not a simple function of the diverse set of images used, as the patterns of activity only appeared after learning.
Collapse
Affiliation(s)
- Ramon Bartolo
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Richard C. Saunders
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Andrew R. Mitz
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Bruno B. Averbeck
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|
194
|
Bartolo R, Averbeck BB. Prefrontal Cortex Predicts State Switches during Reversal Learning. Neuron 2020; 106:1044-1054.e4. [PMID: 32315603 DOI: 10.1016/j.neuron.2020.03.024] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Revised: 01/28/2020] [Accepted: 03/24/2020] [Indexed: 11/25/2022]
Abstract
Reinforcement learning allows organisms to predict future outcomes and to update their beliefs about value in the world. The dorsal-lateral prefrontal cortex (dlPFC) integrates information carried by reward circuits, which can be used to infer the current state of the world under uncertainty. Here, we explored the dlPFC computations related to updating current beliefs during stochastic reversal learning. We recorded the activity of populations up to 1,000 neurons, simultaneously, in two male macaques while they executed a two-armed bandit reversal learning task. Behavioral analyses using a Bayesian framework showed that animals inferred reversals and switched their choice preference rapidly, rather than slowly updating choice values, consistent with state inference. Furthermore, dlPFC neural populations accurately encoded choice preference switches. These results suggest that prefrontal neurons dynamically encode decisions associated with Bayesian subjective values, highlighting the role of the PFC in representing a belief about the current state of the world.
Collapse
Affiliation(s)
- Ramon Bartolo
- Laboratory of Neuropsychology, National Institute of Mental Health/National Institutes of Health, Bethesda, MD 20892-4415, USA.
| | - Bruno B Averbeck
- Laboratory of Neuropsychology, National Institute of Mental Health/National Institutes of Health, Bethesda, MD 20892-4415, USA
| |
Collapse
|
195
|
Huang Y, Yaple ZA, Yu R. Goal-oriented and habitual decisions: Neural signatures of model-based and model-free learning. Neuroimage 2020; 215:116834. [PMID: 32283275 DOI: 10.1016/j.neuroimage.2020.116834] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 03/03/2020] [Accepted: 04/08/2020] [Indexed: 11/26/2022] Open
Abstract
Human decision-making is mainly driven by two fundamental learning processes: a slow, deliberative, goal-directed model-based process that maps out the potential outcomes of all options and a rapid habitual model-free process that enables reflexive repetition of previously successful choices. Although many model-informed neuroimaging studies have examined the neural correlates of model-based and model-free learning, the concordant activity among these two processes remains unclear. We used quantitative meta-analyses of functional magnetic resonance imaging experiments to identify the concordant activity pertaining to model-based and model-free learning over a range of reward-related paradigms. We found that: 1) both processes yielded concordant ventral striatum activity, 2) model-based learning activated the medial prefrontal cortex and orbital frontal cortex, and 3) model-free learning specifically activated the left globus pallidus and right caudate head. Our findings suggest that model-free and model-based decision making engage overlapping yet distinct neural regions. These stereotaxic maps improve our understanding of how deliberative goal-directed and reflexive habitual learning are implemented in the brain.
Collapse
Affiliation(s)
- Yi Huang
- NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore, Singapore
| | - Zachary A Yaple
- Department of Psychology, National University of Singapore, Singapore
| | - Rongjun Yu
- NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore, Singapore; Department of Psychology, National University of Singapore, Singapore.
| |
Collapse
|
196
|
Hong C, Wei X, Wang J, Deng B, Yu H, Che Y. Training Spiking Neural Networks for Cognitive Tasks: A Versatile Framework Compatible With Various Temporal Codes. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:1285-1296. [PMID: 31247574 DOI: 10.1109/tnnls.2019.2919662] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Recent studies have demonstrated the effectiveness of supervised learning in spiking neural networks (SNNs). A trainable SNN provides a valuable tool not only for engineering applications but also for theoretical neuroscience studies. Here, we propose a modified SpikeProp learning algorithm, which ensures better learning stability for SNNs and provides more diverse network structures and coding schemes. Specifically, we designed a spike gradient threshold rule to solve the well-known gradient exploding problem in SNN training. In addition, regulation rules on firing rates and connection weights are proposed to control the network activity during training. Based on these rules, biologically realistic features such as lateral connections, complex synaptic dynamics, and sparse activities are included in the network to facilitate neural computation. We demonstrate the versatility of this framework by implementing three well-known temporal codes for different types of cognitive tasks, namely, handwritten digit recognition, spatial coordinate transformation, and motor sequence generation. Several important features observed in experimental studies, such as selective activity, excitatory-inhibitory balance, and weak pairwise correlation, emerged in the trained model. This agreement between experimental and computational results further confirmed the importance of these features in neural function. This work provides a new framework, in which various neural behaviors can be modeled and the underlying computational mechanisms can be studied.
Collapse
|
197
|
Ergo K, De Loof E, Verguts T. Reward Prediction Error and Declarative Memory. Trends Cogn Sci 2020; 24:388-397. [PMID: 32298624 DOI: 10.1016/j.tics.2020.02.009] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Revised: 02/03/2020] [Accepted: 02/22/2020] [Indexed: 01/04/2023]
Abstract
Learning based on reward prediction error (RPE) was originally proposed in the context of nondeclarative memory. We postulate that RPE may support declarative memory as well. Indeed, recent years have witnessed a number of independent empirical studies reporting effects of RPE on declarative memory. We provide a brief overview of these studies, identify emerging patterns, and discuss open issues such as the role of signed versus unsigned RPEs in declarative learning.
Collapse
Affiliation(s)
- Kate Ergo
- Department of Experimental Psychology, Ghent University, Henri Dunantlaan 2, B-9000 Ghent, Belgium
| | - Esther De Loof
- Department of Experimental Psychology, Ghent University, Henri Dunantlaan 2, B-9000 Ghent, Belgium
| | - Tom Verguts
- Department of Experimental Psychology, Ghent University, Henri Dunantlaan 2, B-9000 Ghent, Belgium.
| |
Collapse
|
198
|
Masse NY, Rosen MC, Freedman DJ. Reevaluating the Role of Persistent Neural Activity in Short-Term Memory. Trends Cogn Sci 2020; 24:242-258. [PMID: 32007384 PMCID: PMC7288241 DOI: 10.1016/j.tics.2019.12.014] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 12/19/2019] [Accepted: 12/23/2019] [Indexed: 12/18/2022]
Abstract
A traditional view of short-term working memory (STM) is that task-relevant information is maintained 'online' in persistent spiking activity. However, recent experimental and modeling studies have begun to question this long-held belief. In this review, we discuss new evidence demonstrating that information can be 'silently' maintained via short-term synaptic plasticity (STSP) without the need for persistent activity. We discuss how the neural mechanisms underlying STM are inextricably linked with the cognitive demands of the task, such that the passive maintenance and the active manipulation of information are subserved differently in the brain. Together, these recent findings point towards a more nuanced view of STM in which multiple substrates work in concert to support our ability to temporarily maintain and manipulate information.
Collapse
Affiliation(s)
- Nicolas Y Masse
- Department of Neurobiology, The University of Chicago, Chicago, IL, USA.
| | - Matthew C Rosen
- Department of Neurobiology, The University of Chicago, Chicago, IL, USA
| | - David J Freedman
- Department of Neurobiology, The University of Chicago, Chicago, IL, USA; Grossman Institute for Neuroscience, Quantitative Biology and Human Behavior, The University of Chicago, Chicago, IL, USA.
| |
Collapse
|
199
|
Bulley A, Schacter DL. Deliberating trade-offs with the future. Nat Hum Behav 2020; 4:238-247. [PMID: 32184495 PMCID: PMC7147875 DOI: 10.1038/s41562-020-0834-9] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 02/05/2020] [Indexed: 12/12/2022]
Abstract
Many fundamental choices in life are intertemporal: they involve trade-offs between sooner and later outcomes. In recent years there has been a surge of interest into how people make intertemporal decisions, given that such decisions are ubiquitous in everyday life and central in domains from substance use to climate change action. While it is clear that people make decisions according to rules, intuitions and habits, they also commonly deliberate over their options, thinking through potential outcomes and reflecting on their own preferences. In this Perspective, we bring to bear recent research into the higher-order capacities that underpin deliberation-particularly those that enable people to think about the future (prospection) and their own thinking (metacognition)-to shed light on intertemporal decision-making. We show how a greater appreciation for these mechanisms of deliberation promises to advance our understanding of intertemporal decision-making and unify a wide range of otherwise disparate choice phenomena.
Collapse
Affiliation(s)
- Adam Bulley
- Department of Psychology, Harvard University, Cambridge, MA, USA.
- The University of Sydney, School of Psychology and Brain and Mind Centre, Sydney, NSW, Australia.
| | | |
Collapse
|
200
|
A distributional code for value in dopamine-based reinforcement learning. Nature 2020; 577:671-675. [PMID: 31942076 DOI: 10.1038/s41586-019-1924-6] [Citation(s) in RCA: 174] [Impact Index Per Article: 43.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Accepted: 11/19/2019] [Indexed: 12/12/2022]
Abstract
Since its introduction, the reward prediction error theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain1-3. According to the now canonical theory, reward predictions are represented as a single scalar quantity, which supports learning about the expectation, or mean, of stochastic outcomes. Here we propose an account of dopamine-based reinforcement learning inspired by recent artificial intelligence research on distributional reinforcement learning4-6. We hypothesized that the brain represents possible future rewards not as a single mean, but instead as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. This idea implies a set of empirical predictions, which we tested using single-unit recordings from mouse ventral tegmental area. Our findings provide strong evidence for a neural realization of distributional reinforcement learning.
Collapse
|