1
|
Buckley M, McGregor A, Ihssen N, Austen J, Thurlbeck S, Smith SP, Heinecke A, Lew AR. The well-worn route revisited: Striatal and hippocampal system contributions to familiar route navigation. Hippocampus 2024; 34:310-326. [PMID: 38721743 DOI: 10.1002/hipo.23607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 02/20/2024] [Accepted: 04/17/2024] [Indexed: 06/21/2024]
Abstract
Classic research has shown a division in the neuroanatomical structures that support flexible (e.g., short-cutting) and habitual (e.g., familiar route following) navigational behavior, with hippocampal-caudate systems associated with the former and putamen systems with the latter. There is, however, disagreement about whether the neural structures involved in navigation process particular forms of spatial information, such as associations between constellations of cues forming a cognitive map, versus single landmark-action associations, or alternatively, perform particular reinforcement learning algorithms that allow the use of different spatial strategies, so-called model-based (flexible) or model-free (habitual) forms of learning. We sought to test these theories by asking participants (N = 24) to navigate within a virtual environment through a previously learned, 9-junction route with distinctive landmarks at each junction while undergoing functional magnetic resonance imaging (fMRI). In a series of probe trials, we distinguished knowledge of individual landmark-action associations along the route versus knowledge of the correct sequence of landmark-action associations, either by having absent landmarks, or "out-of-sequence" landmarks. Under a map-based perspective, sequence knowledge would not require hippocampal systems, because there are no constellations of cues available for cognitive map formation. Within a learning-based model, however, responding based on knowledge of sequence would require hippocampal systems because prior context has to be utilized. We found that hippocampal-caudate systems were more active in probes requiring sequence knowledge, supporting the learning-based model. However, we also found greater putamen activation in probes where navigation based purely on sequence memory could be planned, supporting models of putamen function that emphasize its role in action sequencing.
Collapse
Affiliation(s)
| | | | - Niklas Ihssen
- Department of Psychology, Durham University, Durham, UK
| | - Joseph Austen
- Department of Psychology, Durham University, Durham, UK
| | | | - Shamus P Smith
- School of Information and Physical Sciences, University of Newcastle Australia, Callaghan, New South Wales, Australia
| | | | - Adina R Lew
- Department of Psychology, Lancaster University, Lancaster, UK
| |
Collapse
|
2
|
Parrini M, Tricot G, Caroni P, Spolidoro M. Circuit mechanisms of navigation strategy learning in mice. Curr Biol 2024; 34:79-91.e4. [PMID: 38101403 DOI: 10.1016/j.cub.2023.11.047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 10/09/2023] [Accepted: 11/22/2023] [Indexed: 12/17/2023]
Abstract
Navigation tasks involve the gradual selection and deployment of increasingly effective searching procedures to reach targets. The brain mechanisms underlying such complex behavior are poorly understood, but their elucidation might provide insights into the systems linking exploration and decision making in complex learning. Here, we developed a trial-by-trial goal-related search strategy analysis as mice learned to navigate identical water mazes encompassing distinct goal-related rules and monitored the strategy deployment process throughout learning. We found that navigation learning involved the following three distinct phases: an early phase during which maze-specific search strategies are deployed in a minority of trials, a second phase of preferential increasing deployment of one search strategy, and a final phase of increasing commitment to this strategy only. The three maze learning phases were affected differently by inhibition of retrosplenial cortex (RSC), dorsomedial striatum (DMS), or dorsolateral striatum (DLS). Through brain region-specific inactivation experiments and gain-of-function experiments involving activation of learning-related cFos+ ensembles, we unraveled how goal-related strategy selection relates to deployment throughout these sequential processes. We found that RSC is critically important for search strategy selection, DMS mediates strategy deployment, and DLS ensures searching consistency throughout maze learning. Notably, activation of specific learning-related ensembles was sufficient to direct strategy selection (RSC) or strategy deployment (DMS) in a different maze. Our results establish a goal-related search strategy deployment approach to dissect unsupervised navigation learning processes and suggest that effective searching in navigation involves evidence-based goal-related strategy direction by RSC, reinforcement-modulated strategy deployment through DMS, and online guidance through DLS.
Collapse
Affiliation(s)
- Martina Parrini
- Friedrich Miescher Institute for Biomedical Research, 4058 Basel, Switzerland
| | - Guillaume Tricot
- Friedrich Miescher Institute for Biomedical Research, 4058 Basel, Switzerland
| | - Pico Caroni
- Friedrich Miescher Institute for Biomedical Research, 4058 Basel, Switzerland.
| | - Maria Spolidoro
- Friedrich Miescher Institute for Biomedical Research, 4058 Basel, Switzerland.
| |
Collapse
|
3
|
Diekmann N, Vijayabaskaran S, Zeng X, Kappel D, Menezes MC, Cheng S. CoBeL-RL: A neuroscience-oriented simulation framework for complex behavior and learning. Front Neuroinform 2023; 17:1134405. [PMID: 36970657 PMCID: PMC10033763 DOI: 10.3389/fninf.2023.1134405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 02/17/2023] [Indexed: 03/11/2023] Open
Abstract
Reinforcement learning (RL) has become a popular paradigm for modeling animal behavior, analyzing neuronal representations, and studying their emergence during learning. This development has been fueled by advances in understanding the role of RL in both the brain and artificial intelligence. However, while in machine learning a set of tools and standardized benchmarks facilitate the development of new methods and their comparison to existing ones, in neuroscience, the software infrastructure is much more fragmented. Even if sharing theoretical principles, computational studies rarely share software frameworks, thereby impeding the integration or comparison of different results. Machine learning tools are also difficult to port to computational neuroscience since the experimental requirements are usually not well aligned. To address these challenges we introduce CoBeL-RL, a closed-loop simulator of complex behavior and learning based on RL and deep neural networks. It provides a neuroscience-oriented framework for efficiently setting up and running simulations. CoBeL-RL offers a set of virtual environments, e.g., T-maze and Morris water maze, which can be simulated at different levels of abstraction, e.g., a simple gridworld or a 3D environment with complex visual stimuli, and set up using intuitive GUI tools. A range of RL algorithms, e.g., Dyna-Q and deep Q-network algorithms, is provided and can be easily extended. CoBeL-RL provides tools for monitoring and analyzing behavior and unit activity, and allows for fine-grained control of the simulation via interfaces to relevant points in its closed-loop. In summary, CoBeL-RL fills an important gap in the software toolbox of computational neuroscience.
Collapse
Affiliation(s)
- Nicolas Diekmann
- Faculty for Computer Science, Institute for Neural Computation, Ruhr University Bochum, Bochum, Germany
- International Graduate School of Neuroscience, Ruhr University Bochum, Bochum, Germany
| | - Sandhiya Vijayabaskaran
- Faculty for Computer Science, Institute for Neural Computation, Ruhr University Bochum, Bochum, Germany
| | - Xiangshuai Zeng
- Faculty for Computer Science, Institute for Neural Computation, Ruhr University Bochum, Bochum, Germany
- International Graduate School of Neuroscience, Ruhr University Bochum, Bochum, Germany
| | - David Kappel
- Faculty for Computer Science, Institute for Neural Computation, Ruhr University Bochum, Bochum, Germany
| | - Matheus Chaves Menezes
- Laboratory of Artificial Cognition Methods for Optimisation and Robotics, Federal University of Maranhão, São Luís, Brazil
| | - Sen Cheng
- Faculty for Computer Science, Institute for Neural Computation, Ruhr University Bochum, Bochum, Germany
- *Correspondence: Sen Cheng
| |
Collapse
|
4
|
Kassim FM, Lahooti SK, Keay EA, Iyyalol R, Rodger J, Albrecht MA, Martin-Iverson MT. Dexamphetamine widens temporal and spatial binding windows in healthy participants. J Psychiatry Neurosci 2023; 48:E90-E98. [PMID: 36918195 PMCID: PMC10019325 DOI: 10.1503/jpn.220149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 09/28/2022] [Accepted: 11/11/2022] [Indexed: 03/16/2023] Open
Abstract
BACKGROUND The pathophysiology of psychosis is complex, but a better understanding of stimulus binding windows (BWs) could help to improve our knowledge base. Previous studies have shown that dopamine release is associated with psychosis and widened BWs. We can probe BW mechanisms using drugs of specific interest to psychosis. Therefore, we were interested in understanding how manipulation of the dopamine or catecholamine systems affect psychosis and BWs. We aimed to investigate the effect of dexamphetamine, as a dopamine-releasing stimulant, on the BWs in a unimodal illusion: the tactile funneling illusion (TFI). METHODS We conducted a randomized, double-blind, counterbalanced placebo-controlled crossover study to investigate funnelling and errors of localization. We administered dexamphetamine (0.45 mg/kg) to 46 participants. We manipulated 5 spatial (5-1 cm) and 3 temporal (0, 500 and 750 ms) conditions in the TFI. RESULTS We found that dexamphetamine increased funnelling illusion (p = 0.009) and increased the error of localization in a delay-dependent manner (p = 0.03). We also found that dexamphetamine significantly increased the error of localization at 500 ms temporal separation and 4 cm spatial separation (p interaction = 0.009; p 500ms|4cm v. baseline = 0.01). LIMITATIONS Although amphetamine-induced models of psychosis are a useful approach to understanding the physiology of psychosis related to dopamine hyperactivity, dexamphetamine is equally effective at releasing noradrenaline and dopamine, and, therefore, we were unable to tease apart the effects of the 2 systems on BWs in our study. CONCLUSION We found that dexamphetamine increases illusory perception on the unimodal TFI in healthy participants, which suggests that dopamine or other catecholamines have a role in increasing tactile spatial and temporal BWs.
Collapse
Affiliation(s)
- Faiz M Kassim
- From the Department of Psychiatry, St. Paul's Hospital Millennium Medical College, Addis Ababa, Ethiopia (Kassim); the Psychopharmacology Unit, School of Biomedical Sciences, University of Western Australia, Perth, WA, Australia (Kassim, Lahooti, Keay, Martin-Iverson); the Psychiatry, Graylands Hospital, Mt Claremont, Perth, WA, Australia (Iyyalol); the Experimental and Regenerative Neurosciences, School of Biological Sciences, University of Western Australia, Crawley, WA, Australia (Rodger); the Brain Plasticity Group, Perron Institute for Neurological and Translational Science, Nedlands, WA, Australia (Rodger); the Western Australian Centre for Road Safety Research, School of Psychological Science, University of Western Australia, Perth, WA, Australia (Albrecht)
| | - Samra Krakonja Lahooti
- From the Department of Psychiatry, St. Paul's Hospital Millennium Medical College, Addis Ababa, Ethiopia (Kassim); the Psychopharmacology Unit, School of Biomedical Sciences, University of Western Australia, Perth, WA, Australia (Kassim, Lahooti, Keay, Martin-Iverson); the Psychiatry, Graylands Hospital, Mt Claremont, Perth, WA, Australia (Iyyalol); the Experimental and Regenerative Neurosciences, School of Biological Sciences, University of Western Australia, Crawley, WA, Australia (Rodger); the Brain Plasticity Group, Perron Institute for Neurological and Translational Science, Nedlands, WA, Australia (Rodger); the Western Australian Centre for Road Safety Research, School of Psychological Science, University of Western Australia, Perth, WA, Australia (Albrecht)
| | - Elizabeth Ann Keay
- From the Department of Psychiatry, St. Paul's Hospital Millennium Medical College, Addis Ababa, Ethiopia (Kassim); the Psychopharmacology Unit, School of Biomedical Sciences, University of Western Australia, Perth, WA, Australia (Kassim, Lahooti, Keay, Martin-Iverson); the Psychiatry, Graylands Hospital, Mt Claremont, Perth, WA, Australia (Iyyalol); the Experimental and Regenerative Neurosciences, School of Biological Sciences, University of Western Australia, Crawley, WA, Australia (Rodger); the Brain Plasticity Group, Perron Institute for Neurological and Translational Science, Nedlands, WA, Australia (Rodger); the Western Australian Centre for Road Safety Research, School of Psychological Science, University of Western Australia, Perth, WA, Australia (Albrecht)
| | - Rajan Iyyalol
- From the Department of Psychiatry, St. Paul's Hospital Millennium Medical College, Addis Ababa, Ethiopia (Kassim); the Psychopharmacology Unit, School of Biomedical Sciences, University of Western Australia, Perth, WA, Australia (Kassim, Lahooti, Keay, Martin-Iverson); the Psychiatry, Graylands Hospital, Mt Claremont, Perth, WA, Australia (Iyyalol); the Experimental and Regenerative Neurosciences, School of Biological Sciences, University of Western Australia, Crawley, WA, Australia (Rodger); the Brain Plasticity Group, Perron Institute for Neurological and Translational Science, Nedlands, WA, Australia (Rodger); the Western Australian Centre for Road Safety Research, School of Psychological Science, University of Western Australia, Perth, WA, Australia (Albrecht)
| | - Jennifer Rodger
- From the Department of Psychiatry, St. Paul's Hospital Millennium Medical College, Addis Ababa, Ethiopia (Kassim); the Psychopharmacology Unit, School of Biomedical Sciences, University of Western Australia, Perth, WA, Australia (Kassim, Lahooti, Keay, Martin-Iverson); the Psychiatry, Graylands Hospital, Mt Claremont, Perth, WA, Australia (Iyyalol); the Experimental and Regenerative Neurosciences, School of Biological Sciences, University of Western Australia, Crawley, WA, Australia (Rodger); the Brain Plasticity Group, Perron Institute for Neurological and Translational Science, Nedlands, WA, Australia (Rodger); the Western Australian Centre for Road Safety Research, School of Psychological Science, University of Western Australia, Perth, WA, Australia (Albrecht)
| | - Matthew A Albrecht
- From the Department of Psychiatry, St. Paul's Hospital Millennium Medical College, Addis Ababa, Ethiopia (Kassim); the Psychopharmacology Unit, School of Biomedical Sciences, University of Western Australia, Perth, WA, Australia (Kassim, Lahooti, Keay, Martin-Iverson); the Psychiatry, Graylands Hospital, Mt Claremont, Perth, WA, Australia (Iyyalol); the Experimental and Regenerative Neurosciences, School of Biological Sciences, University of Western Australia, Crawley, WA, Australia (Rodger); the Brain Plasticity Group, Perron Institute for Neurological and Translational Science, Nedlands, WA, Australia (Rodger); the Western Australian Centre for Road Safety Research, School of Psychological Science, University of Western Australia, Perth, WA, Australia (Albrecht)
| | - Mathew T Martin-Iverson
- From the Department of Psychiatry, St. Paul's Hospital Millennium Medical College, Addis Ababa, Ethiopia (Kassim); the Psychopharmacology Unit, School of Biomedical Sciences, University of Western Australia, Perth, WA, Australia (Kassim, Lahooti, Keay, Martin-Iverson); the Psychiatry, Graylands Hospital, Mt Claremont, Perth, WA, Australia (Iyyalol); the Experimental and Regenerative Neurosciences, School of Biological Sciences, University of Western Australia, Crawley, WA, Australia (Rodger); the Brain Plasticity Group, Perron Institute for Neurological and Translational Science, Nedlands, WA, Australia (Rodger); the Western Australian Centre for Road Safety Research, School of Psychological Science, University of Western Australia, Perth, WA, Australia (Albrecht)
| |
Collapse
|
5
|
Reducing Computational Cost During Robot Navigation and Human–Robot Interaction with a Human-Inspired Reinforcement Learning Architecture. Int J Soc Robot 2022. [DOI: 10.1007/s12369-022-00942-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
6
|
Vijayabaskaran S, Cheng S. Navigation task and action space drive the emergence of egocentric and allocentric spatial representations. PLoS Comput Biol 2022; 18:e1010320. [PMID: 36315587 PMCID: PMC9648855 DOI: 10.1371/journal.pcbi.1010320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 11/10/2022] [Accepted: 10/18/2022] [Indexed: 11/12/2022] Open
Abstract
In general, strategies for spatial navigation could employ one of two spatial reference frames: egocentric or allocentric. Notwithstanding intuitive explanations, it remains unclear however under what circumstances one strategy is chosen over another, and how neural representations should be related to the chosen strategy. Here, we first use a deep reinforcement learning model to investigate whether a particular type of navigation strategy arises spontaneously during spatial learning without imposing a bias onto the model. We then examine the spatial representations that emerge in the network to support navigation. To this end, we study two tasks that are ethologically valid for mammals—guidance, where the agent has to navigate to a goal location fixed in allocentric space, and aiming, where the agent navigates to a visible cue. We find that when both navigation strategies are available to the agent, the solutions it develops for guidance and aiming are heavily biased towards the allocentric or the egocentric strategy, respectively, as one might expect. Nevertheless, the agent can learn both tasks using either type of strategy. Furthermore, we find that place-cell-like allocentric representations emerge preferentially in guidance when using an allocentric strategy, whereas egocentric vector representations emerge when using an egocentric strategy in aiming. We thus find that alongside the type of navigational strategy, the nature of the task plays a pivotal role in the type of spatial representations that emerge. Most species rely on navigation in space to find water, food, and mates, as well as to return home. When navigating, humans and animals can use one of two reference frames: one based on stable landmarks in the external environment, such as moving due north and then east, or one centered on oneself, such as moving forward and turning left. However, it remains unclear how these reference frames are chosen and interact in navigation tasks, as well as how they are supported by representations in the brain. We therefore modeled two navigation tasks that would each benefit from using one of these reference frames, and trained an artificial agent to learn to solve them through trial and error. Our results show that when given the choice, the agent leveraged the appropriate reference frame to solve the task, but surprisingly could also use the other reference frame when constrained to do so. We also show that the representations that emerge to enable the agent to solve the tasks exist on a spectrum, and are more complex than commonly thought. These representations reflect both the task and reference frame being used, and provide useful insights for the design of experimental tasks to study the use of navigational strategies.
Collapse
Affiliation(s)
| | - Sen Cheng
- Faculty of Computer Science, Ruhr University Bochum, Bochum, Germany
| |
Collapse
|
7
|
Qiao H, Chen J, Huang X. A Survey of Brain-Inspired Intelligent Robots: Integration of Vision, Decision, Motion Control, and Musculoskeletal Systems. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:11267-11280. [PMID: 33909584 DOI: 10.1109/tcyb.2021.3071312] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Current robotic studies are focused on the performance of specific tasks. However, such tasks cannot be generalized, and some special tasks, such as compliant and precise manipulation, fast and flexible response, and deep collaboration between humans and robots, cannot be realized. Brain-inspired intelligent robots imitate humans and animals, from inner mechanisms to external structures, through an integration of visual cognition, decision making, motion control, and musculoskeletal systems. This kind of robot is more likely to realize the functions that current robots cannot realize and become human friends. With the focus on the development of brain-inspired intelligent robots, this article reviews cutting-edge research in the areas of brain-inspired visual cognition, decision making, musculoskeletal robots, motion control, and their integration. It aims to provide greater insight into brain-inspired intelligent robots and attracts more attention to this field from the global research community.
Collapse
|
8
|
A Brain-Inspired Model of Hippocampal Spatial Cognition Based on a Memory-Replay Mechanism. Brain Sci 2022; 12:brainsci12091176. [PMID: 36138911 PMCID: PMC9496859 DOI: 10.3390/brainsci12091176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 08/13/2022] [Accepted: 08/19/2022] [Indexed: 11/17/2022] Open
Abstract
Since the hippocampus plays an important role in memory and spatial cognition, the study of spatial computation models inspired by the hippocampus has attracted much attention. This study relies mainly on reward signals for learning environments and planning paths. As reward signals in a complex or large-scale environment attenuate sharply, the spatial cognition and path planning performance of such models will decrease clearly as a result. Aiming to solve this problem, we present a brain-inspired mechanism, a Memory-Replay Mechanism, that is inspired by the reactivation function of place cells in the hippocampus. We classify the path memory according to the reward information and find the overlapping place cells in different categories of path memory to segment and reconstruct the memory to form a “virtual path”, replaying the memory by associating the reward information. We conducted a series of navigation experiments in a simple environment called a Morris water maze (MWM) and in a complex environment, and we compared our model with a reinforcement learning model and other brain-inspired models. The experimental results show that under the same conditions, our model has a higher rate of environmental exploration and more stable signal transmission, and the average reward obtained under stable conditions was 14.12% higher than RL with random-experience replay. Our model also shows good performance in complex maze environments where signals are easily attenuated. Moreover, the performance of our model at bifurcations is consistent with neurophysiological studies.
Collapse
|
9
|
Suzuki M, Nishimura Y. The ventral striatum contributes to the activity of the motor cortex and motor outputs in monkeys. Front Syst Neurosci 2022; 16:979272. [PMID: 36211590 PMCID: PMC9540202 DOI: 10.3389/fnsys.2022.979272] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Accepted: 08/10/2022] [Indexed: 11/13/2022] Open
Abstract
The ventral striatum (VSt) is thought to be involved in the vigor of motivated behavior and is suggested to be a limbic-motor interface between limbic areas involved in motivational processes and neural circuits regulating behavioral outputs. However, there is little direct evidence demonstrating the involvement of the VSt in motor control for motivated behaviors. To clarify the functional role of the VSt in motor control, we investigated the effect of reversible pharmacological inactivation of the VSt on the oscillatory activity of the sensorimotor cortices and motor outputs in two macaque monkeys. VSt inactivation reduced movement-related activities of the primary motor cortex and premotor area at 15–120 Hz and increased those at 5–7 Hz. These changes were accompanied by reduced torque outputs but had no effect on the correct performance rate. The present study provides direct evidence that the VSt regulates activities of the motor cortices and motor output.
Collapse
Affiliation(s)
- Michiaki Suzuki
- Division of Behavioral Development, Department of Developmental Physiology, National Institute for Physiological Sciences, Okazaki, Japan
- Department of Physiological Sciences, School of Life Science, SOKENDAI, Hayama, Japan
- Department of Neuroscience, Graduate School of Medicine, Kyoto University, Kyoto, Japan
- Japan Society for the Promotion of Science, Tokyo, Japan
- Neural Prosthetics Project, Department of Brain and Neurosciences, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan
| | - Yukio Nishimura
- Division of Behavioral Development, Department of Developmental Physiology, National Institute for Physiological Sciences, Okazaki, Japan
- Department of Physiological Sciences, School of Life Science, SOKENDAI, Hayama, Japan
- Department of Neuroscience, Graduate School of Medicine, Kyoto University, Kyoto, Japan
- Neural Prosthetics Project, Department of Brain and Neurosciences, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan
- *Correspondence: Yukio Nishimura
| |
Collapse
|
10
|
Massi E, Barthélemy J, Mailly J, Dromnelle R, Canitrot J, Poniatowski E, Girard B, Khamassi M. Model-Based and Model-Free Replay Mechanisms for Reinforcement Learning in Neurorobotics. Front Neurorobot 2022; 16:864380. [PMID: 35812782 PMCID: PMC9263850 DOI: 10.3389/fnbot.2022.864380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 05/05/2022] [Indexed: 11/22/2022] Open
Abstract
Experience replay is widely used in AI to bootstrap reinforcement learning (RL) by enabling an agent to remember and reuse past experiences. Classical techniques include shuffled-, reversed-ordered- and prioritized-memory buffers, which have different properties and advantages depending on the nature of the data and problem. Interestingly, recent computational neuroscience work has shown that these techniques are relevant to model hippocampal reactivations recorded during rodent navigation. Nevertheless, the brain mechanisms for orchestrating hippocampal replay are still unclear. In this paper, we present recent neurorobotics research aiming to endow a navigating robot with a neuro-inspired RL architecture (including different learning strategies, such as model-based (MB) and model-free (MF), and different replay techniques). We illustrate through a series of numerical simulations how the specificities of robotic experimentation (e.g., autonomous state decomposition by the robot, noisy perception, state transition uncertainty, non-stationarity) can shed new lights on which replay techniques turn out to be more efficient in different situations. Finally, we close the loop by raising new hypotheses for neuroscience from such robotic models of hippocampal replay.
Collapse
|
11
|
Gmaz JM, van der Meer MAA. Context coding in the mouse nucleus accumbens modulates motivationally relevant information. PLoS Biol 2022; 20:e3001338. [PMID: 35486662 PMCID: PMC9094556 DOI: 10.1371/journal.pbio.3001338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Revised: 05/11/2022] [Accepted: 04/04/2022] [Indexed: 11/18/2022] Open
Abstract
Neural activity in the nucleus accumbens (NAc) is thought to track fundamentally value-centric quantities linked to reward and effort. However, the NAc also contributes to flexible behavior in ways that are difficult to explain based on value signals alone, raising the question of if and how nonvalue signals are encoded in NAc. We recorded NAc neural ensembles while head-fixed mice performed an odor-based biconditional discrimination task where an initial discrete cue modulated the behavioral significance of a subsequently presented reward-predictive cue. We extracted single-unit and population-level correlates related to the cues and found value-independent coding for the initial, context-setting cue. This context signal occupied a population-level coding space orthogonal to outcome-related representations and was predictive of subsequent behaviorally relevant responses to the reward-predictive cues. Together, these findings support a gating model for how the NAc contributes to behavioral flexibility and provide a novel population-level perspective from which to view NAc computations.
Collapse
Affiliation(s)
- Jimmie M. Gmaz
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, United States of America
| | - Matthijs A. A. van der Meer
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, United States of America
- * E-mail:
| |
Collapse
|
12
|
Feng Z, Nagase AM, Morita K. A Reinforcement Learning Approach to Understanding Procrastination: Does Inaccurate Value Approximation Cause Irrational Postponing of a Task? Front Neurosci 2021; 15:660595. [PMID: 34602962 PMCID: PMC8481628 DOI: 10.3389/fnins.2021.660595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 08/16/2021] [Indexed: 11/27/2022] Open
Abstract
Procrastination is the voluntary but irrational postponing of a task despite being aware that the delay can lead to worse consequences. It has been extensively studied in psychological field, from contributing factors, to theoretical models. From value-based decision making and reinforcement learning (RL) perspective, procrastination has been suggested to be caused by non-optimal choice resulting from cognitive limitations. Exactly what sort of cognitive limitations are involved, however, remains elusive. In the current study, we examined if a particular type of cognitive limitation, namely, inaccurate valuation resulting from inadequate state representation, would cause procrastination. Recent work has suggested that humans may adopt a particular type of state representation called the successor representation (SR) and that humans can learn to represent states by relatively low-dimensional features. Combining these suggestions, we assumed a dimension-reduced version of SR. We modeled a series of behaviors of a "student" doing assignments during the school term, when putting off doing the assignments (i.e., procrastination) is not allowed, and during the vacation, when whether to procrastinate or not can be freely chosen. We assumed that the "student" had acquired a rigid reduced SR of each state, corresponding to each step in completing an assignment, under the policy without procrastination. The "student" learned the approximated value of each state which was computed as a linear function of features of the states in the rigid reduced SR, through temporal-difference (TD) learning. During the vacation, the "student" made decisions at each time-step whether to procrastinate based on these approximated values. Simulation results showed that the reduced SR-based RL model generated procrastination behavior, which worsened across episodes. According to the values approximated by the "student," to procrastinate was the better choice, whereas not to procrastinate was mostly better according to the true values. Thus, the current model generated procrastination behavior caused by inaccurate value approximation, which resulted from the adoption of the reduced SR as state representation. These findings indicate that the reduced SR, or more generally, the dimension reduction in state representation, can be a potential form of cognitive limitation that leads to procrastination.
Collapse
Affiliation(s)
- Zheyu Feng
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan
| | - Asako Mitsuto Nagase
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan
- Division of Neurology, Department of Brain and Neurosciences, Faculty of Medicine, Tottori University, Yonago, Japan
- Research Fellowship for Young Scientists, Japan Society for the Promotion of Science, Tokyo, Japan
- Department of Neurology, Faculty of Medicine, Shimane University, Izumo, Japan
| | - Kenji Morita
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan
- International Research Center for Neurointelligence (WPI-IRCN), The University of Tokyo, Tokyo, Japan
| |
Collapse
|
13
|
Wittkuhn L, Chien S, Hall-McMaster S, Schuck NW. Replay in minds and machines. Neurosci Biobehav Rev 2021; 129:367-388. [PMID: 34371078 DOI: 10.1016/j.neubiorev.2021.08.002] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 07/19/2021] [Accepted: 08/01/2021] [Indexed: 11/19/2022]
Abstract
Experience-related brain activity patterns reactivate during sleep, wakeful rest, and brief pauses from active behavior. In parallel, machine learning research has found that experience replay can lead to substantial performance improvements in artificial agents. Together, these lines of research suggest replay has a variety of computational benefits for decision-making and learning. Here, we provide an overview of putative computational functions of replay as suggested by machine learning and neuroscientific research. We show that replay can lead to faster learning, less forgetting, reorganization or augmentation of experiences, and support planning and generalization. In addition, we highlight the benefits of reactivating abstracted internal representations rather than veridical memories, and discuss how replay could provide a mechanism to build internal representations that improve learning and decision-making.
Collapse
Affiliation(s)
- Lennart Wittkuhn
- Max Planck Research Group NeuroCode, Max Planck Institute for Human Development, Lentzeallee 94, D-14195 Berlin, Germany; Max Planck UCL Centre for Computational Psychiatry and Ageing Research, Lentzeallee 94, D-14195 Berlin, Germany.
| | - Samson Chien
- Max Planck Research Group NeuroCode, Max Planck Institute for Human Development, Lentzeallee 94, D-14195 Berlin, Germany; Max Planck UCL Centre for Computational Psychiatry and Ageing Research, Lentzeallee 94, D-14195 Berlin, Germany
| | - Sam Hall-McMaster
- Max Planck Research Group NeuroCode, Max Planck Institute for Human Development, Lentzeallee 94, D-14195 Berlin, Germany; Max Planck UCL Centre for Computational Psychiatry and Ageing Research, Lentzeallee 94, D-14195 Berlin, Germany
| | - Nicolas W Schuck
- Max Planck Research Group NeuroCode, Max Planck Institute for Human Development, Lentzeallee 94, D-14195 Berlin, Germany; Max Planck UCL Centre for Computational Psychiatry and Ageing Research, Lentzeallee 94, D-14195 Berlin, Germany.
| |
Collapse
|
14
|
Humphries MD, Gurney K. Making decisions in the dark basement of the brain: A look back at the GPR model of action selection and the basal ganglia. BIOLOGICAL CYBERNETICS 2021; 115:323-329. [PMID: 34272969 DOI: 10.1007/s00422-021-00887-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 07/06/2021] [Indexed: 06/13/2023]
Abstract
How does your brain decide what you will do next? Over the past few decades compelling evidence has emerged that the basal ganglia, a collection of nuclei in the fore- and mid-brain of all vertebrates, are vital to action selection. Gurney, Prescott, and Redgrave published an influential computational account of this idea in Biological Cybernetics in 2001. Here we take a look back at this pair of papers, outlining the "GPR" model contained therein, the context of that model's development, and the influence it has had over the past twenty years. Tracing its lineage into models and theories still emerging now, we are encouraged that the GPR model is that rare thing, a computational model of a brain circuit whose advances were directly built on by others.
Collapse
|
15
|
Abstract
Blindsight is the residual visuo-motor ability without subjective awareness observed after lesions of the primary visual cortex (V1). Various visual functions are retained, however, instrumental visual associative learning remains to be investigated. Here we examined the secondary reinforcing properties of visual cues presented to the hemianopic field of macaque monkeys with unilateral V1 lesions. Our aim was to test the potential role of visual pathways bypassing V1 in reinforcing visual instrumental learning. When learning the location of a hidden area in an oculomotor search task, conditioned visual cues presented to the lesion-affected hemifield operated as an effective secondary reinforcer. We noted that not only the hidden area location, but also the vector of the saccade entering the target area was reinforced. Importantly, when the visual reinforcement signal was presented in the lesion-affected field, the monkeys continued searching, as opposed to stopping when the cue was presented in the intact field. This suggests the monkeys were less confident that the target location had been discovered when the reinforcement cue was presented in the affected field. These results indicate that the visual signals mediated by the residual visual pathways after V1 lesions can access fundamental reinforcement mechanisms but with impaired visual awareness.
Collapse
|
16
|
Parkin regulates drug-taking behavior in rat model of methamphetamine use disorder. Transl Psychiatry 2021; 11:293. [PMID: 34001858 PMCID: PMC8129108 DOI: 10.1038/s41398-021-01387-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 03/25/2021] [Accepted: 04/14/2021] [Indexed: 01/02/2023] Open
Abstract
There is no FDA-approved medication for methamphetamine (METH) use disorder. New therapeutic approaches are needed, especially for people who use METH heavily and are at high risk for overdose. This study used genetically engineered rats to evaluate PARKIN as a potential target for METH use disorder. PARKIN knockout, PARKIN-overexpressing, and wild-type young adult male Long Evans rats were trained to self-administer high doses of METH using an extended-access METH self-administration paradigm. Reinforcing/rewarding properties of METH were assessed by quantifying drug-taking behavior and time spent in a METH-paired environment. PARKIN knockout rats self-administered more METH and spent more time in the METH-paired environment than wild-type rats. Wild-type rats overexpressing PARKIN self-administered less METH and spent less time in the METH-paired environment. PARKIN knockout rats overexpressing PARKIN self-administered less METH during the first half of drug self-administration days than PARKIN-deficient rats. The results indicate that rats with PARKIN excess or PARKIN deficit are useful models for studying neural substrates underlying "resilience" or vulnerability to METH use disorder and identify PARKIN as a novel potential drug target to treat heavy use of METH.
Collapse
|
17
|
Huang X, Wu W, Qiao H. Computational Modeling of Emotion-Motivated Decisions for Continuous Control of Mobile Robots. IEEE Trans Cogn Dev Syst 2021. [DOI: 10.1109/tcds.2019.2963545] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
18
|
Neural Mechanisms of Human Decision-Making. COGNITIVE AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2021; 21:35-57. [PMID: 33409958 DOI: 10.3758/s13415-020-00842-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 09/28/2020] [Indexed: 11/08/2022]
Abstract
We present a theory and neural network model of the neural mechanisms underlying human decision-making. We propose a detailed model of the interaction between brain regions, under a proposer-predictor-actor-critic framework. This theory is based on detailed animal data and theories of action-selection. Those theories are adapted to serial operation to bridge levels of analysis and explain human decision-making. Task-relevant areas of cortex propose a candidate plan using fast, model-free, parallel neural computations. Other areas of cortex and medial temporal lobe can then predict likely outcomes of that plan in this situation. This optional prediction- (or model-) based computation can produce better accuracy and generalization, at the expense of speed. Next, linked regions of basal ganglia act to accept or reject the proposed plan based on its reward history in similar contexts. If that plan is rejected, the process repeats to consider a new option. The reward-prediction system acts as a critic to determine the value of the outcome relative to expectations and produce dopamine as a training signal for cortex and basal ganglia. By operating sequentially and hierarchically, the same mechanisms previously proposed for animal action-selection could explain the most complex human plans and decisions. We discuss explanations of model-based decisions, habitization, and risky behavior based on the computational model.
Collapse
|
19
|
Tessereau C, O’Dea R, Coombes S, Bast T. Reinforcement learning approaches to hippocampus-dependent flexible spatial navigation. Brain Neurosci Adv 2021; 5:2398212820975634. [PMID: 33954259 PMCID: PMC8042550 DOI: 10.1177/2398212820975634] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Accepted: 10/21/2020] [Indexed: 11/17/2022] Open
Abstract
Humans and non-human animals show great flexibility in spatial navigation, including the ability to return to specific locations based on as few as one single experience. To study spatial navigation in the laboratory, watermaze tasks, in which rats have to find a hidden platform in a pool of cloudy water surrounded by spatial cues, have long been used. Analogous tasks have been developed for human participants using virtual environments. Spatial learning in the watermaze is facilitated by the hippocampus. In particular, rapid, one-trial, allocentric place learning, as measured in the delayed-matching-to-place variant of the watermaze task, which requires rodents to learn repeatedly new locations in a familiar environment, is hippocampal dependent. In this article, we review some computational principles, embedded within a reinforcement learning framework, that utilise hippocampal spatial representations for navigation in watermaze tasks. We consider which key elements underlie their efficacy, and discuss their limitations in accounting for hippocampus-dependent navigation, both in terms of behavioural performance (i.e. how well do they reproduce behavioural measures of rapid place learning) and neurobiological realism (i.e. how well do they map to neurobiological substrates involved in rapid place learning). We discuss how an actor-critic architecture, enabling simultaneous assessment of the value of the current location and of the optimal direction to follow, can reproduce one-trial place learning performance as shown on watermaze and virtual delayed-matching-to-place tasks by rats and humans, respectively, if complemented with map-like place representations. The contribution of actor-critic mechanisms to delayed-matching-to-place performance is consistent with neurobiological findings implicating the striatum and hippocampo-striatal interaction in delayed-matching-to-place performance, given that the striatum has been associated with actor-critic mechanisms. Moreover, we illustrate that hierarchical computations embedded within an actor-critic architecture may help to account for aspects of flexible spatial navigation. The hierarchical reinforcement learning approach separates trajectory control via a temporal-difference error from goal selection via a goal prediction error and may account for flexible, trial-specific, navigation to familiar goal locations, as required in some arm-maze place memory tasks, although it does not capture one-trial learning of new goal locations, as observed in open field, including watermaze and virtual, delayed-matching-to-place tasks. Future models of one-shot learning of new goal locations, as observed on delayed-matching-to-place tasks, should incorporate hippocampal plasticity mechanisms that integrate new goal information with allocentric place representation, as such mechanisms are supported by substantial empirical evidence.
Collapse
Affiliation(s)
- Charline Tessereau
- School of Mathematical Sciences, University of Nottingham, Nottingham, UK
- School of Psychology, University of Nottingham, Nottingham, UK
- Neuroscience@Nottingham
| | - Reuben O’Dea
- School of Mathematical Sciences, University of Nottingham, Nottingham, UK
- Neuroscience@Nottingham
| | - Stephen Coombes
- School of Mathematical Sciences, University of Nottingham, Nottingham, UK
- Neuroscience@Nottingham
| | - Tobias Bast
- School of Psychology, University of Nottingham, Nottingham, UK
- Neuroscience@Nottingham
| |
Collapse
|
20
|
Bermudez-Contreras E, Clark BJ, Wilber A. The Neuroscience of Spatial Navigation and the Relationship to Artificial Intelligence. Front Comput Neurosci 2020; 14:63. [PMID: 32848684 PMCID: PMC7399088 DOI: 10.3389/fncom.2020.00063] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Accepted: 05/28/2020] [Indexed: 11/13/2022] Open
Abstract
Recent advances in artificial intelligence (AI) and neuroscience are impressive. In AI, this includes the development of computer programs that can beat a grandmaster at GO or outperform human radiologists at cancer detection. A great deal of these technological developments are directly related to progress in artificial neural networks-initially inspired by our knowledge about how the brain carries out computation. In parallel, neuroscience has also experienced significant advances in understanding the brain. For example, in the field of spatial navigation, knowledge about the mechanisms and brain regions involved in neural computations of cognitive maps-an internal representation of space-recently received the Nobel Prize in medicine. Much of the recent progress in neuroscience has partly been due to the development of technology used to record from very large populations of neurons in multiple regions of the brain with exquisite temporal and spatial resolution in behaving animals. With the advent of the vast quantities of data that these techniques allow us to collect there has been an increased interest in the intersection between AI and neuroscience, many of these intersections involve using AI as a novel tool to explore and analyze these large data sets. However, given the common initial motivation point-to understand the brain-these disciplines could be more strongly linked. Currently much of this potential synergy is not being realized. We propose that spatial navigation is an excellent area in which these two disciplines can converge to help advance what we know about the brain. In this review, we first summarize progress in the neuroscience of spatial navigation and reinforcement learning. We then turn our attention to discuss how spatial navigation has been modeled using descriptive, mechanistic, and normative approaches and the use of AI in such models. Next, we discuss how AI can advance neuroscience, how neuroscience can advance AI, and the limitations of these approaches. We finally conclude by highlighting promising lines of research in which spatial navigation can be the point of intersection between neuroscience and AI and how this can contribute to the advancement of the understanding of intelligent behavior.
Collapse
Affiliation(s)
| | - Benjamin J. Clark
- Department of Psychology, University of New Mexico, Albuquerque, NM, United States
| | - Aaron Wilber
- Department of Psychology, Program in Neuroscience, Florida State University, Tallahassee, FL, United States
| |
Collapse
|
21
|
Khamassi M, Girard B. Modeling awake hippocampal reactivations with model-based bidirectional search. BIOLOGICAL CYBERNETICS 2020; 114:231-248. [PMID: 32065253 DOI: 10.1007/s00422-020-00817-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Accepted: 01/21/2020] [Indexed: 06/10/2023]
Abstract
Hippocampal offline reactivations during reward-based learning, usually categorized as replay events, have been found to be important for performance improvement over time and for memory consolidation. Recent computational work has linked these phenomena to the need to transform reward information into state-action values for decision making and to propagate it to all relevant states of the environment. Nevertheless, it is still unclear whether an integrated reinforcement learning mechanism could account for the variety of awake hippocampal reactivations, including variety in order (forward and reverse reactivated trajectories) and variety in the location where they occur (reward site or decision-point). Here, we present a model-based bidirectional search model which accounts for a variety of hippocampal reactivations. The model combines forward trajectory sampling from current position and backward sampling through prioritized sweeping from states associated with large reward prediction errors until the two trajectories connect. This is repeated until stabilization of state-action values (convergence), which could explain why hippocampal reactivations drastically diminish when the animal's performance stabilizes. Simulations in a multiple T-maze task show that forward reactivations are prominently found at decision-points while backward reactivations are exclusively generated at reward sites. Finally, the model can generate imaginary trajectories that are not allowed to the agent during task performance. We raise some experimental predictions and implications for future studies of the role of the hippocampo-prefronto-striatal network in learning.
Collapse
Affiliation(s)
- Mehdi Khamassi
- Institute of Intelligent Systems and Robotics (ISIR), Sorbonne Université and CNRS (Centre National de la Recherche Scientifique), 75005, Paris, France.
| | - Benoît Girard
- Institute of Intelligent Systems and Robotics (ISIR), Sorbonne Université and CNRS (Centre National de la Recherche Scientifique), 75005, Paris, France
| |
Collapse
|
22
|
CB 1 Activity Drives the Selection of Navigational Strategies: A Behavioral and c-Fos Immunoreactivity Study. Int J Mol Sci 2020; 21:ijms21031072. [PMID: 32041135 PMCID: PMC7036945 DOI: 10.3390/ijms21031072] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Revised: 01/29/2020] [Accepted: 01/31/2020] [Indexed: 11/26/2022] Open
Abstract
To promote efficient explorative behaviors, subjects adaptively select spatial navigational strategies based on landmarks or a cognitive map. The hippocampus works alone or in conjunction with the dorsal striatum, both representing the neuronal underpinnings of the navigational strategies organized on the basis of different systems of spatial coordinate integration. The high expression of cannabinoid type 1 (CB1) receptors in structures related to spatial learning—such as the hippocampus, dorsal striatum and amygdala—renders the endocannabinoid system a critical target to study the balance between landmark- and cognitive map-based navigational strategies. In the present study, mice treated with the CB1-inverse agonist/antagonist AM251 or vehicle were trained on a Circular Hole Board, a task that could be solved through either navigational strategy. At the end of the behavioral testing, c-Fos immunoreactivity was evaluated in specific nuclei of the hippocampus, dorsal striatum and amygdala. AM251 treatment impaired spatial learning and modified the pattern of the performed navigational strategies as well as the c-Fos immunoreactivity in the hippocampus, dorsal striatum and amygdala. The present findings shed light on the involvement of CB1 receptors as part of the selection system of the navigational strategies implemented to efficiently solve the spatial problem.
Collapse
|
23
|
The Nucleus Accumbens Core Is Necessary for Responding to Incentive But Not Instructive Stimuli. J Neurosci 2019; 40:1332-1343. [PMID: 31862857 DOI: 10.1523/jneurosci.0194-19.2019] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2019] [Revised: 12/11/2019] [Accepted: 12/12/2019] [Indexed: 11/21/2022] Open
Abstract
An abundant literature has highlighted the importance of the nucleus accumbens core (NAcC) in behavioral tasks dependent on external stimuli. Yet, some studies have also reported the absence of involvement of the NAcC in stimuli processing. We aimed at comparing, in male rats, the underlying neuronal determinants of incentive and instructive stimuli in the same task. We developed a variant of a GO/NOGO task that reveals important differences in these two types of stimuli. The incentive stimulus invites the rat to engage in the task sequence. Once the rat has decided to initiate a trial, it remains engaged in the task until the end of the trial. This task revealed the differential contribution of the NAcC to responding to different types of stimuli: responding to the incentive stimulus depended on NAcC AMPA/NMDA and dopamine D1 receptors, but the retrieval of the response associated with the instructive stimuli (lever pressing on GO, withholding on NOGO) did not. Our electrophysiological study showed that more NAcC neurons responded more strongly to the incentive than the instructive stimuli. Furthermore, when animals did not respond to the incentive stimulus, the induced excitation was suppressed for most projection neurons, whereas interneurons were strongly activated at a latency preceding that found in projection neurons. This work provides insight on the underlying neuronal processes explaining the preferential implication of the NAcC in deciding whether and when to engage in reward-seeking rather than to decide which action to perform.SIGNIFICANCE STATEMENT The nucleus accumbens core (NAcC) is essential to process information carried by reward-predicting stimuli. Yet, stimuli have distinct properties: incentive stimuli orient the attention toward reward-seeking, whereas instructive stimuli inform about the action to perform. Our study shows that, in male rats, NAcC perturbation with glutamate or dopamine antagonists impeded responses to the incentive but not to the instructive stimulus. NAcC neuronal recordings revealed a stronger representation of incentive than instructive stimuli. Furthermore, we found that interneurons are recruited when rats fail to respond to incentive stimuli. This work provides insight on the underlying neuronal processes explaining the preferential implication of the NAcC in deciding whether and when to engage in reward-seeking rather than to decide which action to perform.
Collapse
|
24
|
Kwak S, Jung MW. Distinct roles of striatal direct and indirect pathways in value-based decision making. eLife 2019; 8:46050. [PMID: 31310237 PMCID: PMC6658164 DOI: 10.7554/elife.46050] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 07/09/2019] [Indexed: 12/12/2022] Open
Abstract
The striatum is critically involved in value-based decision making. However, it is unclear how striatal direct and indirect pathways work together to make optimal choices in a dynamic and uncertain environment. Here, we examined the effects of selectively inactivating D1 receptor (D1R)- or D2 receptor (D2R)-expressing dorsal striatal neurons (corresponding to direct- and indirect-pathway neurons, respectively) on mouse choice behavior in a reversal task with progressively increasing reversal frequency and a dynamic two-armed bandit task. Inactivation of either D1R- or D2R-expressing striatal neurons impaired performance in both tasks, but the pattern of altered choice behavior differed between the two animal groups. A reinforcement learning model-based analysis indicated that inactivation of D1R- and D2R-expressing striatal neurons selectively impairs value-dependent action selection and value learning, respectively. Our results suggest differential contributions of striatal direct and indirect pathways to two distinct steps in value-based decision making.
Collapse
Affiliation(s)
- Shinae Kwak
- Center for Synaptic Brain Dysfunctions, Institute for Basic Science, Daejeon, Republic of Korea.,Department of Biological Sciences, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - Min Whan Jung
- Center for Synaptic Brain Dysfunctions, Institute for Basic Science, Daejeon, Republic of Korea.,Department of Biological Sciences, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| |
Collapse
|
25
|
Cazé R, Khamassi M, Aubin L, Girard B. Hippocampal replays under the scrutiny of reinforcement learning models. J Neurophysiol 2018; 120:2877-2896. [DOI: 10.1152/jn.00145.2018] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Multiple in vivo studies have shown that place cells from the hippocampus replay previously experienced trajectories. These replays are commonly considered to mainly reflect memory consolidation processes. Some data, however, have highlighted a functional link between replays and reinforcement learning (RL). This theory, extensively used in machine learning, has introduced efficient algorithms and can explain various behavioral and physiological measures from different brain regions. RL algorithms could constitute a mechanistic description of replays and explain how replays can reduce the number of iterations required to explore the environment during learning. We review the main findings concerning the different hippocampal replay types and the possible associated RL models (either model-based, model-free, or hybrid model types). We conclude by tying these frameworks together. We illustrate the link between data and RL through a series of model simulations. This review, at the frontier between informatics and biology, paves the way for future work on replays.
Collapse
Affiliation(s)
- Romain Cazé
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, Paris, France
| | - Mehdi Khamassi
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, Paris, France
| | - Lise Aubin
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, Paris, France
| | - Benoît Girard
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, Paris, France
| |
Collapse
|
26
|
Gmaz JM, Carmichael JE, van der Meer MA. Persistent coding of outcome-predictive cue features in the rat nucleus accumbens. eLife 2018; 7:37275. [PMID: 30234485 PMCID: PMC6195350 DOI: 10.7554/elife.37275] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Accepted: 09/15/2018] [Indexed: 01/09/2023] Open
Abstract
The nucleus accumbens (NAc) is important for learning from feedback, and for biasing and invigorating behaviour in response to cues that predict motivationally relevant outcomes. NAc encodes outcome-related cue features such as the magnitude and identity of reward. However, little is known about how features of cues themselves are encoded. We designed a decision making task where rats learned multiple sets of outcome-predictive cues, and recorded single-unit activity in the NAc during performance. We found that coding of cue identity and location occurred alongside coding of expected outcome. Furthermore, this coding persisted both during a delay period, after the rat made a decision and was waiting for an outcome, and after the outcome was revealed. Encoding of cue features in the NAc may enable contextual modulation of on-going behaviour, and provide an eligibility trace of outcome-predictive stimuli for updating stimulus-outcome associations to inform future behaviour.
Collapse
Affiliation(s)
- Jimmie M Gmaz
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, United States
| | - James E Carmichael
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, United States
| | | |
Collapse
|
27
|
Boraud T, Leblois A, Rougier NP. A natural history of skills. Prog Neurobiol 2018; 171:114-124. [PMID: 30171867 DOI: 10.1016/j.pneurobio.2018.08.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Revised: 07/19/2018] [Accepted: 08/21/2018] [Indexed: 10/28/2022]
Abstract
The dorsal pallium (a.k.a. cortex in mammals) makes a loop circuit with the basal ganglia and the thalamus known to control and adapt behavior but the who's who of the functional roles of these structures is still debated. Influenced by the Triune brain theory that was proposed in the early sixties, many current theories propose a hierarchical organization on the top of which stands the cortex to which the subcortical structures are subordinated. In particular, habits formation has been proposed to reflect a switch from conscious on-line control of behavior by the cortex, to a fully automated subcortical control. In this review, we propose to revalue the function of the network in light of the current experimental evidence concerning the anatomy and physiology of the basal ganglia-cortical circuits in vertebrates. We briefly review the current theories and show that they could be encompassed in a broader framework of skill learning and performance. Then, after reminding the state of the art concerning the anatomical architecture of the network and the underlying dynamic processes, we summarize the evolution of the anatomical and physiological substrate of skill learning and performance among vertebrates. We then review experimental evidence supporting for the hypothesis that the development of automatized skills relies on the BG teaching cortical circuits and is actually a late feature linked with the development of a specialized cortex or pallium that evolved in parallel in different taxa. We finally propose a minimal computational framework where this hypothesis can be explicitly implemented and tested.
Collapse
Affiliation(s)
- Thomas Boraud
- CNRS, UMR 5293, IMN, 33000 Bordeaux, France; University of Bordeaux, UMR 5293, IMN, 33000 Bordeaux, France; CNRS, French-Israeli Neuroscience Lab, 33000 Bordeaux, France; CHU de Bordeaux, IMN Clinique, 33000 Bordeaux, France.
| | - Arthur Leblois
- CNRS, UMR 5293, IMN, 33000 Bordeaux, France; University of Bordeaux, UMR 5293, IMN, 33000 Bordeaux, France; CNRS, French-Israeli Neuroscience Lab, 33000 Bordeaux, France
| | - Nicolas P Rougier
- University of Bordeaux, UMR 5293, IMN, 33000 Bordeaux, France; INRIA Bordeaux Sud-Ouest, 33405 Talence, France; LaBRI, University of Bordeaux, IPB, CNRS, UMR 5800, 33405 Talence, France
| |
Collapse
|
28
|
Herweg NA, Kahana MJ. Spatial Representations in the Human Brain. Front Hum Neurosci 2018; 12:297. [PMID: 30104966 PMCID: PMC6078001 DOI: 10.3389/fnhum.2018.00297] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Accepted: 07/06/2018] [Indexed: 11/13/2022] Open
Abstract
While extensive research on the neurophysiology of spatial memory has been carried out in rodents, memory research in humans had traditionally focused on more abstract, language-based tasks. Recent studies have begun to address this gap using virtual navigation tasks in combination with electrophysiological recordings in humans. These studies suggest that the human medial temporal lobe (MTL) is equipped with a population of place and grid cells similar to that previously observed in the rodent brain. Furthermore, theta oscillations have been linked to spatial navigation and, more specifically, to the encoding and retrieval of spatial information. While some studies suggest a single navigational theta rhythm which is of lower frequency in humans than rodents, other studies advocate for the existence of two functionally distinct delta-theta frequency bands involved in both spatial and episodic memory. Despite the general consensus between rodent and human electrophysiology, behavioral work in humans does not unequivocally support the use of a metric Euclidean map for navigation. Formal models of navigational behavior, which specifically consider the spatial scale of the environment and complementary learning mechanisms, may help to better understand different navigational strategies and their neurophysiological mechanisms. Finally, the functional overlap of spatial and declarative memory in the MTL calls for a unified theory of MTL function. Such a theory will critically rely upon linking task-related phenomena at multiple temporal and spatial scales. Understanding how single cell responses relate to ongoing theta oscillations during both the encoding and retrieval of spatial and non-spatial associations appears to be key toward developing a more mechanistic understanding of memory processes in the MTL.
Collapse
Affiliation(s)
- Nora A. Herweg
- Computational Memory Lab, Department of Psychology, University of Pennsylvania, Philadelphia, PA, United States
| | - Michael J. Kahana
- Computational Memory Lab, Department of Psychology, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
29
|
Goodroe SC, Starnes J, Brown TI. The Complex Nature of Hippocampal-Striatal Interactions in Spatial Navigation. Front Hum Neurosci 2018; 12:250. [PMID: 29977198 PMCID: PMC6021746 DOI: 10.3389/fnhum.2018.00250] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 05/30/2018] [Indexed: 12/15/2022] Open
Abstract
Decades of research have established the importance of the hippocampus for episodic and spatial memory. In spatial navigation tasks, the role of the hippocampus has been classically juxtaposed with the role of the dorsal striatum, the latter of which has been characterized as a system important for implementing stimulus-response and action-outcome associations. In many neuroimaging paradigms, this has been explored through contrasting way finding and route-following behavior. The distinction between the contributions of the hippocampus and striatum to spatial navigation has been supported by extensive literature. Convergent research has also underscored the fact that these different memory systems can interact in dynamic ways and contribute to a broad range of navigational scenarios. For example, although familiar routes may often be navigable based on stimulus-response associations, hippocampal episodic memory mechanisms can also contribute to egocentric route-oriented memory, enabling recall of context-dependent sequences of landmarks or the actions to be made at decision points. Additionally, the literature has stressed the importance of subdividing the striatum into functional gradients—with more ventral and medial components being important for the behavioral expression of hippocampal-dependent spatial memories. More research is needed to reveal how networks involving these regions process and respond to dynamic changes in memory and control demands over the course of navigational events. In this Perspective article, we suggest that a critical direction for navigation research is to further characterize how hippocampal and striatal subdivisions interact in different navigational contexts.
Collapse
Affiliation(s)
- Sarah C Goodroe
- School of Psychology, Georgia Institute of Technology, Atlanta, GA, United States
| | - Jon Starnes
- School of Psychology, Georgia Institute of Technology, Atlanta, GA, United States
| | - Thackery I Brown
- School of Psychology, Georgia Institute of Technology, Atlanta, GA, United States
| |
Collapse
|
30
|
Dollé L, Chavarriaga R, Guillot A, Khamassi M. Interactions of spatial strategies producing generalization gradient and blocking: A computational approach. PLoS Comput Biol 2018; 14:e1006092. [PMID: 29630600 PMCID: PMC5908205 DOI: 10.1371/journal.pcbi.1006092] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Revised: 04/19/2018] [Accepted: 03/15/2018] [Indexed: 12/16/2022] Open
Abstract
We present a computational model of spatial navigation comprising different learning mechanisms in mammals, i.e., associative, cognitive mapping and parallel systems. This model is able to reproduce a large number of experimental results in different variants of the Morris water maze task, including standard associative phenomena (spatial generalization gradient and blocking), as well as navigation based on cognitive mapping. Furthermore, we show that competitive and cooperative patterns between different navigation strategies in the model allow to explain previous apparently contradictory results supporting either associative or cognitive mechanisms for spatial learning. The key computational mechanism to reconcile experimental results showing different influences of distal and proximal cues on the behavior, different learning times, and different abilities of individuals to alternatively perform spatial and response strategies, relies in the dynamic coordination of navigation strategies, whose performance is evaluated online with a common currency through a modular approach. We provide a set of concrete experimental predictions to further test the computational model. Overall, this computational work sheds new light on inter-individual differences in navigation learning, and provides a formal and mechanistic approach to test various theories of spatial cognition in mammals.
Collapse
Affiliation(s)
- Laurent Dollé
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, F-75005 Paris, France
| | - Ricardo Chavarriaga
- Defitech Chair in Brain-Machine Interface, Center for Neuroprosthetics, Institute of Bioengineering and School of Engineering, EPFL, Geneva, Switzerland
| | - Agnès Guillot
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, F-75005 Paris, France
| | - Mehdi Khamassi
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, F-75005 Paris, France
| |
Collapse
|
31
|
A hippocampo-cerebellar centred network for the learning and execution of sequence-based navigation. Sci Rep 2017; 7:17812. [PMID: 29259243 PMCID: PMC5736633 DOI: 10.1038/s41598-017-18004-7] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2017] [Accepted: 12/05/2017] [Indexed: 12/29/2022] Open
Abstract
How do we translate self-motion into goal-directed actions? Here we investigate the cognitive architecture underlying self-motion processing during exploration and goal-directed behaviour. The task, performed in an environment with limited and ambiguous external landmarks, constrained mice to use self-motion based information for sequence-based navigation. The post-behavioural analysis combined brain network characterization based on c-Fos imaging and graph theory analysis as well as computational modelling of the learning process. The study revealed a widespread network centred around the cerebral cortex and basal ganglia during the exploration phase, while a network dominated by hippocampal and cerebellar activity appeared to sustain sequence-based navigation. The learning process could be modelled by an algorithm combining memory of past actions and model-free reinforcement learning, which parameters pointed toward a central role of hippocampal and cerebellar structures for learning to translate self-motion into a sequence of goal-directed actions.
Collapse
|
32
|
Russek EM, Momennejad I, Botvinick MM, Gershman SJ, Daw ND. Predictive representations can link model-based reinforcement learning to model-free mechanisms. PLoS Comput Biol 2017; 13:e1005768. [PMID: 28945743 PMCID: PMC5628940 DOI: 10.1371/journal.pcbi.1005768] [Citation(s) in RCA: 122] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Revised: 10/05/2017] [Accepted: 09/04/2017] [Indexed: 11/19/2022] Open
Abstract
Humans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown; however, multiple lines of evidence suggest that neural circuits supporting model-based behavior are structurally homologous to and overlapping with those thought to carry out model-free temporal difference (TD) learning. Here, we lay out a family of approaches by which model-based computation may be built upon a core of TD learning. The foundation of this framework is the successor representation, a predictive state representation that, when combined with TD learning of value predictions, can produce a subset of the behaviors associated with model-based learning, while requiring less decision-time computation than dynamic programming. Using simulations, we delineate the precise behavioral capabilities enabled by evaluating actions using this approach, and compare them to those demonstrated by biological organisms. We then introduce two new algorithms that build upon the successor representation while progressively mitigating its limitations. Because this framework can account for the full range of observed putatively model-based behaviors while still utilizing a core TD framework, we suggest that it represents a neurally plausible family of mechanisms for model-based evaluation.
Collapse
Affiliation(s)
- Evan M. Russek
- Center for Neural Science, New York University, New York, NY, United States of America
| | - Ida Momennejad
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ, United States of America
| | - Matthew M. Botvinick
- DeepMind, London, United Kingdom and Gatsby Computational Neuroscience Unit, University College London, United Kingdom
| | - Samuel J. Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA, United States of America
| | - Nathaniel D. Daw
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ, United States of America
| |
Collapse
|
33
|
Murata S, Yamashita Y, Arie H, Ogata T, Sugano S, Tani J. Learning to Perceive the World as Probabilistic or Deterministic via Interaction With Others: A Neuro-Robotics Experiment. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:830-848. [PMID: 26595928 DOI: 10.1109/tnnls.2015.2492140] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
We suggest that different behavior generation schemes, such as sensory reflex behavior and intentional proactive behavior, can be developed by a newly proposed dynamic neural network model, named stochastic multiple timescale recurrent neural network (S-MTRNN). The model learns to predict subsequent sensory inputs, generating both their means and their uncertainty levels in terms of variance (or inverse precision) by utilizing its multiple timescale property. This model was employed in robotics learning experiments in which one robot controlled by the S-MTRNN was required to interact with another robot under the condition of uncertainty about the other's behavior. The experimental results show that self-organized and sensory reflex behavior-based on probabilistic prediction-emerges when learning proceeds without a precise specification of initial conditions. In contrast, intentional proactive behavior with deterministic predictions emerges when precise initial conditions are available. The results also showed that, in situations where unanticipated behavior of the other robot was perceived, the behavioral context was revised adequately by adaptation of the internal neural dynamics to respond to sensory inputs during sensory reflex behavior generation. On the other hand, during intentional proactive behavior generation, an error regression scheme by which the internal neural activity was modified in the direction of minimizing prediction errors was needed for adequately revising the behavioral context. These results indicate that two different ways of treating uncertainty about perceptual events in learning, namely, probabilistic modeling and deterministic modeling, contribute to the development of different dynamic neuronal structures governing the two types of behavior generation schemes.
Collapse
|
34
|
Savalia T, Shukla A, Bapi RS. A Unified Theoretical Framework for Cognitive Sequencing. Front Psychol 2016; 7:1821. [PMID: 27917146 PMCID: PMC5114455 DOI: 10.3389/fpsyg.2016.01821] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2016] [Accepted: 11/03/2016] [Indexed: 11/24/2022] Open
Abstract
The capacity to sequence information is central to human performance. Sequencing ability forms the foundation stone for higher order cognition related to language and goal-directed planning. Information related to the order of items, their timing, chunking and hierarchical organization are important aspects in sequencing. Past research on sequencing has emphasized two distinct and independent dichotomies: implicit vs. explicit and goal-directed vs. habits. We propose a theoretical framework unifying these two streams. Our proposal relies on brain's ability to implicitly extract statistical regularities from the stream of stimuli and with attentional engagement organizing sequences explicitly and hierarchically. Similarly, sequences that need to be assembled purposively to accomplish a goal require engagement of attentional processes. With repetition, these goal-directed plans become habits with concomitant disengagement of attention. Thus, attention and awareness play a crucial role in the implicit-to-explicit transition as well as in how goal-directed plans become automatic habits. Cortico-subcortical loops basal ganglia-frontal cortex and hippocampus-frontal cortex loops mediate the transition process. We show how the computational principles of model-free and model-based learning paradigms, along with a pivotal role for attention and awareness, offer a unifying framework for these two dichotomies. Based on this framework, we make testable predictions related to the potential influence of response-to-stimulus interval (RSI) on developing awareness in implicit learning tasks.
Collapse
Affiliation(s)
- Tejas Savalia
- Cognitive Science Lab, International Institute of Information Technology Hyderabad, India
| | - Anuj Shukla
- Cognitive Science Lab, International Institute of Information Technology Hyderabad, India
| | - Raju S Bapi
- Cognitive Science Lab, International Institute of Information TechnologyHyderabad, India; School of Computer and Information Sciences, University of HyderabadHyderabad, India
| |
Collapse
|
35
|
Kato A, Morita K. Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation. PLoS Comput Biol 2016; 12:e1005145. [PMID: 27736881 PMCID: PMC5063413 DOI: 10.1371/journal.pcbi.1005145] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Accepted: 09/14/2016] [Indexed: 12/12/2022] Open
Abstract
It has been suggested that dopamine (DA) represents reward-prediction-error (RPE) defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of ‘Go’ or ‘No-Go’ selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1) decay-induced sustained RPE creates a gradient of ‘Go’ values towards a goal, and (2) value-contrasts between ‘Go’ and ‘No-Go’ are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i) slowdown of behavior by post-training blockade of DA signaling, (ii) observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii) relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems for value-learning are active even though learning has apparently converged, the systems might be in a state of dynamic equilibrium, where learning and forgetting are balanced. Dopamine (DA) has been suggested to have two reward-related roles: (1) representing reward-prediction-error (RPE), and (2) providing motivational drive. Role(1) is based on the physiological results that DA responds to unpredicted but not predicted reward, whereas role(2) is supported by the pharmacological results that blockade of DA signaling causes motivational impairments such as slowdown of self-paced behavior. So far, these two roles are considered to be played by two different temporal patterns of DA signals: role(1) by phasic signals and role(2) by tonic/sustained signals. However, recent studies have found sustained DA signals with features indicative of both roles (1) and (2), complicating this picture. Meanwhile, whereas synaptic/circuit mechanisms for role(1), i.e., how RPE is calculated in the upstream of DA neurons and how RPE-dependent update of learned-values occurs through DA-dependent synaptic plasticity, have now become clarified, mechanisms for role(2) remain unclear. In this work, we modeled self-paced behavior by a series of ‘Go’ or ‘No-Go’ selections in the framework of reinforcement-learning assuming DA's role(1), and demonstrated that incorporation of decay/forgetting of learned-values, which is presumably implemented as decay of synaptic strengths storing learned-values, provides a potential unified mechanistic account for the DA's two roles, together with its various temporal patterns.
Collapse
Affiliation(s)
- Ayaka Kato
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
| | - Kenji Morita
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan
- * E-mail:
| |
Collapse
|
36
|
Neuronal activity in dorsomedial and dorsolateral striatum under the requirement for temporal credit assignment. Sci Rep 2016; 6:27056. [PMID: 27245401 PMCID: PMC4887996 DOI: 10.1038/srep27056] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2016] [Accepted: 05/13/2016] [Indexed: 11/17/2022] Open
Abstract
To investigate neural processes underlying temporal credit assignment in the striatum, we recorded neuronal activity in the dorsomedial and dorsolateral striatum (DMS and DLS, respectively) of rats performing a dynamic foraging task in which a choice has to be remembered until its outcome is revealed for correct credit assignment. Choice signals appeared sequentially, initially in the DMS and then in the DLS, and they were combined with action value and reward signals in the DLS when choice outcome was revealed. Unlike in conventional dynamic foraging tasks, neural signals for chosen value were elevated in neither brain structure. These results suggest that dynamics of striatal neural signals related to evaluating choice outcome might differ drastically depending on the requirement for temporal credit assignment. In a behavioral context requiring temporal credit assignment, the DLS, but not the DMS, might be in charge of updating the value of chosen action by integrating choice, action value, and reward signals together.
Collapse
|
37
|
Menegas W, Bergan JF, Ogawa SK, Isogai Y, Umadevi Venkataraju K, Osten P, Uchida N, Watabe-Uchida M. Dopamine neurons projecting to the posterior striatum form an anatomically distinct subclass. eLife 2015; 4:e10032. [PMID: 26322384 PMCID: PMC4598831 DOI: 10.7554/elife.10032] [Citation(s) in RCA: 200] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2015] [Accepted: 08/28/2015] [Indexed: 12/18/2022] Open
Abstract
Combining rabies-virus tracing, optical clearing (CLARITY), and whole-brain light-sheet imaging, we mapped the monosynaptic inputs to midbrain dopamine neurons projecting to different targets (different parts of the striatum, cortex, amygdala, etc) in mice. We found that most populations of dopamine neurons receive a similar set of inputs rather than forming strong reciprocal connections with their target areas. A common feature among most populations of dopamine neurons was the existence of dense ‘clusters’ of inputs within the ventral striatum. However, we found that dopamine neurons projecting to the posterior striatum were outliers, receiving relatively few inputs from the ventral striatum and instead receiving more inputs from the globus pallidus, subthalamic nucleus, and zona incerta. These results lay a foundation for understanding the input/output structure of the midbrain dopamine circuit and demonstrate that dopamine neurons projecting to the posterior striatum constitute a unique class of dopamine neurons regulated by different inputs. DOI:http://dx.doi.org/10.7554/eLife.10032.001 Most neurons send their messages to recipient neurons by releasing a substance called a ‘neurotransmitter’ that binds to receptors on the target cell. The sites of this type of signal transmission are called synapses. Some small populations of neurons modulate the activity of hundreds or thousands of these synapses all across the brain by releasing ‘neuromodulators’ that affect how they work. These neuromodulators are essential because they broadcast information that is likely to be useful to many brain regions, like a ‘news channel’ for the brain. One important neuromodulator in the mammalian brain is dopamine, which contributes to motivation, learning, and the control of movement. Clusters of cells deep in the brain release dopamine, and people with Parkinson's disease gradually lose these cells. This makes it increasingly difficult for their brains to produce the correct amount of dopamine, and results in symptoms such as tremors and stiff muscles. Individual dopamine neurons typically send information to a single part of the brain. This suggests that dopamine neurons with different targets might have different roles. To explore this possibility, Menegas et al. classified dopamine neurons in the mouse brain into eight types based on the areas to which they project, and then mapped which neurons send input signals to each type. These inputs are likely to shape the activity of each type (that is, their ‘message’ to the rest of the brain). The mapping revealed that most dopamine neurons do not receive substantial input from the area to which they project (i.e., they do not form ‘closed loops’). Instead, most of their input comes from a common set of brain regions, including a particularly large number of inputs from the ventral striatum. However, Menegas et al. found one exception. Dopamine neurons that target part of the brain called the posterior striatum receive relatively little input from the ventral striatum. Their input comes instead from a set of other brain structures, and in particular from a region called the subthalamic nucleus. Electrical stimulation of the subthalamic nucleus can help to relieve the symptoms of Parkinson's disease. Therefore, the results presented by Menegas et al. suggest that this population of dopamine neurons might be particularly relevant to Parkinson's disease and that focusing future studies on them could ultimately be beneficial for patients. DOI:http://dx.doi.org/10.7554/eLife.10032.002
Collapse
Affiliation(s)
- William Menegas
- Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, Cambridge, United States
| | - Joseph F Bergan
- Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, Cambridge, United States
| | - Sachie K Ogawa
- Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, Cambridge, United States
| | - Yoh Isogai
- Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, Cambridge, United States
| | | | - Pavel Osten
- Cold Spring Harbor Laboratory, Cold Spring Harbor, United States
| | - Naoshige Uchida
- Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, Cambridge, United States
| | - Mitsuko Watabe-Uchida
- Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, Cambridge, United States
| |
Collapse
|
38
|
Viejo G, Khamassi M, Brovelli A, Girard B. Modeling choice and reaction time during arbitrary visuomotor learning through the coordination of adaptive working memory and reinforcement learning. Front Behav Neurosci 2015; 9:225. [PMID: 26379518 PMCID: PMC4549628 DOI: 10.3389/fnbeh.2015.00225] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Accepted: 08/10/2015] [Indexed: 11/18/2022] Open
Abstract
Current learning theory provides a comprehensive description of how humans and other animals learn, and places behavioral flexibility and automaticity at heart of adaptive behaviors. However, the computations supporting the interactions between goal-directed and habitual decision-making systems are still poorly understood. Previous functional magnetic resonance imaging (fMRI) results suggest that the brain hosts complementary computations that may differentially support goal-directed and habitual processes in the form of a dynamical interplay rather than a serial recruitment of strategies. To better elucidate the computations underlying flexible behavior, we develop a dual-system computational model that can predict both performance (i.e., participants' choices) and modulations in reaction times during learning of a stimulus–response association task. The habitual system is modeled with a simple Q-Learning algorithm (QL). For the goal-directed system, we propose a new Bayesian Working Memory (BWM) model that searches for information in the history of previous trials in order to minimize Shannon entropy. We propose a model for QL and BWM coordination such that the expensive memory manipulation is under control of, among others, the level of convergence of the habitual learning. We test the ability of QL or BWM alone to explain human behavior, and compare them with the performance of model combinations, to highlight the need for such combinations to explain behavior. Two of the tested combination models are derived from the literature, and the latter being our new proposal. In conclusion, all subjects were better explained by model combinations, and the majority of them are explained by our new coordination proposal.
Collapse
Affiliation(s)
- Guillaume Viejo
- Sorbonne Université, Université Pierre et Marie Curie, Univ Paris 06, UMR 7222, Institut des Systèmes Intelligents et de Robotique Paris, France ; Centre National de la Recherche Scientifique, UMR 7222, ISIR Paris, France
| | - Mehdi Khamassi
- Sorbonne Université, Université Pierre et Marie Curie, Univ Paris 06, UMR 7222, Institut des Systèmes Intelligents et de Robotique Paris, France ; Centre National de la Recherche Scientifique, UMR 7222, ISIR Paris, France
| | - Andrea Brovelli
- Institut de Neurosciences de la Timone, UMR 7289, Centre National de la Recherche Scientifique - Aix Marseille Université Marseille, France
| | - Benoît Girard
- Sorbonne Université, Université Pierre et Marie Curie, Univ Paris 06, UMR 7222, Institut des Systèmes Intelligents et de Robotique Paris, France ; Centre National de la Recherche Scientifique, UMR 7222, ISIR Paris, France
| |
Collapse
|
39
|
Chou TS, Bucci LD, Krichmar JL. Learning touch preferences with a tactile robot using dopamine modulated STDP in a model of insular cortex. Front Neurorobot 2015; 9:6. [PMID: 26257639 PMCID: PMC4510776 DOI: 10.3389/fnbot.2015.00006] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2015] [Accepted: 07/02/2015] [Indexed: 11/17/2022] Open
Abstract
Neurorobots enable researchers to study how behaviors are produced by neural mechanisms in an uncertain, noisy, real-world environment. To investigate how the somatosensory system processes noisy, real-world touch inputs, we introduce a neurorobot called CARL-SJR, which has a full-body tactile sensory area. The design of CARL-SJR is such that it encourages people to communicate with it through gentle touch. CARL-SJR provides feedback to users by displaying bright colors on its surface. In the present study, we show that CARL-SJR is capable of learning associations between conditioned stimuli (CS; a color pattern on its surface) and unconditioned stimuli (US; a preferred touch pattern) by applying a spiking neural network (SNN) with neurobiologically inspired plasticity. Specifically, we modeled the primary somatosensory cortex, prefrontal cortex, striatum, and the insular cortex, which is important for hedonic touch, to process noisy data generated directly from CARL-SJR's tactile sensory area. To facilitate learning, we applied dopamine-modulated Spike Timing Dependent Plasticity (STDP) to our simulated prefrontal cortex, striatum, and insular cortex. To cope with noisy, varying inputs, the SNN was tuned to produce traveling waves of activity that carried spatiotemporal information. Despite the noisy tactile sensors, spike trains, and variations in subject hand swipes, the learning was quite robust. Further, insular cortex activities in the incremental pathway of dopaminergic reward system allowed us to control CARL-SJR's preference for touch direction without heavily pre-processed inputs. The emerged behaviors we found in this model match animal's behaviors wherein they prefer touch in particular areas and directions. Thus, the results in this paper could serve as an explanation on the underlying neural mechanisms for developing tactile preferences and hedonic touch.
Collapse
Affiliation(s)
- Ting-Shuo Chou
- Department of Computer Sciences, University of California, Irvine Irvine, CA, USA
| | - Liam D Bucci
- Department of Cognitive Sciences, University of California, Irvine Irvine, CA, USA
| | - Jeffrey L Krichmar
- Department of Computer Sciences, University of California, Irvine Irvine, CA, USA ; Department of Cognitive Sciences, University of California, Irvine Irvine, CA, USA
| |
Collapse
|
40
|
Gurney KN, Humphries MD, Redgrave P. A new framework for cortico-striatal plasticity: behavioural theory meets in vitro data at the reinforcement-action interface. PLoS Biol 2015; 13:e1002034. [PMID: 25562526 PMCID: PMC4285402 DOI: 10.1371/journal.pbio.1002034] [Citation(s) in RCA: 65] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2014] [Accepted: 11/20/2014] [Indexed: 11/23/2022] Open
Abstract
A computational model yields new insights into the bewildering complexity of cortico-striatal plasticity and its rationale for supporting operant learning. Operant learning requires that reinforcement signals interact with action representations at a suitable neural interface. Much evidence suggests that this occurs when phasic dopamine, acting as a reinforcement prediction error, gates plasticity at cortico-striatal synapses, and thereby changes the future likelihood of selecting the action(s) coded by striatal neurons. But this hypothesis faces serious challenges. First, cortico-striatal plasticity is inexplicably complex, depending on spike timing, dopamine level, and dopamine receptor type. Second, there is a credit assignment problem—action selection signals occur long before the consequent dopamine reinforcement signal. Third, the two types of striatal output neuron have apparently opposite effects on action selection. Whether these factors rule out the interface hypothesis and how they interact to produce reinforcement learning is unknown. We present a computational framework that addresses these challenges. We first predict the expected activity changes over an operant task for both types of action-coding striatal neuron, and show they co-operate to promote action selection in learning and compete to promote action suppression in extinction. Separately, we derive a complete model of dopamine and spike-timing dependent cortico-striatal plasticity from in vitro data. We then show this model produces the predicted activity changes necessary for learning and extinction in an operant task, a remarkable convergence of a bottom-up data-driven plasticity model with the top-down behavioural requirements of learning theory. Moreover, we show the complex dependencies of cortico-striatal plasticity are not only sufficient but necessary for learning and extinction. Validating the model, we show it can account for behavioural data describing extinction, renewal, and reacquisition, and replicate in vitro experimental data on cortico-striatal plasticity. By bridging the levels between the single synapse and behaviour, our model shows how striatum acts as the action-reinforcement interface. A key component of survival is the ability to learn which actions, in what contexts, yield useful and rewarding outcomes. Actions are encoded in the brain in the cortex but, as many actions are possible at any one time, there needs to be a mechanism to select which one is to be performed. This problem of action selection is mediated by a set of nuclei known as the basal ganglia, which receive convergent “action requests” from all over the cortex and select the one that is currently most important. Working out which is most important is determined by the strength of the input from each action request: the stronger the connection, the more important that action. Understanding learning thus requires understanding how that strength is changed by the outcome of each action. We built a computational model that demonstrates how the brain's internal signal for outcome (carried by the neurotransmitter dopamine) changes the strength of these cortical connections to learn the selection of rewarded actions, and the suppression of unrewarded ones. Our model shows how several known signals in the brain work together to shape the influence of cortical inputs to the basal ganglia at the interface between our actions and their outcomes.
Collapse
Affiliation(s)
- Kevin N. Gurney
- Department of Psychology, Adaptive Behaviour Research Group, University of Sheffield, United Kingdom
- INSIGNEO Institute for In Silico Medicine, University of Sheffield, United Kingdom
- * E-mail:
| | | | - Peter Redgrave
- Department of Psychology, Adaptive Behaviour Research Group, University of Sheffield, United Kingdom
| |
Collapse
|
41
|
Affiliation(s)
- Stan B. Floresco
- Department of Psychology and Brain Research Center, University of British Columbia, Vancouver, British Columbia, V6T 1Z4 Canada;
| |
Collapse
|
42
|
Woolley DG, Mantini D, Coxon JP, D'Hooge R, Swinnen SP, Wenderoth N. Virtual water maze learning in human increases functional connectivity between posterior hippocampus and dorsal caudate. Hum Brain Mapp 2014; 36:1265-77. [PMID: 25418860 DOI: 10.1002/hbm.22700] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2014] [Revised: 10/27/2014] [Accepted: 11/17/2014] [Indexed: 11/10/2022] Open
Abstract
Recent work has demonstrated that functional connectivity between remote brain regions can be modulated by task learning or the performance of an already well-learned task. Here, we investigated the extent to which initial learning and stable performance of a spatial navigation task modulates functional connectivity between subregions of hippocampus and striatum. Subjects actively navigated through a virtual water maze environment and used visual cues to learn the position of a fixed spatial location. Resting-state functional magnetic resonance imaging scans were collected before and after virtual water maze navigation in two scan sessions conducted 1 week apart, with a behavior-only training session in between. There was a large significant reduction in the time taken to intercept the target location during scan session 1 and a small significant reduction during the behavior-only training session. No further reduction was observed during scan session 2. This indicates that scan session 1 represented initial learning and scan session 2 represented stable performance. We observed an increase in functional connectivity between left posterior hippocampus and left dorsal caudate that was specific to scan session 1. Importantly, the magnitude of the increase in functional connectivity was correlated with offline gains in task performance. Our findings suggest cooperative interaction occurs between posterior hippocampus and dorsal caudate during awake rest following the initial phase of spatial navigation learning. Furthermore, we speculate that the increase in functional connectivity observed during awake rest after initial learning might reflect consolidation-related processing.
Collapse
Affiliation(s)
- Daniel G Woolley
- Department of Kinesiology, Movement Control and Neuroplasticity Research Group, KU Leuven, Leuven, Belgium
| | | | | | | | | | | |
Collapse
|
43
|
Matsuda E, Hubert J, Ikegami T. A robotic approach to understanding the role and the mechanism of vicarious trial-and-error in a T-maze task. PLoS One 2014; 9:e102708. [PMID: 25050548 PMCID: PMC4106851 DOI: 10.1371/journal.pone.0102708] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2013] [Accepted: 06/23/2014] [Indexed: 11/22/2022] Open
Abstract
Vicarious trial-and-error (VTE) is a behavior observed in rat experiments that seems to suggest self-conflict. This behavior is seen mainly when the rats are uncertain about making a decision. The presence of VTE is regarded as an indicator of a deliberative decision-making process, that is, searching, predicting, and evaluating outcomes. This process is slower than automated decision-making processes, such as reflex or habituation, but it allows for flexible and ongoing control of behavior. In this study, we propose for the first time a robotic model of VTE to see if VTE can emerge just from a body-environment interaction and to show the underlying mechanism responsible for the observation of VTE and the advantages provided by it. We tried several robots with different parameters, and we have found that they showed three different types of VTE: high numbers of VTE at the beginning of learning, decreasing numbers afterward (similar VTE pattern to experiments with rats), low during the whole learning period, and high numbers all the time. Therefore, we were able to reproduce the phenomenon of VTE in a model robot using only a simple dynamical neural network with Hebbian learning, which suggests that VTE is an emergent property of a plastic and embodied neural network. From a comparison of the three types of VTE, we demonstrated that 1) VTE is associated with chaotic activity of neurons in our model and 2) VTE-showing robots were robust to environmental perturbations. We suggest that the instability of neuronal activity found in VTE allows ongoing learning to rebuild its strategy continuously, which creates robust behavior. Based on these results, we suggest that VTE is caused by a similar mechanism in biology and leads to robust decision making in an analogous way.
Collapse
Affiliation(s)
- Eiko Matsuda
- Department of Arts and Sciences, The University of Tokyo, Tokyo, Japan
- School of Informatics and Engineering, University of Sussex, Brighton, United Kingdom
- Japan Society for the Promotion of Science, Tokyo, Japan
| | - Julien Hubert
- Department of Arts and Sciences, The University of Tokyo, Tokyo, Japan
| | - Takashi Ikegami
- Department of Arts and Sciences, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
44
|
Experimental predictions drawn from a computational model of sign-trackers and goal-trackers. ACTA ACUST UNITED AC 2014; 109:78-86. [PMID: 24954026 DOI: 10.1016/j.jphysparis.2014.06.001] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2014] [Revised: 05/30/2014] [Accepted: 06/02/2014] [Indexed: 11/20/2022]
Abstract
Gaining a better understanding of the biological mechanisms underlying the individual variation observed in response to rewards and reward cues could help to identify and treat individuals more prone to disorders of impulsive control, such as addiction. Variation in response to reward cues is captured in rats undergoing autoshaping experiments where the appearance of a lever precedes food delivery. Although no response is required for food to be delivered, some rats (goal-trackers) learn to approach and avidly engage the magazine until food delivery, whereas other rats (sign-trackers) come to approach and engage avidly the lever. The impulsive and often maladaptive characteristics of the latter response are reminiscent of addictive behaviour in humans. In a previous article, we developed a computational model accounting for a set of experimental data regarding sign-trackers and goal-trackers. Here we show new simulations of the model to draw experimental predictions that could help further validate or refute the model. In particular, we apply the model to new experimental protocols such as injecting flupentixol locally into the core of the nucleus accumbens rather than systemically, and lesioning of the core of the nucleus accumbens before or after conditioning. In addition, we discuss the possibility of removing the food magazine during the inter-trial interval. The predictions from this revised model will help us better understand the role of different brain regions in the behaviours expressed by sign-trackers and goal-trackers.
Collapse
|
45
|
Easy rider: monkeys learn to drive a wheelchair to navigate through a complex maze. PLoS One 2014; 9:e96275. [PMID: 24831130 PMCID: PMC4022652 DOI: 10.1371/journal.pone.0096275] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Accepted: 04/07/2014] [Indexed: 11/19/2022] Open
Abstract
The neurological bases of spatial navigation are mainly investigated in rodents and seldom in primates. The few studies led on spatial navigation in both human and non-human primates are performed in virtual, not in real environments. This is mostly because of methodological difficulties inherent in conducting research on freely-moving monkeys in real world environments. There is some incertitude, however, regarding the extrapolation of rodent spatial navigation strategies to primates. Here we present an entirely new platform for investigating real spatial navigation in rhesus monkeys. We showed that monkeys can learn a pathway by using different strategies. In these experiments three monkeys learned to drive the wheelchair and to follow a specified route through a real maze. After learning the route, probe tests revealed that animals successively use three distinct navigation strategies based on i) the place of the reward, ii) the direction taken to obtain reward or iii) a cue indicating reward location. The strategy used depended of the options proposed and the duration of learning. This study reveals that monkeys, like rodents and humans, switch between different spatial navigation strategies with extended practice, implying well-conserved brain learning systems across different species. This new task with freely driving monkeys provides a good support for the electrophysiological and pharmacological investigation of spatial navigation in the real world by making possible electrophysiological and pharmacological investigations.
Collapse
|
46
|
Retailleau A, Boraud T. The Michelin red guide of the brain: role of dopamine in goal-oriented navigation. Front Syst Neurosci 2014; 8:32. [PMID: 24672436 PMCID: PMC3957057 DOI: 10.3389/fnsys.2014.00032] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2014] [Accepted: 02/18/2014] [Indexed: 11/13/2022] Open
Abstract
Spatial learning has been recognized over the years to be under the control of the hippocampus and related temporal lobe structures. Hippocampal damage often causes severe impairments in the ability to learn and remember a location in space defined by distal visual cues. Such cognitive disabilities are found in Parkinsonian patients. We recently investigated the role of dopamine in navigation in the 6-Hydroxy-dopamine (6-OHDA) rat, a model of Parkinson’s disease (PD) commonly used to investigate the pathophysiology of dopamine depletion (Retailleau et al., 2013). We demonstrated that dopamine (DA) is essential to spatial learning as its depletion results in spatial impairments. Our results showed that the behavioral effect of DA depletion is correlated with modification of the neural encoding of spatial features and decision making processes in hippocampus. However, the origin of these alterations in the neural processing of the spatial information needs to be clarified. It could result from a local effect: dopamine depletion disturbs directly the processing of relevant spatial information at hippocampal level. Alternatively, it could result from a more distributed network effect: dopamine depletion elsewhere in the brain (entorhinal cortex, striatum, etc.) modifies the way hippocampus processes spatial information. Recent experimental evidence in rodents, demonstrated indeed, that other brain areas are involved in the acquisition of spatial information. Amongst these, the cortex—basal ganglia (BG) loop is known to be involved in reinforcement learning and has been identified as an important contributor to spatial learning. In particular, it has been shown that altered activity of the BG striatal complex can impair the ability to perform spatial learning tasks. The present review provides a glimpse of the findings obtained over the past decade that support a dialog between these two structures during spatial learning under DA control.
Collapse
Affiliation(s)
- Aude Retailleau
- Sagol Department of Neurobiology, University of Haifa Haifa, Israel
| | - Thomas Boraud
- Institut des Maladies Neurodegeneratives UMR 5293, University of Bordeaux Bordeaux, France ; Institut des Maladies Neurodegeneratives UMR 5293, CNRS Bordeaux, France
| |
Collapse
|
47
|
Lesaint F, Sigaud O, Flagel SB, Robinson TE, Khamassi M. Modelling individual differences in the form of Pavlovian conditioned approach responses: a dual learning systems approach with factored representations. PLoS Comput Biol 2014; 10:e1003466. [PMID: 24550719 PMCID: PMC3923662 DOI: 10.1371/journal.pcbi.1003466] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2013] [Accepted: 12/19/2013] [Indexed: 12/04/2022] Open
Abstract
Reinforcement Learning has greatly influenced models of conditioning, providing powerful explanations of acquired behaviour and underlying physiological observations. However, in recent autoshaping experiments in rats, variation in the form of Pavlovian conditioned responses (CRs) and associated dopamine activity, have questioned the classical hypothesis that phasic dopamine activity corresponds to a reward prediction error-like signal arising from a classical Model-Free system, necessary for Pavlovian conditioning. Over the course of Pavlovian conditioning using food as the unconditioned stimulus (US), some rats (sign-trackers) come to approach and engage the conditioned stimulus (CS) itself - a lever - more and more avidly, whereas other rats (goal-trackers) learn to approach the location of food delivery upon CS presentation. Importantly, although both sign-trackers and goal-trackers learn the CS-US association equally well, only in sign-trackers does phasic dopamine activity show classical reward prediction error-like bursts. Furthermore, neither the acquisition nor the expression of a goal-tracking CR is dopamine-dependent. Here we present a computational model that can account for such individual variations. We show that a combination of a Model-Based system and a revised Model-Free system can account for the development of distinct CRs in rats. Moreover, we show that revising a classical Model-Free system to individually process stimuli by using factored representations can explain why classical dopaminergic patterns may be observed for some rats and not for others depending on the CR they develop. In addition, the model can account for other behavioural and pharmacological results obtained using the same, or similar, autoshaping procedures. Finally, the model makes it possible to draw a set of experimental predictions that may be verified in a modified experimental protocol. We suggest that further investigation of factored representations in computational neuroscience studies may be useful.
Collapse
Affiliation(s)
- Florian Lesaint
- Institut des Systèmes Intelligents et de Robotique, UMR 7222, UPMC Univ Paris 06, Paris, France
- Institut des Systèmes Intelligents et de Robotique, UMR 7222, CNRS, Paris, France
| | - Olivier Sigaud
- Institut des Systèmes Intelligents et de Robotique, UMR 7222, UPMC Univ Paris 06, Paris, France
- Institut des Systèmes Intelligents et de Robotique, UMR 7222, CNRS, Paris, France
| | - Shelly B. Flagel
- Department of Psychiatry, University of Michigan, Ann Arbor, Michigan, United States of America
- Molecular and Behavioral Neuroscience Institute, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Psychology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Terry E. Robinson
- Department of Psychology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Mehdi Khamassi
- Institut des Systèmes Intelligents et de Robotique, UMR 7222, UPMC Univ Paris 06, Paris, France
- Institut des Systèmes Intelligents et de Robotique, UMR 7222, CNRS, Paris, France
| |
Collapse
|
48
|
Renaudo E, Girard B, Chatila R, Khamassi M. Design of a Control Architecture for Habit Learning in Robots. BIOMIMETIC AND BIOHYBRID SYSTEMS 2014. [DOI: 10.1007/978-3-319-09435-9_22] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
49
|
Penny WD, Zeidman P, Burgess N. Forward and backward inference in spatial cognition. PLoS Comput Biol 2013; 9:e1003383. [PMID: 24348230 PMCID: PMC3861045 DOI: 10.1371/journal.pcbi.1003383] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2013] [Accepted: 10/23/2013] [Indexed: 12/26/2022] Open
Abstract
This paper shows that the various computations underlying spatial cognition can be implemented using statistical inference in a single probabilistic model. Inference is implemented using a common set of 'lower-level' computations involving forward and backward inference over time. For example, to estimate where you are in a known environment, forward inference is used to optimally combine location estimates from path integration with those from sensory input. To decide which way to turn to reach a goal, forward inference is used to compute the likelihood of reaching that goal under each option. To work out which environment you are in, forward inference is used to compute the likelihood of sensory observations under the different hypotheses. For reaching sensory goals that require a chaining together of decisions, forward inference can be used to compute a state trajectory that will lead to that goal, and backward inference to refine the route and estimate control signals that produce the required trajectory. We propose that these computations are reflected in recent findings of pattern replay in the mammalian brain. Specifically, that theta sequences reflect decision making, theta flickering reflects model selection, and remote replay reflects route and motor planning. We also propose a mapping of the above computational processes onto lateral and medial entorhinal cortex and hippocampus.
Collapse
Affiliation(s)
- Will D. Penny
- Wellcome Trust Centre for Neuroimaging, University College, London, London, United Kingdom
| | - Peter Zeidman
- Wellcome Trust Centre for Neuroimaging, University College, London, London, United Kingdom
| | - Neil Burgess
- Institute for Cognitive Neuroscience, University College, London, London, United Kingdom
| |
Collapse
|
50
|
Mannella F, Gurney K, Baldassarre G. The nucleus accumbens as a nexus between values and goals in goal-directed behavior: a review and a new hypothesis. Front Behav Neurosci 2013; 7:135. [PMID: 24167476 PMCID: PMC3805952 DOI: 10.3389/fnbeh.2013.00135] [Citation(s) in RCA: 94] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2013] [Accepted: 09/15/2013] [Indexed: 01/01/2023] Open
Abstract
Goal-directed behavior is a fundamental means by which animals can flexibly solve the challenges posed by variable external and internal conditions. Recently, the processes and brain mechanisms underlying such behavior have been extensively studied from behavioral, neuroscientific and computational perspectives. This research has highlighted the processes underlying goal-directed behavior and associated brain systems including prefrontal cortex, basal ganglia and, in particular therein, the nucleus accumbens (NAcc). This paper focusses on one particular process at the core of goal-directed behavior: how motivational value is assigned to goals on the basis of internal states and environmental stimuli, and how this supports goal selection processes. Various biological and computational accounts have been given of this problem and of related multiple neural and behavior phenomena, but we still lack an integrated hypothesis on the generation and use of value for goal selection. This paper proposes an hypothesis that aims to solve this problem and is based on this key elements: (a) amygdala and hippocampus establish the motivational value of stimuli and goals; (b) prefrontal cortex encodes various types of action outcomes; (c) NAcc integrates different sources of value, representing them in terms of a common currency with the aid of dopamine, and thereby plays a major role in selecting action outcomes within prefrontal cortex. The “goals” pursued by the organism are the outcomes selected by these processes. The hypothesis is developed in the context of a critical review of relevant biological and computational literature which offer it support. The paper shows how the hypothesis has the potential to integrate existing interpretations of motivational value and goal selection.
Collapse
Affiliation(s)
- Francesco Mannella
- Laboratory of Computational Embodied Neuroscience, Institute of Cognitive Sciences and Technologies, National Research Council Rome, Italy
| | | | | |
Collapse
|