Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Wang JX, Kurth-Nelson Z, Kumaran D, Tirumala D, Soyer H, Leibo JZ, Hassabis D, Botvinick M. Prefrontal cortex as a meta-reinforcement learning system. Nat Neurosci 2018;21:860-868. [DOI: 10.1038/s41593-018-0147-8] [Citation(s) in RCA: 258] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2017] [Accepted: 04/05/2018] [Indexed: 11/09/2022]

For:	Wang JX, Kurth-Nelson Z, Kumaran D, Tirumala D, Soyer H, Leibo JZ, Hassabis D, Botvinick M. Prefrontal cortex as a meta-reinforcement learning system. Nat Neurosci 2018;21:860-868. [DOI: 10.1038/s41593-018-0147-8] [Citation(s) in RCA: 258] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2017] [Accepted: 04/05/2018] [Indexed: 11/09/2022]

Number

Cited by Other Article(s)

151

Bermudez-Contreras E. Deep reinforcement learning to study spatial navigation, learning and memory in artificial and biological agents. BIOLOGICAL CYBERNETICS 2021;115:131-134. [PMID: 33564968 DOI: 10.1007/s00422-021-00862-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Accepted: 01/19/2021] [Indexed: 06/12/2023]

152

Raman DV, O'Leary T. Frozen algorithms: how the brain's wiring facilitates learning. Curr Opin Neurobiol 2021;67:207-214. [PMID: 33508698 PMCID: PMC8202511 DOI: 10.1016/j.conb.2020.12.017] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 12/21/2020] [Accepted: 12/30/2020] [Indexed: 12/03/2022]

153

Starkweather CK, Uchida N. Dopamine signals as temporal difference errors: recent advances. Curr Opin Neurobiol 2021;67:95-105. [PMID: 33186815 PMCID: PMC8107188 DOI: 10.1016/j.conb.2020.08.014] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 08/24/2020] [Accepted: 08/26/2020] [Indexed: 11/28/2022]

154

Bartolo R, Averbeck BB. Inference as a fundamental process in behavior. Curr Opin Behav Sci 2021;38:8-13. [DOI: 10.1016/j.cobeha.2020.06.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

155

Deep random walk of unitary invariance for large-scale data representation. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.11.039] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

156

Wang JX. Meta-learning in natural and artificial intelligence. Curr Opin Behav Sci 2021. [DOI: 10.1016/j.cobeha.2021.01.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

157

Banerjee A, Rikhye RV, Marblestone A. Reinforcement-guided learning in frontal neocortex: emerging computational concepts. Curr Opin Behav Sci 2021. [DOI: 10.1016/j.cobeha.2021.02.019] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

158

Patai EZ, Spiers HJ. The Versatile Wayfinder: Prefrontal Contributions to Spatial Navigation. Trends Cogn Sci 2021;25:520-533. [PMID: 33752958 DOI: 10.1016/j.tics.2021.02.010] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Revised: 02/22/2021] [Accepted: 02/23/2021] [Indexed: 12/15/2022]

159

Silva C, Porter BS, Hillman KL. Stimulation in the Rat Anterior Insula and Anterior Cingulate During an Effortful Weightlifting Task. Front Neurosci 2021;15:643384. [PMID: 33716659 PMCID: PMC7952617 DOI: 10.3389/fnins.2021.643384] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 02/11/2021] [Indexed: 12/14/2022] Open

160

Cross L, Cockburn J, Yue Y, O'Doherty JP. Using deep reinforcement learning to reveal how the brain encodes abstract state-space representations in high-dimensional environments. Neuron 2021;109:724-738.e7. [PMID: 33326755 PMCID: PMC7897245 DOI: 10.1016/j.neuron.2020.11.021] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Revised: 10/15/2020] [Accepted: 11/17/2020] [Indexed: 11/21/2022]

161

Baram AB, Muller TH, Nili H, Garvert MM, Behrens TEJ. Entorhinal and ventromedial prefrontal cortices abstract and generalize the structure of reinforcement learning problems. Neuron 2021;109:713-723.e7. [PMID: 33357385 PMCID: PMC7889496 DOI: 10.1016/j.neuron.2020.11.024] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Revised: 10/09/2020] [Accepted: 11/19/2020] [Indexed: 11/25/2022]

162

Alexandre F. A global framework for a systemic view of brain modeling. Brain Inform 2021;8:3. [PMID: 33591440 PMCID: PMC7886931 DOI: 10.1186/s40708-021-00126-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 02/05/2021] [Indexed: 11/23/2022] Open

Abstract

The brain is a complex system, due to the heterogeneity of its structure, the diversity of the functions in which it participates and to its reciprocal relationships with the body and the environment. A systemic description of the brain is presented here, as a contribution to developing a brain theory and as a general framework where specific models in computational neuroscience can be integrated and associated with global information flows and cognitive functions. In an enactive view, this framework integrates the fundamental organization of the brain in sensorimotor loops with the internal and the external worlds, answering four fundamental questions (what, why, where and how). Our survival-oriented definition of behavior gives a prominent role to pavlovian and instrumental conditioning, augmented during phylogeny by the specific contribution of other kinds of learning, related to semantic memory in the posterior cortex, episodic memory in the hippocampus and working memory in the frontal cortex. This framework highlights that responses can be prepared in different ways, from pavlovian reflexes and habitual behavior to deliberations for goal-directed planning and reasoning, and explains that these different kinds of responses coexist, collaborate and compete for the control of behavior. It also lays emphasis on the fact that cognition can be described as a dynamical system of interacting memories, some acting to provide information to others, to replace them when they are not efficient enough, or to help for their improvement. Describing the brain as an architecture of learning systems has also strong implications in Machine Learning. Our biologically informed view of pavlovian and instrumental conditioning can be very precious to revisit classical Reinforcement Learning and provide a basis to ensure really autonomous learning.

Collapse

163

Fang H, Zeng Y, Zhao F. Brain Inspired Sequences Production by Spiking Neural Networks With Reward-Modulated STDP. Front Comput Neurosci 2021;15:612041. [PMID: 33664661 PMCID: PMC7921721 DOI: 10.3389/fncom.2021.612041] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 01/19/2021] [Indexed: 11/13/2022] Open

164

The Best Laid Plans: Computational Principles of Anterior Cingulate Cortex. Trends Cogn Sci 2021;25:316-329. [PMID: 33593641 DOI: 10.1016/j.tics.2021.01.008] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 01/17/2021] [Accepted: 01/19/2021] [Indexed: 12/26/2022]

165

Diederen KMJ, Fletcher PC. Dopamine, Prediction Error and Beyond. Neuroscientist 2021;27:30-46. [PMID: 32338128 PMCID: PMC7804370 DOI: 10.1177/1073858420907591] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

166

Tomov MS, Schulz E, Gershman SJ. Multi-task reinforcement learning in humans. Nat Hum Behav 2021;5:764-773. [PMID: 33510391 DOI: 10.1038/s41562-020-01035-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Accepted: 12/10/2020] [Indexed: 01/01/2023]

167

Bari BA, Cohen JY. Dynamic decision making and value computations in medial frontal cortex. INTERNATIONAL REVIEW OF NEUROBIOLOGY 2021;158:83-113. [PMID: 33785157 DOI: 10.1016/bs.irn.2020.12.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

168

Pouncy T, Tsividis P, Gershman SJ. What Is the Model in Model-Based Planning? Cogn Sci 2021;45:e12928. [PMID: 33398907 DOI: 10.1111/cogs.12928] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Revised: 11/17/2020] [Accepted: 11/17/2020] [Indexed: 11/28/2022]

169

Tessereau C, O’Dea R, Coombes S, Bast T. Reinforcement learning approaches to hippocampus-dependent flexible spatial navigation. Brain Neurosci Adv 2021;5:2398212820975634. [PMID: 33954259 PMCID: PMC8042550 DOI: 10.1177/2398212820975634] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Accepted: 10/21/2020] [Indexed: 11/17/2022] Open

Abstract

Humans and non-human animals show great flexibility in spatial navigation, including the ability to return to specific locations based on as few as one single experience. To study spatial navigation in the laboratory, watermaze tasks, in which rats have to find a hidden platform in a pool of cloudy water surrounded by spatial cues, have long been used. Analogous tasks have been developed for human participants using virtual environments. Spatial learning in the watermaze is facilitated by the hippocampus. In particular, rapid, one-trial, allocentric place learning, as measured in the delayed-matching-to-place variant of the watermaze task, which requires rodents to learn repeatedly new locations in a familiar environment, is hippocampal dependent. In this article, we review some computational principles, embedded within a reinforcement learning framework, that utilise hippocampal spatial representations for navigation in watermaze tasks. We consider which key elements underlie their efficacy, and discuss their limitations in accounting for hippocampus-dependent navigation, both in terms of behavioural performance (i.e. how well do they reproduce behavioural measures of rapid place learning) and neurobiological realism (i.e. how well do they map to neurobiological substrates involved in rapid place learning). We discuss how an actor-critic architecture, enabling simultaneous assessment of the value of the current location and of the optimal direction to follow, can reproduce one-trial place learning performance as shown on watermaze and virtual delayed-matching-to-place tasks by rats and humans, respectively, if complemented with map-like place representations. The contribution of actor-critic mechanisms to delayed-matching-to-place performance is consistent with neurobiological findings implicating the striatum and hippocampo-striatal interaction in delayed-matching-to-place performance, given that the striatum has been associated with actor-critic mechanisms. Moreover, we illustrate that hierarchical computations embedded within an actor-critic architecture may help to account for aspects of flexible spatial navigation. The hierarchical reinforcement learning approach separates trajectory control via a temporal-difference error from goal selection via a goal prediction error and may account for flexible, trial-specific, navigation to familiar goal locations, as required in some arm-maze place memory tasks, although it does not capture one-trial learning of new goal locations, as observed in open field, including watermaze and virtual, delayed-matching-to-place tasks. Future models of one-shot learning of new goal locations, as observed on delayed-matching-to-place tasks, should incorporate hippocampal plasticity mechanisms that integrate new goal information with allocentric place representation, as such mechanisms are supported by substantial empirical evidence.

Collapse

170

Schilling M, Paskarbeit J, Ritter H, Schneider A, Cruse H. From Adaptive Locomotion to Predictive Action Selection – Cognitive Control for a Six-Legged Walker. IEEE T ROBOT 2021. [DOI: 10.1109/tro.2021.3106832] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

171

Márton CD, Schultz SR, Averbeck BB. Learning to select actions shapes recurrent dynamics in the corticostriatal system. Neural Netw 2020;132:375-393. [PMID: 32992244 PMCID: PMC7685243 DOI: 10.1016/j.neunet.2020.09.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 09/03/2020] [Accepted: 09/11/2020] [Indexed: 01/03/2023]

Abstract

Learning to select appropriate actions based on their values is fundamental to adaptive behavior. This form of learning is supported by fronto-striatal systems. The dorsal-lateral prefrontal cortex (dlPFC) and the dorsal striatum (dSTR), which are strongly interconnected, are key nodes in this circuitry. Substantial experimental evidence, including neurophysiological recordings, have shown that neurons in these structures represent key aspects of learning. The computational mechanisms that shape the neurophysiological responses, however, are not clear. To examine this, we developed a recurrent neural network (RNN) model of the dlPFC-dSTR circuit and trained it on an oculomotor sequence learning task. We compared the activity generated by the model to activity recorded from monkey dlPFC and dSTR in the same task. This network consisted of a striatal component which encoded action values, and a prefrontal component which selected appropriate actions. After training, this system was able to autonomously represent and update action values and select actions, thus being able to closely approximate the representational structure in corticostriatal recordings. We found that learning to select the correct actions drove action-sequence representations further apart in activity space, both in the model and in the neural data. The model revealed that learning proceeds by increasing the distance between sequence-specific representations. This makes it more likely that the model will select the appropriate action sequence as learning develops. Our model thus supports the hypothesis that learning in networks drives the neural representations of actions further apart, increasing the probability that the network generates correct actions as learning proceeds. Altogether, this study advances our understanding of how neural circuit dynamics are involved in neural computation, revealing how dynamics in the corticostriatal system support task learning.

Collapse

172

Piette C, Touboul J, Venance L. Engrams of Fast Learning. Front Cell Neurosci 2020;14:575915. [PMID: 33250712 PMCID: PMC7676431 DOI: 10.3389/fncel.2020.575915] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 09/24/2020] [Indexed: 01/22/2023] Open

173

Shen X, Zhang X, Huang Y, Chen S, Wang Y. Task Learning Over Multi-Day Recording via Internally Rewarded Reinforcement Learning Based Brain Machine Interfaces. IEEE Trans Neural Syst Rehabil Eng 2020;28:3089-3099. [PMID: 33232240 DOI: 10.1109/tnsre.2020.3039970] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Abstract

Autonomous brain machine interfaces (BMIs) aim to enable paralyzed people to self-evaluate their movement intention to control external devices. Previous reinforcement learning (RL)-based decoders interpret the mapping between neural activity and movements using the external reward for well-trained subjects, and have not investigated the task learning procedure. The brain has developed a learning mechanism to identify the correct actions that lead to rewards in the new task. This internal guidance can be utilized to replace the external reference to advance BMIs as an autonomous system. In this study, we propose to build an internally rewarded reinforcement learning-based BMI framework using the multi-site recording to demonstrate the autonomous learning ability of the BMI decoder on the new task. We test the model on the neural data collected over multiple days while the rats were learning a new lever discrimination task. The primary motor cortex (M1) and medial prefrontal cortex (mPFC) spikes are interpreted by the proposed RL framework into the discrete lever press actions. The neural activity of the mPFC post the action duration is interpreted as the internal reward information, where a support vector machine is implemented to classify the reward vs. non-reward trials with a high accuracy of 87.5% across subjects. This internal reward is used to replace the external water reward to update the decoder, which is able to adapt to the nonstationary neural activity during subject learning. The multi-cortical recording allows us to take in more cortical recordings as input and uses internal critics to guide the decoder learning. Comparing with the classic decoder using M1 activity as the only input and external guidance, the proposed system with multi-cortical recordings shows a better decoding accuracy. More importantly, our internally rewarded decoder demonstrates the autonomous learning ability on the new task as the decoder successfully addresses the time-variant neural patterns while subjects are learning, and works asymptotically as the subjects' behavioral learning progresses. It reveals the potential of endowing BMIs with autonomous task learning ability in the RL framework.

Collapse

174

Tsuda B, Tye KM, Siegelmann HT, Sejnowski TJ. A modeling framework for adaptive lifelong learning with transfer and savings through gating in the prefrontal cortex. Proc Natl Acad Sci U S A 2020;117:29872-29882. [PMID: 33154155 PMCID: PMC7703668 DOI: 10.1073/pnas.2009591117] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open

175

Eckstein MK, Collins AGE. Computational evidence for hierarchically structured reinforcement learning in humans. Proc Natl Acad Sci U S A 2020;117:29381-29389. [PMID: 33229518 PMCID: PMC7703642 DOI: 10.1073/pnas.1912330117] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open

176

Zhang Z, Cheng H, Yang T. A recurrent neural network framework for flexible and adaptive decision making based on sequence learning. PLoS Comput Biol 2020;16:e1008342. [PMID: 33141824 PMCID: PMC7673505 DOI: 10.1371/journal.pcbi.1008342] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2020] [Revised: 11/18/2020] [Accepted: 09/16/2020] [Indexed: 11/25/2022] Open

177

Jin C, Chen W, Cao Y, Xu Z, Tan Z, Zhang X, Deng L, Zheng C, Zhou J, Shi H, Feng J. Development and evaluation of an artificial intelligence system for COVID-19 diagnosis. Nat Commun 2020;11:5088. [PMID: 33037212 DOI: 10.1101/823377] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Accepted: 09/04/2020] [Indexed: 05/22/2023] Open

178

Pelekanos V, Premereur E, Mitchell DJ, Chakraborty S, Mason S, Lee ACH, Mitchell AS. Corticocortical and Thalamocortical Changes in Functional Connectivity and White Matter Structural Integrity after Reward-Guided Learning of Visuospatial Discriminations in Rhesus Monkeys. J Neurosci 2020;40:7887-7901. [PMID: 32900835 PMCID: PMC7548693 DOI: 10.1523/jneurosci.0364-20.2020] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Revised: 06/30/2020] [Accepted: 07/25/2020] [Indexed: 12/14/2022] Open

Abstract

The frontal cortex and temporal lobes together regulate complex learning and memory capabilities. Here, we collected resting-state functional and diffusion-weighted MRI data before and after male rhesus macaque monkeys received extensive training to learn novel visuospatial discriminations (reward-guided learning). We found functional connectivity changes in orbitofrontal, ventromedial prefrontal, inferotemporal, entorhinal, retrosplenial, and anterior cingulate cortices, the subicular complex, and the dorsal, medial thalamus. These corticocortical and thalamocortical changes in functional connectivity were accompanied by related white matter structural alterations in the uncinate fasciculus, fornix, and ventral prefrontal tract: tracts that connect (sub)cortical networks and are implicated in learning and memory processes in monkeys and humans. After the well-trained monkeys received fornix transection, they were impaired in learning new visuospatial discriminations. In addition, the functional connectivity profile that was observed after the training was altered. These changes were accompanied by white matter changes in the ventral prefrontal tract, although the integrity of the uncinate fasciculus remained unchanged. Our experiments highlight the importance of different communication relayed among corticocortical and thalamocortical circuitry for the ability to learn new visuospatial associations (learning-to-learn) and to make reward-guided decisions.SIGNIFICANCE STATEMENT Frontal neural networks and the temporal lobes contribute to reward-guided learning in mammals. Here, we provide novel insight by showing that specific corticocortical and thalamocortical functional connectivity is altered after rhesus monkeys received extensive training to learn novel visuospatial discriminations. Contiguous white matter fiber pathways linking these gray matter structures, namely, the uncinate fasciculus, fornix, and ventral prefrontal tract, showed structural changes after completing training in the visuospatial task. Additionally, different patterns of functional and structural connectivity are reported after removal of subcortical connections within the extended hippocampal system, via fornix transection. These results highlight the importance of both corticocortical and thalamocortical interactions in reward-guided learning in the normal brain and identify brain structures important for memory capabilities after injury.

Collapse

179

van Lieshout LLF, de Lange FP, Cools R. Why so curious? Quantifying mechanisms of information seeking. Curr Opin Behav Sci 2020. [DOI: 10.1016/j.cobeha.2020.08.005] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

180

Park SA, Miller DS, Nili H, Ranganath C, Boorman ED. Map Making: Constructing, Combining, and Inferring on Abstract Cognitive Maps. Neuron 2020;107:1226-1238.e8. [PMID: 32702288 PMCID: PMC7529977 DOI: 10.1016/j.neuron.2020.06.030] [Citation(s) in RCA: 90] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2019] [Revised: 05/29/2020] [Accepted: 06/24/2020] [Indexed: 10/23/2022]

181

Mark S, Moran R, Parr T, Kennerley SW, Behrens TEJ. Transferring structural knowledge across cognitive maps in humans and models. Nat Commun 2020;11:4783. [PMID: 32963219 PMCID: PMC7508979 DOI: 10.1038/s41467-020-18254-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 08/14/2020] [Indexed: 01/15/2023] Open

182

Feasibility Analysis and Application of Reinforcement Learning Algorithm Based on Dynamic Parameter Adjustment. ALGORITHMS 2020. [DOI: 10.3390/a13090239] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Abstract Reinforcement learning, as a branch of machine learning, has been gradually applied in the control field. However, in the practical application of the algorithm, the hyperparametric approach to network settings for deep reinforcement learning still follows the empirical attempts of traditional machine learning (supervised learning and unsupervised learning). This method ignores part of the information generated by agents exploring the environment contained in the updating of the reinforcement learning value function, which will affect the performance of the convergence and cumulative return of reinforcement learning. The reinforcement learning algorithm based on dynamic parameter adjustment is a new method for setting learning rate parameters of deep reinforcement learning. Based on the traditional method of setting parameters for reinforcement learning, this method analyzes the advantages of different learning rates at different stages of reinforcement learning and dynamically adjusts the learning rates in combination with the temporal-difference (TD) error values to achieve the advantages of different learning rates in different stages to improve the rationality of the algorithm in practical application. At the same time, by combining the Robbins–Monro approximation algorithm and deep reinforcement learning algorithm, it is proved that the algorithm of dynamic regulation learning rate can theoretically meet the convergence requirements of the intelligent control algorithm. In the experiment, the effect of this method is analyzed through the continuous control scenario in the standard experimental environment of ”Car-on-The-Hill” of reinforcement learning, and it is verified that the new method can achieve better results than the traditional reinforcement learning in practical application. According to the model characteristics of the deep reinforcement learning, a more suitable setting method for the learning rate of the deep reinforcement learning network proposed. At the same time, the feasibility of the method has been proved both in theory and in the application. Therefore, the method of setting the learning rate parameter is worthy of further development and research. Collapse

183

Prior cortical activity differences during an action observation plus motor imagery task related to motor adaptation performance of a coordinated multi-limb complex task. Cogn Neurodyn 2020;14:769-779. [PMID: 33101530 DOI: 10.1007/s11571-020-09633-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2020] [Revised: 08/24/2020] [Accepted: 09/01/2020] [Indexed: 12/16/2022] Open

184

Collins AGE, Cockburn J. Beyond dichotomies in reinforcement learning. Nat Rev Neurosci 2020;21:576-586. [PMID: 32873936 DOI: 10.1038/s41583-020-0355-6] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/20/2020] [Indexed: 11/09/2022]

185

Cortese A, Lau H, Kawato M. Unconscious reinforcement learning of hidden brain states supported by confidence. Nat Commun 2020;11:4429. [PMID: 32868772 PMCID: PMC7459278 DOI: 10.1038/s41467-020-17828-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 07/13/2020] [Indexed: 12/11/2022] Open

186

Trial-by-trial dynamics of reward prediction error-associated signals during extinction learning and renewal. Prog Neurobiol 2020;197:101901. [PMID: 32846162 DOI: 10.1016/j.pneurobio.2020.101901] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 07/06/2020] [Accepted: 08/18/2020] [Indexed: 11/24/2022]

187

Klos C, Kalle Kossio YF, Goedeke S, Gilra A, Memmesheimer RM. Dynamical Learning of Dynamics. PHYSICAL REVIEW LETTERS 2020;125:088103. [PMID: 32909804 DOI: 10.1103/physrevlett.125.088103] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Revised: 06/24/2020] [Accepted: 07/21/2020] [Indexed: 06/11/2023]

188

Diaconescu AO, Stecy M, Kasper L, Burke CJ, Nagy Z, Mathys C, Tobler PN. Neural arbitration between social and individual learning systems. eLife 2020;9:54051. [PMID: 32779568 PMCID: PMC7476763 DOI: 10.7554/elife.54051] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Accepted: 08/10/2020] [Indexed: 12/20/2022] Open

Affiliation(s)

Andreea Oliviana Diaconescu Translational Neuromodeling Unit, Institute for Biomedical Engineering, University of Zurich & ETH Zurich, Zurich, Switzerland.,Laboratory for Social and Neural Systems Research, Department of Economics, University of Zurich, Zurich, Switzerland.,University of Basel, Department of Psychiatry (UPK), Basel, Switzerland.,Krembil Centre for Neuroinformatics, Centre for Addiction and Mental Health (CAMH), University of Toronto, Toronto, Canada
Madeline Stecy Translational Neuromodeling Unit, Institute for Biomedical Engineering, University of Zurich & ETH Zurich, Zurich, Switzerland.,Laboratory for Social and Neural Systems Research, Department of Economics, University of Zurich, Zurich, Switzerland.,Rutgers Robert Wood Johnson Medical School, New Brunswick, United States
Lars Kasper Translational Neuromodeling Unit, Institute for Biomedical Engineering, University of Zurich & ETH Zurich, Zurich, Switzerland.,Laboratory for Social and Neural Systems Research, Department of Economics, University of Zurich, Zurich, Switzerland.,Institute for Biomedical Engineering, MRI Technology Group, ETH Zürich & University of Zurich, Zurich, Switzerland
Christopher J Burke Laboratory for Social and Neural Systems Research, Department of Economics, University of Zurich, Zurich, Switzerland
Zoltan Nagy Laboratory for Social and Neural Systems Research, Department of Economics, University of Zurich, Zurich, Switzerland
Christoph Mathys Translational Neuromodeling Unit, Institute for Biomedical Engineering, University of Zurich & ETH Zurich, Zurich, Switzerland.,Interacting Minds Centre, Aarhus University, Aarhus, Denmark.,Scuola Internazionale Superiore di Studi Avanzati (SISSA), Trieste, Italy
Philippe N Tobler Laboratory for Social and Neural Systems Research, Department of Economics, University of Zurich, Zurich, Switzerland

Collapse

189

Deep Reinforcement Learning and Its Neuroscientific Implications. Neuron 2020;107:603-616. [DOI: 10.1016/j.neuron.2020.06.014] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Revised: 06/08/2020] [Accepted: 06/12/2020] [Indexed: 11/23/2022]

190

Dissociable Neural Systems Support the Learning and Transfer of Hierarchical Control Structure. J Neurosci 2020;40:6624-6637. [PMID: 32690614 DOI: 10.1523/jneurosci.0847-20.2020] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 05/15/2020] [Accepted: 07/08/2020] [Indexed: 11/21/2022] Open

Abstract

Humans can draw insight from previous experiences to quickly adapt to novel environments that share a common underlying structure. Here we combine functional imaging and computational modeling to identify the neural systems that support the discovery and transfer of hierarchical task structure. Human subjects (male and female) completed multiple blocks of a reinforcement learning task that contained a global hierarchical structure governing stimulus-response action mapping. First, behavioral and computational evidence showed that humans successfully discover and transfer the hierarchical rule structure embedded within the task. Next, analysis of fMRI BOLD data revealed activity across a frontoparietal network that was specifically associated with the discovery of this embedded structure. Finally, activity throughout a cingulo-opercular network supported the transfer and implementation of this discovered structure. Together, these results reveal a division of labor in which dissociable neural systems support the learning and transfer of abstract control structures.SIGNIFICANCE STATEMENT A fundamental and defining feature of human behavior is the ability to generalize knowledge from the past to support future action. Although the neural circuits underlying more direct forms of learning have been well established over the last century, we still lack a solid framework from which to investigate more abstract, higher-order human learning and knowledge generalization. We designed a novel behavioral paradigm to specifically isolate a learning process in which previous knowledge, rather than directly indicating the correct action, instead guides the search for the correct action. Moreover, we identify that this learning process is achieved via the coordinated and temporally specific activity of two prominent cognitive control brain networks.

Collapse

191

Self-organization of action hierarchy and compositionality by reinforcement learning with recurrent neural networks. Neural Netw 2020;129:149-162. [PMID: 32534378 DOI: 10.1016/j.neunet.2020.06.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Revised: 05/25/2020] [Accepted: 06/02/2020] [Indexed: 11/20/2022]

192

Smith R, Schwartenbeck P, Parr T, Friston KJ. An Active Inference Approach to Modeling Structure Learning: Concept Learning as an Example Case. Front Comput Neurosci 2020;14:41. [PMID: 32508611 PMCID: PMC7250191 DOI: 10.3389/fncom.2020.00041] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Accepted: 04/17/2020] [Indexed: 11/13/2022] Open

Abstract

Within computational neuroscience, the algorithmic and neural basis of structure learning remains poorly understood. Concept learning is one primary example, which requires both a type of internal model expansion process (adding novel hidden states that explain new observations), and a model reduction process (merging different states into one underlying cause and thus reducing model complexity via meta-learning). Although various algorithmic models of concept learning have been proposed within machine learning and cognitive science, many are limited to various degrees by an inability to generalize, the need for very large amounts of training data, and/or insufficiently established biological plausibility. Using concept learning as an example case, we introduce a novel approach for modeling structure learning-and specifically state-space expansion and reduction-within the active inference framework and its accompanying neural process theory. Our aim is to demonstrate its potential to facilitate a novel line of active inference research in this area. The approach we lay out is based on the idea that a generative model can be equipped with extra (hidden state or cause) "slots" that can be engaged when an agent learns about novel concepts. This can be combined with a Bayesian model reduction process, in which any concept learning-associated with these slots-can be reset in favor of a simpler model with higher model evidence. We use simulations to illustrate this model's ability to add new concepts to its state space (with relatively few observations) and increase the granularity of the concepts it currently possesses. We also simulate the predicted neural basis of these processes. We further show that it can accomplish a simple form of "one-shot" generalization to new stimuli. Although deliberately simple, these simulation results highlight ways in which active inference could offer useful resources in developing neurocomputational models of structure learning. They provide a template for how future active inference research could apply this approach to real-world structure learning problems and assess the added utility it may offer.

Collapse

193

Bartolo R, Saunders RC, Mitz AR, Averbeck BB. Dimensionality, information and learning in prefrontal cortex. PLoS Comput Biol 2020;16:e1007514. [PMID: 32330126 PMCID: PMC7202668 DOI: 10.1371/journal.pcbi.1007514] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 05/06/2020] [Accepted: 03/11/2020] [Indexed: 01/12/2023] Open

194

Bartolo R, Averbeck BB. Prefrontal Cortex Predicts State Switches during Reversal Learning. Neuron 2020;106:1044-1054.e4. [PMID: 32315603 DOI: 10.1016/j.neuron.2020.03.024] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Revised: 01/28/2020] [Accepted: 03/24/2020] [Indexed: 11/25/2022]

195

Huang Y, Yaple ZA, Yu R. Goal-oriented and habitual decisions: Neural signatures of model-based and model-free learning. Neuroimage 2020;215:116834. [PMID: 32283275 DOI: 10.1016/j.neuroimage.2020.116834] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 03/03/2020] [Accepted: 04/08/2020] [Indexed: 11/26/2022] Open

196

Hong C, Wei X, Wang J, Deng B, Yu H, Che Y. Training Spiking Neural Networks for Cognitive Tasks: A Versatile Framework Compatible With Various Temporal Codes. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020;31:1285-1296. [PMID: 31247574 DOI: 10.1109/tnnls.2019.2919662] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

197

Ergo K, De Loof E, Verguts T. Reward Prediction Error and Declarative Memory. Trends Cogn Sci 2020;24:388-397. [PMID: 32298624 DOI: 10.1016/j.tics.2020.02.009] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Revised: 02/03/2020] [Accepted: 02/22/2020] [Indexed: 01/04/2023]

198

Masse NY, Rosen MC, Freedman DJ. Reevaluating the Role of Persistent Neural Activity in Short-Term Memory. Trends Cogn Sci 2020;24:242-258. [PMID: 32007384 PMCID: PMC7288241 DOI: 10.1016/j.tics.2019.12.014] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 12/19/2019] [Accepted: 12/23/2019] [Indexed: 12/18/2022]

199

Bulley A, Schacter DL. Deliberating trade-offs with the future. Nat Hum Behav 2020;4:238-247. [PMID: 32184495 PMCID: PMC7147875 DOI: 10.1038/s41562-020-0834-9] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 02/05/2020] [Indexed: 12/12/2022]

200

A distributional code for value in dopamine-based reinforcement learning. Nature 2020;577:671-675. [PMID: 31942076 DOI: 10.1038/s41586-019-1924-6] [Citation(s) in RCA: 174] [Impact Index Per Article: 43.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Accepted: 11/19/2019] [Indexed: 12/12/2022]