1
|
Scholz V, Waltmann M, Herzog N, Reiter A, Horstmann A, Deserno L. Cortical Grey Matter Mediates Increases in Model-Based Control and Learning from Positive Feedback from Adolescence to Adulthood. J Neurosci 2023; 43:2178-2189. [PMID: 36823039 PMCID: PMC10039741 DOI: 10.1523/jneurosci.1418-22.2023] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 12/20/2022] [Accepted: 01/13/2023] [Indexed: 02/25/2023] Open
Abstract
Cognition and brain structure undergo significant maturation from adolescence into adulthood. Model-based (MB) control is known to increase across development, which is mediated by cognitive abilities. Here, we asked two questions unaddressed in previous developmental studies. First, what are the brain structural correlates of age-related increases in MB control? Second, how are age-related increases in MB control from adolescence to adulthood influenced by motivational context? A human developmental sample (n = 103; age, 12-50, male/female, 55:48) completed structural MRI and an established task to capture MB control. The task was modified with respect to outcome valence by including (1) reward and punishment blocks to manipulate the motivational context and (2) an additional choice test to assess learning from positive versus negative feedback. After replicating that an age-dependent increase in MB control is mediated by cognitive abilities, we demonstrate first-time evidence that gray matter density (GMD) in the parietal cortex mediates the increase of MB control with age. Although motivational context did not relate to age-related changes in MB control, learning from positive feedback improved with age. Meanwhile, negative feedback learning showed no age effects. We present a first report that an age-related increase in positive feedback learning was mediated by reduced GMD in the parietal, medial, and dorsolateral prefrontal cortex. Our findings indicate that brain maturation, putatively reflected in lower GMD, in distinct and partially overlapping brain regions could lead to a more efficient brain organization and might thus be a key developmental step toward age-related increases in planning and value-based choice.SIGNIFICANCE STATEMENT Changes in model-based decision-making are paralleled by extensive maturation in cognition and brain structure across development. Still, to date the neuroanatomical underpinnings of these changes remain unclear. Here, we demonstrate for the first time that parietal GMD mediates age-dependent increases in model-based control. Age-related increases in positive feedback learning were mediated by reduced GMD in the parietal, medial, and dorsolateral prefrontal cortex. A manipulation of motivational context did not have an impact on age-related changes in model-based control. These findings highlight that brain maturation in distinct and overlapping cortical regions constitutes a key developmental step toward improved value-based choices.
Collapse
Affiliation(s)
- Vanessa Scholz
- Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, Centre of Mental Health, University of Würzburg, 97080 Würzburg, Germany
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6525 GD Nijmegen, The Netherlands
| | - Maria Waltmann
- Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, Centre of Mental Health, University of Würzburg, 97080 Würzburg, Germany
- Max Planck Institute for Cognition and Neuroscience, D-04103 Leipzig, Germany
| | - Nadine Herzog
- Max Planck Institute for Cognition and Neuroscience, D-04103 Leipzig, Germany
- Integrated Research and Treatment Center AdiposityDiseases, Leipzig University Medical Center, 04103 Leipzig, Germany
| | - Andrea Reiter
- Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, Centre of Mental Health, University of Würzburg, 97080 Würzburg, Germany
- Collaborative Research Center-940 Volition and Cognitive Control, Faculty of Psychology, Technical University Dresden, 01069 Dresden, Germany
| | - Annette Horstmann
- Max Planck Institute for Cognition and Neuroscience, D-04103 Leipzig, Germany
- Integrated Research and Treatment Center AdiposityDiseases, Leipzig University Medical Center, 04103 Leipzig, Germany
- Department of Psychology and Logopedics, Faculty of Medicine, University of Helsinki, 00014 Helsinki, Finland
| | - Lorenz Deserno
- Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, Centre of Mental Health, University of Würzburg, 97080 Würzburg, Germany
- Max Planck Institute for Cognition and Neuroscience, D-04103 Leipzig, Germany
- Integrated Research and Treatment Center AdiposityDiseases, Leipzig University Medical Center, 04103 Leipzig, Germany
- Department of Psychiatry and Psychotherapy, University Hospital Carl Gustav Carus, Technical University Dresden, 01069 Dresden, Germany
| |
Collapse
|
2
|
Model-based learning retrospectively updates model-free values. Sci Rep 2022; 12:2358. [PMID: 35149713 PMCID: PMC8837618 DOI: 10.1038/s41598-022-05567-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 12/16/2021] [Indexed: 12/02/2022] Open
Abstract
Reinforcement learning (RL) is widely regarded as divisible into two distinct computational strategies. Model-free learning is a simple RL process in which a value is associated with actions, whereas model-based learning relies on the formation of internal models of the environment to maximise reward. Recently, theoretical and animal work has suggested that such models might be used to train model-free behaviour, reducing the burden of costly forward planning. Here we devised a way to probe this possibility in human behaviour. We adapted a two-stage decision task and found evidence that model-based processes at the time of learning can alter model-free valuation in healthy individuals. We asked people to rate subjective value of an irrelevant feature that was seen at the time a model-based decision would have been made. These irrelevant feature value ratings were updated by rewards, but in a way that accounted for whether the selected action retrospectively ought to have been taken. This model-based influence on model-free value ratings was best accounted for by a reward prediction error that was calculated relative to the decision path that would most likely have led to the reward. This effect occurred independently of attention and was not present when participants were not explicitly told about the structure of the environment. These findings suggest that current conceptions of model-based and model-free learning require updating in favour of a more integrated approach. Our task provides an empirical handle for further study of the dialogue between these two learning systems in the future.
Collapse
|
3
|
Optimism and pessimism in optimised replay. PLoS Comput Biol 2022; 18:e1009634. [PMID: 35020718 PMCID: PMC8809607 DOI: 10.1371/journal.pcbi.1009634] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 02/02/2022] [Accepted: 11/12/2021] [Indexed: 11/24/2022] Open
Abstract
The replay of task-relevant trajectories is known to contribute to memory consolidation and improved task performance. A wide variety of experimental data show that the content of replayed sequences is highly specific and can be modulated by reward as well as other prominent task variables. However, the rules governing the choice of sequences to be replayed still remain poorly understood. One recent theoretical suggestion is that the prioritization of replay experiences in decision-making problems is based on their effect on the choice of action. We show that this implies that subjects should replay sub-optimal actions that they dysfunctionally choose rather than optimal ones, when, by being forgetful, they experience large amounts of uncertainty in their internal models of the world. We use this to account for recent experimental data demonstrating exactly pessimal replay, fitting model parameters to the individual subjects’ choices. When animals are asleep or restfully awake, populations of neurons in their brains recapitulate activity associated with extended behaviourally-relevant experiences. This process is called replay, and it has been established for a long time in rodents, and very recently in humans, to be important for good performance in decision-making tasks. The specific experiences which are replayed during those epochs follow highly ordered patterns, but the mechanisms which establish their priority are still not fully understood. One promising theoretical suggestion is that each replay experience is chosen in such a way that the learning that ensues is most helpful for the subsequent performance of the animal. A very recent study reported a surprising result that humans who achieved high performance in a planning task tended to replay actions they found to be sub-optimal, and that this was associated with a useful deprecation of those actions in subsequent performance. In this study, we examine the nature of this pessimized form of replay and show that it is exactly appropriate for forgetful agents. We analyse the role of forgetting for replay choices of our model, and verify our predictions using human subject data.
Collapse
|
4
|
Collins AGE, Shenhav A. Advances in modeling learning and decision-making in neuroscience. Neuropsychopharmacology 2022; 47:104-118. [PMID: 34453117 PMCID: PMC8617262 DOI: 10.1038/s41386-021-01126-y] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Revised: 07/14/2021] [Accepted: 07/22/2021] [Indexed: 02/07/2023]
Abstract
An organism's survival depends on its ability to learn about its environment and to make adaptive decisions in the service of achieving the best possible outcomes in that environment. To study the neural circuits that support these functions, researchers have increasingly relied on models that formalize the computations required to carry them out. Here, we review the recent history of computational modeling of learning and decision-making, and how these models have been used to advance understanding of prefrontal cortex function. We discuss how such models have advanced from their origins in basic algorithms of updating and action selection to increasingly account for complexities in the cognitive processes required for learning and decision-making, and the representations over which they operate. We further discuss how a deeper understanding of the real-world complexities in these computations has shed light on the fundamental constraints on optimal behavior, and on the complex interactions between corticostriatal pathways to determine such behavior. The continuing and rapid development of these models holds great promise for understanding the mechanisms by which animals adapt to their environments, and what leads to maladaptive forms of learning and decision-making within clinical populations.
Collapse
Affiliation(s)
- Anne G E Collins
- Department of Psychology and Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, USA.
| | - Amitai Shenhav
- Department of Cognitive, Linguistic, & Psychological Sciences and Carney Institute for Brain Science, Brown University, Providence, RI, USA.
| |
Collapse
|
5
|
Deserno L, Moran R, Michely J, Lee Y, Dayan P, Dolan RJ. Dopamine enhances model-free credit assignment through boosting of retrospective model-based inference. eLife 2021; 10:e67778. [PMID: 34882092 PMCID: PMC8758138 DOI: 10.7554/elife.67778] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 12/08/2021] [Indexed: 11/13/2022] Open
Abstract
Dopamine is implicated in representing model-free (MF) reward prediction errors a as well as influencing model-based (MB) credit assignment and choice. Putative cooperative interactions between MB and MF systems include a guidance of MF credit assignment by MB inference. Here, we used a double-blind, placebo-controlled, within-subjects design to test an hypothesis that enhancing dopamine levels boosts the guidance of MF credit assignment by MB inference. In line with this, we found that levodopa enhanced guidance of MF credit assignment by MB inference, without impacting MF and MB influences directly. This drug effect correlated negatively with a dopamine-dependent change in purely MB credit assignment, possibly reflecting a trade-off between these two MB components of behavioural control. Our findings of a dopamine boost in MB inference guidance of MF learning highlight a novel DA influence on MB-MF cooperative interactions.
Collapse
Affiliation(s)
- Lorenz Deserno
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College LondonLondonUnited Kingdom
- The Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College LondonLondonUnited Kingdom
- Department of Child and Adolescent Psychiatry, Psychotherapy and Psychosomatics, University of WürzburgWürzburgGermany
- Department of Psychiatry and Psychotherapy, Technische Universität DresdenDresdenGermany
| | - Rani Moran
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College LondonLondonUnited Kingdom
- The Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College LondonLondonUnited Kingdom
| | - Jochen Michely
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College LondonLondonUnited Kingdom
- The Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College LondonLondonUnited Kingdom
- Department of Psychiatry and Psychotherapy, Charité Universitätsmedizin BerlinBerlinGermany
| | - Ying Lee
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College LondonLondonUnited Kingdom
- The Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College LondonLondonUnited Kingdom
- Department of Psychiatry and Psychotherapy, Technische Universität DresdenDresdenGermany
| | - Peter Dayan
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College LondonLondonUnited Kingdom
- Max Planck Institute for Biological CyberneticsTübingenGermany
- University of TübingenTübingenGermany
| | - Raymond J Dolan
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College LondonLondonUnited Kingdom
- The Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College LondonLondonUnited Kingdom
| |
Collapse
|
6
|
Shahar N, Hauser TU, Moran R, Moutoussis M, Bullmore ET, Dolan RJ. Assigning the right credit to the wrong action: compulsivity in the general population is associated with augmented outcome-irrelevant value-based learning. Transl Psychiatry 2021; 11:564. [PMID: 34741013 PMCID: PMC8571313 DOI: 10.1038/s41398-021-01642-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 09/01/2021] [Accepted: 09/21/2021] [Indexed: 11/08/2022] Open
Abstract
Compulsive behavior is enacted under a belief that a specific act controls the likelihood of an undesired future event. Compulsive behaviors are widespread in the general population despite having no causal relationship with events they aspire to influence. In the current study, we tested whether there is an increased tendency to assign value to aspects of a task that do not predict an outcome (i.e., outcome-irrelevant learning) among individuals with compulsive tendencies. We studied 514 healthy individuals who completed self-report compulsivity, anxiety, depression, and schizotypal measurements, and a well-established reinforcement-learning task (i.e., the two-step task). As expected, we found a positive relationship between compulsivity and outcome-irrelevant learning. Specifically, individuals who reported having stronger compulsive tendencies (e.g., washing, checking, grooming) also tended to assign value to response keys and stimuli locations that did not predict an outcome. Controlling for overall goal-directed abilities and the co-occurrence of anxious, depressive, or schizotypal tendencies did not impact these associations. These findings indicate that outcome-irrelevant learning processes may contribute to the expression of compulsivity in a general population setting. We highlight the need for future research on the formation of non-veridical action-outcome associations as a factor related to the occurrence and maintenance of compulsive behavior.
Collapse
Affiliation(s)
- Nitzan Shahar
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, London, WC1B 5EH, UK.
- Wellcome Centre for Human Neuroimaging, University College London, London, WC1N 3BG, UK.
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel.
- Psychology Department, Tel Aviv University, Tel Aviv, Israel.
| | - Tobias U Hauser
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, London, WC1B 5EH, UK
- Wellcome Centre for Human Neuroimaging, University College London, London, WC1N 3BG, UK
| | - Rani Moran
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, London, WC1B 5EH, UK
- Wellcome Centre for Human Neuroimaging, University College London, London, WC1N 3BG, UK
| | - Michael Moutoussis
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, London, WC1B 5EH, UK
- Wellcome Centre for Human Neuroimaging, University College London, London, WC1N 3BG, UK
| | | | - Raymond J Dolan
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, London, WC1B 5EH, UK
- Wellcome Centre for Human Neuroimaging, University College London, London, WC1N 3BG, UK
| |
Collapse
|
7
|
Na S, Chung D, Hula A, Perl O, Jung J, Heflin M, Blackmore S, Fiore VG, Dayan P, Gu X. Humans use forward thinking to exploit social controllability. eLife 2021; 10:64983. [PMID: 34711304 PMCID: PMC8555988 DOI: 10.7554/elife.64983] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Accepted: 09/30/2021] [Indexed: 12/27/2022] Open
Abstract
The controllability of our social environment has a profound impact on our behavior and mental health. Nevertheless, neurocomputational mechanisms underlying social controllability remain elusive. Here, 48 participants performed a task where their current choices either did (Controllable), or did not (Uncontrollable), influence partners’ future proposals. Computational modeling revealed that people engaged a mental model of forward thinking (FT; i.e., calculating the downstream effects of current actions) to estimate social controllability in both Controllable and Uncontrollable conditions. A large-scale online replication study (n=1342) supported this finding. Using functional magnetic resonance imaging (n=48), we further demonstrated that the ventromedial prefrontal cortex (vmPFC) computed the projected total values of current actions during forward planning, supporting the neural realization of the forward-thinking model. These findings demonstrate that humans use vmPFC-dependent FT to estimate and exploit social controllability, expanding the role of this neurocomputational mechanism beyond spatial and cognitive contexts.
Collapse
Affiliation(s)
- Soojung Na
- The Graduate School of Biomedical Sciences, Icahn School of Medicine at Mount Sinai, New York, United States.,Nash Family Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, United States.,Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Dongil Chung
- Department of Biomedical Engineering, Ulsan National Institute of Science and Technology, Ulsan, Republic of Korea
| | - Andreas Hula
- Austrian Institute of Technology, Seibersdorf, Austria
| | - Ofer Perl
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Jennifer Jung
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, United States
| | - Matthew Heflin
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Sylvia Blackmore
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, United States.,Queen Square Institute of Neurology, University College London, London, United Kingdom
| | - Vincenzo G Fiore
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Peter Dayan
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany.,University of Tübingen, Tübingen, Germany
| | - Xiaosi Gu
- Nash Family Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, United States.,Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, United States
| |
Collapse
|
8
|
Yu LQ, Wilson RC, Nassar MR. Adaptive learning is structure learning in time. Neurosci Biobehav Rev 2021; 128:270-281. [PMID: 34144114 PMCID: PMC8422504 DOI: 10.1016/j.neubiorev.2021.06.024] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 04/19/2021] [Accepted: 06/11/2021] [Indexed: 10/21/2022]
Abstract
People use information flexibly. They often combine multiple sources of relevant information over time in order to inform decisions with little or no interference from intervening irrelevant sources. They adjust the degree to which they use new information over time rationally in accordance with environmental statistics and their own uncertainty. They can even use information gained in one situation to solve a problem in a very different one. Learning flexibly rests on the ability to infer the context at a given time, and therefore knowing which pieces of information to combine and which to separate. We review the psychological and neural mechanisms behind adaptive learning and structure learning to outline how people pool together relevant information, demarcate contexts, prevent interference between information collected in different contexts, and transfer information from one context to another. By examining all of these processes through the lens of optimal inference we bridge concepts from multiple fields to provide a unified multi-system view of how the brain exploits structure in time to optimize learning.
Collapse
Affiliation(s)
- Linda Q Yu
- Carney Institute for Brain Sciences, Brown University, 164 Angell Street, Providence, RI, 02912, USA.
| | - Robert C Wilson
- Department of Psychology, University of Arizona, Tucson, AZ, 85721, USA
| | - Matthew R Nassar
- Carney Institute for Brain Sciences, Brown University, 164 Angell Street, Providence, RI, 02912, USA
| |
Collapse
|
9
|
Moran R, Dayan P, Dolan RJ. Efficiency and prioritization of inference-based credit assignment. Curr Biol 2021; 31:2747-2756.e6. [PMID: 33887181 PMCID: PMC8279739 DOI: 10.1016/j.cub.2021.03.091] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 02/11/2021] [Accepted: 03/29/2021] [Indexed: 11/16/2022]
Abstract
Organisms adapt to their environments by learning to approach states that predict rewards and avoid states associated with punishments. Knowledge about the affective value of states often relies on credit assignment (CA), whereby state values are updated on the basis of reward feedback. Remarkably, humans assign credit to states that are not observed but are instead inferred based on a cognitive map that represents structural knowledge of an environment. A pertinent example is authors attempting to infer the identity of anonymous reviewers to assign them credit or blame and, on this basis, inform future referee recommendations. Although inference is cognitively costly, it is unknown how it influences CA or how it is apportioned between hidden and observable states (for example, both anonymous and revealed reviewers). We addressed these questions in a task that provided choices between lotteries where each led to a unique pair of occasionally rewarding outcome states. On some trials, both states were observable (rendering inference nugatory), whereas on others, the identity of one of the states was concealed. Importantly, by exploiting knowledge of choice-state associations, subjects could infer the identity of this hidden state. We show that having to perform inference reduces state-value updates. Strikingly, and in violation of normative theories, this reduction in CA was selective for the observed outcome alone. These findings have implications for the operation of putative cognitive maps.
Collapse
Affiliation(s)
- Rani Moran
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, 10-12 Russell Square, London WC1B 5EH, UK; Wellcome Centre for Human Neuroimaging, University College London, London WC1N 3BG, UK.
| | - Peter Dayan
- Max Planck Institute for Biological Cybernetics, Max Planck-Ring 8, 72076 Tübingen, Germany; University of Tübingen, 72074 Tübingen, Germany
| | - Raymond J Dolan
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, 10-12 Russell Square, London WC1B 5EH, UK; Wellcome Centre for Human Neuroimaging, University College London, London WC1N 3BG, UK
| |
Collapse
|
10
|
Xia L, Collins AGE. Temporal and state abstractions for efficient learning, transfer, and composition in humans. Psychol Rev 2021; 128:643-666. [PMID: 34014709 PMCID: PMC8485577 DOI: 10.1037/rev0000295] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Humans use prior knowledge to efficiently solve novel tasks, but how they structure past knowledge during learning to enable such fast generalization is not well understood. We recently proposed that hierarchical state abstraction enabled generalization of simple one-step rules, by inferring context clusters for each rule. However, humans' daily tasks are often temporally extended, and necessitate more complex multi-step, hierarchically structured strategies. The options framework in hierarchical reinforcement learning provides a theoretical framework for representing such transferable strategies. Options are abstract multi-step policies, assembled from simpler one-step actions or other options, that can represent meaningful reusable strategies as temporal abstractions. We developed a novel sequential decision-making protocol to test if humans learn and transfer multi-step options. In a series of four experiments, we found transfer effects at multiple hierarchical levels of abstraction that could not be explained by flat reinforcement learning models or hierarchical models lacking temporal abstractions. We extended the options framework to develop a quantitative model that blends temporal and state abstractions. Our model captures the transfer effects observed in human participants. Our results provide evidence that humans create and compose hierarchical options, and use them to explore in novel contexts, consequently transferring past knowledge and speeding up learning. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Collapse
Affiliation(s)
- Liyu Xia
- Department of Mathematics, University of California, Berkeley
| | - Anne G E Collins
- Department of Psychology, Helen Wills Neuroscience Institute, University of California, Berkeley
| |
Collapse
|
11
|
Wood AN. New roles for dopamine in motor skill acquisition: lessons from primates, rodents, and songbirds. J Neurophysiol 2021; 125:2361-2374. [PMID: 33978497 DOI: 10.1152/jn.00648.2020] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Motor learning is a core aspect of human life and appears to be ubiquitous throughout the animal kingdom. Dopamine, a neuromodulator with a multifaceted role in synaptic plasticity, may be a key signaling molecule for motor skill learning. Though typically studied in the context of reward-based associative learning, dopamine appears to be necessary for some types of motor learning. Mesencephalic dopamine structures are highly conserved among vertebrates, as are some of their primary targets within the basal ganglia, a subcortical circuit important for motor learning and motor control. With a focus on the benefits of cross-species comparisons, this review examines how "model-free" and "model-based" computational frameworks for understanding dopamine's role in associative learning may be applied to motor learning. The hypotheses that dopamine could drive motor learning either by functioning as a reward prediction error, through passive facilitating of normal basal ganglia activity, or through other mechanisms are examined in light of new studies using humans, rodents, and songbirds. Additionally, new paradigms that could enhance our understanding of dopamine's role in motor learning by bridging the gap between the theoretical literature on motor learning in humans and other species are discussed.
Collapse
Affiliation(s)
- A N Wood
- Department of Biology and Graduate Program in Neuroscience, Emory University, Atlanta, Georgia
| |
Collapse
|
12
|
Abstract
Credit assignment (CA) to relevant actions poses a challenge because one is often flooded with reward feedback that is not easily causally attributed. We addressed this issue in a reinforcement learning framework wherein choice is mutually controlled by value-caching model-free (MF) and prospective, planning model-based (MB) systems. We find knowledge, stored in a cognitive map, filters exuberant reward feedback to guide CA in both systems but based on different attribute dimensions. In MF, CA is boosted for outcomes that are relevant (causally related) to one’s choice, whereas in MB, CA is enhanced for outcomes that attract greater attention during the deliberation process that preceded a choice. We consider normative and mechanistic accounts, including how these processes are instrumental to adaptation. An influential reinforcement learning framework proposes that behavior is jointly governed by model-free (MF) and model-based (MB) controllers. The former learns the values of actions directly from past encounters, and the latter exploits a cognitive map of the task to calculate these prospectively. Considerable attention has been paid to how these systems interact during choice, but how and whether knowledge of a cognitive map contributes to the way MF and MB controllers assign credit (i.e., to how they revaluate actions and states following the receipt of an outcome) remains underexplored. Here, we examine such sophisticated credit assignment using a dual-outcome bandit task. We provide evidence that knowledge of a cognitive map influences credit assignment in both MF and MB systems, mediating subtly different aspects of apparent relevance. Specifically, we show MF credit assignment is enhanced for those rewards that are related to a choice, and this contrasted with choice-unrelated rewards that reinforced subsequent choices negatively. This modulation is only possible based on knowledge of task structure. On the other hand, MB credit assignment was boosted for outcomes that impacted on differences in values between offered bandits. We consider mechanistic accounts and the normative status of these findings. We suggest the findings extend the scope and sophistication of cognitive map-based credit assignment during reinforcement learning, with implications for understanding behavioral control.
Collapse
|
13
|
Moran R, Keramati M, Dolan RJ. Model based planners reflect on their model-free propensities. PLoS Comput Biol 2021; 17:e1008552. [PMID: 33411724 PMCID: PMC7817042 DOI: 10.1371/journal.pcbi.1008552] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Revised: 01/20/2021] [Accepted: 11/23/2020] [Indexed: 12/19/2022] Open
Abstract
Dual-reinforcement learning theory proposes behaviour is under the tutelage of a retrospective, value-caching, model-free (MF) system and a prospective-planning, model-based (MB), system. This architecture raises a question as to the degree to which, when devising a plan, a MB controller takes account of influences from its MF counterpart. We present evidence that such a sophisticated self-reflective MB planner incorporates an anticipation of the influences its own MF-proclivities exerts on the execution of its planned future actions. Using a novel bandit task, wherein subjects were periodically allowed to design their environment, we show that reward-assignments were constructed in a manner consistent with a MB system taking account of its MF propensities. Thus, in the task participants assigned higher rewards to bandits that were momentarily associated with stronger MF tendencies. Our findings have implications for a range of decision making domains that includes drug abuse, pre-commitment, and the tension between short and long-term decision horizons in economics.
Collapse
Affiliation(s)
- Rani Moran
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom
- Wellcome Centre for Human Neuroimaging, University College London, London, United Kingdom
| | - Mehdi Keramati
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom
- Wellcome Centre for Human Neuroimaging, University College London, London, United Kingdom
- Department of Psychology, City, University of London, London, United Kingdom
| | - Raymond J. Dolan
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom
- Wellcome Centre for Human Neuroimaging, University College London, London, United Kingdom
| |
Collapse
|
14
|
Steinke A, Lange F, Kopp B. Parallel model-based and model-free reinforcement learning for card sorting performance. Sci Rep 2020; 10:15464. [PMID: 32963297 PMCID: PMC7508815 DOI: 10.1038/s41598-020-72407-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Accepted: 08/31/2020] [Indexed: 12/13/2022] Open
Abstract
The Wisconsin Card Sorting Test (WCST) is considered a gold standard for the assessment of cognitive flexibility. On the WCST, repeating a sorting category following negative feedback is typically treated as indicating reduced cognitive flexibility. Therefore such responses are referred to as 'perseveration' errors. Recent research suggests that the propensity for perseveration errors is modulated by response demands: They occur less frequently when their commitment repeats the previously executed response. Here, we propose parallel reinforcement-learning models of card sorting performance, which assume that card sorting performance can be conceptualized as resulting from model-free reinforcement learning at the level of responses that occurs in parallel with model-based reinforcement learning at the categorical level. We compared parallel reinforcement-learning models with purely model-based reinforcement learning, and with the state-of-the-art attentional-updating model. We analyzed data from 375 participants who completed a computerized WCST. Parallel reinforcement-learning models showed best predictive accuracies for the majority of participants. Only parallel reinforcement-learning models accounted for the modulation of perseveration propensity by response demands. In conclusion, parallel reinforcement-learning models provide a new theoretical perspective on card sorting and it offers a suitable framework for discerning individual differences in latent processes that subserve behavioral flexibility.
Collapse
Affiliation(s)
- Alexander Steinke
- Department of Neurology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625, Hannover, Germany.
| | - Florian Lange
- Behavioral Engineering Research Group, KU Leuven, Naamsestraat 69, 3000, Leuven, Belgium
| | - Bruno Kopp
- Department of Neurology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625, Hannover, Germany
| |
Collapse
|
15
|
Collins AGE, Cockburn J. Beyond dichotomies in reinforcement learning. Nat Rev Neurosci 2020; 21:576-586. [PMID: 32873936 DOI: 10.1038/s41583-020-0355-6] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/20/2020] [Indexed: 11/09/2022]
Abstract
Reinforcement learning (RL) is a framework of particular importance to psychology, neuroscience and machine learning. Interactions between these fields, as promoted through the common hub of RL, has facilitated paradigm shifts that relate multiple levels of analysis in a singular framework (for example, relating dopamine function to a computationally defined RL signal). Recently, more sophisticated RL algorithms have been proposed to better account for human learning, and in particular its oft-documented reliance on two separable systems: a model-based (MB) system and a model-free (MF) system. However, along with many benefits, this dichotomous lens can distort questions, and may contribute to an unnecessarily narrow perspective on learning and decision-making. Here, we outline some of the consequences that come from overconfidently mapping algorithms, such as MB versus MF RL, with putative cognitive processes. We argue that the field is well positioned to move beyond simplistic dichotomies, and we propose a means of refocusing research questions towards the rich and complex components that comprise learning and decision-making.
Collapse
Affiliation(s)
- Anne G E Collins
- Department of Psychology and the Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, USA.
| | - Jeffrey Cockburn
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA
| |
Collapse
|
16
|
FitzGerald THB, Penny WD, Bonnici HM, Adams RA. Retrospective Inference as a Form of Bounded Rationality, and Its Beneficial Influence on Learning. Front Artif Intell 2020; 3:2. [PMID: 33733122 PMCID: PMC7861256 DOI: 10.3389/frai.2020.00002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Accepted: 01/14/2020] [Indexed: 12/22/2022] Open
Abstract
Probabilistic models of cognition typically assume that agents make inferences about current states by combining new sensory information with fixed beliefs about the past, an approach known as Bayesian filtering. This is computationally parsimonious, but, in general, leads to suboptimal beliefs about past states, since it ignores the fact that new observations typically contain information about the past as well as the present. This is disadvantageous both because knowledge of past states may be intrinsically valuable, and because it impairs learning about fixed or slowly changing parameters of the environment. For these reasons, in offline data analysis it is usual to infer on every set of states using the entire time series of observations, an approach known as (fixed-interval) Bayesian smoothing. Unfortunately, however, this is impractical for real agents, since it requires the maintenance and updating of beliefs about an ever-growing set of states. We propose an intermediate approach, finite retrospective inference (FRI), in which agents perform update beliefs about a limited number of past states (Formally, this represents online fixed-lag smoothing with a sliding window). This can be seen as a form of bounded rationality in which agents seek to optimize the accuracy of their beliefs subject to computational and other resource costs. We show through simulation that this approach has the capacity to significantly increase the accuracy of both inference and learning, using a simple variational scheme applied to both randomly generated Hidden Markov models (HMMs), and a specific application of the HMM, in the form of the widely used probabilistic reversal task. Our proposal thus constitutes a theoretical contribution to normative accounts of bounded rationality, which makes testable empirical predictions that can be explored in future work.
Collapse
Affiliation(s)
- Thomas H B FitzGerald
- School of Psychology, University of East Anglia, Norwich, United Kingdom.,The Wellcome Trust Centre for Neuroimaging, University College London, London, United Kingdom.,Max Planck-UCL Centre for Computational Psychiatry and Ageing Research, London, United Kingdom
| | - Will D Penny
- School of Psychology, University of East Anglia, Norwich, United Kingdom.,The Wellcome Trust Centre for Neuroimaging, University College London, London, United Kingdom
| | - Heidi M Bonnici
- School of Psychology, University of East Anglia, Norwich, United Kingdom
| | - Rick A Adams
- The Wellcome Trust Centre for Neuroimaging, University College London, London, United Kingdom.,Max Planck-UCL Centre for Computational Psychiatry and Ageing Research, London, United Kingdom.,Department of Computer Science, University College London, London, United Kingdom
| |
Collapse
|
17
|
Kopp B, Steinke A, Bertram M, Skripuletz T, Lange F. Multiple Levels of Control Processes for Wisconsin Card Sorts: An Observational Study. Brain Sci 2019; 9:brainsci9060141. [PMID: 31213007 PMCID: PMC6627185 DOI: 10.3390/brainsci9060141] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 06/12/2019] [Accepted: 06/15/2019] [Indexed: 11/16/2022] Open
Abstract
We explored short-term behavioral plasticity on the Modified Wisconsin Card Sorting Test (M-WCST) by deriving novel error metrics by stratifying traditional set loss and perseverative errors. Separating the rule set and the response set allowed for the measurement of performance across four trial types, crossing rule set (i.e., maintain vs. switch) and response demand (i.e., repeat vs. alternate). Critically, these four trial types can be grouped based on trial-wise feedback on t − 1 trials. Rewarded (correct) maintain t − 1 trials should lead to error enhancement when the response demands shift from repeat to alternate. In contrast, punished (incorrect) t − 1 trials should lead to error suppression when the response demands shift from repeat to alternate. The results supported the error suppression prediction: An error suppression effect (ESE) was observed across numerous patient samples. Exploratory analyses show that the ESE did not share substantial portions of variance with traditional neuropsychological measures of executive functioning. They further point into the direction that striatal or limbic circuit neuropathology may be associated with enhanced ESE. These data suggest that punishment of the recently executed response induces behavioral avoidance, which is detectable as the ESE on the WCST. The assessment of the ESE might provide an index of response-related avoidance learning on the WCST.
Collapse
Affiliation(s)
- Bruno Kopp
- Department of Neurology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625 Hannover, Germany.
| | - Alexander Steinke
- Department of Neurology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625 Hannover, Germany.
| | - Malte Bertram
- Department of Neurology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625 Hannover, Germany.
| | - Thomas Skripuletz
- Department of Neurology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625 Hannover, Germany.
| | - Florian Lange
- Department of Neurology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625 Hannover, Germany.
- Behavioral Engineering Research Group, KU Leuven, Naamsestraat 69, 3000 Leuven, Belgium.
| |
Collapse
|
18
|
Radulescu A, Niv Y. State representation in mental illness. Curr Opin Neurobiol 2019; 55:160-166. [PMID: 31051434 DOI: 10.1016/j.conb.2019.03.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Revised: 03/10/2019] [Accepted: 03/25/2019] [Indexed: 10/26/2022]
Abstract
Reinforcement learning theory provides a powerful set of computational ideas for modeling human learning and decision making. Reinforcement learning algorithms rely on state representations that enable efficient behavior by focusing only on aspects relevant to the task at hand. Forming such representations often requires selective attention to the sensory environment, and recalling memories of relevant past experiences. A striking range of psychiatric disorders, including bipolar disorder and schizophrenia, involve changes in these cognitive processes. We review and discuss evidence that these changes can be cast as altered state representation, with the goal of providing a useful transdiagnostic dimension along which mental disorders can be understood and compared.
Collapse
Affiliation(s)
| | - Yael Niv
- Psychology Department, Princeton University, United States; Princeton Neuroscience Institute, Princeton University, United States
| |
Collapse
|