1
|
Katabi G, Shahar N. Exploring the steps of learning: computational modeling of initiatory-actions among individuals with attention-deficit/hyperactivity disorder. Transl Psychiatry 2024; 14:10. [PMID: 38191535 PMCID: PMC10774270 DOI: 10.1038/s41398-023-02717-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Revised: 12/13/2023] [Accepted: 12/14/2023] [Indexed: 01/10/2024] Open
Abstract
Attention-deficit/hyperactivity disorder (ADHD) is characterized by difficulty in acting in a goal-directed manner. While most environments require a sequence of actions for goal attainment, ADHD was never studied in the context of value-based sequence learning. Here, we made use of current advancements in hierarchical reinforcement-learning algorithms to track the internal value and choice policy of individuals with ADHD performing a three-stage sequence learning task. Specifically, 54 participants (28 ADHD, 26 controls) completed a value-based reinforcement-learning task that allowed us to estimate internal action values for each trial and stage using computational modeling. We found attenuated sensitivity to action values in ADHD compared to controls, both in choice and reaction-time variability estimates. Remarkably, this was found only for first-stage actions (i.e., initiatory actions), while for actions performed just before outcome delivery the two groups were strikingly indistinguishable. These results suggest a difficulty in following value estimation for initiatory actions in ADHD.
Collapse
Affiliation(s)
- Gili Katabi
- School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel.
| | - Nitzan Shahar
- School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
2
|
Wise T, Charpentier CJ, Dayan P, Mobbs D. Interactive cognitive maps support flexible behavior under threat. Cell Rep 2023; 42:113008. [PMID: 37610871 PMCID: PMC10658881 DOI: 10.1016/j.celrep.2023.113008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 07/11/2023] [Accepted: 08/03/2023] [Indexed: 08/25/2023] Open
Abstract
In social environments, survival can depend upon inferring and adapting to other agents' goal-directed behavior. However, it remains unclear how humans achieve this, despite the fact that many decisions must account for complex, dynamic agents acting according to their own goals. Here, we use a predator-prey task (total n = 510) to demonstrate that humans exploit an interactive cognitive map of the social environment to infer other agents' preferences and simulate their future behavior, providing for flexible, generalizable responses. A model-based inverse reinforcement learning model explained participants' inferences about threatening agents' preferences, with participants using this inferred knowledge to enact generalizable, model-based behavioral responses. Using tree-search planning models, we then found that behavior was best explained by a planning algorithm that incorporated simulations of the threat's goal-directed behavior. Our results indicate that humans use a cognitive map to determine other agents' preferences, facilitating generalized predictions of their behavior and effective responses.
Collapse
Affiliation(s)
- Toby Wise
- Department of Neuroimaging, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK; Department of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA.
| | - Caroline J Charpentier
- Department of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA; Department of Psychology, University of Maryland, College Park, MD, USA; Brain and Behavior Institute, University of Maryland, College Park, MD, USA
| | - Peter Dayan
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany; University of Tübingen, Tübingen, Germany
| | - Dean Mobbs
- Department of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA; Computation and Neural Systems Program, California Institute of Technology, Pasadena, CA, USA
| |
Collapse
|
3
|
Emanuel A, Eldar E. Emotions as computations. Neurosci Biobehav Rev 2023; 144:104977. [PMID: 36435390 PMCID: PMC9805532 DOI: 10.1016/j.neubiorev.2022.104977] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 10/26/2022] [Accepted: 11/22/2022] [Indexed: 11/26/2022]
Abstract
Emotions ubiquitously impact action, learning, and perception, yet their essence and role remain widely debated. Computational accounts of emotion aspire to answer these questions with greater conceptual precision informed by normative principles and neurobiological data. We examine recent progress in this regard and find that emotions may implement three classes of computations, which serve to evaluate states, actions, and uncertain prospects. For each of these, we use the formalism of reinforcement learning to offer a new formulation that better accounts for existing evidence. We then consider how these distinct computations may map onto distinct emotions and moods. Integrating extensive research on the causes and consequences of different emotions suggests a parsimonious one-to-one mapping, according to which emotions are integral to how we evaluate outcomes (pleasure & pain), learn to predict them (happiness & sadness), use them to inform our (frustration & content) and others' (anger & gratitude) actions, and plan in order to realize (desire & hope) or avoid (fear & anxiety) uncertain outcomes.
Collapse
Affiliation(s)
- Aviv Emanuel
- Department of Psychology, Hebrew University of Jerusalem, Jerusalem 9190501, Israel; Department of Cognitive and Brain Sciences, Hebrew University of Jerusalem, Jerusalem 9190501, Israel.
| | - Eran Eldar
- Department of Psychology, Hebrew University of Jerusalem, Jerusalem 9190501, Israel; Department of Cognitive and Brain Sciences, Hebrew University of Jerusalem, Jerusalem 9190501, Israel.
| |
Collapse
|
4
|
Rybicki AJ, Sowden SL, Schuster B, Cook JL. Dopaminergic challenge dissociates learning from primary versus secondary sources of information. eLife 2022; 11:74893. [PMID: 35289748 PMCID: PMC9023054 DOI: 10.7554/elife.74893] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Accepted: 03/14/2022] [Indexed: 11/13/2022] Open
Abstract
Some theories of human cultural evolution posit that humans have social-specific learning mechanisms that are adaptive specialisations moulded by natural selection to cope with the pressures of group living. However, the existence of neurochemical pathways that are specialised for learning from social information and individual experience is widely debated. Cognitive neuroscientific studies present mixed evidence for social-specific learning mechanisms: some studies find dissociable neural correlates for social and individual learning, whereas others find the same brain areas and, dopamine-mediated, computations involved in both. Here, we demonstrate that, like individual learning, social learning is modulated by the dopamine D2 receptor antagonist haloperidol when social information is the primary learning source, but not when it comprises a secondary, additional element. Two groups (total N = 43) completed a decision-making task which required primary learning, from own experience, and secondary learning from an additional source. For one group, the primary source was social, and secondary was individual; for the other group this was reversed. Haloperidol affected primary learning irrespective of social/individual nature, with no effect on learning from the secondary source. Thus, we illustrate that dopaminergic mechanisms underpinning learning can be dissociated along a primary-secondary but not a social-individual axis. These results resolve conflict in the literature and support an expanding field showing that, rather than being specialised for particular inputs, neurochemical pathways in the human brain can process both social and non-social cues and arbitrate between the two depending upon which cue is primarily relevant for the task at hand.
Collapse
|
5
|
Model-based learning retrospectively updates model-free values. Sci Rep 2022; 12:2358. [PMID: 35149713 PMCID: PMC8837618 DOI: 10.1038/s41598-022-05567-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 12/16/2021] [Indexed: 12/02/2022] Open
Abstract
Reinforcement learning (RL) is widely regarded as divisible into two distinct computational strategies. Model-free learning is a simple RL process in which a value is associated with actions, whereas model-based learning relies on the formation of internal models of the environment to maximise reward. Recently, theoretical and animal work has suggested that such models might be used to train model-free behaviour, reducing the burden of costly forward planning. Here we devised a way to probe this possibility in human behaviour. We adapted a two-stage decision task and found evidence that model-based processes at the time of learning can alter model-free valuation in healthy individuals. We asked people to rate subjective value of an irrelevant feature that was seen at the time a model-based decision would have been made. These irrelevant feature value ratings were updated by rewards, but in a way that accounted for whether the selected action retrospectively ought to have been taken. This model-based influence on model-free value ratings was best accounted for by a reward prediction error that was calculated relative to the decision path that would most likely have led to the reward. This effect occurred independently of attention and was not present when participants were not explicitly told about the structure of the environment. These findings suggest that current conceptions of model-based and model-free learning require updating in favour of a more integrated approach. Our task provides an empirical handle for further study of the dialogue between these two learning systems in the future.
Collapse
|
6
|
Deserno L, Moran R, Michely J, Lee Y, Dayan P, Dolan RJ. Dopamine enhances model-free credit assignment through boosting of retrospective model-based inference. eLife 2021; 10:e67778. [PMID: 34882092 PMCID: PMC8758138 DOI: 10.7554/elife.67778] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 12/08/2021] [Indexed: 11/13/2022] Open
Abstract
Dopamine is implicated in representing model-free (MF) reward prediction errors a as well as influencing model-based (MB) credit assignment and choice. Putative cooperative interactions between MB and MF systems include a guidance of MF credit assignment by MB inference. Here, we used a double-blind, placebo-controlled, within-subjects design to test an hypothesis that enhancing dopamine levels boosts the guidance of MF credit assignment by MB inference. In line with this, we found that levodopa enhanced guidance of MF credit assignment by MB inference, without impacting MF and MB influences directly. This drug effect correlated negatively with a dopamine-dependent change in purely MB credit assignment, possibly reflecting a trade-off between these two MB components of behavioural control. Our findings of a dopamine boost in MB inference guidance of MF learning highlight a novel DA influence on MB-MF cooperative interactions.
Collapse
Affiliation(s)
- Lorenz Deserno
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College LondonLondonUnited Kingdom
- The Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College LondonLondonUnited Kingdom
- Department of Child and Adolescent Psychiatry, Psychotherapy and Psychosomatics, University of WürzburgWürzburgGermany
- Department of Psychiatry and Psychotherapy, Technische Universität DresdenDresdenGermany
| | - Rani Moran
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College LondonLondonUnited Kingdom
- The Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College LondonLondonUnited Kingdom
| | - Jochen Michely
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College LondonLondonUnited Kingdom
- The Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College LondonLondonUnited Kingdom
- Department of Psychiatry and Psychotherapy, Charité Universitätsmedizin BerlinBerlinGermany
| | - Ying Lee
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College LondonLondonUnited Kingdom
- The Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College LondonLondonUnited Kingdom
- Department of Psychiatry and Psychotherapy, Technische Universität DresdenDresdenGermany
| | - Peter Dayan
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College LondonLondonUnited Kingdom
- Max Planck Institute for Biological CyberneticsTübingenGermany
- University of TübingenTübingenGermany
| | - Raymond J Dolan
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College LondonLondonUnited Kingdom
- The Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College LondonLondonUnited Kingdom
| |
Collapse
|
7
|
McDougle SD, Ballard IC, Baribault B, Bishop SJ, Collins AGE. Executive Function Assigns Value to Novel Goal-Congruent Outcomes. Cereb Cortex 2021; 32:231-247. [PMID: 34231854 PMCID: PMC8634563 DOI: 10.1093/cercor/bhab205] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 05/10/2021] [Accepted: 06/04/2021] [Indexed: 11/14/2022] Open
Abstract
People often learn from the outcomes of their actions, even when these outcomes do not involve material rewards or punishments. How does our brain provide this flexibility? We combined behavior, computational modeling, and functional neuroimaging to probe whether learning from abstract novel outcomes harnesses the same circuitry that supports learning from familiar secondary reinforcers. Behavior and neuroimaging revealed that novel images can act as a substitute for rewards during instrumental learning, producing reliable reward-like signals in dopaminergic circuits. Moreover, we found evidence that prefrontal correlates of executive control may play a role in shaping flexible responses in reward circuits. These results suggest that learning from novel outcomes is supported by an interplay between high-level representations in prefrontal cortex and low-level responses in subcortical reward circuits. This interaction may allow for human reinforcement learning over arbitrarily abstract reward functions.
Collapse
Affiliation(s)
| | - Ian C Ballard
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA 94720, USA
| | - Beth Baribault
- Department of Psychology, University of California, Berkeley, CA 94704, USA
| | - Sonia J Bishop
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA 94720, USA
- Department of Psychology, University of California, Berkeley, CA 94704, USA
| | - Anne G E Collins
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA 94720, USA
- Department of Psychology, University of California, Berkeley, CA 94704, USA
| |
Collapse
|
8
|
Shahar N, Hauser TU, Moran R, Moutoussis M, Bullmore ET, Dolan RJ. Assigning the right credit to the wrong action: compulsivity in the general population is associated with augmented outcome-irrelevant value-based learning. Transl Psychiatry 2021; 11:564. [PMID: 34741013 PMCID: PMC8571313 DOI: 10.1038/s41398-021-01642-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 09/01/2021] [Accepted: 09/21/2021] [Indexed: 11/08/2022] Open
Abstract
Compulsive behavior is enacted under a belief that a specific act controls the likelihood of an undesired future event. Compulsive behaviors are widespread in the general population despite having no causal relationship with events they aspire to influence. In the current study, we tested whether there is an increased tendency to assign value to aspects of a task that do not predict an outcome (i.e., outcome-irrelevant learning) among individuals with compulsive tendencies. We studied 514 healthy individuals who completed self-report compulsivity, anxiety, depression, and schizotypal measurements, and a well-established reinforcement-learning task (i.e., the two-step task). As expected, we found a positive relationship between compulsivity and outcome-irrelevant learning. Specifically, individuals who reported having stronger compulsive tendencies (e.g., washing, checking, grooming) also tended to assign value to response keys and stimuli locations that did not predict an outcome. Controlling for overall goal-directed abilities and the co-occurrence of anxious, depressive, or schizotypal tendencies did not impact these associations. These findings indicate that outcome-irrelevant learning processes may contribute to the expression of compulsivity in a general population setting. We highlight the need for future research on the formation of non-veridical action-outcome associations as a factor related to the occurrence and maintenance of compulsive behavior.
Collapse
Affiliation(s)
- Nitzan Shahar
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, London, WC1B 5EH, UK.
- Wellcome Centre for Human Neuroimaging, University College London, London, WC1N 3BG, UK.
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel.
- Psychology Department, Tel Aviv University, Tel Aviv, Israel.
| | - Tobias U Hauser
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, London, WC1B 5EH, UK
- Wellcome Centre for Human Neuroimaging, University College London, London, WC1N 3BG, UK
| | - Rani Moran
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, London, WC1B 5EH, UK
- Wellcome Centre for Human Neuroimaging, University College London, London, WC1N 3BG, UK
| | - Michael Moutoussis
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, London, WC1B 5EH, UK
- Wellcome Centre for Human Neuroimaging, University College London, London, WC1N 3BG, UK
| | | | - Raymond J Dolan
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, London, WC1B 5EH, UK
- Wellcome Centre for Human Neuroimaging, University College London, London, WC1N 3BG, UK
| |
Collapse
|
9
|
Moran R, Dayan P, Dolan RJ. Efficiency and prioritization of inference-based credit assignment. Curr Biol 2021; 31:2747-2756.e6. [PMID: 33887181 PMCID: PMC8279739 DOI: 10.1016/j.cub.2021.03.091] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 02/11/2021] [Accepted: 03/29/2021] [Indexed: 11/16/2022]
Abstract
Organisms adapt to their environments by learning to approach states that predict rewards and avoid states associated with punishments. Knowledge about the affective value of states often relies on credit assignment (CA), whereby state values are updated on the basis of reward feedback. Remarkably, humans assign credit to states that are not observed but are instead inferred based on a cognitive map that represents structural knowledge of an environment. A pertinent example is authors attempting to infer the identity of anonymous reviewers to assign them credit or blame and, on this basis, inform future referee recommendations. Although inference is cognitively costly, it is unknown how it influences CA or how it is apportioned between hidden and observable states (for example, both anonymous and revealed reviewers). We addressed these questions in a task that provided choices between lotteries where each led to a unique pair of occasionally rewarding outcome states. On some trials, both states were observable (rendering inference nugatory), whereas on others, the identity of one of the states was concealed. Importantly, by exploiting knowledge of choice-state associations, subjects could infer the identity of this hidden state. We show that having to perform inference reduces state-value updates. Strikingly, and in violation of normative theories, this reduction in CA was selective for the observed outcome alone. These findings have implications for the operation of putative cognitive maps.
Collapse
Affiliation(s)
- Rani Moran
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, 10-12 Russell Square, London WC1B 5EH, UK; Wellcome Centre for Human Neuroimaging, University College London, London WC1N 3BG, UK.
| | - Peter Dayan
- Max Planck Institute for Biological Cybernetics, Max Planck-Ring 8, 72076 Tübingen, Germany; University of Tübingen, 72074 Tübingen, Germany
| | - Raymond J Dolan
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, 10-12 Russell Square, London WC1B 5EH, UK; Wellcome Centre for Human Neuroimaging, University College London, London WC1N 3BG, UK
| |
Collapse
|