51
|
Tomov MS, Tsividis PA, Pouncy T, Tenenbaum JB, Gershman SJ. The neural architecture of theory-based reinforcement learning. Neuron 2023; 111:1331-1344.e8. [PMID: 36898374 PMCID: PMC10200004 DOI: 10.1016/j.neuron.2023.01.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 11/06/2022] [Accepted: 01/27/2023] [Indexed: 03/11/2023]
Abstract
Humans learn internal models of the world that support planning and generalization in complex environments. Yet it remains unclear how such internal models are represented and learned in the brain. We approach this question using theory-based reinforcement learning, a strong form of model-based reinforcement learning in which the model is a kind of intuitive theory. We analyzed fMRI data from human participants learning to play Atari-style games. We found evidence of theory representations in prefrontal cortex and of theory updating in prefrontal cortex, occipital cortex, and fusiform gyrus. Theory updates coincided with transient strengthening of theory representations. Effective connectivity during theory updating suggests that information flows from prefrontal theory-coding regions to posterior theory-updating regions. Together, our results are consistent with a neural architecture in which top-down theory representations originating in prefrontal regions shape sensory predictions in visual areas, where factored theory prediction errors are computed and trigger bottom-up updates of the theory.
Collapse
Affiliation(s)
- Momchil S Tomov
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA 02138, USA; Center for Brains, Minds, and Machines, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Motional AD, Inc., Boston, MA 02210, USA.
| | - Pedro A Tsividis
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Center for Brains, Minds, and Machines, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Thomas Pouncy
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA 02138, USA
| | - Joshua B Tenenbaum
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Center for Brains, Minds, and Machines, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Samuel J Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA 02138, USA; Center for Brains, Minds, and Machines, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|
52
|
Feher da Silva C, Lombardi G, Edelson M, Hare TA. Rethinking model-based and model-free influences on mental effort and striatal prediction errors. Nat Hum Behav 2023:10.1038/s41562-023-01573-1. [PMID: 37012365 DOI: 10.1038/s41562-023-01573-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 02/27/2023] [Indexed: 04/05/2023]
Abstract
A standard assumption in neuroscience is that low-effort model-free learning is automatic and continuously used, whereas more complex model-based strategies are only used when the rewards they generate are worth the additional effort. We present evidence refuting this assumption. First, we demonstrate flaws in previous reports of combined model-free and model-based reward prediction errors in the ventral striatum that probably led to spurious results. More appropriate analyses yield no evidence of model-free prediction errors in this region. Second, we find that task instructions generating more correct model-based behaviour reduce rather than increase mental effort. This is inconsistent with cost-benefit arbitration between model-based and model-free strategies. Together, our data indicate that model-free learning may not be automatic. Instead, humans can reduce mental effort by using a model-based strategy alone rather than arbitrating between multiple strategies. Our results call for re-evaluation of the assumptions in influential theories of learning and decision-making.
Collapse
Affiliation(s)
| | - Gaia Lombardi
- Zurich Center for Neuroeconomics, Department of Economics, University of Zurich, Zurich, Switzerland
| | - Micah Edelson
- Zurich Center for Neuroeconomics, Department of Economics, University of Zurich, Zurich, Switzerland
| | - Todd A Hare
- Zurich Center for Neuroeconomics, Department of Economics, University of Zurich, Zurich, Switzerland.
| |
Collapse
|
53
|
Letkiewicz AM, Kottler HC, Shankman SA, Cochran AL. Quantifying aberrant approach-avoidance conflict in psychopathology: A review of computational approaches. Neurosci Biobehav Rev 2023; 147:105103. [PMID: 36804398 PMCID: PMC10023482 DOI: 10.1016/j.neubiorev.2023.105103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 02/02/2023] [Accepted: 02/16/2023] [Indexed: 02/19/2023]
Abstract
Making effective decisions during approach-avoidance conflict is critical in daily life. Aberrant decision-making during approach-avoidance conflict is evident in a range of psychological disorders, including anxiety, depression, trauma-related disorders, substance use disorders, and alcohol use disorders. To help clarify etiological pathways and reveal novel intervention targets, clinical research into decision-making is increasingly adopting a computational psychopathology approach. This approach uses mathematical models that can identify specific decision-making related processes that are altered in mental health disorders. In our review, we highlight foundational approach-avoidance conflict research, followed by more in-depth discussion of computational approaches that have been used to model behavior in these tasks. Specifically, we describe the computational models that have been applied to approach-avoidance conflict (e.g., drift-diffusion, active inference, and reinforcement learning models), and provide resources to guide clinical researchers who may be interested in applying computational modeling. Finally, we identify notable gaps in the current literature and potential future directions for computational approaches aimed at identifying mechanisms of approach-avoidance conflict in psychopathology.
Collapse
Affiliation(s)
- Allison M Letkiewicz
- Department of Psychiatry and Behavioral Sciences, Northwestern University, Chicago, IL, USA.
| | - Haley C Kottler
- Department of Mathematics, University of Wisconsin, Madison, WI, USA
| | - Stewart A Shankman
- Department of Psychiatry and Behavioral Sciences, Northwestern University, Chicago, IL, USA; Department of Psychology, Northwestern University, Evanston, IL, USA
| | - Amy L Cochran
- Department of Mathematics, University of Wisconsin, Madison, WI, USA; Department of Population Health Sciences, University of Wisconsin, Madison, WI, USA
| |
Collapse
|
54
|
Sharp PB, Dolan RJ, Eldar E. Disrupted state transition learning as a computational marker of compulsivity. Psychol Med 2023; 53:2095-2105. [PMID: 37310326 PMCID: PMC10106291 DOI: 10.1017/s0033291721003846] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 08/28/2021] [Accepted: 09/02/2021] [Indexed: 11/07/2022]
Abstract
BACKGROUND Disorders involving compulsivity, fear, and anxiety are linked to beliefs that the world is less predictable. We lack a mechanistic explanation for how such beliefs arise. Here, we test a hypothesis that in people with compulsivity, fear, and anxiety, learning a probabilistic mapping between actions and environmental states is compromised. METHODS In Study 1 (n = 174), we designed a novel online task that isolated state transition learning from other facets of learning and planning. To determine whether this impairment is due to learning that is too fast or too slow, we estimated state transition learning rates by fitting computational models to two independent datasets, which tested learning in environments in which state transitions were either stable (Study 2: n = 1413) or changing (Study 3: n = 192). RESULTS Study 1 established that individuals with higher levels of compulsivity are more likely to demonstrate an impairment in state transition learning. Preliminary evidence here linked this impairment to a common factor comprising compulsivity and fear. Studies 2 and 3 showed that compulsivity is associated with learning that is too fast when it should be slow (i.e. when state transition are stable) and too slow when it should be fast (i.e. when state transitions change). CONCLUSIONS Together, these findings indicate that compulsivity is associated with a dysregulation of state transition learning, wherein the rate of learning is not well adapted to the task environment. Thus, dysregulated state transition learning might provide a key target for therapeutic intervention in compulsivity.
Collapse
Affiliation(s)
- Paul B. Sharp
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, London, UK
- Wellcome Centre for Human Neuroimaging, University College London, London, UK
- The Hebrew University of Jerusalem, Jerusalem, IL, USA
| | - Raymond J. Dolan
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, London, UK
- Wellcome Centre for Human Neuroimaging, University College London, London, UK
| | - Eran Eldar
- The Hebrew University of Jerusalem, Jerusalem, IL, USA
| |
Collapse
|
55
|
Peciña M, Chen J, Karp JF, Dombrovski AY. Dynamic Feedback Between Antidepressant Placebo Expectancies and Mood. JAMA Psychiatry 2023; 80:389-398. [PMID: 36857039 PMCID: PMC9979016 DOI: 10.1001/jamapsychiatry.2023.0010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Accepted: 01/02/2023] [Indexed: 03/02/2023]
Abstract
Importance Despite high antidepressant placebo response rates, the mechanisms underlying the persistence of antidepressant placebo effects are still poorly understood. Objective To investigate the neurobehavioral mechanisms underlying the evolution of antidepressant placebo effects using a reinforcement learning (RL) framework. Design, Setting, and Participants In this acute within-patient cross-sectional study of antidepressant placebos, patients aged 18 to 55 years not receiving medication for major depressive disorder (MDD) were recruited at the University of Pittsburgh between February 21, 2017, to March 1, 2021. Interventions The antidepressant placebo functional magnetic resonance imaging task manipulates placebo-associated expectancies using visually cued fast-acting antidepressant infusions and controls their reinforcement with sham visual neurofeedback while assessing expected and experienced mood improvement. Main Outcomes and Measures The trial-by-trial evolution of expectancies and mood was examined using multilevel modeling and RL, relating model-predicted signals to spatiotemporal dynamics of blood oxygenation level-dependent (BOLD) response. Results A bayesian RL model comparison in 60 individuals (mean [SE] age, 24.5 [0.8] years; 51 females [85%]) with MDD revealed that antidepressant placebo trial-wise expectancies were updated by composite learning signals multiplexing sensory evidence (neurofeedback) and trial-wise mood (bayesian omnibus risk <0.001; exceedance probability = 97%). Placebo expectancy, neurofeedback manipulations, and composite learning signals modulated the visual cortex and dorsal attention network (threshold-free cluster enhancement [TFCE] = 1 - P >.95). As participants anticipated antidepressant infusions, learned placebo expectancies modulated the salience network (SN, TFCE = 1 - P >.95), positively scaling with depression severity. Conclusions and Relevance Results of this cross-sectional study suggest that on a timescale of minutes, antidepressant placebo effects were maintained by positive feedback loops between expectancies and mood improvement. During learning, representations of placebos and their perceived effects were enhanced in primary and secondary sensory cortices. Latent learned placebo expectancies were encoded in the SN.
Collapse
Affiliation(s)
- Marta Peciña
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Jiazhou Chen
- Section on Development and Affective Neuroscience, National Institute of Health, Bethesda, Maryland
- Division of Psychiatry, University College London, London, United Kingdom
| | | | | |
Collapse
|
56
|
Scholz V, Waltmann M, Herzog N, Reiter A, Horstmann A, Deserno L. Cortical Grey Matter Mediates Increases in Model-Based Control and Learning from Positive Feedback from Adolescence to Adulthood. J Neurosci 2023; 43:2178-2189. [PMID: 36823039 PMCID: PMC10039741 DOI: 10.1523/jneurosci.1418-22.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 12/20/2022] [Accepted: 01/13/2023] [Indexed: 02/25/2023] Open
Abstract
Cognition and brain structure undergo significant maturation from adolescence into adulthood. Model-based (MB) control is known to increase across development, which is mediated by cognitive abilities. Here, we asked two questions unaddressed in previous developmental studies. First, what are the brain structural correlates of age-related increases in MB control? Second, how are age-related increases in MB control from adolescence to adulthood influenced by motivational context? A human developmental sample (n = 103; age, 12-50, male/female, 55:48) completed structural MRI and an established task to capture MB control. The task was modified with respect to outcome valence by including (1) reward and punishment blocks to manipulate the motivational context and (2) an additional choice test to assess learning from positive versus negative feedback. After replicating that an age-dependent increase in MB control is mediated by cognitive abilities, we demonstrate first-time evidence that gray matter density (GMD) in the parietal cortex mediates the increase of MB control with age. Although motivational context did not relate to age-related changes in MB control, learning from positive feedback improved with age. Meanwhile, negative feedback learning showed no age effects. We present a first report that an age-related increase in positive feedback learning was mediated by reduced GMD in the parietal, medial, and dorsolateral prefrontal cortex. Our findings indicate that brain maturation, putatively reflected in lower GMD, in distinct and partially overlapping brain regions could lead to a more efficient brain organization and might thus be a key developmental step toward age-related increases in planning and value-based choice.SIGNIFICANCE STATEMENT Changes in model-based decision-making are paralleled by extensive maturation in cognition and brain structure across development. Still, to date the neuroanatomical underpinnings of these changes remain unclear. Here, we demonstrate for the first time that parietal GMD mediates age-dependent increases in model-based control. Age-related increases in positive feedback learning were mediated by reduced GMD in the parietal, medial, and dorsolateral prefrontal cortex. A manipulation of motivational context did not have an impact on age-related changes in model-based control. These findings highlight that brain maturation in distinct and overlapping cortical regions constitutes a key developmental step toward improved value-based choices.
Collapse
Affiliation(s)
- Vanessa Scholz
- Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, Centre of Mental Health, University of Würzburg, 97080 Würzburg, Germany
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6525 GD Nijmegen, The Netherlands
| | - Maria Waltmann
- Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, Centre of Mental Health, University of Würzburg, 97080 Würzburg, Germany
- Max Planck Institute for Cognition and Neuroscience, D-04103 Leipzig, Germany
| | - Nadine Herzog
- Max Planck Institute for Cognition and Neuroscience, D-04103 Leipzig, Germany
- Integrated Research and Treatment Center AdiposityDiseases, Leipzig University Medical Center, 04103 Leipzig, Germany
| | - Andrea Reiter
- Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, Centre of Mental Health, University of Würzburg, 97080 Würzburg, Germany
- Collaborative Research Center-940 Volition and Cognitive Control, Faculty of Psychology, Technical University Dresden, 01069 Dresden, Germany
| | - Annette Horstmann
- Max Planck Institute for Cognition and Neuroscience, D-04103 Leipzig, Germany
- Integrated Research and Treatment Center AdiposityDiseases, Leipzig University Medical Center, 04103 Leipzig, Germany
- Department of Psychology and Logopedics, Faculty of Medicine, University of Helsinki, 00014 Helsinki, Finland
| | - Lorenz Deserno
- Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, Centre of Mental Health, University of Würzburg, 97080 Würzburg, Germany
- Max Planck Institute for Cognition and Neuroscience, D-04103 Leipzig, Germany
- Integrated Research and Treatment Center AdiposityDiseases, Leipzig University Medical Center, 04103 Leipzig, Germany
- Department of Psychiatry and Psychotherapy, University Hospital Carl Gustav Carus, Technical University Dresden, 01069 Dresden, Germany
| |
Collapse
|
57
|
Bianchi S, Muñoz-Martin I, Covi E, Bricalli A, Piccolboni G, Regev A, Molas G, Nodin JF, Andrieu F, Ielmini D. A self-adaptive hardware with resistive switching synapses for experience-based neurocomputing. Nat Commun 2023; 14:1565. [PMID: 36944647 PMCID: PMC10030830 DOI: 10.1038/s41467-023-37097-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Accepted: 03/02/2023] [Indexed: 03/23/2023] Open
Abstract
Neurobiological systems continually interact with the surrounding environment to refine their behaviour toward the best possible reward. Achieving such learning by experience is one of the main challenges of artificial intelligence, but currently it is hindered by the lack of hardware capable of plastic adaptation. Here, we propose a bio-inspired recurrent neural network, mastered by a digital system on chip with resistive-switching synaptic arrays of memory devices, which exploits homeostatic Hebbian learning for improved efficiency. All the results are discussed experimentally and theoretically, proposing a conceptual framework for benchmarking the main outcomes in terms of accuracy and resilience. To test the proposed architecture for reinforcement learning tasks, we study the autonomous exploration of continually evolving environments and verify the results for the Mars rover navigation. We also show that, compared to conventional deep learning techniques, our in-memory hardware has the potential to achieve a significant boost in speed and power-saving.
Collapse
Affiliation(s)
- S Bianchi
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano and IUNET, Milano, 20133, Italy
- Infineon Technologies, Villach, Austria
| | - I Muñoz-Martin
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano and IUNET, Milano, 20133, Italy
- Infineon Technologies, Villach, Austria
| | - E Covi
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano and IUNET, Milano, 20133, Italy
- NaMLab gGmbH, Dresden, Germany
| | | | | | - A Regev
- Weebit Nano, Hod Hasharon, Israel
| | - G Molas
- Weebit Nano, Hod Hasharon, Israel
| | - J F Nodin
- Univ. Grenoble Alpes, CEA, Leti, F-38000, Grenoble, France
| | - F Andrieu
- Univ. Grenoble Alpes, CEA, Leti, F-38000, Grenoble, France
| | - D Ielmini
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano and IUNET, Milano, 20133, Italy.
| |
Collapse
|
58
|
Nissan N, Hertz U, Shahar N, Gabay Y. Distinct reinforcement learning profiles distinguish between language and attentional neurodevelopmental disorders. BEHAVIORAL AND BRAIN FUNCTIONS : BBF 2023; 19:6. [PMID: 36941632 PMCID: PMC10029183 DOI: 10.1186/s12993-023-00207-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Accepted: 01/26/2023] [Indexed: 03/23/2023]
Abstract
BACKGROUND Theoretical models posit abnormalities in cortico-striatal pathways in two of the most common neurodevelopmental disorders (Developmental dyslexia, DD, and Attention deficit hyperactive disorder, ADHD), but it is still unclear what distinct cortico-striatal dysfunction might distinguish language disorders from others that exhibit very different symptomatology. Although impairments in tasks that depend on the cortico-striatal network, including reinforcement learning (RL), have been implicated in both disorders, there has been little attempt to dissociate between different types of RL or to compare learning processes in these two types of disorders. The present study builds upon prior research indicating the existence of two learning manifestations of RL and evaluates whether these processes can be differentiated in language and attention deficit disorders. We used a two-step RL task shown to dissociate model-based from model-free learning in human learners. RESULTS Our results show that, relative to neurotypicals, DD individuals showed an impairment in model-free but not in model-based learning, whereas in ADHD the ability to use both model-free and model-based learning strategies was significantly compromised. CONCLUSIONS Thus, learning impairments in DD may be linked to a selective deficit in the ability to form action-outcome associations based on previous history, whereas in ADHD some learning deficits may be related to an incapacity to pursue rewards based on the tasks' structure. Our results indicate how different patterns of learning deficits may underlie different disorders, and how computation-minded experimental approaches can differentiate between them.
Collapse
Affiliation(s)
- Noyli Nissan
- Department of Special Education, University of Haifa, Haifa, Israel
- Edmond J. Safra Brain Research Center for the Study of Learning Disabilities, University of Haifa, 199 Abba Khoushy Ave, Haifa, Israel
| | - Uri Hertz
- Department of Cognitive Sciences, University of Haifa, Haifa, Israel
| | - Nitzan Shahar
- The School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| | - Yafit Gabay
- Department of Special Education, University of Haifa, Haifa, Israel.
- Edmond J. Safra Brain Research Center for the Study of Learning Disabilities, University of Haifa, 199 Abba Khoushy Ave, Haifa, Israel.
| |
Collapse
|
59
|
Smid CR, Kool W, Hauser TU, Steinbeis N. Computational and behavioral markers of model-based decision making in childhood. Dev Sci 2023; 26:e13295. [PMID: 35689563 DOI: 10.1111/desc.13295] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2021] [Revised: 05/12/2022] [Accepted: 05/22/2022] [Indexed: 11/28/2022]
Abstract
Human decision-making is underpinned by distinct systems that differ in flexibility and associated cognitive cost. A widely accepted dichotomy distinguishes between a cheap but rigid model-free system and a flexible but costly model-based system. Typically, humans use a hybrid of both types of decision-making depending on environmental demands. However, children's use of a model-based system during decision-making has not yet been shown. While prior developmental work has identified simple building blocks of model-based reasoning in young children (1-4 years old), there has been little evidence of this complex cognitive system influencing behavior before adolescence. Here, by using a modified task to make engagement in cognitively costly strategies more rewarding, we show that children aged 5-11-years (N = 85), including the youngest children, displayed multiple indicators of model-based decision making, and that the degree of its use increased throughout childhood. Unlike adults (N = 24), however, children did not display adaptive arbitration between model-free and model-based decision-making. Our results demonstrate that throughout childhood, children can engage in highly sophisticated and costly decision-making strategies. However, the flexible arbitration between decision-making strategies might be a critically late-developing component in human development.
Collapse
Affiliation(s)
- Claire R Smid
- Department of Psychology and Language Sciences, University College London, London, the United Kingdom
| | - Wouter Kool
- Department of Psychological & Brain Sciences, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Tobias U Hauser
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, London, the United Kingdom.,Wellcome Centre for Human Neuroimaging, University College London, London, the United Kingdom
| | - Nikolaus Steinbeis
- Department of Psychology and Language Sciences, University College London, London, the United Kingdom
| |
Collapse
|
60
|
Young ME, Howatt BC. Resource limitations: A taxonomy. Behav Processes 2023; 206:104823. [PMID: 36682436 DOI: 10.1016/j.beproc.2023.104823] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 01/02/2023] [Accepted: 01/17/2023] [Indexed: 01/21/2023]
Abstract
Decision making within the context of resource limitations requires balancing the short-term benefits of obtaining a resource and the long-term consequences of depleting those resources. The present manuscript focuses on four types of tasks that share this tradeoff to develop a taxonomy that will encourage a deeper understanding of the psychological processes at play. The four types considered are foraging, common pool traps, deterioration traps, and a novel designation referred to as resource cliffs. All four will be shown to include two opposite processes - depletion of the resource and its replenishment over time. By considering the unique and shared features of these tasks, a taxonomy of features emerges that can be combined to not only create novel tasks but also to shift the research focus to task features rather than specific tasks. The paper closes with a consideration of current theoretical frameworks previously applied to one or more of these resource-limitation tasks as well as the promise of reinforcement learning as a unifying theory.
Collapse
|
61
|
Bakst L, McGuire JT. Experience-driven recalibration of learning from surprising events. Cognition 2023; 232:105343. [PMID: 36481590 PMCID: PMC9851993 DOI: 10.1016/j.cognition.2022.105343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 10/13/2022] [Accepted: 11/21/2022] [Indexed: 12/12/2022]
Abstract
Different environments favor different patterns of adaptive learning. A surprising event that in one context would accelerate belief updating might, in another context, be downweighted as a meaningless outlier. Here, we investigated whether people would spontaneously regulate the influence of surprise on learning in response to event-by-event experiential feedback. Across two experiments, we examined whether participants performing a perceptual judgment task under spatial uncertainty (n = 29, n = 63) adapted their patterns of predictive gaze according to the informativeness or uninformativeness of surprising events in their current environment. Uninstructed predictive eye movements exhibited a form of metalearning in which surprise came to modulate event-by-event learning rates in opposite directions across contexts. Participants later appropriately readjusted their patterns of adaptive learning when the statistics of the environment underwent an unsignaled reversal. Although significant adjustments occurred in both directions, performance was consistently superior in environments in which surprising events reflected meaningful change, potentially reflecting a bias towards interpreting surprise as informative and/or difficulty ignoring salient outliers. Our results provide evidence for spontaneous, context-appropriate recalibration of the role of surprise in adaptive learning.
Collapse
Affiliation(s)
- Leah Bakst
- Department of Psychological & Brain Sciences, Boston University, 64 Cummington Mall, Boston, MA 02215, USA; Center for Systems Neuroscience, Boston University, 610 Commonwealth Avenue, Boston, MA 02215, USA.
| | - Joseph T McGuire
- Department of Psychological & Brain Sciences, Boston University, 64 Cummington Mall, Boston, MA 02215, USA; Center for Systems Neuroscience, Boston University, 610 Commonwealth Avenue, Boston, MA 02215, USA.
| |
Collapse
|
62
|
Ruel A, Bolenz F, Li SC, Fischer A, Eppinger B. Neural evidence for age-related deficits in the representation of state spaces. Cereb Cortex 2023; 33:1768-1781. [PMID: 35510942 DOI: 10.1093/cercor/bhac171] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 04/08/2022] [Accepted: 04/09/2022] [Indexed: 11/12/2022] Open
Abstract
Under high cognitive demands, older adults tend to resort to simpler, habitual, or model-free decision strategies. This age-related shift in decision behavior has been attributed to deficits in the representation of the cognitive maps, or state spaces, necessary for more complex model-based decision-making. Yet, the neural mechanisms behind this shift remain unclear. In this study, we used a modified 2-stage Markov task in combination with computational modeling and single-trial EEG analyses to establish neural markers of age-related changes in goal-directed decision-making under different demands on the representation of state spaces. Our results reveal that the shift to simpler decision strategies in older adults is due to (i) impairments in the representation of the transition structure of the task and (ii) a diminished signaling of the reward value associated with decision options. In line with the diminished state space hypothesis of human aging, our findings suggest that deficits in goal-directed, model-based behavior in older adults result from impairments in the representation of state spaces of cognitive tasks.
Collapse
Affiliation(s)
- Alexa Ruel
- Department of Psychology, Concordia University, 7141 Sherbrooke Street W. Montreal, Quebec H4B 1R6, Canada
| | - Florian Bolenz
- Chair of Lifespan Developmental Neuroscience, Faculty of Psychology, Technische Universität Dresden, Bürogebäude, Zi. 244 Strehlener Straße 22/24, Dresden 01069, Germany.,Center for Adaptive Rationality, Max Planck Institute for Human Development, Lentzeallee 94, Berlin 14195, Germany.,Cluster of Excellence "Science of Intelligence", Technische Universität Berlin, Straße des 17. Juni 135, Berlin 10623, Germany
| | - Shu-Chen Li
- Chair of Lifespan Developmental Neuroscience, Faculty of Psychology, Technische Universität Dresden, Bürogebäude, Zi. 244 Strehlener Straße 22/24, Dresden 01069, Germany.,Cluster of Excellence "Centre for Tactile Internet with Human-in-the-Loop", Technische Universität Dresden, Bürogebäude, Zi. 244 Strehlener Straße 22/24, Dresden 01069, Germany
| | - Adrian Fischer
- Department of Education and Psychology, Freie Universität Berlin, Habelschwerdter Allee 45, Berlin 14195, Germany
| | - Ben Eppinger
- Department of Psychology, Concordia University, 7141 Sherbrooke Street W. Montreal, Quebec H4B 1R6, Canada.,Chair of Lifespan Developmental Neuroscience, Faculty of Psychology, Technische Universität Dresden, Bürogebäude, Zi. 244 Strehlener Straße 22/24, Dresden 01069, Germany.,PERFORM Center, Concordia University, 7200 Sherbrooke St. W. Montreal, Quebec H4B 1R6, Canada
| |
Collapse
|
63
|
Grahek I, Frömer R, Prater Fahey M, Shenhav A. Learning when effort matters: neural dynamics underlying updating and adaptation to changes in performance efficacy. Cereb Cortex 2023; 33:2395-2411. [PMID: 35695774 PMCID: PMC9977373 DOI: 10.1093/cercor/bhac215] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 05/06/2022] [Accepted: 05/08/2022] [Indexed: 11/13/2022] Open
Abstract
To determine how much cognitive control to invest in a task, people need to consider whether exerting control matters for obtaining rewards. In particular, they need to account for the efficacy of their performance-the degree to which rewards are determined by performance or by independent factors. Yet it remains unclear how people learn about their performance efficacy in an environment. Here we combined computational modeling with measures of task performance and EEG, to provide a mechanistic account of how people (i) learn and update efficacy expectations in a changing environment and (ii) proactively adjust control allocation based on current efficacy expectations. Across 2 studies, subjects performed an incentivized cognitive control task while their performance efficacy (the likelihood that rewards are performance-contingent or random) varied over time. We show that people update their efficacy beliefs based on prediction errors-leveraging similar neural and computational substrates as those that underpin reward learning-and adjust how much control they allocate according to these beliefs. Using computational modeling, we show that these control adjustments reflect changes in information processing, rather than the speed-accuracy tradeoff. These findings demonstrate the neurocomputational mechanism through which people learn how worthwhile their cognitive control is.
Collapse
Affiliation(s)
- Ivan Grahek
- Department of Cognitive, Linguistic, & Psychological Sciences, Carney Institute for Brain Science, Brown University, Box 1821, Providence, RI 02912, United States
| | - Romy Frömer
- Department of Cognitive, Linguistic, & Psychological Sciences, Carney Institute for Brain Science, Brown University, Box 1821, Providence, RI 02912, United States
| | - Mahalia Prater Fahey
- Department of Cognitive, Linguistic, & Psychological Sciences, Carney Institute for Brain Science, Brown University, Box 1821, Providence, RI 02912, United States
| | - Amitai Shenhav
- Department of Cognitive, Linguistic, & Psychological Sciences, Carney Institute for Brain Science, Brown University, Box 1821, Providence, RI 02912, United States
| |
Collapse
|
64
|
Ekman M, Kusch S, de Lange FP. Successor-like representation guides the prediction of future events in human visual cortex and hippocampus. eLife 2023; 12:78904. [PMID: 36729024 PMCID: PMC9894584 DOI: 10.7554/elife.78904] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 01/13/2023] [Indexed: 02/03/2023] Open
Abstract
Human agents build models of their environment, which enable them to anticipate and plan upcoming events. However, little is known about the properties of such predictive models. Recently, it has been proposed that hippocampal representations take the form of a predictive map-like structure, the so-called successor representation (SR). Here, we used human functional magnetic resonance imaging to probe whether activity in the early visual cortex (V1) and hippocampus adhere to the postulated properties of the SR after visual sequence learning. Participants were exposed to an arbitrary spatiotemporal sequence consisting of four items (A-B-C-D). We found that after repeated exposure to the sequence, merely presenting single sequence items (e.g., - B - -) resulted in V1 activation at the successor locations of the full sequence (e.g., C-D), but not at the predecessor locations (e.g., A). This highlights that visual representations are skewed toward future states, in line with the SR. Similar results were also found in the hippocampus. Moreover, the hippocampus developed a coactivation profile that showed sensitivity to the temporal distance in sequence space, with fading representations for sequence events in the more distant past and future. V1, in contrast, showed a coactivation profile that was only sensitive to spatial distance in stimulus space. Taken together, these results provide empirical evidence for the proposition that both visual and hippocampal cortex represent a predictive map of the visual world akin to the SR.
Collapse
Affiliation(s)
- Matthias Ekman
- Radboud University Nijmegen, Donders Institute for Brain, Cognition and BehaviourNijmegenNetherlands
| | - Sarah Kusch
- Radboud University Nijmegen, Donders Institute for Brain, Cognition and BehaviourNijmegenNetherlands
| | - Floris P de Lange
- Radboud University Nijmegen, Donders Institute for Brain, Cognition and BehaviourNijmegenNetherlands
| |
Collapse
|
65
|
Pool ER, Pauli WM, Cross L, O'Doherty JP. Neural substrates of parallel devaluation-sensitive and devaluation-insensitive Pavlovian learning in humans. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.26.525637. [PMID: 36747799 PMCID: PMC9901183 DOI: 10.1101/2023.01.26.525637] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Pavlovian learning depends on multiple and parallel associations leading to distinct classes of conditioned responses that vary in their flexibility following changes in the value of an associated outcome. Here, we aimed to differentiate brain areas involved in learning and encoding associations that are sensitive to changes in the value of an outcome from those that are not sensitive to such changes. To address this question, we combined a Pavlovian learning task with outcome devaluation, eye-tracking and functional magnetic resonance imaging. We used computational modeling to identify brain regions involved in learning stimulus-reward associations and stimulus-stimulus associations, by testing for brain areas correlating with reward-prediction errors and state-prediction errors, respectively. We found that, contrary to theoretical predictions about reward prediction errors being exclusively model-free, voxels correlating with reward prediction errors in the ventral striatum and subgenual anterior cingulate cortex were sensitive to devaluation. On the other hand, brain areas correlating with state prediction errors were found to be devaluation insensitive. In a supplementary analysis, we distinguished brain regions encoding predictions about outcome taste identity from those involved in encoding predictions about its expected spatial location. A subset of regions involved in taste identity predictions were devaluation sensitive while those involved in encoding predictions about spatial location were devaluation insensitive. These findings provide insights into the role of multiple associative mechanisms in the brain in mediating Pavlovian conditioned behavior - illustrating how distinct neural pathways can in parallel produce both devaluation sensitive and devaluation insensitive behaviors.
Collapse
|
66
|
Oguchi M, Li Y, Matsumoto Y, Kiyonari T, Yamamoto K, Sugiura S, Sakagami M. Proselfs depend more on model-based than model-free learning in a non-social probabilistic state-transition task. Sci Rep 2023; 13:1419. [PMID: 36697448 PMCID: PMC9876908 DOI: 10.1038/s41598-023-27609-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 01/04/2023] [Indexed: 01/26/2023] Open
Abstract
Humans form complex societies in which we routinely engage in social decision-making regarding the allocation of resources among ourselves and others. One dimension that characterizes social decision-making in particular is whether to prioritize self-interest or respect for others-proself or prosocial. What causes this individual difference in social value orientation? Recent developments in the social dual-process theory argue that social decision-making is characterized by its underlying domain-general learning systems: the model-free and model-based systems. In line with this "learning" approach, we propose and experimentally test the hypothesis that differences in social preferences stem from which learning system is dominant in an individual. Here, we used a non-social state transition task that allowed us to assess the balance between model-free/model-based learning and investigate its relation to the social value orientations. The results showed that proselfs depended more on model-based learning, whereas prosocials depended more on model-free learning. Reward amount and reaction time analyses showed that proselfs learned the task structure earlier in the session than prosocials, reflecting their difference in model-based/model-free learning dependence. These findings support the learning hypothesis on what makes differences in social preferences and have implications for understanding the mechanisms of prosocial behavior.
Collapse
Affiliation(s)
- Mineki Oguchi
- Brain Science Institute, Tamagawa University, 6-1-1, Tamagawagakuen, Machida, Tokyo, Japan
| | - Yang Li
- Brain Science Institute, Tamagawa University, 6-1-1, Tamagawagakuen, Machida, Tokyo, Japan.,Graduate School of Informatics, Nagoya University, Nagoya, Japan
| | - Yoshie Matsumoto
- Brain Science Institute, Tamagawa University, 6-1-1, Tamagawagakuen, Machida, Tokyo, Japan.,Department of Psychology, Faculty of Human Sciences, Seinan Gakuin University, Fukuoka, Japan
| | - Toko Kiyonari
- School of Social Informatics, Aoyama Gakuin University, Kanagawa, Japan
| | | | | | - Masamichi Sakagami
- Brain Science Institute, Tamagawa University, 6-1-1, Tamagawagakuen, Machida, Tokyo, Japan.
| |
Collapse
|
67
|
Sherif MA, Fotros A, Greenberg BD, McLaughlin NCR. Understanding cingulotomy's therapeutic effect in OCD through computer models. Front Integr Neurosci 2023; 16:889831. [PMID: 36704759 PMCID: PMC9871832 DOI: 10.3389/fnint.2022.889831] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 12/05/2022] [Indexed: 01/12/2023] Open
Abstract
Cingulotomy is therapeutic in OCD, but what are the possible mechanisms? Computer models that formalize cortical OCD abnormalities and anterior cingulate cortex (ACC) function can help answer this. At the neural dynamics level, cortical dynamics in OCD have been modeled using attractor networks, where activity patterns resistant to change denote the inability to switch to new patterns, which can reflect inflexible thinking patterns or behaviors. From that perspective, cingulotomy might reduce the influence of difficult-to-escape ACC attractor dynamics on other cortical areas. At the functional level, computer formulations based on model-free reinforcement learning (RL) have been used to describe the multitude of phenomena ACC is involved in, such as tracking the timing of expected outcomes and estimating the cost of exerting cognitive control and effort. Different elements of model-free RL models of ACC could be affected by the inflexible cortical dynamics, making it challenging to update their values. An agent can also use a world model, a representation of how the states of the world change, to plan its actions, through model-based RL. OCD has been hypothesized to be driven by reduced certainty of how the brain's world model describes changes. Cingulotomy might improve such uncertainties about the world and one's actions, making it possible to trust the outcomes of these actions more and thus reduce the urge to collect more sensory information in the form of compulsions. Connecting the neural dynamics models with the functional formulations can provide new ways of understanding the role of ACC in OCD, with potential therapeutic insights.
Collapse
Affiliation(s)
- Mohamed A. Sherif
- Department of Psychiatry, Brown University, Providence, RI, United States,Carney Institute for Brain Science, Brown University, Providence, RI, United States,Department of Psychiatry Lifespan Health System, Providence, RI, United States,*Correspondence: Mohamed A. Sherif,
| | - Aryandokht Fotros
- Department of Psychiatry, Brown University, Providence, RI, United States,Department of Psychiatry Lifespan Health System, Providence, RI, United States
| | - Benjamin D. Greenberg
- Department of Psychiatry, Brown University, Providence, RI, United States,Carney Institute for Brain Science, Brown University, Providence, RI, United States,Butler Hospital, Providence, RI, United States,United States Department of Veterans Affairs, Providence VA Medical Center, Providence, RI, United States
| | - Nicole C. R. McLaughlin
- Department of Psychiatry, Brown University, Providence, RI, United States,Carney Institute for Brain Science, Brown University, Providence, RI, United States,Butler Hospital, Providence, RI, United States
| |
Collapse
|
68
|
Emanuel A, Eldar E. Emotions as computations. Neurosci Biobehav Rev 2023; 144:104977. [PMID: 36435390 PMCID: PMC9805532 DOI: 10.1016/j.neubiorev.2022.104977] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 10/26/2022] [Accepted: 11/22/2022] [Indexed: 11/26/2022]
Abstract
Emotions ubiquitously impact action, learning, and perception, yet their essence and role remain widely debated. Computational accounts of emotion aspire to answer these questions with greater conceptual precision informed by normative principles and neurobiological data. We examine recent progress in this regard and find that emotions may implement three classes of computations, which serve to evaluate states, actions, and uncertain prospects. For each of these, we use the formalism of reinforcement learning to offer a new formulation that better accounts for existing evidence. We then consider how these distinct computations may map onto distinct emotions and moods. Integrating extensive research on the causes and consequences of different emotions suggests a parsimonious one-to-one mapping, according to which emotions are integral to how we evaluate outcomes (pleasure & pain), learn to predict them (happiness & sadness), use them to inform our (frustration & content) and others' (anger & gratitude) actions, and plan in order to realize (desire & hope) or avoid (fear & anxiety) uncertain outcomes.
Collapse
Affiliation(s)
- Aviv Emanuel
- Department of Psychology, Hebrew University of Jerusalem, Jerusalem 9190501, Israel; Department of Cognitive and Brain Sciences, Hebrew University of Jerusalem, Jerusalem 9190501, Israel.
| | - Eran Eldar
- Department of Psychology, Hebrew University of Jerusalem, Jerusalem 9190501, Israel; Department of Cognitive and Brain Sciences, Hebrew University of Jerusalem, Jerusalem 9190501, Israel.
| |
Collapse
|
69
|
Lee JH, Leibo JZ, An SJ, Lee SW. Importance of prefrontal meta control in human-like reinforcement learning. Front Comput Neurosci 2022; 16:1060101. [PMID: 36618272 PMCID: PMC9811824 DOI: 10.3389/fncom.2022.1060101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Accepted: 11/30/2022] [Indexed: 12/24/2022] Open
Abstract
Recent investigation on reinforcement learning (RL) has demonstrated considerable flexibility in dealing with various problems. However, such models often experience difficulty learning seemingly easy tasks for humans. To reconcile the discrepancy, our paper is focused on the computational benefits of the brain's RL. We examine the brain's ability to combine complementary learning strategies to resolve the trade-off between prediction performance, computational costs, and time constraints. The complex need for task performance created by a volatile and/or multi-agent environment motivates the brain to continually explore an ideal combination of multiple strategies, called meta-control. Understanding these functions would allow us to build human-aligned RL models.
Collapse
Affiliation(s)
- Jee Hang Lee
- Department of Human-Centered Artificial Intelligence, Sangmyung University, Seoul, South Korea
| | | | - Su Jin An
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea
| | - Sang Wan Lee
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea
- Program of Brain and Cognitive Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea
- KAIST Center for Neuroscience-Inspired Artificial Intelligence, Korea Advanced Institute of Science and Technology, Daejeon, South Korea
- KAIST Institute for Health Science and Technology, Korea Advanced Institute of Science and Technology, Daejeon, South Korea
- KAIST Institute for Artificial Intelligence, Korea Advanced Institute of Science and Technology, Daejeon, South Korea
| |
Collapse
|
70
|
Moughrabi N, Botsford C, Gruichich TS, Azar A, Heilicher M, Hiser J, Crombie KM, Dunsmoor JE, Stowe Z, Cisler JM. Large-scale neural network computations and multivariate representations during approach-avoidance conflict decision-making. Neuroimage 2022; 264:119709. [PMID: 36283543 PMCID: PMC9835092 DOI: 10.1016/j.neuroimage.2022.119709] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 10/20/2022] [Accepted: 10/21/2022] [Indexed: 11/09/2022] Open
Abstract
Many real-world situations require navigating decisions for both reward and threat. While there has been significant progress in understanding mechanisms of decision-making and mediating neurocircuitry separately for reward and threat, there is limited understanding of situations where reward and threat contingencies compete to create approach-avoidance conflict (AAC). Here, we leverage computational learning models, independent component analysis (ICA), and multivariate pattern analysis (MVPA) approaches to understand decision-making during a novel task that embeds concurrent reward and threat learning and manipulates congruency between reward and threat probabilities. Computational modeling supported a modified reinforcement learning model where participants integrated reward and threat value into a combined total value according to an individually varying policy parameter, which was highly predictive of decisions to approach reward vs avoid threat during trials where the highest reward option was also the highest threat option (i.e., approach-avoidance conflict). ICA analyses demonstrated unique roles for salience, frontoparietal, medial prefrontal, and inferior frontal networks in differential encoding of reward vs threat prediction error and value signals. The left frontoparietal network uniquely encoded degree of conflict between reward and threat value at the time of choice. MVPA demonstrated that delivery of reward and threat could accurately be decoded within salience and inferior frontal networks, respectively, and that decisions to approach reward vs avoid threat were predicted by the relative degree to which these reward vs threat representations were active at the time of choice. This latter result suggests that navigating AAC decisions involves generating mental representations for possible decision outcomes, and relative activation of these representations may bias subsequent decision-making towards approaching reward or avoiding threat accordingly.
Collapse
Affiliation(s)
- Nicole Moughrabi
- Department of Psychiatry and Behavioral Sciences, University of Texas at Austin
| | - Chloe Botsford
- Department of Psychiatry, University of Wisconsin-Madison
| | | | - Ameera Azar
- Department of Psychiatry and Behavioral Sciences, University of Texas at Austin
| | | | - Jaryd Hiser
- Department of Psychiatry, University of Wisconsin-Madison
| | - Kevin M Crombie
- Department of Psychiatry and Behavioral Sciences, University of Texas at Austin
| | - Joseph E Dunsmoor
- Department of Psychiatry and Behavioral Sciences, University of Texas at Austin; Institute for Early Life Adversity Research, University of Texas at Austin
| | - Zach Stowe
- Department of Psychiatry, University of Wisconsin-Madison
| | - Josh M Cisler
- Department of Psychiatry and Behavioral Sciences, University of Texas at Austin; Institute for Early Life Adversity Research, University of Texas at Austin.
| |
Collapse
|
71
|
Pearce AL, Fuchs BA, Keller KL. The role of reinforcement learning and value-based decision-making frameworks in understanding food choice and eating behaviors. Front Nutr 2022; 9:1021868. [PMID: 36483928 PMCID: PMC9722736 DOI: 10.3389/fnut.2022.1021868] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 11/04/2022] [Indexed: 11/23/2022] Open
Abstract
The obesogenic food environment includes easy access to highly-palatable, energy-dense, "ultra-processed" foods that are heavily marketed to consumers; therefore, it is critical to understand the neurocognitive processes the underlie overeating in response to environmental food-cues (e.g., food images, food branding/advertisements). Eating habits are learned through reinforcement, which is the process through which environmental food cues become valued and influence behavior. This process is supported by multiple behavioral control systems (e.g., Pavlovian, Habitual, Goal-Directed). Therefore, using neurocognitive frameworks for reinforcement learning and value-based decision-making can improve our understanding of food-choice and eating behaviors. Specifically, the role of reinforcement learning in eating behaviors was considered using the frameworks of (1) Sign-versus Goal-Tracking Phenotypes; (2) Model-Free versus Model-Based; and (3) the Utility or Value-Based Model. The sign-and goal-tracking phenotypes may contribute a mechanistic insight on the role of food-cue incentive salience in two prevailing models of overconsumption-the Extended Behavioral Susceptibility Theory and the Reactivity to Embedded Food Cues in Advertising Model. Similarly, the model-free versus model-based framework may contribute insight to the Extended Behavioral Susceptibility Theory and the Healthy Food Promotion Model. Finally, the value-based model provides a framework for understanding how all three learning systems are integrated to influence food choice. Together, these frameworks can provide mechanistic insight to existing models of food choice and overconsumption and may contribute to the development of future prevention and treatment efforts.
Collapse
Affiliation(s)
- Alaina L. Pearce
- Social Science Research Institute, Pennsylvania State University, University Park, PA, United States
- Department of Nutritional Sciences, Pennsylvania State University, University Park, PA, United States
| | - Bari A. Fuchs
- Department of Nutritional Sciences, Pennsylvania State University, University Park, PA, United States
| | - Kathleen L. Keller
- Social Science Research Institute, Pennsylvania State University, University Park, PA, United States
- Department of Nutritional Sciences, Pennsylvania State University, University Park, PA, United States
- Department of Food Science, Pennsylvania State University, University Park, PA, United States
| |
Collapse
|
72
|
Colas JT, Dundon NM, Gerraty RT, Saragosa‐Harris NM, Szymula KP, Tanwisuth K, Tyszka JM, van Geen C, Ju H, Toga AW, Gold JI, Bassett DS, Hartley CA, Shohamy D, Grafton ST, O'Doherty JP. Reinforcement learning with associative or discriminative generalization across states and actions: fMRI at 3 T and 7 T. Hum Brain Mapp 2022; 43:4750-4790. [PMID: 35860954 PMCID: PMC9491297 DOI: 10.1002/hbm.25988] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 05/20/2022] [Accepted: 06/10/2022] [Indexed: 11/12/2022] Open
Abstract
The model-free algorithms of "reinforcement learning" (RL) have gained clout across disciplines, but so too have model-based alternatives. The present study emphasizes other dimensions of this model space in consideration of associative or discriminative generalization across states and actions. This "generalized reinforcement learning" (GRL) model, a frugal extension of RL, parsimoniously retains the single reward-prediction error (RPE), but the scope of learning goes beyond the experienced state and action. Instead, the generalized RPE is efficiently relayed for bidirectional counterfactual updating of value estimates for other representations. Aided by structural information but as an implicit rather than explicit cognitive map, GRL provided the most precise account of human behavior and individual differences in a reversal-learning task with hierarchical structure that encouraged inverse generalization across both states and actions. Reflecting inference that could be true, false (i.e., overgeneralization), or absent (i.e., undergeneralization), state generalization distinguished those who learned well more so than action generalization. With high-resolution high-field fMRI targeting the dopaminergic midbrain, the GRL model's RPE signals (alongside value and decision signals) were localized within not only the striatum but also the substantia nigra and the ventral tegmental area, including specific effects of generalization that also extend to the hippocampus. Factoring in generalization as a multidimensional process in value-based learning, these findings shed light on complexities that, while challenging classic RL, can still be resolved within the bounds of its core computations.
Collapse
Affiliation(s)
- Jaron T. Colas
- Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
- Computation and Neural Systems Program, California Institute of TechnologyPasadenaCaliforniaUSA
| | - Neil M. Dundon
- Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA
- Department of Child and Adolescent Psychiatry, Psychotherapy, and PsychosomaticsUniversity of FreiburgFreiburg im BreisgauGermany
| | - Raphael T. Gerraty
- Department of PsychologyColumbia UniversityNew YorkNew YorkUSA
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA
- Center for Science and SocietyColumbia UniversityNew YorkNew YorkUSA
| | - Natalie M. Saragosa‐Harris
- Department of PsychologyNew York UniversityNew YorkNew YorkUSA
- Department of PsychologyUniversity of CaliforniaLos AngelesCaliforniaUSA
| | - Karol P. Szymula
- Department of BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Koranis Tanwisuth
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
- Department of PsychologyUniversity of CaliforniaBerkeleyCaliforniaUSA
| | - J. Michael Tyszka
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
| | - Camilla van Geen
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA
- Department of PsychologyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Harang Ju
- Neuroscience Graduate GroupUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Arthur W. Toga
- Laboratory of Neuro ImagingUSC Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Joshua I. Gold
- Department of NeuroscienceUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Dani S. Bassett
- Department of BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of Electrical and Systems EngineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of NeurologyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of PsychiatryUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of Physics and AstronomyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Santa Fe InstituteSanta FeNew MexicoUSA
| | - Catherine A. Hartley
- Department of PsychologyNew York UniversityNew YorkNew YorkUSA
- Center for Neural ScienceNew York UniversityNew YorkNew YorkUSA
| | - Daphna Shohamy
- Department of PsychologyColumbia UniversityNew YorkNew YorkUSA
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA
- Kavli Institute for Brain ScienceColumbia UniversityNew YorkNew YorkUSA
| | - Scott T. Grafton
- Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA
| | - John P. O'Doherty
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
- Computation and Neural Systems Program, California Institute of TechnologyPasadenaCaliforniaUSA
| |
Collapse
|
73
|
The neuroanatomy of social trust predicts depression vulnerability. Sci Rep 2022; 12:16724. [PMID: 36202831 PMCID: PMC9537537 DOI: 10.1038/s41598-022-20443-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 09/13/2022] [Indexed: 12/01/2022] Open
Abstract
Trust attitude is a social personality trait linked with the estimation of others’ trustworthiness. Trusting others, however, can have substantial negative effects on mental health, such as the development of depression. Despite significant progress in understanding the neurobiology of trust, whether the neuroanatomy of trust is linked with depression vulnerability remains unknown. To investigate a link between the neuroanatomy of trust and depression vulnerability, we assessed trust and depressive symptoms and employed neuroimaging to acquire brain structure data of healthy participants. A high depressive symptom score was used as an indicator of depression vulnerability. The neuroanatomical results observed with the healthy sample were validated in a sample of clinically diagnosed depressive patients. We found significantly higher depressive symptoms among low trusters than among high trusters. Neuroanatomically, low trusters and depressive patients showed similar volume reduction in brain regions implicated in social cognition, including the dorsolateral prefrontal cortex (DLPFC), dorsomedial PFC, posterior cingulate, precuneus, and angular gyrus. Furthermore, the reduced volume of the DLPFC and precuneus mediated the relationship between trust and depressive symptoms. These findings contribute to understanding social- and neural-markers of depression vulnerability and may inform the development of social interventions to prevent pathological depression.
Collapse
|
74
|
Karvelis P, Charlton CE, Allohverdi SG, Bedford P, Hauke DJ, Diaconescu AO. Computational approaches to treatment response prediction in major depression using brain activity and behavioral data: A systematic review. Netw Neurosci 2022; 6:1066-1103. [PMID: 38800454 PMCID: PMC11117101 DOI: 10.1162/netn_a_00233] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 01/14/2022] [Indexed: 05/29/2024] Open
Abstract
Major depressive disorder is a heterogeneous diagnostic category with multiple available treatments. With the goal of optimizing treatment selection, researchers are developing computational models that attempt to predict treatment response based on various pretreatment measures. In this paper, we review studies that use brain activity data to predict treatment response. Our aim is to highlight and clarify important methodological differences between various studies that relate to the incorporation of domain knowledge, specifically within two approaches delineated as data-driven and theory-driven. We argue that theory-driven generative modeling, which explicitly models information processing in the brain and thus can capture disease mechanisms, is a promising emerging approach that is only beginning to be utilized in treatment response prediction. The predictors extracted via such models could improve interpretability, which is critical for clinical decision-making. We also identify several methodological limitations across the reviewed studies and provide suggestions for addressing them. Namely, we consider problems with dichotomizing treatment outcomes, the importance of investigating more than one treatment in a given study for differential treatment response predictions, the need for a patient-centered approach for defining treatment outcomes, and finally, the use of internal and external validation methods for improving model generalizability.
Collapse
Affiliation(s)
- Povilas Karvelis
- Krembil Centre for Neuroinformatics, Centre for Addiction and Mental Health (CAMH), Toronto, ON, Canada
| | - Colleen E. Charlton
- Krembil Centre for Neuroinformatics, Centre for Addiction and Mental Health (CAMH), Toronto, ON, Canada
| | - Shona G. Allohverdi
- Krembil Centre for Neuroinformatics, Centre for Addiction and Mental Health (CAMH), Toronto, ON, Canada
| | - Peter Bedford
- Krembil Centre for Neuroinformatics, Centre for Addiction and Mental Health (CAMH), Toronto, ON, Canada
| | - Daniel J. Hauke
- Krembil Centre for Neuroinformatics, Centre for Addiction and Mental Health (CAMH), Toronto, ON, Canada
- Department of Psychiatry (UPK), University of Basel, Basel, Switzerland
- Department of Mathematics and Computer Science, University of Basel, Basel, Switzerland
| | - Andreea O. Diaconescu
- Krembil Centre for Neuroinformatics, Centre for Addiction and Mental Health (CAMH), Toronto, ON, Canada
- University of Toronto, Department of Psychiatry, Toronto, ON, Canada
- Institute of Medical Sciences, University of Toronto, Toronto, ON, Canada
- Department of Psychology, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
75
|
Wang M, Wang Y, Sai AMVV, Liu Z, Gao Y, Tong X, Cai Z. Task assignment for hybrid scenarios in spatial crowdsourcing : A Q-Learning-based approach. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
76
|
Dutta CN, Christov-Moore L, Ombao H, Douglas PK. Neuroprotection in late life attention-deficit/hyperactivity disorder: A review of pharmacotherapy and phenotype across the lifespan. Front Hum Neurosci 2022; 16:938501. [PMID: 36226261 PMCID: PMC9548548 DOI: 10.3389/fnhum.2022.938501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2022] [Accepted: 08/16/2022] [Indexed: 11/13/2022] Open
Abstract
For decades, psychostimulants have been the gold standard pharmaceutical treatment for attention-deficit/hyperactivity disorder (ADHD). In the United States, an astounding 9% of all boys and 4% of girls will be prescribed stimulant drugs at some point during their childhood. Recent meta-analyses have revealed that individuals with ADHD have reduced brain volume loss later in life (>60 y.o.) compared to the normal aging brain, which suggests that either ADHD or its treatment may be neuroprotective. Crucially, these neuroprotective effects were significant in brain regions (e.g., hippocampus, amygdala) where severe volume loss is linked to cognitive impairment and Alzheimer's disease. Historically, the ADHD diagnosis and its pharmacotherapy came about nearly simultaneously, making it difficult to evaluate their effects in isolation. Certain evidence suggests that psychostimulants may normalize structural brain changes typically observed in the ADHD brain. If ADHD itself is neuroprotective, perhaps exercising the brain, then psychostimulants may not be recommended across the lifespan. Alternatively, if stimulant drugs are neuroprotective, then this class of medications may warrant further investigation for their therapeutic effects. Here, we take a bottom-up holistic approach to review the psychopharmacology of ADHD in the context of recent models of attention. We suggest that future studies are greatly needed to better appreciate the interactions amongst an ADHD diagnosis, stimulant treatment across the lifespan, and structure-function alterations in the aging brain.
Collapse
Affiliation(s)
- Cintya Nirvana Dutta
- Biostatistics Group, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
- School of Modeling, Simulation, and Training, and Computer Science, University of Central Florida, Orlando, FL, United States
| | - Leonardo Christov-Moore
- Brain and Creativity Institute, University of Southern California, Los Angeles, CA, United States
| | - Hernando Ombao
- Biostatistics Group, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Pamela K. Douglas
- School of Modeling, Simulation, and Training, and Computer Science, University of Central Florida, Orlando, FL, United States
- Department of Psychiatry and Biobehavioral Medicine, University of California, Los Angeles, Los Angeles, CA, United States
| |
Collapse
|
77
|
Roberts SGB, Dunbar RIM, Roberts AI. Communicative roots of complex sociality and cognition: neuropsychological mechanisms underpinning the processing of social information. Philos Trans R Soc Lond B Biol Sci 2022; 377:20210295. [PMID: 35934969 PMCID: PMC9358321 DOI: 10.1098/rstb.2021.0295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 04/08/2022] [Indexed: 11/12/2022] Open
Abstract
Primate social bonds are described as being especially complex in their nature, and primates have unusually large brains for their body size compared to other mammals. Communication in primates has attracted considerable attention because of the important role it plays in social bonding. It has been proposed that differentiated social relationships are cognitively complex because primates need to continuously update their knowledge about different types of social bonds. Therefore, primates infer whether an opportunity for social interaction is rewarding (valuable to individual goals) based on their knowledge of the social relationships of the interactants. However, exposure to distraction and stress has detrimental effects on the dopaminergic system, suggesting that understanding social relationships as rewarding is affected in these conditions. This paper proposes that complex communication evolved to augment the capacity to form social relationships during stress through flexibly modifying intentionality in communication (audience checking, response waiting and elaboration). Intentional communication may upregulate dopamine dynamics to allow recognition that an interaction is rewarding during stress. By examining these associations between complexity of communication and stress, we provide new insights into the cognitive skills involved in forming social bonds in primates and the evolution of communication systems in both primates and humans. This article is part of the theme issue 'Cognition, communication and social bonds in primates'.
Collapse
Affiliation(s)
- Sam G. B. Roberts
- School of Psychology, Liverpool John Moores University, Liverpool L3 3AF, UK
| | - Robin I. M. Dunbar
- Department of Experimental Psychology, University of Oxford, Oxford OX2 6GG, UK
| | - Anna I. Roberts
- Institute of Human Biology and Evolution, Adam Mickiewicz University, Poznan, Poland
| |
Collapse
|
78
|
A computational model of inner speech supporting flexible goal-directed behaviour in Autism. Sci Rep 2022; 12:14198. [PMID: 35987942 PMCID: PMC9392752 DOI: 10.1038/s41598-022-18445-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 08/11/2022] [Indexed: 11/21/2022] Open
Abstract
Experimental and computational studies propose that inner speech boosts categorisation skills and executive functions, making human behaviour more focused and flexible. In addition, many clinical studies highlight a relationship between poor inner-speech and an executive impairment in autism spectrum condition (ASC), but contrasting findings are reported. Here we directly investigate the latter issue through a previously implemented and validated computational model of the Wisconsin Cards Sorting Tests. In particular, the model was applied to explore potential individual differences in cognitive flexibility and inner speech contribution in autistic and neurotypical participants. Our model predicts that the use of inner-speech could increase along the life-span of neurotypical participants but would be reduced in autistic ones. Although we found more attentional failures (i.e., wrong behavioural rule switches) in autistic children/teenagers and more perseverative behaviours in autistic young/older adults, only autistic children and older adults exhibited a lower performance (i.e., fewer consecutive correct rule switches) than matched control groups. Overall, our results corroborate the idea that the reduced use of inner speech could represent a disadvantage for autistic children and autistic older adults. Moreover, the results suggest that cognitive-behavioural therapies should focus on developing inner speech skills in autistic children as this could provide cognitive support throughout their whole life span.
Collapse
|
79
|
A comparison of reinforcement learning models of human spatial navigation. Sci Rep 2022; 12:13923. [PMID: 35978035 PMCID: PMC9385652 DOI: 10.1038/s41598-022-18245-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 08/08/2022] [Indexed: 11/09/2022] Open
Abstract
Reinforcement learning (RL) models have been influential in characterizing human learning and decision making, but few studies apply them to characterizing human spatial navigation and even fewer systematically compare RL models under different navigation requirements. Because RL can characterize one's learning strategies quantitatively and in a continuous manner, and one's consistency of using such strategies, it can provide a novel and important perspective for understanding the marked individual differences in human navigation and disentangle navigation strategies from navigation performance. One-hundred and fourteen participants completed wayfinding tasks in a virtual environment where different phases manipulated navigation requirements. We compared performance of five RL models (3 model-free, 1 model-based and 1 "hybrid") at fitting navigation behaviors in different phases. Supporting implications from prior literature, the hybrid model provided the best fit regardless of navigation requirements, suggesting the majority of participants rely on a blend of model-free (route-following) and model-based (cognitive mapping) learning in such navigation scenarios. Furthermore, consistent with a key prediction, there was a correlation in the hybrid model between the weight on model-based learning (i.e., navigation strategy) and the navigator's exploration vs. exploitation tendency (i.e., consistency of using such navigation strategy), which was modulated by navigation task requirements. Together, we not only show how computational findings from RL align with the spatial navigation literature, but also reveal how the relationship between navigation strategy and a person's consistency using such strategies changes as navigation requirements change.
Collapse
|
80
|
Guida P, Michiels M, Redgrave P, Luque D, Obeso I. An fMRI meta-analysis of the role of the striatum in everyday-life vs laboratory-developed habits. Neurosci Biobehav Rev 2022; 141:104826. [PMID: 35963543 DOI: 10.1016/j.neubiorev.2022.104826] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 07/17/2022] [Accepted: 08/09/2022] [Indexed: 11/30/2022]
Abstract
The dorsolateral striatum plays a critical role in the acquisition and expression of stimulus-response habits that are learned in experimental laboratories. Here, we use meta-analytic procedures to contrast the neural circuits activated by laboratory-acquired habits with those activated by stimulus-response behaviours acquired in everyday-life. We confirmed that newly learned habits rely more on the anterior putamen with activation extending into caudate and nucleus accumbens. Motor and associative components of everyday-life habits were identified. We found that motor-dominant stimulus-response associations developed outside the laboratory primarily engaged posterior dorsal putamen, supplementary motor area (SMA) and cerebellum. Importantly, associative components were also represented in the posterior putamen. Thus, common neural representations for both naturalistic and laboratory-based habits were found in the left posterior and right anterior putamen. These findings suggest a partial common striatal substrate for habitual actions that are performed predominantly by stimulus-response associations represented in the posterior striatum. The overlapping neural substrates for laboratory and everyday-life habits supports the use of both methods for the analysis of habitual behaviour.
Collapse
Affiliation(s)
- Pasqualina Guida
- HM CINAC, Centro Integral de Neurociencias AC. Hospital Universitario HM Puerta del Sur, HM Hospitales, Madrid, Spain; CIBERNED, Instituto de Salud Carlos III, Madrid, Spain; Ph.D. Program in Neuroscience, Universidad Autónoma de Madrid Cajal Institute, Madrid 28029, Spain
| | - Mario Michiels
- HM CINAC, Centro Integral de Neurociencias AC. Hospital Universitario HM Puerta del Sur, HM Hospitales, Madrid, Spain; CIBERNED, Instituto de Salud Carlos III, Madrid, Spain; Ph.D. Program in Neuroscience, Universidad Autónoma de Madrid Cajal Institute, Madrid 28029, Spain
| | - Peter Redgrave
- Department of Psychology, University of Sheffield, Sheffield S10 2TN, UK
| | - David Luque
- Departamento de Psicología Básica, Universidad Autónoma de Madrid, Madrid, Spain; Departamento de Psicología Básica, Universidad de Málaga, Madrid, Spain
| | - Ignacio Obeso
- HM CINAC, Centro Integral de Neurociencias AC. Hospital Universitario HM Puerta del Sur, HM Hospitales, Madrid, Spain; CIBERNED, Instituto de Salud Carlos III, Madrid, Spain; Psychobiology department, Complutense University of Madrid, Madrid, Spain.
| |
Collapse
|
81
|
Carruthers P, Williams DM. Model-free metacognition. Cognition 2022; 225:105117. [DOI: 10.1016/j.cognition.2022.105117] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 03/25/2022] [Accepted: 03/31/2022] [Indexed: 01/08/2023]
|
82
|
Wu Y, Morita M, Izawa J. Reward prediction errors, not sensory prediction errors, play a major role in model selection in human reinforcement learning. Neural Netw 2022; 154:109-121. [PMID: 35872516 DOI: 10.1016/j.neunet.2022.07.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 03/25/2022] [Accepted: 07/06/2022] [Indexed: 10/17/2022]
Abstract
Model-based reinforcement learning enables an agent to learn in variable environments and tasks by optimizing its actions based on the predicted states and outcomes. This mechanism has also been considered in the brain. However, exactly how the brain selects an appropriate model for confronting environments has remained unclear. Here, we investigated the model selection algorithm in the human brain during a reinforcement learning task. One primary theory of model selection in the brain is based on sensory prediction errors. Here, we compared this theory with an alternative possibility of internal model selection with reward prediction errors. To compare these two theories, we devised a switching experiment from a first-order Markov decision process to a second-order Markov decision process that provides either reward- or sensory prediction error regarding environmental change. We tested two representative computational models driven by different prediction errors. One is the sensory prediction-error-driven Bayesian algorithm, which has been discussed as a representative internal model selection algorithm in the animal reinforcement learning task. The other is the reward-prediction-error-driven policy gradient algorithm. We compared the simulation results of these two computational models with human reinforcement learning behaviors. The model fitting result supports that the policy gradient algorithm is preferable to the Bayesian algorithm. This suggests that the human brain employs the reward prediction error to select an appropriate internal model in the reinforcement learning task.
Collapse
Affiliation(s)
- Yihao Wu
- School of Integrative and Global Majors, University of Tsukuba, Tennodai 1-1-1, Tsukuba, Ibaraki, 305-8573, Japan.
| | - Masahiko Morita
- Faculty of Engineering, Information and Systems, University of Tsukuba, Tennodai 1-1-1, Tsukuba, Ibaraki, 305-8573, Japan.
| | - Jun Izawa
- Faculty of Engineering, Information and Systems, University of Tsukuba, Tennodai 1-1-1, Tsukuba, Ibaraki, 305-8573, Japan.
| |
Collapse
|
83
|
Glitz L, Juechems K, Summerfield C, Garrett N. Model Sharing in the Human Medial Temporal Lobe. J Neurosci 2022; 42:5410-5426. [PMID: 35606146 PMCID: PMC7613027 DOI: 10.1523/jneurosci.1978-21.2022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 04/20/2022] [Accepted: 04/23/2022] [Indexed: 11/21/2022] Open
Abstract
Effective planning involves knowing where different actions take us. However, natural environments are rich and complex, leading to an exponential increase in memory demand as a plan grows in depth. One potential solution is to filter out features of the environment irrelevant to the task at hand. This enables a shared model of transition dynamics to be used for planning over a range of different input features. Here, we asked human participants (13 male, 16 female) to perform a sequential decision-making task, designed so that knowledge should be integrated independently of the input features (visual cues) present in one case but not in another. Participants efficiently switched between using a low-dimensional (cue independent) and a high-dimensional (cue specific) representation of state transitions. fMRI data identified the medial temporal lobe as a locus for learning state transitions. Within this region, multivariate patterns of BOLD responses were less correlated between trials with differing input features but similar state associations in the high dimensional than in the low dimensional case, suggesting that these patterns switched between separable (specific to input features) and shared (invariant to input features) transition models. Finally, we show that transition models are updated more strongly following the receipt of positive compared with negative outcomes, a finding that challenges conventional theories of planning. Together, these findings propose a computational and neural account of how information relevant for planning can be shared and segmented in response to the vast array of contextual features we encounter in our world.SIGNIFICANCE STATEMENT Effective planning involves maintaining an accurate model of which actions take us to which locations. But in a world awash with information, mapping actions to states with the right level of complexity is critical. Using a new decision-making "heist task" in conjunction with computational modeling and fMRI, we show that patterns of BOLD responses in the medial temporal lobe-a brain region key for prospective planning-become less sensitive to the presence of visual features when these are irrelevant to the task at hand. By flexibly adapting the complexity of task-state representations in this way, state-action mappings learned under one set of features can be used to plan in the presence of others.
Collapse
Affiliation(s)
- Leonie Glitz
- Department of Experimental Psychology, University of Oxford, Oxford OX2 6HG, United Kingdom
| | - Keno Juechems
- Department of Experimental Psychology, University of Oxford, Oxford OX2 6HG, United Kingdom
| | | | - Neil Garrett
- Department of Experimental Psychology, University of Oxford, Oxford OX2 6HG, United Kingdom
- School of Psychology, University of East Anglia, Norwich NR4 7TJ, United Kingdom
| |
Collapse
|
84
|
Morris LS, Grehl MM, Rutter SB, Mehta M, Westwater ML. On what motivates us: a detailed review of intrinsic v. extrinsic motivation. Psychol Med 2022; 52:1801-1816. [PMID: 35796023 PMCID: PMC9340849 DOI: 10.1017/s0033291722001611] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 05/02/2022] [Accepted: 05/12/2022] [Indexed: 12/02/2022]
Abstract
Motivational processes underlie behaviors that enrich the human experience, and impairments in motivation are commonly observed in psychiatric illness. While motivated behavior is often examined with respect to extrinsic reinforcers, not all actions are driven by reactions to external stimuli; some are driven by 'intrinsic' motivation. Intrinsically motivated behaviors are computationally similar to extrinsically motivated behaviors, in that they strive to maximize reward value and minimize punishment. However, our understanding of the neurocognitive mechanisms that underlie intrinsically motivated behavior remains limited. Dysfunction in intrinsic motivation represents an important trans-diagnostic facet of psychiatric symptomology, but due to a lack of clear consensus, the contribution of intrinsic motivation to psychopathology remains poorly understood. This review aims to provide an overview of the conceptualization, measurement, and neurobiology of intrinsic motivation, providing a framework for understanding its potential contributions to psychopathology and its treatment. Distinctions between intrinsic and extrinsic motivation are discussed, including divergence in the types of associated rewards or outcomes that drive behavioral action and choice. A useful framework for understanding intrinsic motivation, and thus separating it from extrinsic motivation, is developed and suggestions for optimization of paradigms to measure intrinsic motivation are proposed.
Collapse
Affiliation(s)
- Laurel S. Morris
- Department of Psychiatry, Depression and Anxiety Center for Discovery and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA
| | - Mora M. Grehl
- Department of Psychology, Temple University, Philadelphia, PA 19122 USA
| | - Sarah B. Rutter
- Department of Psychiatry, Depression and Anxiety Center for Discovery and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA
| | - Marishka Mehta
- Department of Psychiatry, Depression and Anxiety Center for Discovery and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA
| | - Margaret L. Westwater
- Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, CT 06510 USA
| |
Collapse
|
85
|
Jiang Y, Wu H, Mi Q, Zhu L. Neurocomputations of strategic behavior: From iterated to novel interactions. WIRES COGNITIVE SCIENCE 2022; 13:e1598. [PMID: 35441465 PMCID: PMC9542218 DOI: 10.1002/wcs.1598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Revised: 03/27/2022] [Accepted: 03/29/2022] [Indexed: 11/15/2022]
Abstract
Strategic interactions, where an individual's payoff depends on the decisions of multiple intelligent agents, are ubiquitous among social animals. They span a variety of important social behaviors such as competition, cooperation, coordination, and communication, and often involve complex, intertwining cognitive operations ranging from basic reward processing to higher‐order mentalization. Here, we review the progress and challenges in probing the neural and cognitive mechanisms of strategic behavior of interacting individuals, drawing an analogy to recent developments in studies of reward‐seeking behavior, in particular, how research focuses in the field of strategic behavior have been expanded from adaptive behavior based on trial‐and‐error to flexible decisions based on limited prior experience. We highlight two important research questions in the field of strategic behavior: (i) How does the brain exploit past experience for learning to behave strategically? and (ii) How does the brain decide what to do in novel strategic situations in the absence of direct experience? For the former, we discuss the utility of learning models that have effectively connected various types of neural data with strategic learning behavior and helped elucidate the interplay among multiple learning processes. For the latter, we review the recent evidence and propose a neural generative mechanism by which the brain makes novel strategic choices through simulating others' goal‐directed actions according to rational or bounded‐rational principles obtained through indirect social knowledge. This article is categorized under:Economics > Interactive Decision‐Making Psychology > Reasoning and Decision Making Neuroscience > Cognition
Collapse
Affiliation(s)
- Yaomin Jiang
- School of Psychological and Cognitive Sciences, Beijing Key Laboratory of Behavior and Mental Health, IDG/McGovern Institute for Brain Research, Peking‐Tsinghua Center for Life Sciences Peking University Beijing China
| | - Hai‐Tao Wu
- School of Psychological and Cognitive Sciences, Beijing Key Laboratory of Behavior and Mental Health, IDG/McGovern Institute for Brain Research, Peking‐Tsinghua Center for Life Sciences Peking University Beijing China
| | - Qingtian Mi
- School of Psychological and Cognitive Sciences, Beijing Key Laboratory of Behavior and Mental Health, IDG/McGovern Institute for Brain Research, Peking‐Tsinghua Center for Life Sciences Peking University Beijing China
| | - Lusha Zhu
- School of Psychological and Cognitive Sciences, Beijing Key Laboratory of Behavior and Mental Health, IDG/McGovern Institute for Brain Research, Peking‐Tsinghua Center for Life Sciences Peking University Beijing China
| |
Collapse
|
86
|
Fermin ASR, Friston K, Yamawaki S. An insula hierarchical network architecture for active interoceptive inference. ROYAL SOCIETY OPEN SCIENCE 2022; 9:220226. [PMID: 35774133 PMCID: PMC9240682 DOI: 10.1098/rsos.220226] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 06/09/2022] [Indexed: 05/05/2023]
Abstract
In the brain, the insular cortex receives a vast amount of interoceptive information, ascending through deep brain structures, from multiple visceral organs. The unique hierarchical and modular architecture of the insula suggests specialization for processing interoceptive afferents. Yet, the biological significance of the insula's neuroanatomical architecture, in relation to deep brain structures, remains obscure. In this opinion piece, we propose the Insula Hierarchical Modular Adaptive Interoception Control (IMAC) model to suggest that insula modules (granular, dysgranular and agranular), forming parallel networks with the prefrontal cortex and striatum, are specialized to form higher order interoceptive representations. These interoceptive representations are recruited in a context-dependent manner to support habitual, model-based and exploratory control of visceral organs and physiological processes. We discuss how insula interoceptive representations may give rise to conscious feelings that best explain lower order deep brain interoceptive representations, and how the insula may serve to defend the body and mind against pathological depression.
Collapse
Affiliation(s)
- Alan S. R. Fermin
- Center for Brain, Mind and Kansei Sciences Research, Hiroshima University, Hiroshima, Japan
| | - Karl Friston
- The Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, London, England
| | - Shigeto Yamawaki
- Center for Brain, Mind and Kansei Sciences Research, Hiroshima University, Hiroshima, Japan
| |
Collapse
|
87
|
Hogeveen J, Mullins TS, Romero JD, Eversole E, Rogge-Obando K, Mayer AR, Costa VD. The neurocomputational bases of explore-exploit decision-making. Neuron 2022; 110:1869-1879.e5. [PMID: 35390278 PMCID: PMC9167768 DOI: 10.1016/j.neuron.2022.03.014] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 12/11/2021] [Accepted: 03/10/2022] [Indexed: 02/04/2023]
Abstract
Flexible decision-making requires animals to forego immediate rewards (exploitation) and try novel choice options (exploration) to discover if they are preferable to familiar alternatives. Using the same task and a partially observable Markov decision process (POMDP) model to quantify the value of choices, we first determined that the computational basis for managing explore-exploit tradeoffs is conserved across monkeys and humans. We then used fMRI to identify where in the human brain the immediate value of exploitative choices and relative uncertainty about the value of exploratory choices were encoded. Consistent with prior neurophysiological evidence in monkeys, we observed divergent encoding of reward value and uncertainty in prefrontal and parietal regions, including frontopolar cortex, and parallel encoding of these computations in motivational regions including the amygdala, ventral striatum, and orbitofrontal cortex. These results clarify the interplay between prefrontal and motivational circuits that supports adaptive explore-exploit decisions in humans and nonhuman primates.
Collapse
Affiliation(s)
- Jeremy Hogeveen
- Department of Psychology, University of New Mexico, Albuquerque, NM 87131, USA; Psychology Clinical Neuroscience Center, University of New Mexico, Albuquerque, NM 87131, USA.
| | - Teagan S Mullins
- Department of Psychology, University of New Mexico, Albuquerque, NM 87131, USA; Psychology Clinical Neuroscience Center, University of New Mexico, Albuquerque, NM 87131, USA
| | - John D Romero
- Department of Psychology, University of New Mexico, Albuquerque, NM 87131, USA; Psychology Clinical Neuroscience Center, University of New Mexico, Albuquerque, NM 87131, USA
| | - Elizabeth Eversole
- Department of Psychology, University of New Mexico, Albuquerque, NM 87131, USA; Psychology Clinical Neuroscience Center, University of New Mexico, Albuquerque, NM 87131, USA
| | - Kimberly Rogge-Obando
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN 37235, USA
| | - Andrew R Mayer
- Department of Psychology, University of New Mexico, Albuquerque, NM 87131, USA; Department of Psychiatry & Behavioral Sciences, University of New Mexico School of Medicine, Albuquerque, NM 87131, USA; Department of Neurology, University of New Mexico School of Medicine, Albuquerque, NM 87131, USA; The Mind Research Network/Lovelace Biomedical Research Institute, Pete & Nancy Domenici Hall, Albuquerque, NM 87106, USA
| | - Vincent D Costa
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR 97239, USA; Division of Neuroscience, Oregon National Primate Research Center, Beaverton, OR 97006, USA.
| |
Collapse
|
88
|
Castro-Rodrigues P, Akam T, Snorasson I, Camacho M, Paixão V, Maia A, Barahona-Corrêa JB, Dayan P, Simpson HB, Costa RM, Oliveira-Maia AJ. Explicit knowledge of task structure is a primary determinant of human model-based action. Nat Hum Behav 2022; 6:1126-1141. [PMID: 35589826 DOI: 10.1038/s41562-022-01346-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 03/19/2022] [Accepted: 03/31/2022] [Indexed: 11/09/2022]
Abstract
Explicit information obtained through instruction profoundly shapes human choice behaviour. However, this has been studied in computationally simple tasks, and it is unknown how model-based and model-free systems, respectively generating goal-directed and habitual actions, are affected by the absence or presence of instructions. We assessed behaviour in a variant of a computationally more complex decision-making task, before and after providing information about task structure, both in healthy volunteers and in individuals suffering from obsessive-compulsive or other disorders. Initial behaviour was model-free, with rewards directly reinforcing preceding actions. Model-based control, employing predictions of states resulting from each action, emerged with experience in a minority of participants, and less in those with obsessive-compulsive disorder. Providing task structure information strongly increased model-based control, similarly across all groups. Thus, in humans, explicit task structural knowledge is a primary determinant of model-based reinforcement learning and is most readily acquired from instruction rather than experience.
Collapse
Affiliation(s)
- Pedro Castro-Rodrigues
- Champalimaud Clinical Centre, Champalimaud Foundation, Lisbon, Portugal.,Champalimaud Research, Champalimaud Foundation, Lisbon, Portugal.,NOVA Medical School, NMS, Universidade Nova de Lisboa, Lisbon, Portugal.,Centro Hospitalar Psiquiátrico de Lisboa, Lisbon, Portugal
| | - Thomas Akam
- Champalimaud Research, Champalimaud Foundation, Lisbon, Portugal.,Department of Experimental Psychology, University of Oxford, Oxford, UK
| | - Ivar Snorasson
- Center for Obsessive-Compulsive & Related Disorders, New York State Psychiatric Institute, New York, NY, USA
| | - Marta Camacho
- Champalimaud Clinical Centre, Champalimaud Foundation, Lisbon, Portugal.,Champalimaud Research, Champalimaud Foundation, Lisbon, Portugal.,John Van Geest Center for Brain Repair, University of Cambridge, Cambridge, UK
| | - Vitor Paixão
- Champalimaud Research, Champalimaud Foundation, Lisbon, Portugal
| | - Ana Maia
- Champalimaud Clinical Centre, Champalimaud Foundation, Lisbon, Portugal.,Champalimaud Research, Champalimaud Foundation, Lisbon, Portugal.,NOVA Medical School, NMS, Universidade Nova de Lisboa, Lisbon, Portugal.,Department of Psychiatry and Mental Health, Centro Hospitalar de Lisboa Ocidental, Lisbon, Portugal
| | - J Bernardo Barahona-Corrêa
- Champalimaud Clinical Centre, Champalimaud Foundation, Lisbon, Portugal.,Champalimaud Research, Champalimaud Foundation, Lisbon, Portugal.,NOVA Medical School, NMS, Universidade Nova de Lisboa, Lisbon, Portugal
| | - Peter Dayan
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany.,The University of Tübingen, Tübingen, Germany
| | - H Blair Simpson
- Center for Obsessive-Compulsive & Related Disorders, New York State Psychiatric Institute, New York, NY, USA.,Department of Psychiatry, Columbia University, New York, NY, USA
| | - Rui M Costa
- Champalimaud Research, Champalimaud Foundation, Lisbon, Portugal.,NOVA Medical School, NMS, Universidade Nova de Lisboa, Lisbon, Portugal.,Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Albino J Oliveira-Maia
- Champalimaud Clinical Centre, Champalimaud Foundation, Lisbon, Portugal. .,Champalimaud Research, Champalimaud Foundation, Lisbon, Portugal. .,NOVA Medical School, NMS, Universidade Nova de Lisboa, Lisbon, Portugal.
| |
Collapse
|
89
|
Ho MK, Abel D, Correa CG, Littman ML, Cohen JD, Griffiths TL. People construct simplified mental representations to plan. Nature 2022; 606:129-136. [PMID: 35589843 DOI: 10.1038/s41586-022-04743-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Accepted: 04/07/2022] [Indexed: 11/09/2022]
Abstract
One of the most striking features of human cognition is the ability to plan. Two aspects of human planning stand out-its efficiency and flexibility. Efficiency is especially impressive because plans must often be made in complex environments, and yet people successfully plan solutions to many everyday problems despite having limited cognitive resources1-3. Standard accounts in psychology, economics and artificial intelligence have suggested that human planning succeeds because people have a complete representation of a task and then use heuristics to plan future actions in that representation4-11. However, this approach generally assumes that task representations are fixed. Here we propose that task representations can be controlled and that such control provides opportunities to quickly simplify problems and more easily reason about them. We propose a computational account of this simplification process and, in a series of preregistered behavioural experiments, show that it is subject to online cognitive control12-14 and that people optimally balance the complexity of a task representation and its utility for planning and acting. These results demonstrate how strategically perceiving and conceiving problems facilitates the effective use of limited cognitive resources.
Collapse
Affiliation(s)
- Mark K Ho
- Department of Psychology, Princeton University, Princeton, NJ, USA. .,Department of Computer Science, Princeton University, Princeton, NJ, USA.
| | - David Abel
- Department of Computer Science, Brown University, Providence, RI, USA.,DeepMind, London, UK
| | - Carlos G Correa
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Michael L Littman
- Department of Computer Science, Brown University, Providence, RI, USA
| | - Jonathan D Cohen
- Department of Psychology, Princeton University, Princeton, NJ, USA.,Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Thomas L Griffiths
- Department of Psychology, Princeton University, Princeton, NJ, USA.,Department of Computer Science, Princeton University, Princeton, NJ, USA
| |
Collapse
|
90
|
Bolenz F, Profitt MF, Stechbarth F, Eppinger B, Strobel A. Need for cognition does not account for individual differences in metacontrol of decision making. Sci Rep 2022; 12:8240. [PMID: 35581395 PMCID: PMC9114337 DOI: 10.1038/s41598-022-12341-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Accepted: 05/09/2022] [Indexed: 12/05/2022] Open
Abstract
Humans show metacontrol of decision making, that is they adapt their reliance on decision-making strategies toward situational differences such as differences in reward magnitude. Specifically, when higher rewards are at stake, individuals increase reliance on a more accurate but cognitively effortful strategy. We investigated whether the personality trait Need for Cognition (NFC) explains individual differences in metacontrol. Based on findings of cognitive effort expenditure in executive functions, we expected more metacontrol in individuals low in NFC. In two independent studies, metacontrol was assessed by means of a decision-making task that dissociates different reinforcement-learning strategies and in which reward magnitude was manipulated across trials. In contrast to our expectations, NFC did not account for individual differences in metacontrol of decision making. In fact, a Bayesian analysis provided moderate to strong evidence against a relationship between NFC and metacontrol. Beyond this, there was no consistent evidence for relationship between NFC and overall model-based decision making. These findings show that the effect of rewards on the engagement of effortful decision-making strategies is largely independent of the intrinsic motivation for engaging in cognitively effortful tasks and suggest a differential role of NFC for the regulation of cognitive effort in decision making and executive functions.
Collapse
Affiliation(s)
- Florian Bolenz
- Faculty of Psychology, Technische Universität Dresden, Dresden, Germany. .,Max Planck Institute for Human Development, Lentzeallee 94, 14195, Berlin, Germany. .,Cluster of Excellence "Science of Intelligence", Technische Universität Berlin, Berlin, Germany.
| | - Maxine F Profitt
- Department of Psychology, Concordia University, Montreal, Canada
| | - Fabian Stechbarth
- Faculty of Psychology, Technische Universität Dresden, Dresden, Germany
| | - Ben Eppinger
- Faculty of Psychology, Technische Universität Dresden, Dresden, Germany.,Department of Psychology, Concordia University, Montreal, Canada.,PERFORM Centre, Concordia University, Montreal, Canada
| | - Alexander Strobel
- Faculty of Psychology, Technische Universität Dresden, Dresden, Germany
| |
Collapse
|
91
|
Patt VM, Palombo DJ, Esterman M, Verfaellie M. Hippocampal Contribution to Probabilistic Feedback Learning: Modeling Observation- and Reinforcement-based Processes. J Cogn Neurosci 2022; 34:1429-1446. [PMID: 35604353 DOI: 10.1162/jocn_a_01873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Simple probabilistic reinforcement learning is recognized as a striatum-based learning system, but in recent years, has also been associated with hippocampal involvement. This study examined whether such involvement may be attributed to observation-based learning (OL) processes, running in parallel to striatum-based reinforcement learning. A computational model of OL, mirroring classic models of reinforcement-based learning (RL), was constructed and applied to the neuroimaging data set of Palombo, Hayes, Reid, and Verfaellie (2019). Hippocampal contributions to value-based learning: Converging evidence from fMRI and amnesia. Cognitive, Affective & Behavioral Neuroscience, 19(3), 523-536. Results suggested that OL processes may indeed take place concomitantly to reinforcement learning and involve activation of the hippocampus and central orbitofrontal cortex. However, rather than independent mechanisms running in parallel, the brain correlates of the OL and RL prediction errors indicated collaboration between systems, with direct implication of the hippocampus in computations of the discrepancy between the expected and actual reinforcing values of actions. These findings are consistent with previous accounts of a role for the hippocampus in encoding the strength of observed stimulus-outcome associations, with updating of such associations through striatal reinforcement-based computations. In addition, enhanced negative RL prediction error signaling was found in the anterior insula with greater use of OL over RL processes. This result may suggest an additional mode of collaboration between the OL and RL systems, implicating the error monitoring network.
Collapse
Affiliation(s)
- Virginie M Patt
- VA Boston Healthcare System, MA.,Boston University School of Medicine, MA
| | | | - Michael Esterman
- VA Boston Healthcare System, MA.,Boston University School of Medicine, MA
| | - Mieke Verfaellie
- VA Boston Healthcare System, MA.,Boston University School of Medicine, MA
| |
Collapse
|
92
|
Rational arbitration between statistics and rules in human sequence processing. Nat Hum Behav 2022; 6:1087-1103. [PMID: 35501360 DOI: 10.1038/s41562-021-01259-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Accepted: 11/17/2021] [Indexed: 01/29/2023]
Abstract
Detecting and learning temporal regularities is essential to accurately predict the future. A long-standing debate in cognitive science concerns the existence in humans of a dissociation between two systems, one for handling statistical regularities governing the probabilities of individual items and their transitions, and another for handling deterministic rules. Here, to address this issue, we used finger tracking to continuously monitor the online build-up of evidence, confidence, false alarms and changes-of-mind during sequence processing. All these aspects of behaviour conformed tightly to a hierarchical Bayesian inference model with distinct hypothesis spaces for statistics and rules, yet linked by a single probabilistic currency. Alternative models based either on a single statistical mechanism or on two non-commensurable systems were rejected. Our results indicate that a hierarchical Bayesian inference mechanism, capable of operating over distinct hypothesis spaces for statistics and rules, underlies the human capability for sequence processing.
Collapse
|
93
|
Lei Y, Solway A. Conflict and competition between model-based and model-free control. PLoS Comput Biol 2022; 18:e1010047. [PMID: 35511764 PMCID: PMC9070915 DOI: 10.1371/journal.pcbi.1010047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 03/22/2022] [Indexed: 11/25/2022] Open
Abstract
A large literature has accumulated suggesting that human and animal decision making is driven by at least two systems, and that important functions of these systems can be captured by reinforcement learning algorithms. The "model-free" system caches and uses stimulus-value or stimulus-response associations, and the "model-based" system implements more flexible planning using a model of the world. However, it is not clear how the two systems interact during deliberation and how a single decision emerges from this process, especially when they disagree. Most previous work has assumed that while the systems operate in parallel, they do so independently, and they combine linearly to influence decisions. Using an integrated reinforcement learning/drift-diffusion model, we tested the hypothesis that the two systems interact in a non-linear fashion similar to other situations with cognitive conflict. We differentiated two forms of conflict: action conflict, a binary state representing whether the systems disagreed on the best action, and value conflict, a continuous measure of the extent to which the two systems disagreed on the difference in value between the available options. We found that decisions with greater value conflict were characterized by reduced model-based control and increased caution both with and without action conflict. Action conflict itself (the binary state) acted in the opposite direction, although its effects were less prominent. We also found that between-system conflict was highly correlated with within-system conflict, and although it is less clear a priori why the latter might influence the strength of each system above its standard linear contribution, we could not rule it out. Our work highlights the importance of non-linear conflict effects, and provides new constraints for more detailed process models of decision making. It also presents new avenues to explore with relation to disorders of compulsivity, where an imbalance between systems has been implicated.
Collapse
Affiliation(s)
- Yuqing Lei
- Department of Psychology, University of Maryland-College Park, College Park, Maryland, United States of America
| | - Alec Solway
- Department of Psychology, University of Maryland-College Park, College Park, Maryland, United States of America
- Program in Neuroscience and Cognitive Science, University of Maryland-College Park, College Park, Maryland, United States of America
| |
Collapse
|
94
|
Morris RW, Dezfouli A, Griffiths KR, Le Pelley ME, Balleine BW. The Neural Bases of Action-Outcome Learning in Humans. J Neurosci 2022; 42:3636-3647. [PMID: 35296548 PMCID: PMC9053851 DOI: 10.1523/jneurosci.1079-21.2022] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 02/17/2022] [Accepted: 02/22/2022] [Indexed: 11/21/2022] Open
Abstract
From an associative perspective the acquisition of new goal-directed actions requires the encoding of specific action-outcome (AO) associations and, therefore, sensitivity to the validity of an action as a predictor of a specific outcome relative to other events. Although competitive architectures have been proposed within associative learning theory to achieve this kind of identity-based selection, whether and how these architectures are implemented by the brain is still a matter of conjecture. To investigate this issue, we trained human participants to encode various AO associations while undergoing functional neuroimaging (fMRI). We then degraded one AO contingency by increasing the probability of the outcome in the absence of its associated action while keeping other AO contingencies intact. We found that this treatment selectively reduced performance of the degraded action. Furthermore, when a signal predicted the unpaired outcome, performance of the action was restored, suggesting that the degradation effect reflects competition between the action and the context for prediction of the specific outcome. We used a Kalman filter to model the contribution of different causal variables to AO learning and found that activity in the medial prefrontal cortex (mPFC) and the dorsal anterior cingulate cortex (dACC) tracked changes in the association of the action and context, respectively, with regard to the specific outcome. Furthermore, we found the mPFC participated in a network with the striatum and posterior parietal cortex to segregate the influence of the various competing predictors to establish specific AO associations.SIGNIFICANCE STATEMENT Humans and other animals learn the consequences of their actions, allowing them to control their environment in a goal-directed manner. Nevertheless, it is unknown how we parse environmental causes from the effects of our own actions to establish these specific action-outcome (AO) relationships. Here, we show that the brain learns the causal structure of the environment by segregating the unique influence of actions from other causes in the medial prefrontal and anterior cingulate cortices and, through a network of structures, including the caudate nucleus and posterior parietal cortex, establishes the distinct causal relationships from which specific AO associations are formed.
Collapse
Affiliation(s)
- Richard W Morris
- Centre for Translational Data Science, University of Sydney, Sydney, NSW 2006, Australia
| | - Amir Dezfouli
- Data61, Commonwealth Scientific and Industrial Research Organisation, Sydney, NSW 2015, Australia
| | - Kristi R Griffiths
- Brain Dynamics Centre, Westmead Institute for Medical Research, University of Sydney, Sydney, NSW 2145, Australia
| | - Mike E Le Pelley
- School of Psychology, University of New South Wales Sydney, Sydney, NSW 2052, Australia
| | - Bernard W Balleine
- School of Psychology, University of New South Wales Sydney, Sydney, NSW 2052, Australia
| |
Collapse
|
95
|
Siestrup S, Jainta B, El-Sourani N, Trempler I, Wurm MF, Wolf OT, Cheng S, Schubotz RI. What Happened When? Cerebral Processing of Modified Structure and Content in Episodic Cueing. J Cogn Neurosci 2022; 34:1287-1305. [PMID: 35552744 DOI: 10.1162/jocn_a_01862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Episodic memories are not static but can change on the basis of new experiences, potentially allowing us to make valid predictions in the face of an ever-changing environment. Recent research has identified prediction errors during memory retrieval as a possible trigger for such changes. In this study, we used modified episodic cues to investigate whether different types of mnemonic prediction errors modulate brain activity and subsequent memory performance. Participants encoded episodes that consisted of short toy stories. During a subsequent fMRI session, participants were presented videos showing the original episodes, or slightly modified versions thereof. In modified videos, either the order of two subsequent action steps was changed or an object was exchanged for another. Content modifications recruited parietal, temporo-occipital, and parahippocampal areas reflecting the processing of the new object information. In contrast, structure modifications elicited activation in right dorsal premotor, posterior temporal, and parietal areas, reflecting the processing of new sequence information. In a post-fMRI memory test, the participants' tendency to accept modified episodes as originally encoded increased significantly when they had been presented modified versions already during the fMRI session. After experiencing modifications, especially those of the episodes' structure, the recognition of originally encoded episodes was impaired as well. Our study sheds light onto the neural processing of different types of episodic prediction errors and their influence on subsequent memory recall.
Collapse
|
96
|
Karvelis P, Diaconescu AO. A Computational Model of Hopelessness and Active-Escape Bias in Suicidality. COMPUTATIONAL PSYCHIATRY (CAMBRIDGE, MASS.) 2022; 6:34-59. [PMID: 38774778 PMCID: PMC11104346 DOI: 10.5334/cpsy.80] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 02/15/2022] [Indexed: 12/27/2022]
Abstract
Currently, psychiatric practice lacks reliable predictive tools and a sufficiently detailed mechanistic understanding of suicidal thoughts and behaviors (STB) to provide timely and personalized interventions. Developing computational models of STB that integrate across behavioral, cognitive and neural levels of analysis could help better understand STB vulnerabilities and guide personalized interventions. To that end, we present a computational model based on the active inference framework. With this model, we show that several STB risk markers - hopelessness, Pavlovian bias and active-escape bias - are interrelated via the drive to maximize one's model evidence. We propose four ways in which these effects can arise: (1) increased learning from aversive outcomes, (2) reduced belief decay in response to unexpected outcomes, (3) increased stress sensitivity and (4) reduced sense of stressor controllability. These proposals stem from considering the neurocircuits implicated in STB: how the locus coeruleus - norepinephrine (LC-NE) system together with the amygdala (Amy), the dorsal prefrontal cortex (dPFC) and the anterior cingulate cortex (ACC) mediate learning in response to acute stress and volatility as well as how the dorsal raphe nucleus - serotonin (DRN-5-HT) system together with the ventromedial prefrontal cortex (vmPFC) mediate stress reactivity based on perceived stressor controllability. We validate the model by simulating performance in an Avoid/Escape Go/No-Go task replicating recent behavioral findings. This serves as a proof of concept and provides a computational hypothesis space that can be tested empirically and be used to distinguish planful versus impulsive STB subtypes. We discuss the relevance of the proposed model for treatment response prediction, including pharmacotherapy and psychotherapy, as well as sex differences as it relates to stress reactivity and suicide risk.
Collapse
Affiliation(s)
- Povilas Karvelis
- Krembil Centre for Neuroinformatics, Centre for Addiction and Mental Health (CAMH), Toronto, Ontario, Canada
| | - Andreea O. Diaconescu
- Krembil Centre for Neuroinformatics, Centre for Addiction and Mental Health (CAMH), Toronto, Ontario, Canada
- University of Toronto, Department of Psychiatry, Toronto, Ontario, Canada
- Institute of Medical Sciences, University of Toronto, Toronto, ON, Canada
- Department of Psychology, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
97
|
Stress-sensitive inference of task controllability. Nat Hum Behav 2022; 6:812-822. [PMID: 35273354 DOI: 10.1038/s41562-022-01306-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Accepted: 01/17/2022] [Indexed: 11/08/2022]
Abstract
Estimating the controllability of the environment enables agents to better predict upcoming events and decide when to engage controlled action selection. How does the human brain estimate controllability? Trial-by-trial analysis of choices, decision times and neural activity in an explore-and-predict task demonstrate that humans solve this problem by comparing the predictions of an 'actor' model with those of a reduced 'spectator' model of their environment. Neural blood oxygen level-dependent responses within striatal and medial prefrontal areas tracked the instantaneous difference in the prediction errors generated by these two statistical learning models. Blood oxygen level-dependent activity in the posterior cingulate, temporoparietal and prefrontal cortices covaried with changes in estimated controllability. Exposure to inescapable stressors biased controllability estimates downward and increased reliance on the spectator model in an anxiety-dependent fashion. Taken together, these findings provide a mechanistic account of controllability inference and its distortion by stress exposure.
Collapse
|
98
|
Animal models of action control and cognitive dysfunction in Parkinson's disease. PROGRESS IN BRAIN RESEARCH 2022; 269:227-255. [PMID: 35248196 DOI: 10.1016/bs.pbr.2022.01.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Parkinson's disease (PD) has historically been considered a motor disorder induced by a loss of dopaminergic neurons in the substantia nigra pars compacta. More recently, it has been recognized to have significant non-motor symptoms, most prominently cognitive symptoms associated with a dysexecutive syndrome. It is common in the literature to see motor and cognitive symptoms treated separately and, indeed, there has been a general call for specialized treatment of the latter, particularly in the more severe cases of PD with mild cognitive impairment and dementia. Animal studies have similarly been developed to model the motor or non-motor symptoms. Nevertheless, considerable research has established that segregating consideration of cognition from the precursors to motor movement, particularly movement associated with goal-directed action, is difficult if not impossible. Indeed, on some contemporary views cognition is embodied in action control, something that is particularly prevalent in theory and evidence relating to the integration of goal-directed and habitual control processes. The current paper addresses these issues within the literature detailing animal models of cognitive dysfunction in PD and their neural and neurochemical bases. Generally, studies using animal models of PD provide some of the clearest evidence for the integration of these action control processes at multiple levels of analysis and imply that consideration of this integrative process may have significant benefits for developing new approaches to the treatment of PD.
Collapse
|
99
|
UAV swarm path planning with reinforcement learning for field prospecting. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03254-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
AbstractThere has been steady growth in the adoption of Unmanned Aerial Vehicle (UAV) swarms by operators due to their time and cost benefits. However, this kind of system faces an important problem, which is the calculation of many optimal paths for each UAV. Solving this problem would allow control of many UAVs without human intervention while saving battery between recharges and performing several tasks simultaneously. The main aim is to develop a Reinforcement Learning based system capable of calculating the optimal flight path for a UAV swarm. This method stands out for its ability to learn through trial and error, allowing the model to adjust itself. The aim of these paths is to achieve full coverage of an overflight area for tasks such as field prospection, regardless of map size and the number of UAVs in the swarm. It is not necessary to establish targets or to have any previous knowledge other than the given map. Experiments have been conducted to determine whether it is optimal to establish a single control for all UAVs in the swarm or a control for each UAV. The results show that it is better to use one control for all UAVs because of the shorter flight time. In addition, the flight time is greatly affected by the size of the map. The results give starting points for future research, such as finding the optimal map size for each situation.
Collapse
|
100
|
Triche A, Maida AS, Kumar A. Exploration in neo-Hebbian reinforcement learning: Computational approaches to the exploration-exploitation balance with bio-inspired neural networks. Neural Netw 2022; 151:16-33. [DOI: 10.1016/j.neunet.2022.03.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 03/08/2022] [Accepted: 03/14/2022] [Indexed: 10/18/2022]
|