1
|
Castro-Rodrigues P, Akam T, Snorasson I, Camacho M, Paixão V, Maia A, Barahona-Corrêa JB, Dayan P, Simpson HB, Costa RM, Oliveira-Maia AJ. Explicit knowledge of task structure is a primary determinant of human model-based action. Nat Hum Behav 2022; 6:1126-1141. [PMID: 35589826 DOI: 10.1038/s41562-022-01346-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 03/19/2022] [Accepted: 03/31/2022] [Indexed: 11/09/2022]
Abstract
Explicit information obtained through instruction profoundly shapes human choice behaviour. However, this has been studied in computationally simple tasks, and it is unknown how model-based and model-free systems, respectively generating goal-directed and habitual actions, are affected by the absence or presence of instructions. We assessed behaviour in a variant of a computationally more complex decision-making task, before and after providing information about task structure, both in healthy volunteers and in individuals suffering from obsessive-compulsive or other disorders. Initial behaviour was model-free, with rewards directly reinforcing preceding actions. Model-based control, employing predictions of states resulting from each action, emerged with experience in a minority of participants, and less in those with obsessive-compulsive disorder. Providing task structure information strongly increased model-based control, similarly across all groups. Thus, in humans, explicit task structural knowledge is a primary determinant of model-based reinforcement learning and is most readily acquired from instruction rather than experience.
Collapse
Affiliation(s)
- Pedro Castro-Rodrigues
- Champalimaud Clinical Centre, Champalimaud Foundation, Lisbon, Portugal.,Champalimaud Research, Champalimaud Foundation, Lisbon, Portugal.,NOVA Medical School, NMS, Universidade Nova de Lisboa, Lisbon, Portugal.,Centro Hospitalar Psiquiátrico de Lisboa, Lisbon, Portugal
| | - Thomas Akam
- Champalimaud Research, Champalimaud Foundation, Lisbon, Portugal.,Department of Experimental Psychology, University of Oxford, Oxford, UK
| | - Ivar Snorasson
- Center for Obsessive-Compulsive & Related Disorders, New York State Psychiatric Institute, New York, NY, USA
| | - Marta Camacho
- Champalimaud Clinical Centre, Champalimaud Foundation, Lisbon, Portugal.,Champalimaud Research, Champalimaud Foundation, Lisbon, Portugal.,John Van Geest Center for Brain Repair, University of Cambridge, Cambridge, UK
| | - Vitor Paixão
- Champalimaud Research, Champalimaud Foundation, Lisbon, Portugal
| | - Ana Maia
- Champalimaud Clinical Centre, Champalimaud Foundation, Lisbon, Portugal.,Champalimaud Research, Champalimaud Foundation, Lisbon, Portugal.,NOVA Medical School, NMS, Universidade Nova de Lisboa, Lisbon, Portugal.,Department of Psychiatry and Mental Health, Centro Hospitalar de Lisboa Ocidental, Lisbon, Portugal
| | - J Bernardo Barahona-Corrêa
- Champalimaud Clinical Centre, Champalimaud Foundation, Lisbon, Portugal.,Champalimaud Research, Champalimaud Foundation, Lisbon, Portugal.,NOVA Medical School, NMS, Universidade Nova de Lisboa, Lisbon, Portugal
| | - Peter Dayan
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany.,The University of Tübingen, Tübingen, Germany
| | - H Blair Simpson
- Center for Obsessive-Compulsive & Related Disorders, New York State Psychiatric Institute, New York, NY, USA.,Department of Psychiatry, Columbia University, New York, NY, USA
| | - Rui M Costa
- Champalimaud Research, Champalimaud Foundation, Lisbon, Portugal.,NOVA Medical School, NMS, Universidade Nova de Lisboa, Lisbon, Portugal.,Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Albino J Oliveira-Maia
- Champalimaud Clinical Centre, Champalimaud Foundation, Lisbon, Portugal. .,Champalimaud Research, Champalimaud Foundation, Lisbon, Portugal. .,NOVA Medical School, NMS, Universidade Nova de Lisboa, Lisbon, Portugal.
| |
Collapse
|
2
|
Allen TA, Schreiber AM, Hall NT, Hallquist MN. From Description to Explanation: Integrating Across Multiple Levels of Analysis to Inform Neuroscientific Accounts of Dimensional Personality Pathology. J Pers Disord 2020; 34:650-676. [PMID: 33074057 PMCID: PMC7583665 DOI: 10.1521/pedi.2020.34.5.650] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Dimensional approaches to psychiatric nosology are rapidly transforming the way researchers and clinicians conceptualize personality pathology, leading to a growing interest in describing how individuals differ from one another. Yet, in order to successfully prevent and treat personality pathology, it is also necessary to explain the sources of these individual differences. The emerging field of personality neuroscience is well-positioned to guide the transition from description to explanation within personality pathology research. However, establishing comprehensive, mechanistic accounts of personality pathology will require personality neuroscientists to move beyond atheoretical studies that link trait differences to neural correlates without considering the algorithmic processes that are carried out by those correlates. We highlight some of the dangers we see in overpopulating personality neuroscience with brain-trait associational studies and offer a series of recommendations for personality neuroscientists seeking to build explanatory theories of personality pathology.
Collapse
Affiliation(s)
| | | | - Nathan T. Hall
- Department of Psychology, The Pennsylvania State University
| | | |
Collapse
|
3
|
Task complexity interacts with state-space uncertainty in the arbitration between model-based and model-free learning. Nat Commun 2019; 10:5738. [PMID: 31844060 PMCID: PMC6915739 DOI: 10.1038/s41467-019-13632-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Accepted: 11/11/2019] [Indexed: 12/11/2022] Open
Abstract
It has previously been shown that the relative reliability of model-based and model-free reinforcement-learning (RL) systems plays a role in the allocation of behavioral control between them. However, the role of task complexity in the arbitration between these two strategies remains largely unknown. Here, using a combination of novel task design, computational modelling, and model-based fMRI analysis, we examined the role of task complexity alongside state-space uncertainty in the arbitration process. Participants tended to increase model-based RL control in response to increasing task complexity. However, they resorted to model-free RL when both uncertainty and task complexity were high, suggesting that these two variables interact during the arbitration process. Computational fMRI revealed that task complexity interacts with neural representations of the reliability of the two systems in the inferior prefrontal cortex. The brain dynamically arbitrates between two model-based and model-free reinforcement learning (RL). Here, the authors show that participants tended to increase model-based control in response to increasing task complexity, but resorted to model-free when both uncertainty and task complexity were high.
Collapse
|
4
|
Hannikainen IR, Machery E, Rose D, Stich S, Olivola CY, Sousa P, Cova F, Buchtel EE, Alai M, Angelucci A, Berniûnas R, Chatterjee A, Cheon H, Cho IR, Cohnitz D, Dranseika V, Eraña Lagos Á, Ghadakpour L, Grinberg M, Hashimoto T, Horowitz A, Hristova E, Jraissati Y, Kadreva V, Karasawa K, Kim H, Kim Y, Lee M, Mauro C, Mizumoto M, Moruzzi S, Ornelas J, Osimani B, Romero C, Rosas López A, Sangoi M, Sereni A, Songhorian S, Struchiner N, Tripodi V, Usui N, Vázquez Del Mercado A, Vosgerichian HA, Zhang X, Zhu J. For Whom Does Determinism Undermine Moral Responsibility? Surveying the Conditions for Free Will Across Cultures. Front Psychol 2019; 10:2428. [PMID: 31749739 PMCID: PMC6848273 DOI: 10.3389/fpsyg.2019.02428] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2019] [Accepted: 10/14/2019] [Indexed: 11/22/2022] Open
Abstract
Philosophers have long debated whether, if determinism is true, we should hold people morally responsible for their actions since in a deterministic universe, people are arguably not the ultimate source of their actions nor could they have done otherwise if initial conditions and the laws of nature are held fixed. To reveal how non-philosophers ordinarily reason about the conditions for free will, we conducted a cross-cultural and cross-linguistic survey (N = 5,268) spanning twenty countries and sixteen languages. Overall, participants tended to ascribe moral responsibility whether the perpetrator lacked sourcehood or alternate possibilities. However, for American, European, and Middle Eastern participants, being the ultimate source of one’s actions promoted perceptions of free will and control as well as ascriptions of blame and punishment. By contrast, being the source of one’s actions was not particularly salient to Asian participants. Finally, across cultures, participants exhibiting greater cognitive reflection were more likely to view free will as incompatible with causal determinism. We discuss these findings in light of documented cultural differences in the tendency toward dispositional versus situational attributions.
Collapse
Affiliation(s)
- Ivar R Hannikainen
- Department of Law, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Edouard Machery
- Department of History and Philosophy of Science, University of Pittsburgh, Pittsburgh, PA, United States
| | - David Rose
- Department of Philosophy, Florida State University, Tallahassee, FL, United States
| | - Stephen Stich
- Department of Philosophy, Rutgers University, New Brunswick, NJ, United States
| | - Christopher Y Olivola
- Tepper School of Business, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Paulo Sousa
- Institute of Cognition and Culture, Queen's University, Belfast, United Kingdom
| | - Florian Cova
- Department of Philosophy, University of Geneva, Geneva, Switzerland
| | - Emma E Buchtel
- Department of Psychology, The Education University of Hong Kong, Tai Po, Hong Kong
| | - Mario Alai
- Department of Pure and Applied Sciences, University of Urbino Carlo Bo, Urbino, Italy
| | - Adriano Angelucci
- Department of Pure and Applied Sciences, University of Urbino Carlo Bo, Urbino, Italy
| | | | - Amita Chatterjee
- School of Cognitive Science, Jadavpur University, Kolkata, India
| | - Hyundeuk Cheon
- Department of Philosophy, Seoul National University, Seoul, South Korea
| | - In-Rae Cho
- Department of Philosophy, Seoul National University, Seoul, South Korea
| | - Daniel Cohnitz
- Department of Philosophy and Religious Studies, Utrecht University, Utrecht, Netherlands
| | | | | | | | - Maurice Grinberg
- Department of Cognitive Science and Psychology, New Bulgarian University, Sofia, Bulgaria
| | | | - Amir Horowitz
- Department of History, Philosophy and Judaic Studies, Open University of Israel, Ra'anana, Israel
| | - Evgeniya Hristova
- Department of Cognitive Science and Psychology, New Bulgarian University, Sofia, Bulgaria
| | - Yasmina Jraissati
- Department of Philosophy, American University of Beirut, Beirut, Lebanon
| | - Veselina Kadreva
- Department of Cognitive Science and Psychology, New Bulgarian University, Sofia, Bulgaria
| | - Kaori Karasawa
- Department of Social Psychology, University of Tokyo, Tokyo, Japan
| | - Hackjin Kim
- Department of Psychology, Korea University, Seoul, South Korea
| | - Yeonjeong Kim
- Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA, United States
| | - Minwoo Lee
- Department of Psychology, Korea University, Seoul, South Korea
| | | | - Masaharu Mizumoto
- School of Knowledge Science, Japan Advanced Institute of Science and Technology, Ishikawa, Japan
| | - Sebastiano Moruzzi
- Department of Philosophy and Communication Studies, University of Bologna, Bologna, Italy
| | - Jorge Ornelas
- Faculty of Social Sciences and Humanities, Universidad Autónoma de San Luis Potosí, San Luis Potosí, Mexico
| | - Barbara Osimani
- Munich Center for Mathematical Philosophy, Ludwig Maximilians Universität, Munich, Germany
| | - Carlos Romero
- Instituto de Investigaciones Filosóficas-UNAM, Mexico City, Mexico
| | | | - Massimo Sangoi
- Department of Pure and Applied Sciences, University of Urbino Carlo Bo, Urbino, Italy
| | - Andrea Sereni
- Faculty of Philosophy, Scuola Universitaria Superiore IUSS, Pavia, Italy
| | - Sarah Songhorian
- Faculty of Philosophy, Vita-Salute San Raffaele University, Milan, Italy
| | - Noel Struchiner
- Department of Law, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Vera Tripodi
- Department of Philosophy and Educational Sciences, University of Turin, Turin, Italy
| | - Naoki Usui
- Department of Humanities, Mie University, Tsu, Japan
| | | | - Hrag A Vosgerichian
- Department of History, Philosophy and Judaic Studies, Open University of Israel, Ra'anana, Israel
| | - Xueyi Zhang
- School of Humanities, Southeast University, Nanjing, China
| | - Jing Zhu
- School of Information Management, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
5
|
Hasz BM, Redish AD. Deliberation and Procedural Automation on a Two-Step Task for Rats. Front Integr Neurosci 2018; 12:30. [PMID: 30123115 PMCID: PMC6085996 DOI: 10.3389/fnint.2018.00030] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Accepted: 07/02/2018] [Indexed: 11/25/2022] Open
Abstract
Current theories suggest that decision-making arises from multiple, competing action-selection systems. Rodent studies dissociate deliberation and procedural behavior, and find a transition from procedural to deliberative behavior with experience. However, it remains unknown how this transition from deliberative to procedural control evolves within single trials, or within blocks of repeated choices. We adapted for rats a two-step task which has been used to dissociate model-based from model-free decisions in humans. We found that a mixture of model-based and model-free algorithms was more likely to explain rat choice strategies on the task than either model-based or model-free algorithms alone. This task contained two choices per trial, which provides a more complex and non-discrete per-trial choice structure. This task structure enabled us to evaluate how deliberative and procedural behavior evolved within-trial and within blocks of repeated choice sequences. We found that vicarious trial and error (VTE), a behavioral correlate of deliberation in rodents, was correlated between the two choice points on a given lap. We also found that behavioral stereotypy, a correlate of procedural automation, increased with the number of repeated choices. While VTE at the first choice point decreased [corrected] with the number of repeated choices, VTE at the second choice point did not, and only increased after unexpected transitions within the task. This suggests that deliberation at the beginning of trials may correspond to changes in choice patterns, while mid-trial deliberation may correspond to an interruption of a procedural process.
Collapse
Affiliation(s)
- Brendan M. Hasz
- Graduate Program in Neuroscience, University of Minnesota Twin CitiesMinneapolis, MN, United States
| | - A. David Redish
- Department of Neuroscience, University of Minnesota Twin CitiesMinneapolis, MN, United States
| |
Collapse
|
6
|
A simple computational algorithm of model-based choice preference. COGNITIVE AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2018; 17:764-783. [PMID: 28573384 DOI: 10.3758/s13415-017-0511-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
A broadly used computational framework posits that two learning systems operate in parallel during the learning of choice preferences-namely, the model-free and model-based reinforcement-learning systems. In this study, we examined another possibility, through which model-free learning is the basic system and model-based information is its modulator. Accordingly, we proposed several modified versions of a temporal-difference learning model to explain the choice-learning process. Using the two-stage decision task developed by Daw, Gershman, Seymour, Dayan, and Dolan (2011), we compared their original computational model, which assumes a parallel learning process, and our proposed models, which assume a sequential learning process. Choice data from 23 participants showed a better fit with the proposed models. More specifically, the proposed eligibility adjustment model, which assumes that the environmental model can weight the degree of the eligibility trace, can explain choices better under both model-free and model-based controls and has a simpler computational algorithm than the original model. In addition, the forgetting learning model and its variation, which assume changes in the values of unchosen actions, substantially improved the fits to the data. Overall, we show that a hybrid computational model best fits the data. The parameters used in this model succeed in capturing individual tendencies with respect to both model use in learning and exploration behavior. This computational model provides novel insights into learning with interacting model-free and model-based components.
Collapse
|
7
|
Fakhari P, Khodadadi A, Busemeyer JR. The detour problem in a stochastic environment: Tolman revisited. Cogn Psychol 2018; 101:29-49. [PMID: 29294373 DOI: 10.1016/j.cogpsych.2017.12.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Revised: 12/22/2017] [Accepted: 12/23/2017] [Indexed: 10/18/2022]
Abstract
We designed a grid world task to study human planning and re-planning behavior in an unknown stochastic environment. In our grid world, participants were asked to travel from a random starting point to a random goal position while maximizing their reward. Because they were not familiar with the environment, they needed to learn its characteristics from experience to plan optimally. Later in the task, we randomly blocked the optimal path to investigate whether and how people adjust their original plans to find a detour. To this end, we developed and compared 12 different models. These models were different on how they learned and represented the environment and how they planned to catch the goal. The majority of our participants were able to plan optimally. We also showed that people were capable of revising their plans when an unexpected event occurred. The result from the model comparison showed that the model-based reinforcement learning approach provided the best account for the data and outperformed heuristics in explaining the behavioral data in the re-planning trials.
Collapse
Affiliation(s)
- Pegah Fakhari
- Indiana University, Department of Psychological and Brain Sciences, Bloomington, IN, United States.
| | - Arash Khodadadi
- Indiana University, Department of Psychological and Brain Sciences, Bloomington, IN, United States
| | - Jerome R Busemeyer
- Indiana University, Department of Psychological and Brain Sciences, Bloomington, IN, United States
| |
Collapse
|
8
|
Abstract
Many accounts of decision making and reinforcement learning posit the existence of two distinct systems that control choice: a fast, automatic system and a slow, deliberative system. Recent research formalizes this distinction by mapping these systems to “model-free” and “model-based” strategies in reinforcement learning. Model-free strategies are computationally cheap, but sometimes inaccurate, because action values can be accessed by inspecting a look-up table constructed through trial-and-error. In contrast, model-based strategies compute action values through planning in a causal model of the environment, which is more accurate but also more cognitively demanding. It is assumed that this trade-off between accuracy and computational demand plays an important role in the arbitration between the two strategies, but we show that the hallmark task for dissociating model-free and model-based strategies, as well as several related variants, do not embody such a trade-off. We describe five factors that reduce the effectiveness of the model-based strategy on these tasks by reducing its accuracy in estimating reward outcomes and decreasing the importance of its choices. Based on these observations, we describe a version of the task that formally and empirically obtains an accuracy-demand trade-off between model-free and model-based strategies. Moreover, we show that human participants spontaneously increase their reliance on model-based control on this task, compared to the original paradigm. Our novel task and our computational analyses may prove important in subsequent empirical investigations of how humans balance accuracy and demand. When you make a choice about what groceries to get for dinner, you can rely on two different strategies. You can make your choice by relying on habit, simply buying the items you need to make a meal that is second nature to you. However, you can also plan your actions in a more deliberative way, realizing that the friend who will join you is a vegetarian, and therefore you should not make the burgers that have become a staple in your cooking. These two strategies differ in how computationally demanding and accurate they are. While the habitual strategy is less computationally demanding (costs less effort and time), the deliberative strategy is more accurate. Scientists have been able to study the distinction between these strategies using a task that allows them to measure how much people rely on habit and planning strategies. Interestingly, we have discovered that in this task, the deliberative strategy does not increase performance accuracy, and hence does not induce a trade-off between accuracy and demand. We describe why this happens, and improve the task so that it embodies an accuracy-demand trade-off, providing evidence for theories of cost-based arbitration between cognitive strategies.
Collapse
|
9
|
Gaze data reveal distinct choice processes underlying model-based and model-free reinforcement learning. Nat Commun 2016; 7:12438. [PMID: 27511383 PMCID: PMC4987535 DOI: 10.1038/ncomms12438] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2015] [Accepted: 07/03/2016] [Indexed: 11/08/2022] Open
Abstract
Organisms appear to learn and make decisions using different strategies known as model-free and model-based learning; the former is mere reinforcement of previously rewarded actions and the latter is a forward-looking strategy that involves evaluation of action-state transition probabilities. Prior work has used neural data to argue that both model-based and model-free learners implement a value comparison process at trial onset, but model-based learners assign more weight to forward-looking computations. Here using eye-tracking, we report evidence for a different interpretation of prior results: model-based subjects make their choices prior to trial onset. In contrast, model-free subjects tend to ignore model-based aspects of the task and instead seem to treat the decision problem as a simple comparison process between two differentially valued items, consistent with previous work on sequential-sampling models of decision making. These findings illustrate a problem with assuming that experimental subjects make their decisions at the same prescribed time.
Collapse
|
10
|
Raio CM, Goldfarb EV, Lempert KM, Sokol-Hessner P. Classifying emotion regulation strategies. Nat Rev Neurosci 2016; 17:532. [PMID: 27277870 DOI: 10.1038/nrn.2016.78] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Candace M Raio
- Center for Neural Science, New York University, 4 Washington Place, Room 909, New York, New York 10003, USA
| | - Elizabeth V Goldfarb
- Department of Psychology, New York University, 6 Washington Place, Room 890, New York, New York 10003, USA
| | - Karolina M Lempert
- Department of Psychology, New York University, 6 Washington Place, Room 890, New York, New York 10003, USA
| | - Peter Sokol-Hessner
- Center for Neural Science, New York University, 4 Washington Place, Room 909, New York, New York 10003, USA.,Department of Psychology, New York University, 6 Washington Place, Room 890, New York, New York 10003, USA
| |
Collapse
|
11
|
Sebold M, Schad DJ, Nebe S, Garbusow M, Jünger E, Kroemer NB, Kathmann N, Zimmermann US, Smolka MN, Rapp MA, Heinz A, Huys QJM. Don't Think, Just Feel the Music: Individuals with Strong Pavlovian-to-Instrumental Transfer Effects Rely Less on Model-based Reinforcement Learning. J Cogn Neurosci 2016; 28:985-95. [PMID: 26942321 DOI: 10.1162/jocn_a_00945] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Behavioral choice can be characterized along two axes. One axis distinguishes reflexive, model-free systems that slowly accumulate values through experience and a model-based system that uses knowledge to reason prospectively. The second axis distinguishes Pavlovian valuation of stimuli from instrumental valuation of actions or stimulus-action pairs. This results in four values and many possible interactions between them, with important consequences for accounts of individual variation. We here explored whether individual variation along one axis was related to individual variation along the other. Specifically, we asked whether individuals' balance between model-based and model-free learning was related to their tendency to show Pavlovian interferences with instrumental decisions. In two independent samples with a total of 243 participants, Pavlovian-instrumental transfer effects were negatively correlated with the strength of model-based reasoning in a two-step task. This suggests a potential common underlying substrate predisposing individuals to both have strong Pavlovian interference and be less model-based and provides a framework within which to interpret the observation of both effects in addiction.
Collapse
Affiliation(s)
- Miriam Sebold
- Charité-Universitätsmedizin Berlin.,Humboldt-Universität zu Berlin
| | - Daniel J Schad
- Charité-Universitätsmedizin Berlin.,University of Potsdam
| | | | - Maria Garbusow
- Charité-Universitätsmedizin Berlin.,Humboldt-Universität zu Berlin
| | | | - Nils B Kroemer
- Technische Universität Dresden.,Yale University School of Medicine.,The John B. Pierce Laboratory, New Haven, CT
| | | | | | | | | | | | | |
Collapse
|
12
|
Manza P, Hu S, Ide JS, Farr OM, Zhang S, Leung HC, Li CSR. The effects of methylphenidate on cerebral responses to conflict anticipation and unsigned prediction error in a stop-signal task. J Psychopharmacol 2016; 30:283-93. [PMID: 26755547 PMCID: PMC4837899 DOI: 10.1177/0269881115625102] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
To adapt flexibly to a rapidly changing environment, humans must anticipate conflict and respond to surprising, unexpected events. To this end, the brain estimates upcoming conflict on the basis of prior experience and computes unsigned prediction error (UPE). Although much work implicates catecholamines in cognitive control, little is known about how pharmacological manipulation of catecholamines affects the neural processes underlying conflict anticipation and UPE computation. We addressed this issue by imaging 24 healthy young adults who received a 45 mg oral dose of methylphenidate (MPH) and 62 matched controls who did not receive MPH prior to performing the stop-signal task. We used a Bayesian Dynamic Belief Model to make trial-by-trial estimates of conflict and UPE during task performance. Replicating previous research, the control group showed anticipation-related activation in the presupplementary motor area and deactivation in the ventromedial prefrontal cortex and parahippocampal gyrus, as well as UPE-related activations in the dorsal anterior cingulate, insula, and inferior parietal lobule. In group comparison, MPH increased anticipation activity in the bilateral caudate head and decreased UPE activity in each of the aforementioned regions. These findings highlight distinct effects of catecholamines on the neural mechanisms underlying conflict anticipation and UPE, signals critical to learning and adaptive behavior.
Collapse
Affiliation(s)
- Peter Manza
- Integrative Neuroscience Program, Department of Psychology, Stony Brook University, Stony Brook, NY, USA Department of Psychiatry, Yale University, New Haven, CT, USA
| | - Sien Hu
- Department of Psychiatry, Yale University, New Haven, CT, USA
| | - Jaime S Ide
- Department of Psychiatry, Yale University, New Haven, CT, USA Department of Biomedical Engineering, Stony Brook University, Stony Brook, NY, USA
| | - Olivia M Farr
- Department of Psychiatry, Yale University, New Haven, CT, USA Beth Israel Deaconess Medical Center/Harvard Medical School, Boston, MA, USA
| | - Sheng Zhang
- Department of Psychiatry, Yale University, New Haven, CT, USA
| | - Hoi-Chung Leung
- Integrative Neuroscience Program, Department of Psychology, Stony Brook University, Stony Brook, NY, USA
| | - Chiang-shan R Li
- Department of Psychiatry, Yale University, New Haven, CT, USA Department of Neuroscience, Yale University, New Haven, CT, USA Interdepartmental Neuroscience Program, Yale University, New Haven, CT, USA
| |
Collapse
|
13
|
Gillan CM, Kosinski M, Whelan R, Phelps EA, Daw ND. Characterizing a psychiatric symptom dimension related to deficits in goal-directed control. eLife 2016; 5. [PMID: 26928075 PMCID: PMC4786435 DOI: 10.7554/elife.11305] [Citation(s) in RCA: 280] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2015] [Accepted: 01/14/2016] [Indexed: 12/22/2022] Open
Abstract
Prominent theories suggest that compulsive behaviors, characteristic of obsessive-compulsive disorder and addiction, are driven by shared deficits in goal-directed control, which confers vulnerability for developing rigid habits. However, recent studies have shown that deficient goal-directed control accompanies several disorders, including those without an obvious compulsive element. Reasoning that this lack of clinical specificity might reflect broader issues with psychiatric diagnostic categories, we investigated whether a dimensional approach would better delineate the clinical manifestations of goal-directed deficits. Using large-scale online assessment of psychiatric symptoms and neurocognitive performance in two independent general-population samples, we found that deficits in goal-directed control were most strongly associated with a symptom dimension comprising compulsive behavior and intrusive thought. This association was highly specific when compared to other non-compulsive aspects of psychopathology. These data showcase a powerful new methodology and highlight the potential of a dimensional, biologically-grounded approach to psychiatry research. DOI:http://dx.doi.org/10.7554/eLife.11305.001 When an individual resists the temptation to stay out late in order to get a good night’s sleep, he or she is exercising what is known as “goal-directed control”. This kind of control allows individuals to regulate their behaviour in a deliberate manner. It is thought that a reduction in goal-directed control may be linked to compulsiveness or compulsivity, a psychological trait that involves excessive repetition of thoughts or actions. Furthermore, evidence shows that goal-directed control is reduced in people with compulsive disorders, such as obsessive-compulsive disorder (or OCD) and drug addiction. However, failures of goal-directed control have also been reported in other mental health conditions that are not linked to compulsivity, such as social anxiety disorder. The fact that reduced goal-directed control is found across various mental health conditions highlights a core issue in modern psychiatric research and treatment. Mental health conditions are typically defined and diagnosed by their clinical symptoms, not by their underlying psychological traits or biological abnormalities. This makes it difficult to determine the cause of a specific disorder, as its symptoms are often rooted in the same psychological and biological traits seen in other mental health conditions. To start to tackle this issue, Gillan et al. used a strategy that allowed them to look at compulsivity as a “trans-diagnostic dimension”; that is, as something that exists on a spectrum and is not specific to one disorder but involved in numerous different mental health conditions. Nearly 2,000 people completed an online task that assessed goal-directed control, and filled in questionnaires that measured symptoms of various mental health conditions. Gillan et al. showed that, as expected, people with reduced goal-directed control were generally more compulsive, and that this relationship could be seen in the context of both OCD and other compulsive disorders such as addiction. Further, by leveraging the efficiency of online data collection to collect such a large sample, Gillan et al. were also able to examine how much different symptoms co-occurred in people. This enabled them to use a statistical technique to pick out three trans-diagnostic dimensions – compulsive behaviour and intrusive thought, anxious-depression and social withdrawal – and found that only the compulsive factor was associated with reduced goal-directed control. In fact, reduced goal-directed control was found to be more closely related to compulsivity than the symptoms of traditional mental health disorders including OCD. These findings show that research into the causes of mental health conditions and perhaps ultimately diagnosis and treatment – all of which have traditionally approached specific disorders in isolation – would benefit greatly from a trans-diagnostic approach. DOI:http://dx.doi.org/10.7554/eLife.11305.002
Collapse
Affiliation(s)
- Claire M Gillan
- Department of Psychology, New York University, New York, United States.,Department of Psychology, University of Cambridge, Cambridge, United Kingdom.,Behavioural and Clinical Neuroscience Institute, University of Cambridge, Cambridge, United Kingdom
| | - Michal Kosinski
- Stanford Graduate School of Business, Stanford University, Stanford, United States
| | - Robert Whelan
- Department of Psychology, University College Dublin, Dulbin, Ireland
| | - Elizabeth A Phelps
- Department of Psychology, New York University, New York, United States.,Center for Neural Science, New York University, New York, United States.,Nathan Kline Institute, New York, United States
| | - Nathaniel D Daw
- Department of Psychology, Princeton University, Princeton, United States.,Neuroscience Institute, Princeton University, Princeton, United States
| |
Collapse
|
14
|
Akam T, Costa R, Dayan P. Simple Plans or Sophisticated Habits? State, Transition and Learning Interactions in the Two-Step Task. PLoS Comput Biol 2015; 11:e1004648. [PMID: 26657806 PMCID: PMC4686094 DOI: 10.1371/journal.pcbi.1004648] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2015] [Accepted: 11/09/2015] [Indexed: 11/28/2022] Open
Abstract
The recently developed ‘two-step’ behavioural task promises to differentiate model-based from model-free reinforcement learning, while generating neurophysiologically-friendly decision datasets with parametric variation of decision variables. These desirable features have prompted its widespread adoption. Here, we analyse the interactions between a range of different strategies and the structure of transitions and outcomes in order to examine constraints on what can be learned from behavioural performance. The task involves a trade-off between the need for stochasticity, to allow strategies to be discriminated, and a need for determinism, so that it is worth subjects’ investment of effort to exploit the contingencies optimally. We show through simulation that under certain conditions model-free strategies can masquerade as being model-based. We first show that seemingly innocuous modifications to the task structure can induce correlations between action values at the start of the trial and the subsequent trial events in such a way that analysis based on comparing successive trials can lead to erroneous conclusions. We confirm the power of a suggested correction to the analysis that can alleviate this problem. We then consider model-free reinforcement learning strategies that exploit correlations between where rewards are obtained and which actions have high expected value. These generate behaviour that appears model-based under these, and also more sophisticated, analyses. Exploiting the full potential of the two-step task as a tool for behavioural neuroscience requires an understanding of these issues. Planning is the use of a predictive model of the consequences of actions to guide decision making. Planning plays a critical role in human behaviour, but isolating its contribution is challenging because it is complemented by control systems which learn values of actions directly from the history of reinforcement, resulting in automatized mappings from states to actions often termed habits. Our study examined a recently developed behavioural task which uses choices in a multi-step decision tree to differentiate planning from value-based control. We compared various strategies using simulations, showing a range that produce behaviour that resembles planning but in fact arises as a fixed mapping from particular sorts of states to action. These results show that when a planning problem is faced repeatedly, sophisticated automatization strategies may be developed which identify that there are in fact a limited number of relevant states of the world each with an appropriate fixed or habitual response. Understanding such strategies is important for the design and interpretation of tasks which aim to isolate the contribution of planning to behaviour. Such strategies are also of independent scientific interest as they may contribute to automatization of behaviour in complex environments.
Collapse
Affiliation(s)
- Thomas Akam
- Champalimaud Neuroscience Program, Champalimaud Centre for the Unknown, Lisbon, Portugal
- Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom
- * E-mail:
| | - Rui Costa
- Champalimaud Neuroscience Program, Champalimaud Centre for the Unknown, Lisbon, Portugal
| | - Peter Dayan
- Gatsby Computational Neuroscience Unit, UCL, London, United Kingdom
| |
Collapse
|
15
|
Model-Based Reasoning in Humans Becomes Automatic with Training. PLoS Comput Biol 2015; 11:e1004463. [PMID: 26379239 PMCID: PMC4588166 DOI: 10.1371/journal.pcbi.1004463] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2015] [Accepted: 06/13/2015] [Indexed: 11/19/2022] Open
Abstract
Model-based and model-free reinforcement learning (RL) have been suggested as algorithmic realizations of goal-directed and habitual action strategies. Model-based RL is more flexible than model-free but requires sophisticated calculations using a learnt model of the world. This has led model-based RL to be identified with slow, deliberative processing, and model-free RL with fast, automatic processing. In support of this distinction, it has recently been shown that model-based reasoning is impaired by placing subjects under cognitive load--a hallmark of non-automaticity. Here, using the same task, we show that cognitive load does not impair model-based reasoning if subjects receive prior training on the task. This finding is replicated across two studies and a variety of analysis methods. Thus, task familiarity permits use of model-based reasoning in parallel with other cognitive demands. The ability to deploy model-based reasoning in an automatic, parallelizable fashion has widespread theoretical implications, particularly for the learning and execution of complex behaviors. It also suggests a range of important failure modes in psychiatric disorders.
Collapse
|
16
|
Otto AR, Skatova A, Madlon-Kay S, Daw ND. Cognitive control predicts use of model-based reinforcement learning. J Cogn Neurosci 2015; 27:319-33. [PMID: 25170791 DOI: 10.1162/jocn_a_00709] [Citation(s) in RCA: 108] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Accounts of decision-making and its neural substrates have long posited the operation of separate, competing valuation systems in the control of choice behavior. Recent theoretical and experimental work suggest that this classic distinction between behaviorally and neurally dissociable systems for habitual and goal-directed (or more generally, automatic and controlled) choice may arise from two computational strategies for reinforcement learning (RL), called model-free and model-based RL, but the cognitive or computational processes by which one system may dominate over the other in the control of behavior is a matter of ongoing investigation. To elucidate this question, we leverage the theoretical framework of cognitive control, demonstrating that individual differences in utilization of goal-related contextual information--in the service of overcoming habitual, stimulus-driven responses--in established cognitive control paradigms predict model-based behavior in a separate, sequential choice task. The behavioral correspondence between cognitive control and model-based RL compellingly suggests that a common set of processes may underpin the two behaviors. In particular, computational mechanisms originally proposed to underlie controlled behavior may be applicable to understanding the interactions between model-based and model-free choice behavior.
Collapse
|
17
|
Pickering AD, Pesola F. Modeling dopaminergic and other processes involved in learning from reward prediction error: contributions from an individual differences perspective. Front Hum Neurosci 2014; 8:740. [PMID: 25324752 PMCID: PMC4179695 DOI: 10.3389/fnhum.2014.00740] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2014] [Accepted: 09/03/2014] [Indexed: 11/13/2022] Open
Abstract
Phasic firing changes of midbrain dopamine neurons have been widely characterized as reflecting a reward prediction error (RPE). Major personality traits (e.g., extraversion) have been linked to inter-individual variations in dopaminergic neurotransmission. Consistent with these two claims, recent research (Smillie et al., 2011; Cooper et al., 2014) found that extraverts exhibited larger RPEs than introverts, as reflected in feedback related negativity (FRN) effects in EEG recordings. Using an established, biologically-localized RPE computational model, we successfully simulated dopaminergic cell firing changes which are thought to modulate the FRN. We introduced simulated individual differences into the model: parameters were systematically varied, with stable values for each simulated individual. We explored whether a model parameter might be responsible for the observed covariance between extraversion and the FRN changes in real data, and argued that a parameter is a plausible source of such covariance if parameter variance, across simulated individuals, correlated almost perfectly with the size of the simulated dopaminergic FRN modulation, and created as much variance as possible in this simulated output. Several model parameters met these criteria, while others did not. In particular, variations in the strength of connections carrying excitatory reward drive inputs to midbrain dopaminergic cells were considered plausible candidates, along with variations in a parameter which scales the effects of dopamine cell firing bursts on synaptic modification in ventral striatum. We suggest possible neurotransmitter mechanisms underpinning these model parameters. Finally, the limitations and possible extensions of our general approach are discussed.
Collapse
Affiliation(s)
- Alan D Pickering
- Department of Psychology, Goldsmiths, University of London London, UK
| | - Francesca Pesola
- Section for Recovery, Health Service and Population Research Department, Institute of Psychiatry, King's College, University of London London, UK
| |
Collapse
|
18
|
Multiple memory systems as substrates for multiple decision systems. Neurobiol Learn Mem 2014; 117:4-13. [PMID: 24846190 DOI: 10.1016/j.nlm.2014.04.014] [Citation(s) in RCA: 67] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2013] [Revised: 04/22/2014] [Accepted: 04/29/2014] [Indexed: 11/22/2022]
Abstract
It has recently become widely appreciated that value-based decision making is supported by multiple computational strategies. In particular, animal and human behavior in learning tasks appears to include habitual responses described by prominent model-free reinforcement learning (RL) theories, but also more deliberative or goal-directed actions that can be characterized by a different class of theories, model-based RL. The latter theories evaluate actions by using a representation of the contingencies of the task (as with a learned map of a spatial maze), called an "internal model." Given the evidence of behavioral and neural dissociations between these approaches, they are often characterized as dissociable learning systems, though they likely interact and share common mechanisms. In many respects, this division parallels a longstanding dissociation in cognitive neuroscience between multiple memory systems, describing, at the broadest level, separate systems for declarative and procedural learning. Procedural learning has notable parallels with model-free RL: both involve learning of habits and both are known to depend on parts of the striatum. Declarative memory, by contrast, supports memory for single events or episodes and depends on the hippocampus. The hippocampus is thought to support declarative memory by encoding temporal and spatial relations among stimuli and thus is often referred to as a relational memory system. Such relational encoding is likely to play an important role in learning an internal model, the representation that is central to model-based RL. Thus, insofar as the memory systems represent more general-purpose cognitive mechanisms that might subserve performance on many sorts of tasks including decision making, these parallels raise the question whether the multiple decision systems are served by multiple memory systems, such that one dissociation is grounded in the other. Here we investigated the relationship between model-based RL and relational memory by comparing individual differences across behavioral tasks designed to measure either capacity. Human subjects performed two tasks, a learning and generalization task (acquired equivalence) which involves relational encoding and depends on the hippocampus; and a sequential RL task that could be solved by either a model-based or model-free strategy. We assessed the correlation between subjects' use of flexible, relational memory, as measured by generalization in the acquired equivalence task, and their differential reliance on either RL strategy in the decision task. We observed a significant positive relationship between generalization and model-based, but not model-free, choice strategies. These results are consistent with the hypothesis that model-based RL, like acquired equivalence, relies on a more general-purpose relational memory system.
Collapse
|