1
|
Falck J, Zhang L, Raffington L, Mohn JJ, Triesch J, Heim C, Shing YL. Hippocampus and striatum show distinct contributions to longitudinal changes in value-based learning in middle childhood. eLife 2024; 12:RP89483. [PMID: 38953517 PMCID: PMC11219037 DOI: 10.7554/elife.89483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/04/2024] Open
Abstract
The hippocampal-dependent memory system and striatal-dependent memory system modulate reinforcement learning depending on feedback timing in adults, but their contributions during development remain unclear. In a 2-year longitudinal study, 6-to-7-year-old children performed a reinforcement learning task in which they received feedback immediately or with a short delay following their response. Children's learning was found to be sensitive to feedback timing modulations in their reaction time and inverse temperature parameter, which quantifies value-guided decision-making. They showed longitudinal improvements towards more optimal value-based learning, and their hippocampal volume showed protracted maturation. Better delayed model-derived learning covaried with larger hippocampal volume longitudinally, in line with the adult literature. In contrast, a larger striatal volume in children was associated with both better immediate and delayed model-derived learning longitudinally. These findings show, for the first time, an early hippocampal contribution to the dynamic development of reinforcement learning in middle childhood, with neurally less differentiated and more cooperative memory systems than in adults.
Collapse
Affiliation(s)
- Johannes Falck
- Department of Psychology, Goethe University FrankfurtFrankfurtGermany
| | - Lei Zhang
- Centre for Human Brain Health, School of Psychology, University of BirminghamBirminghamUnited Kingdom
- Institute for Mental Health, School of Psychology, University of BirminghamBirminghamUnited Kingdom
- Centre for Developmental Science, School of Psychology, University of BirminghamBirminghamUnited Kingdom
- Social, Cognitive and Affective Neuroscience Unit, Department of Cognition, Emotion, and Methods in Psychology, Faculty of Psychology, University of ViennaViennaAustria
| | - Laurel Raffington
- Max Planck Research Group Biosocial, Max Planck Institute for Human DevelopmentBerlinGermany
| | - Johannes Julius Mohn
- Charité – Universitätsmedizin Berlin, Institute of Medical PsychologyBerlinGermany
- Max Planck School of Cognition, Max Planck Institute for Human Cognitive and Brain SciencesLeipzigGermany
| | - Jochen Triesch
- Frankfurt Institute for Advanced Studies (FIAS)Frankfurt am MainGermany
| | - Christine Heim
- Charité – Universitätsmedizin Berlin, Institute of Medical PsychologyBerlinGermany
- Center for Safe & Healthy Children, The Pennsylvania State UniversityUniversity ParkUnited States
| | - Yee Lee Shing
- Department of Psychology, Goethe University FrankfurtFrankfurtGermany
| |
Collapse
|
2
|
Ohta H, Nozawa T, Nakano T, Morimoto Y, Ishizuka T. Nonlinear age-related differences in probabilistic learning in mice: A 5-armed bandit task study. Neurobiol Aging 2024; 142:8-16. [PMID: 39029360 DOI: 10.1016/j.neurobiolaging.2024.06.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 06/17/2024] [Accepted: 06/19/2024] [Indexed: 07/21/2024]
Abstract
This study explores the impact of aging on reinforcement learning in mice, focusing on changes in learning rates and behavioral strategies. A 5-armed bandit task (5-ABT) and a computational Q-learning model were used to evaluate the positive and negative learning rates and the inverse temperature across three age groups (3, 12, and 18 months). Results showed a significant decline in the negative learning rate of 18-month-old mice, which was not observed for the positive learning rate. This suggests that older mice maintain the ability to learn from successful experiences while decreasing the ability to learn from negative outcomes. We also observed a significant age-dependent variation in inverse temperature, reflecting a shift in action selection policy. Middle-aged mice (12 months) exhibited higher inverse temperature, indicating a higher reliance on previous rewarding experiences and reduced exploratory behaviors, when compared to both younger and older mice. This study provides new insights into aging research by demonstrating that there are age-related differences in specific components of reinforcement learning, which exhibit a non-linear pattern.
Collapse
Affiliation(s)
- Hiroyuki Ohta
- Department of Pharmacology, National Defense Medical College, 3-2 Namiki, Tokorozawa, Saitama 359-8513, Japan.
| | - Takashi Nozawa
- Mejiro University, 4-31-1 Naka-Ochiai, Shinjuku, Tokyo 161-8539, Japan
| | - Takashi Nakano
- Department of Computational Biology, School of Medicine, Fujita Health University, 1-98 Dengakugakubo, Kutsukake, Toyoake, Aichi 470-1192, Japan; International Center for Brain Science (ICBS), Fujita Health University, 1-98 Dengakugakubo, Kutsukake, Toyoake, Aichi 470-1192, Japan
| | - Yuji Morimoto
- Department of Physiology, National Defense Medical College, 3-2 Namiki, Tokorozawa, Saitama 359-8513, Japan
| | - Toshiaki Ishizuka
- Department of Pharmacology, National Defense Medical College, 3-2 Namiki, Tokorozawa, Saitama 359-8513, Japan
| |
Collapse
|
3
|
Augustat N, Endres D, Mueller EM. Uncertainty of treatment efficacy moderates placebo effects on reinforcement learning. Sci Rep 2024; 14:14421. [PMID: 38909105 PMCID: PMC11193823 DOI: 10.1038/s41598-024-64240-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 06/06/2024] [Indexed: 06/24/2024] Open
Abstract
The placebo-reward hypothesis postulates that positive effects of treatment expectations on health (i.e., placebo effects) and reward processing share common neural underpinnings. Moreover, experiments in humans and animals indicate that reward uncertainty increases striatal dopamine, which is presumably involved in placebo responses and reward learning. Therefore, treatment uncertainty analogously to reward uncertainty may affect updating from rewards after placebo treatment. Here, we address whether different degrees of uncertainty regarding the efficacy of a sham treatment affect reward sensitivity. In an online between-subjects experiment with N = 141 participants, we systematically varied the provided efficacy instructions before participants first received a sham treatment that consisted of listening to binaural beats and then performed a probabilistic reinforcement learning task. We fitted a Q-learning model including two different learning rates for positive (gain) and negative (loss) reward prediction errors and an inverse gain parameter to behavioral decision data in the reinforcement learning task. Our results yielded an inverted-U-relationship between provided treatment efficacy probability and learning rates for gain, such that higher levels of treatment uncertainty, rather than of expected net efficacy, affect presumably dopamine-related reward learning. These findings support the placebo-reward hypothesis and suggest harnessing uncertainty in placebo treatment for recovering reward learning capabilities.
Collapse
Affiliation(s)
- Nick Augustat
- Department of Psychology, University of Marburg, Marburg, Germany.
| | - Dominik Endres
- Department of Psychology, University of Marburg, Marburg, Germany
| | - Erik M Mueller
- Department of Psychology, University of Marburg, Marburg, Germany
| |
Collapse
|
4
|
Colas JT, O’Doherty JP, Grafton ST. Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts. PLoS Comput Biol 2024; 20:e1011950. [PMID: 38552190 PMCID: PMC10980507 DOI: 10.1371/journal.pcbi.1011950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 02/26/2024] [Indexed: 04/01/2024] Open
Abstract
Active reinforcement learning enables dynamic prediction and control, where one should not only maximize rewards but also minimize costs such as of inference, decisions, actions, and time. For an embodied agent such as a human, decisions are also shaped by physical aspects of actions. Beyond the effects of reward outcomes on learning processes, to what extent can modeling of behavior in a reinforcement-learning task be complicated by other sources of variance in sequential action choices? What of the effects of action bias (for actions per se) and action hysteresis determined by the history of actions chosen previously? The present study addressed these questions with incremental assembly of models for the sequential choice data from a task with hierarchical structure for additional complexity in learning. With systematic comparison and falsification of computational models, human choices were tested for signatures of parallel modules representing not only an enhanced form of generalized reinforcement learning but also action bias and hysteresis. We found evidence for substantial differences in bias and hysteresis across participants-even comparable in magnitude to the individual differences in learning. Individuals who did not learn well revealed the greatest biases, but those who did learn accurately were also significantly biased. The direction of hysteresis varied among individuals as repetition or, more commonly, alternation biases persisting from multiple previous actions. Considering that these actions were button presses with trivial motor demands, the idiosyncratic forces biasing sequences of action choices were robust enough to suggest ubiquity across individuals and across tasks requiring various actions. In light of how bias and hysteresis function as a heuristic for efficient control that adapts to uncertainty or low motivation by minimizing the cost of effort, these phenomena broaden the consilient theory of a mixture of experts to encompass a mixture of expert and nonexpert controllers of behavior.
Collapse
Affiliation(s)
- Jaron T. Colas
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, California, United States of America
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
- Computation and Neural Systems Program, California Institute of Technology, Pasadena, California, United States of America
| | - John P. O’Doherty
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
- Computation and Neural Systems Program, California Institute of Technology, Pasadena, California, United States of America
| | - Scott T. Grafton
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, California, United States of America
| |
Collapse
|
5
|
Wilbrecht L, Davidow JY. Goal-directed learning in adolescence: neurocognitive development and contextual influences. Nat Rev Neurosci 2024; 25:176-194. [PMID: 38263216 DOI: 10.1038/s41583-023-00783-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/12/2023] [Indexed: 01/25/2024]
Abstract
Adolescence is a time during which we transition to independence, explore new activities and begin pursuit of major life goals. Goal-directed learning, in which we learn to perform actions that enable us to obtain desired outcomes, is central to many of these processes. Currently, our understanding of goal-directed learning in adolescence is itself in a state of transition, with the scientific community grappling with inconsistent results. When we examine metrics of goal-directed learning through the second decade of life, we find that many studies agree there are steady gains in performance in the teenage years, but others report that adolescent goal-directed learning is already adult-like, and some find adolescents can outperform adults. To explain the current variability in results, sophisticated experimental designs are being applied to test learning in different contexts. There is also increasing recognition that individuals of different ages and in different states will draw on different neurocognitive systems to support goal-directed learning. Through adoption of more nuanced approaches, we can be better prepared to recognize and harness adolescent strengths and to decipher the purpose (or goals) of adolescence itself.
Collapse
Affiliation(s)
- Linda Wilbrecht
- Department of Psychology, University of California, Berkeley, CA, USA.
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA.
| | - Juliet Y Davidow
- Department of Psychology, Northeastern University, Boston, MA, USA.
| |
Collapse
|
6
|
Cheng Z, Moser AD, Jones M, Kaiser RH. Reinforcement learning and working memory in mood disorders: A computational analysis in a developmental transdiagnostic sample. J Affect Disord 2024; 344:423-431. [PMID: 37839471 DOI: 10.1016/j.jad.2023.10.084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 10/08/2023] [Accepted: 10/10/2023] [Indexed: 10/17/2023]
Abstract
BACKGROUND Mood disorders commonly onset during adolescence and young adulthood and are conceptually and empirically related to reinforcement learning abnormalities. However, the nature of abnormalities associated with acute symptom severity versus lifetime diagnosis remains unclear, and prior research has often failed to disentangle working memory from reward processes. METHODS The present sample (N = 220) included adolescents and young adults with a lifetime history of unipolar disorders (n = 127), bipolar disorders (n = 28), or no history of psychopathology (n = 62), and varying severity of mood symptoms. Analyses fitted a reinforcement learning and working memory model to an instrumental learning task that varied working memory load, and tested associations between model parameters and diagnoses or current symptoms. RESULTS Current severity of manic or anhedonic symptoms negatively correlated with task performance. Participants reporting higher severity of current anhedonia, or with lifetime unipolar or bipolar disorders, showed lower reward learning rates. Participants reporting higher severity of current manic symptoms showed faster working memory decay and reduced use of working memory. LIMITATIONS Computational parameters should be interpreted in the task environment (a deterministic reward learning paradigm), and developmental population. Future work should test replication in other paradigms and populations. CONCLUSIONS Results indicate abnormalities in reinforcement learning processes that either scale with current symptom severity, or correspond with lifetime mood diagnoses. Findings may have implications for understanding reward processing anomalies related to state-like (current symptom) or trait-like (lifetime diagnosis) aspects of mood disorders.
Collapse
Affiliation(s)
- Ziwei Cheng
- Department of Psychology and Neuroscience, University of Colorado Boulder, Boulder, CO, United States; Institute for Cognitive Science, University of Colorado Boulder, Boulder, CO, United States
| | - Amelia D Moser
- Department of Psychology and Neuroscience, University of Colorado Boulder, Boulder, CO, United States; Institute for Cognitive Science, University of Colorado Boulder, Boulder, CO, United States
| | - Matt Jones
- Department of Psychology and Neuroscience, University of Colorado Boulder, Boulder, CO, United States
| | - Roselinde H Kaiser
- Department of Psychology and Neuroscience, University of Colorado Boulder, Boulder, CO, United States; Institute for Cognitive Science, University of Colorado Boulder, Boulder, CO, United States.
| |
Collapse
|
7
|
Chase HW. A novel technique for delineating the effect of variation in the learning rate on the neural correlates of reward prediction errors in model-based fMRI. Front Psychol 2023; 14:1211528. [PMID: 38187436 PMCID: PMC10768009 DOI: 10.3389/fpsyg.2023.1211528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 11/28/2023] [Indexed: 01/09/2024] Open
Abstract
Introduction Computational models play an increasingly important role in describing variation in neural activation in human neuroimaging experiments, including evaluating individual differences in the context of psychiatric neuroimaging. In particular, reinforcement learning (RL) techniques have been widely adopted to examine neural responses to reward prediction errors and stimulus or action values, and how these might vary as a function of clinical status. However, there is a lack of consensus around the importance of the precision of free parameter estimation for these methods, particularly with regard to the learning rate. In the present study, I introduce a novel technique which may be used within a general linear model (GLM) to model the effect of mis-estimation of the learning rate on reward prediction error (RPE)-related neural responses. Methods Simulations employed a simple RL algorithm, which was used to generate hypothetical neural activations that would be expected to be observed in functional magnetic resonance imaging (fMRI) studies of RL. Similar RL models were incorporated within a GLM-based analysis method including derivatives, with individual differences in the resulting GLM-derived beta parameters being evaluated with respect to the free parameters of the RL model or being submitted to other validation analyses. Results Initial simulations demonstrated that the conventional approach to fitting RL models to RPE responses is more likely to reflect individual differences in a reinforcement efficacy construct (lambda) rather than learning rate (alpha). The proposed method, adding a derivative regressor to the GLM, provides a second regressor which reflects the learning rate. Validation analyses were performed including examining another comparable method which yielded highly similar results, and a demonstration of sensitivity of the method in presence of fMRI-like noise. Conclusion Overall, the findings underscore the importance of the lambda parameter for interpreting individual differences in RPE-coupled neural activity, and validate a novel neural metric of the modulation of such activity by individual differences in the learning rate. The method is expected to find application in understanding aberrant reinforcement learning across different psychiatric patient groups including major depression and substance use disorder.
Collapse
Affiliation(s)
- Henry W. Chase
- Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
| |
Collapse
|
8
|
Sato Y, Sakai Y, Hirata S. State-transition-free reinforcement learning in chimpanzees (Pan troglodytes). Learn Behav 2023; 51:413-427. [PMID: 37369920 DOI: 10.3758/s13420-023-00591-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/07/2023] [Indexed: 06/29/2023]
Abstract
The outcome of an action often occurs after a delay. One solution for learning appropriate actions from delayed outcomes is to rely on a chain of state transitions. Another solution, which does not rest on state transitions, is to use an eligibility trace (ET) that directly bridges a current outcome and multiple past actions via transient memories. Previous studies revealed that humans (Homo sapiens) learned appropriate actions in a behavioral task in which solutions based on the ET were effective but transition-based solutions were ineffective. This suggests that ET may be used in human learning systems. However, no studies have examined nonhuman animals with an equivalent behavioral task. We designed a task for nonhuman animals following a previous human study. In each trial, participants chose one of two stimuli that were randomly selected from three stimulus types: a stimulus associated with a food reward delivered immediately, a stimulus associated with a reward delivered after a few trials, and a stimulus associated with no reward. The presented stimuli did not vary according to the participants' choices. To maximize the total reward, participants had to learn the value of the stimulus associated with a delayed reward. Five chimpanzees (Pan troglodytes) performed the task using a touchscreen. Two chimpanzees were able to learn successfully, indicating that learning mechanisms that do not depend on state transitions were involved in the learning processes. The current study extends previous ET research by proposing a behavioral task and providing empirical data from chimpanzees.
Collapse
Grants
- 16H06283 Ministry of Education, Culture, Sports, Science, Japan Society for the Promotion of Science
- 18H05524 Ministry of Education, Culture, Sports, Science, Japan Society for the Promotion of Science
- 19J22889 Ministry of Education, Culture, Sports, Science, Japan Society for the Promotion of Science
- 26245069 Ministry of Education, Culture, Sports, Science, Japan Society for the Promotion of Science
- U04 Program for Leading Graduate Schools
Collapse
Affiliation(s)
- Yutaro Sato
- Wildlife Research Center, Kyoto University, Kyoto, Japan.
- University Administration Office, Headquarters for Management Strategy, Niigata University, Niigata, Japan.
| | - Yutaka Sakai
- Brain Science Institute, Tamagawa University, Tokyo, Japan
| | - Satoshi Hirata
- Wildlife Research Center, Kyoto University, Kyoto, Japan
| |
Collapse
|
9
|
Giron AP, Ciranka S, Schulz E, van den Bos W, Ruggeri A, Meder B, Wu CM. Developmental changes in exploration resemble stochastic optimization. Nat Hum Behav 2023; 7:1955-1967. [PMID: 37591981 PMCID: PMC10663152 DOI: 10.1038/s41562-023-01662-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Accepted: 06/21/2023] [Indexed: 08/19/2023]
Abstract
Human development is often described as a 'cooling off' process, analogous to stochastic optimization algorithms that implement a gradual reduction in randomness over time. Yet there is ambiguity in how to interpret this analogy, due to a lack of concrete empirical comparisons. Using data from n = 281 participants ages 5 to 55, we show that cooling off does not only apply to the single dimension of randomness. Rather, human development resembles an optimization process of multiple learning parameters, for example, reward generalization, uncertainty-directed exploration and random temperature. Rapid changes in parameters occur during childhood, but these changes plateau and converge to efficient values in adulthood. We show that while the developmental trajectory of human parameters is strikingly similar to several stochastic optimization algorithms, there are important differences in convergence. None of the optimization algorithms tested were able to discover reliably better regions of the strategy space than adult participants on this task.
Collapse
Affiliation(s)
- Anna P Giron
- Human and Machine Cognition Lab, University of Tübingen, Tübingen, Germany
- Attention and Affect Lab, University of Tübingen, Tübingen, Germany
| | - Simon Ciranka
- Center for Adaptive Rationality, Max Planck Institute for Human Development, Berlin, Germany
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, Berlin, Germany
| | - Eric Schulz
- MPRG Computational Principles of Intelligence, Max Planck Institute for Biological Cybernetics, Tübingen, Germany
| | - Wouter van den Bos
- Department of Psychology, University of Amsterdam, Amsterdam, the Netherlands
- Amsterdam Brain and Cognition, University of Amsterdam, Amsterdam, the Netherlands
| | - Azzurra Ruggeri
- MPRG iSearch, Max Planck Institute for Human Development, Berlin, Germany
- School of Social Sciences and Technology, Technical University Munich, Munich, Germany
- Central European University, Vienna, Austria
| | - Björn Meder
- MPRG iSearch, Max Planck Institute for Human Development, Berlin, Germany
- Institute for Mind, Brain and Behavior, Health and Medical University, Potsdam, Germany
| | - Charley M Wu
- Human and Machine Cognition Lab, University of Tübingen, Tübingen, Germany.
- Center for Adaptive Rationality, Max Planck Institute for Human Development, Berlin, Germany.
| |
Collapse
|
10
|
De Panfilis C, Lis S. Difficulties in updating social information in personality disorders: A commentary on the article by Rosenblau et al. Neurosci Biobehav Rev 2023; 153:105387. [PMID: 37683989 DOI: 10.1016/j.neubiorev.2023.105387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 08/22/2023] [Accepted: 09/05/2023] [Indexed: 09/10/2023]
Affiliation(s)
- Chiara De Panfilis
- Unit of Neuroscience, Department of Medicine and Surgery, University of Parma, Italy.
| | - Stefanie Lis
- Department of Clinical Psychology, Department of Psychiatric and Psychosomatic Medicine, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany.
| |
Collapse
|
11
|
Pauli R, Brazil IA, Kohls G, Klein-Flügge MC, Rogers JC, Dikeos D, Dochnal R, Fairchild G, Fernández-Rivas A, Herpertz-Dahlmann B, Hervas A, Konrad K, Popma A, Stadler C, Freitag CM, De Brito SA, Lockwood PL. Action initiation and punishment learning differ from childhood to adolescence while reward learning remains stable. Nat Commun 2023; 14:5689. [PMID: 37709750 PMCID: PMC10502052 DOI: 10.1038/s41467-023-41124-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Accepted: 08/24/2023] [Indexed: 09/16/2023] Open
Abstract
Theoretical and empirical accounts suggest that adolescence is associated with heightened reward learning and impulsivity. Experimental tasks and computational models that can dissociate reward learning from the tendency to initiate actions impulsively (action initiation bias) are thus critical to characterise the mechanisms that drive developmental differences. However, existing work has rarely quantified both learning ability and action initiation, or it has relied on small samples. Here, using computational modelling of a learning task collected from a large sample (N = 742, 9-18 years, 11 countries), we test differences in reward and punishment learning and action initiation from childhood to adolescence. Computational modelling reveals that whilst punishment learning rates increase with age, reward learning remains stable. In parallel, action initiation biases decrease with age. Results are similar when considering pubertal stage instead of chronological age. We conclude that heightened reward responsivity in adolescence can reflect differences in action initiation rather than enhanced reward learning.
Collapse
Affiliation(s)
- Ruth Pauli
- Centre for Human Brain Health, School of Psychology, University of Birmingham, Birmingham, UK.
| | - Inti A Brazil
- Radboud University, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands
| | - Gregor Kohls
- Child Neuropsychology Section, Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, RWTH Aachen University, Aachen, Germany
- Department of Child and Adolescent Psychiatry, Faculty of Medicine, TU, Dresden, Germany
| | - Miriam C Klein-Flügge
- Department of Experimental Psychology, University of Oxford, Oxford, UK
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, UK
| | - Jack C Rogers
- Centre for Human Brain Health, School of Psychology, University of Birmingham, Birmingham, UK
- Institute for Mental Health, School of Psychology, University of Birmingham, Birmingham, UK
| | - Dimitris Dikeos
- Department of Psychiatry, Medical School, National and Kapodistrian University of Athens, Athens, Greece
| | - Roberta Dochnal
- Faculty of Medicine, Child and Adolescent Psychiatry, Department of the Child Health Center, Szeged University, Szeged, Hungary
| | | | | | - Beate Herpertz-Dahlmann
- Child Neuropsychology Section, Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, RWTH Aachen University, Aachen, Germany
| | - Amaia Hervas
- University Hospital Mutua Terrassa, Barcelona, Spain
| | - Kerstin Konrad
- Child Neuropsychology Section, Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, RWTH Aachen University, Aachen, Germany
- JARA-Brain Institute II, Molecular Neuroscience and Neuroimaging, RWTH Aachen and Research Centre Jülich, Jülich, Germany
| | - Arne Popma
- Department of Child and Adolescent Psychiatry, VU University Medical Center, Amsterdam, Netherlands
| | - Christina Stadler
- Department of Child and Adolescent Psychiatry, Psychiatric University Hospital, University of Basel, Basel, Switzerland
| | - Christine M Freitag
- Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, University Hospital Frankfurt, Goethe University, Frankfurt am Main, Germany
| | - Stephane A De Brito
- Centre for Human Brain Health, School of Psychology, University of Birmingham, Birmingham, UK
- Institute for Mental Health, School of Psychology, University of Birmingham, Birmingham, UK
| | - Patricia L Lockwood
- Centre for Human Brain Health, School of Psychology, University of Birmingham, Birmingham, UK.
- Department of Experimental Psychology, University of Oxford, Oxford, UK.
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, UK.
- Institute for Mental Health, School of Psychology, University of Birmingham, Birmingham, UK.
| |
Collapse
|
12
|
Schaaf JV, Weidinger L, Molleman L, van den Bos W. Test-retest reliability of reinforcement learning parameters. Behav Res Methods 2023:10.3758/s13428-023-02203-4. [PMID: 37684495 DOI: 10.3758/s13428-023-02203-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/18/2023] [Indexed: 09/10/2023]
Abstract
It has recently been suggested that parameter estimates of computational models can be used to understand individual differences at the process level. One area of research in which this approach, called computational phenotyping, has taken hold is computational psychiatry. One requirement for successful computational phenotyping is that behavior and parameters are stable over time. Surprisingly, the test-retest reliability of behavior and model parameters remains unknown for most experimental tasks and models. The present study seeks to close this gap by investigating the test-retest reliability of canonical reinforcement learning models in the context of two often-used learning paradigms: a two-armed bandit and a reversal learning task. We tested independent cohorts for the two tasks (N = 69 and N = 47) via an online testing platform with a between-test interval of five weeks. Whereas reliability was high for personality and cognitive measures (with ICCs ranging from .67 to .93), it was generally poor for the parameter estimates of the reinforcement learning models (with ICCs ranging from .02 to .52 for the bandit task and from .01 to .71 for the reversal learning task). Given that simulations indicated that our procedures could detect high test-retest reliability, this suggests that a significant proportion of the variability must be ascribed to the participants themselves. In support of that hypothesis, we show that mood (stress and happiness) can partly explain within-participant variability. Taken together, these results are critical for current practices in computational phenotyping and suggest that individual variability should be taken into account in the future development of the field.
Collapse
Affiliation(s)
- Jessica V Schaaf
- Department of Psychology, University of Amsterdam, Amsterdam, the Netherlands.
- Cognitive Neuroscience Department, Radboud University Medical Centre, Nijmegen, the Netherlands.
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen, the Netherlands.
| | - Laura Weidinger
- DeepMind, London, United Kingdom
- Center for Adaptive Rationality, Max Planck Institute for Human Development, Berlin, Germany
| | - Lucas Molleman
- Department of Psychology, University of Amsterdam, Amsterdam, the Netherlands
- Center for Adaptive Rationality, Max Planck Institute for Human Development, Berlin, Germany
| | - Wouter van den Bos
- Department of Psychology, University of Amsterdam, Amsterdam, the Netherlands
- Center for Adaptive Rationality, Max Planck Institute for Human Development, Berlin, Germany
| |
Collapse
|
13
|
Yip SW, Barch DM, Chase HW, Flagel S, Huys QJ, Konova AB, Montague R, Paulus M. From Computation to Clinic. BIOLOGICAL PSYCHIATRY GLOBAL OPEN SCIENCE 2023; 3:319-328. [PMID: 37519475 PMCID: PMC10382698 DOI: 10.1016/j.bpsgos.2022.03.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Revised: 02/25/2022] [Accepted: 03/22/2022] [Indexed: 12/12/2022] Open
Abstract
Theory-driven and data-driven computational approaches to psychiatry have enormous potential for elucidating mechanism of disease and providing translational linkages between basic science findings and the clinic. These approaches have already demonstrated utility in providing clinically relevant understanding, primarily via back translation from clinic to computation, revealing how specific disorders or symptoms map onto specific computational processes. Nonetheless, forward translation, from computation to clinic, remains rare. In addition, consensus regarding specific barriers to forward translation-and on the best strategies to overcome these barriers-is limited. This perspective review brings together expert basic and computationally trained researchers and clinicians to 1) identify challenges specific to preclinical model systems and clinical translation of computational models of cognition and affect, and 2) discuss practical approaches to overcoming these challenges. In doing so, we highlight recent evidence for the ability of computational approaches to predict treatment responses in psychiatric disorders and discuss considerations for maximizing the clinical relevance of such models (e.g., via longitudinal testing) and the likelihood of stakeholder adoption (e.g., via cost-effectiveness analyses).
Collapse
Affiliation(s)
- Sarah W. Yip
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut
| | - Deanna M. Barch
- Departments of Psychological & Brain Sciences, Psychiatry, and Radiology, Washington University, St. Louis, Missouri
| | - Henry W. Chase
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Shelly Flagel
- Department of Psychiatry and Michigan Neuroscience Institute, University of Michigan, Ann Arbor, Michigan
| | - Quentin J.M. Huys
- Division of Psychiatry and Max Planck UCL Centre for Computational Psychiatry and Ageing Research, Institute of Neurology, University College London, London, United Kingdom
- Camden and Islington NHS Foundation Trust, London, United Kingdom
| | - Anna B. Konova
- Department of Psychiatry and Brain Health Institute, Rutgers University, Piscataway, New Jersey
| | - Read Montague
- Fralin Biomedical Research Institute and Department of Physics, Virginia Tech, Blacksburg, Virginia
| | - Martin Paulus
- Laureate Institute for Brain Research, Tulsa, Oklahoma
| |
Collapse
|
14
|
Topel S, Ma I, Sleutels J, van Steenbergen H, de Bruijn ERA, van Duijvenvoorde ACK. Expecting the unexpected: a review of learning under uncertainty across development. COGNITIVE, AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2023:10.3758/s13415-023-01098-0. [PMID: 37237092 PMCID: PMC10390612 DOI: 10.3758/s13415-023-01098-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 03/28/2023] [Indexed: 05/28/2023]
Abstract
Many of our decisions take place under uncertainty. To successfully navigate the environment, individuals need to estimate the degree of uncertainty and adapt their behaviors accordingly by learning from experiences. However, uncertainty is a broad construct and distinct types of uncertainty may differentially influence our learning. We provide a semi-systematic review to illustrate cognitive and neurobiological processes involved in learning under two types of uncertainty: learning in environments with stochastic outcomes, and with volatile outcomes. We specifically reviewed studies (N = 26 studies) that included an adolescent population, because adolescence is a period in life characterized by heightened exploration and learning, as well as heightened uncertainty due to experiencing many new, often social, environments. Until now, reviews have not comprehensively compared learning under distinct types of uncertainties in this age range. Our main findings show that although the overall developmental patterns were mixed, most studies indicate that learning from stochastic outcomes, as indicated by increased accuracy in performance, improved with age. We also found that adolescents tended to have an advantage compared with adults and children when learning from volatile outcomes. We discuss potential mechanisms explaining these age-related differences and conclude by outlining future research directions.
Collapse
Affiliation(s)
- Selin Topel
- Leiden University, Institute of Psychology, Wassenaarseweg 52, 2333, AK, Leiden, The Netherlands.
- Leiden Institute for Brain and Cognition, Leiden, The Netherlands.
| | - Ili Ma
- Leiden University, Institute of Psychology, Wassenaarseweg 52, 2333, AK, Leiden, The Netherlands
- Leiden Institute for Brain and Cognition, Leiden, The Netherlands
| | - Jan Sleutels
- Leiden University, Institute of Psychology, Wassenaarseweg 52, 2333, AK, Leiden, The Netherlands
- Leiden University, Institute for Philosophy, Leiden, The Netherlands
| | - Henk van Steenbergen
- Leiden University, Institute of Psychology, Wassenaarseweg 52, 2333, AK, Leiden, The Netherlands
- Leiden Institute for Brain and Cognition, Leiden, The Netherlands
| | - Ellen R A de Bruijn
- Leiden University, Institute of Psychology, Wassenaarseweg 52, 2333, AK, Leiden, The Netherlands
- Leiden Institute for Brain and Cognition, Leiden, The Netherlands
| | - Anna C K van Duijvenvoorde
- Leiden University, Institute of Psychology, Wassenaarseweg 52, 2333, AK, Leiden, The Netherlands
- Leiden Institute for Brain and Cognition, Leiden, The Netherlands
| |
Collapse
|
15
|
Towner E, Chierchia G, Blakemore SJ. Sensitivity and specificity in affective and social learning in adolescence. Trends Cogn Sci 2023:S1364-6613(23)00092-X. [PMID: 37198089 DOI: 10.1016/j.tics.2023.04.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 03/23/2023] [Accepted: 04/05/2023] [Indexed: 05/19/2023]
Abstract
Adolescence is a period of heightened affective and social sensitivity. In this review we address how this increased sensitivity influences associative learning. Based on recent evidence from human and rodent studies, as well as advances in computational biology, we suggest that, compared to other age groups, adolescents show features of heightened Pavlovian learning but tend to perform worse than adults at instrumental learning. Because Pavlovian learning does not involve decision-making, whereas instrumental learning does, we propose that these developmental differences might be due to heightened sensitivity to rewards and threats in adolescence, coupled with a lower specificity of responding. We discuss the implications of these findings for adolescent mental health and education.
Collapse
Affiliation(s)
- Emily Towner
- Department of Psychology, University of Cambridge, Downing Street, Cambridge, UK.
| | - Gabriele Chierchia
- Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy; Department of Psychology, University of Cambridge, Downing Street, Cambridge, UK
| | | |
Collapse
|
16
|
He Q, Beveridge EH, Vargas V, Salen A, Brown TI. Effects of Acute Stress on Rigid Learning, Flexible Learning, and Value-Based Decision-Making in Spatial Navigation. Psychol Sci 2023; 34:552-567. [PMID: 36944163 DOI: 10.1177/09567976231155870] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2023] Open
Abstract
The current study investigated how stress affects value-based decision-making during spatial navigation and different types of learning underlying decisions. Eighty-two adult participants (42 females) first learned to find object locations in a virtual environment from a fixed starting location (rigid learning) and then to find the same objects from unpredictable starting locations (flexible learning). Participants then decided whether to reach goal objects from the fixed or unpredictable starting location. We found that stress impairs rigid learning in females, and it does not impair, and even improves, flexible learning when performance with rigid learning is controlled for. Critically, examining how earlier learning influences subsequent decision-making using computational models, we found that stress reduces memory integration, making participants more likely to focus on recent memory and less likely to integrate information from other sources. Collectively, our results show how stress impacts different memory systems and the communication between memory and decision-making.
Collapse
Affiliation(s)
- Qiliang He
- School of Psychology, Georgia Institute of Technology
| | | | - Vanesa Vargas
- School of Psychology, Georgia Institute of Technology
| | - Ashley Salen
- School of Psychology, Georgia Institute of Technology
| | | |
Collapse
|
17
|
Karvelis P, Paulus MP, Diaconescu AO. Individual differences in computational psychiatry: a review of current challenges. Neurosci Biobehav Rev 2023; 148:105137. [PMID: 36940888 DOI: 10.1016/j.neubiorev.2023.105137] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 03/04/2023] [Accepted: 03/14/2023] [Indexed: 03/23/2023]
Abstract
Bringing precision to the understanding and treatment of mental disorders requires instruments for studying clinically relevant individual differences. One promising approach is the development of computational assays: integrating computational models with cognitive tasks to infer latent patient-specific disease processes in brain computations. While recent years have seen many methodological advancements in computational modelling and many cross-sectional patient studies, much less attention has been paid to basic psychometric properties (reliability and construct validity) of the computational measures provided by the assays. In this review, we assess the extent of this issue by examining emerging empirical evidence. We find that many computational measures suffer from poor psychometric properties, which poses a risk of invalidating previous findings and undermining ongoing research efforts using computational assays to study individual (and even group) differences. We provide recommendations for how to address these problems and, crucially, embed them within a broader perspective on key developments that are needed for translating computational assays to clinical practice.
Collapse
Affiliation(s)
- Povilas Karvelis
- Krembil Centre for Neuroinformatics, Centre for Addiction and Mental Health (CAMH), Toronto, ON, Canada.
| | - Martin P Paulus
- Laureate Institute for Brain Research, Tulsa, OK, USA; Oxley College of Health Sciences, The University of Tulsa, Tulsa, OK, USA
| | - Andreea O Diaconescu
- Krembil Centre for Neuroinformatics, Centre for Addiction and Mental Health (CAMH), Toronto, ON, Canada; Department of Psychiatry, University of Toronto, Toronto, ON, Canada; Institute of Medical Sciences, University of Toronto, Toronto, ON, Canada; Department of Psychology, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
18
|
Rutherford AV, McDougle SD, Joormann J. "Don't [ruminate], be happy": A cognitive perspective linking depression and anhedonia. Clin Psychol Rev 2023; 101:102255. [PMID: 36871425 DOI: 10.1016/j.cpr.2023.102255] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Revised: 12/19/2022] [Accepted: 02/16/2023] [Indexed: 02/22/2023]
Abstract
Anhedonia, a lack of pleasure in things an individual once enjoyed, and rumination, the process of perseverative and repetitive attention to specific thoughts, are hallmark features of depression. Though these both contribute to the same debilitating disorder, they have often been studied independently and through different theoretical lenses (e.g., biological vs. cognitive). Cognitive theories and research on rumination have largely focused on understanding negative affect in depression with much less focus on the etiology and maintenance of anhedonia. In this paper, we argue that by examining the relation between cognitive constructs and deficits in positive affect, we may better understand anhedonia in depression thereby improving prevention and intervention efforts. We review the extant literature on cognitive deficits in depression and discuss how these dysfunctions may not only lead to sustained negative affect but, importantly, interfere with an ability to attend to social and environmental cues that could restore positive affect. Specifically, we discuss how rumination is associated to deficits in working memory and propose that these deficits in working memory may contribute to anhedonia in depression. We further argue that analytical approaches such as computational modeling are needed to study these questions and, finally, discuss implications for treatment.
Collapse
Affiliation(s)
| | | | - Jutta Joormann
- Department of Psychology, Yale University, New Haven, CT, USA
| |
Collapse
|
19
|
Heald JB, Lengyel M, Wolpert DM. Contextual inference in learning and memory. Trends Cogn Sci 2023; 27:43-64. [PMID: 36435674 PMCID: PMC9789331 DOI: 10.1016/j.tics.2022.10.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 10/11/2022] [Accepted: 10/12/2022] [Indexed: 11/25/2022]
Abstract
Context is widely regarded as a major determinant of learning and memory across numerous domains, including classical and instrumental conditioning, episodic memory, economic decision-making, and motor learning. However, studies across these domains remain disconnected due to the lack of a unifying framework formalizing the concept of context and its role in learning. Here, we develop a unified vernacular allowing direct comparisons between different domains of contextual learning. This leads to a Bayesian model positing that context is unobserved and needs to be inferred. Contextual inference then controls the creation, expression, and updating of memories. This theoretical approach reveals two distinct components that underlie adaptation, proper and apparent learning, respectively referring to the creation and updating of memories versus time-varying adjustments in their expression. We review a number of extensions of the basic Bayesian model that allow it to account for increasingly complex forms of contextual learning.
Collapse
Affiliation(s)
- James B Heald
- Department of Neuroscience, Columbia University, New York, NY 10027, USA; Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA.
| | - Máté Lengyel
- Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, Cambridge, UK; Center for Cognitive Computation, Department of Cognitive Science, Central European University, Budapest, Hungary.
| | - Daniel M Wolpert
- Department of Neuroscience, Columbia University, New York, NY 10027, USA; Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA; Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, Cambridge, UK.
| |
Collapse
|
20
|
Fan C, Yao L, Zhang J, Zhen Z, Wu X. Advanced Reinforcement Learning and Its Connections with Brain Neuroscience. RESEARCH (WASHINGTON, D.C.) 2023; 6:0064. [PMID: 36939448 PMCID: PMC10017102 DOI: 10.34133/research.0064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/27/2022] [Accepted: 01/10/2023] [Indexed: 01/22/2023]
Abstract
In recent years, brain science and neuroscience have greatly propelled the innovation of computer science. In particular, knowledge from the neurobiology and neuropsychology of the brain revolutionized the development of reinforcement learning (RL) by providing novel interpretable mechanisms of how the brain achieves intelligent and efficient decision making. Triggered by this, there has been a boom in research about advanced RL algorithms that are built upon the inspirations of brain neuroscience. In this work, to further strengthen the bidirectional link between the 2 communities and especially promote the research on modern RL technology, we provide a comprehensive survey of recent advances in the area of brain-inspired/related RL algorithms. We start with basis theories of RL, and present a concise introduction to brain neuroscience related to RL. Then, we classify these advanced RL methodologies into 3 categories according to different connections of the brain, i.e., micro-neural activity, macro-brain structure, and cognitive function. Each category is further surveyed by presenting several modern RL algorithms along with their mathematical models, correlations with the brain, and open issues. Finally, we introduce several important applications of RL algorithms, followed by the discussions of challenges and opportunities for future research.
Collapse
Affiliation(s)
- Chaoqiong Fan
- School of Artificial Intelligence,
Beijing Normal University, Beijing, China
| | - Li Yao
- School of Artificial Intelligence,
Beijing Normal University, Beijing, China
| | - Jiacai Zhang
- School of Artificial Intelligence,
Beijing Normal University, Beijing, China
| | - Zonglei Zhen
- Faculty of Psychology,
Beijing Normal University, Beijing, China
| | - Xia Wu
- School of Artificial Intelligence,
Beijing Normal University, Beijing, China
- Address correspondence to:
| |
Collapse
|
21
|
Abstract
In reinforcement learning (RL) experiments, participants learn to make rewarding choices in response to different stimuli; RL models use outcomes to estimate stimulus-response values that change incrementally. RL models consider any response type indiscriminately, ranging from more concretely defined motor choices (pressing a key with the index finger), to more general choices that can be executed in a number of ways (selecting dinner at the restaurant). However, does the learning process vary as a function of the choice type? In Experiment 1, we show that it does: Participants were slower and less accurate in learning correct choices of a general format compared with learning more concrete motor actions. Using computational modeling, we show that two mechanisms contribute to this. First, there was evidence of irrelevant credit assignment: The values of motor actions interfered with the values of other choice dimensions, resulting in more incorrect choices when the correct response was not defined by a single motor action; second, information integration for relevant general choices was slower. In Experiment 2, we replicated and further extended the findings from Experiment 1 by showing that slowed learning was attributable to weaker working memory use, rather than slowed RL. In both experiments, we ruled out the explanation that the difference in performance between two condition types was driven by difficulty/different levels of complexity. We conclude that defining a more abstract choice space used by multiple learning systems for credit assignment recruits executive resources, limiting how much such processes then contribute to fast learning.
Collapse
Affiliation(s)
| | - Amy Zou
- University of California, Berkeley
| | - Anne G E Collins
- University of California, Berkeley
- Helen Wills Neuroscience Institute Berkeley, CA
| |
Collapse
|
22
|
Vinckier F, Jaffre C, Gauthier C, Smajda S, Abdel-Ahad P, Le Bouc R, Daunizeau J, Fefeu M, Borderies N, Plaze M, Gaillard R, Pessiglione M. Elevated Effort Cost Identified by Computational Modeling as a Distinctive Feature Explaining Multiple Behaviors in Patients With Depression. BIOLOGICAL PSYCHIATRY. COGNITIVE NEUROSCIENCE AND NEUROIMAGING 2022; 7:1158-1169. [PMID: 35952972 DOI: 10.1016/j.bpsc.2022.07.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 07/14/2022] [Accepted: 07/25/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND Motivational deficit is a core clinical manifestation of depression and a strong predictor of treatment failure. However, the underlying mechanisms, which cannot be accessed through conventional questionnaire-based scoring, remain largely unknown. According to decision theory, apathy could result either from biased subjective estimates (of action costs or outcomes) or from dysfunctional processes (in making decisions or allocating resources). METHODS Here, we combined a series of behavioral tasks with computational modeling to elucidate the motivational deficits of 35 patients with unipolar or bipolar depression under various treatments compared with 35 matched healthy control subjects. RESULTS The most striking feature, which was observed independent of medication across preference tasks (likeability ratings and binary decisions), performance tasks (physical and mental effort exertion), and instrumental learning tasks (updating choices to maximize outcomes), was an elevated sensitivity to effort cost. By contrast, sensitivity to action outcomes (reward and punishment) and task-specific processes were relatively spared. CONCLUSIONS These results highlight effort cost as a critical dimension that might explain multiple behavioral changes in patients with depression. More generally, they validate a test battery for computational phenotyping of motivational states, which could orientate toward specific medication or rehabilitation therapy, and thereby help pave the way for more personalized medicine in psychiatry.
Collapse
Affiliation(s)
- Fabien Vinckier
- Motivation, Brain & Behavior lab Institut du Cerveau, Hôpital Pitié-Salpêtrière, Paris, France; Université Paris Cité, Paris, France; Department of Psychiatry, Service Hospitalo-Universitaire, GHU Paris Psychiatrie & Neurosciences, Paris, France.
| | - Claire Jaffre
- Motivation, Brain & Behavior lab Institut du Cerveau, Hôpital Pitié-Salpêtrière, Paris, France; Université Paris Cité, Paris, France; Department of Psychiatry, Service Hospitalo-Universitaire, GHU Paris Psychiatrie & Neurosciences, Paris, France
| | - Claire Gauthier
- Université Paris Cité, Paris, France; Department of Psychiatry, Service Hospitalo-Universitaire, GHU Paris Psychiatrie & Neurosciences, Paris, France
| | - Sarah Smajda
- Université Paris Cité, Paris, France; Department of Psychiatry, Service Hospitalo-Universitaire, GHU Paris Psychiatrie & Neurosciences, Paris, France
| | - Pierre Abdel-Ahad
- Université Paris Cité, Paris, France; Department of Psychiatry, Service Hospitalo-Universitaire, GHU Paris Psychiatrie & Neurosciences, Paris, France
| | - Raphaël Le Bouc
- Motivation, Brain & Behavior lab Institut du Cerveau, Hôpital Pitié-Salpêtrière, Paris, France; Urgences cérébro-vasculaires, Pitié-Salpêtrière Hospital, Sorbonne University, Assistance Publique Hôpitaux de Paris, Paris, France; Zurich Center for Neuroeconomics, Department of Economics, University of Zurich, Zurich, Switzerland
| | - Jean Daunizeau
- Motivation, Brain & Behavior lab Institut du Cerveau, Hôpital Pitié-Salpêtrière, Paris, France; Sorbonne Universités, Inserm, CNRS, Paris, France
| | - Mylène Fefeu
- Université Paris Cité, Paris, France; Department of Psychiatry, Service Hospitalo-Universitaire, GHU Paris Psychiatrie & Neurosciences, Paris, France
| | - Nicolas Borderies
- Motivation, Brain & Behavior lab Institut du Cerveau, Hôpital Pitié-Salpêtrière, Paris, France
| | - Marion Plaze
- Université Paris Cité, Paris, France; Department of Psychiatry, Service Hospitalo-Universitaire, GHU Paris Psychiatrie & Neurosciences, Paris, France
| | - Raphaël Gaillard
- Université Paris Cité, Paris, France; Department of Psychiatry, Service Hospitalo-Universitaire, GHU Paris Psychiatrie & Neurosciences, Paris, France; Institut Pasteur, experimental neuropathology unit, Paris, France
| | - Mathias Pessiglione
- Motivation, Brain & Behavior lab Institut du Cerveau, Hôpital Pitié-Salpêtrière, Paris, France; Sorbonne Universités, Inserm, CNRS, Paris, France
| |
Collapse
|
23
|
Lin WC, Liu C, Kosillo P, Tai LH, Galarce E, Bateup HS, Lammel S, Wilbrecht L. Transient food insecurity during the juvenile-adolescent period affects adult weight, cognitive flexibility, and dopamine neurobiology. Curr Biol 2022; 32:3690-3703.e5. [PMID: 35863352 PMCID: PMC10519557 DOI: 10.1016/j.cub.2022.06.089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 04/01/2022] [Accepted: 06/29/2022] [Indexed: 10/17/2022]
Abstract
A major challenge for neuroscience, public health, and evolutionary biology is to understand the effects of scarcity and uncertainty on the developing brain. Currently, a significant fraction of children and adolescents worldwide experience insecure access to food. The goal of our work was to test in mice whether the transient experience of insecure versus secure access to food during the juvenile-adolescent period produced lasting differences in learning, decision-making, and the dopamine system in adulthood. We manipulated feeding schedules in mice from postnatal day (P)21 to P40 as food insecure or ad libitum and found that when tested in adulthood (after P60), males with different developmental feeding history showed significant differences in multiple metrics of cognitive flexibility in learning and decision-making. Adult females with different developmental feeding history showed no differences in cognitive flexibility but did show significant differences in adult weight. We next applied reinforcement learning models to these behavioral data. The best fit models suggested that in males, developmental feeding history altered how mice updated their behavior after negative outcomes. This effect was sensitive to task context and reward contingencies. Consistent with these results, in males, we found that the two feeding history groups showed significant differences in the AMPAR/NMDAR ratio of excitatory synapses on nucleus-accumbens-projecting midbrain dopamine neurons and evoked dopamine release in dorsal striatal targets. Together, these data show in a rodent model that transient differences in feeding history in the juvenile-adolescent period can have significant impacts on adult weight, learning, decision-making, and dopamine neurobiology.
Collapse
Affiliation(s)
- Wan Chen Lin
- Helen Wills Neuroscience Institute, University of California Berkeley, Berkeley, CA 94720, USA
| | - Christine Liu
- Helen Wills Neuroscience Institute, University of California Berkeley, Berkeley, CA 94720, USA
| | - Polina Kosillo
- Department of Molecular and Cellular Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Lung-Hao Tai
- Helen Wills Neuroscience Institute, University of California Berkeley, Berkeley, CA 94720, USA
| | - Ezequiel Galarce
- Robert Wood Johnson Foundation Health and Society Scholar, University of California Berkeley, Berkeley, CA 94720, USA
| | - Helen S Bateup
- Helen Wills Neuroscience Institute, University of California Berkeley, Berkeley, CA 94720, USA; Department of Molecular and Cellular Biology, University of California Berkeley, Berkeley, CA 94720, USA; Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| | - Stephan Lammel
- Helen Wills Neuroscience Institute, University of California Berkeley, Berkeley, CA 94720, USA; Department of Molecular and Cellular Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Linda Wilbrecht
- Helen Wills Neuroscience Institute, University of California Berkeley, Berkeley, CA 94720, USA; Department of Psychology, University of California Berkeley, Berkeley, CA 94720, USA.
| |
Collapse
|
24
|
Nussenbaum K, Velez JA, Washington BT, Hamling HE, Hartley CA. Flexibility in valenced reinforcement learning computations across development. Child Dev 2022; 93:1601-1615. [PMID: 35596654 PMCID: PMC9831067 DOI: 10.1111/cdev.13791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Optimal integration of positive and negative outcomes during learning varies depending on an environment's reward statistics. The present study investigated the extent to which children, adolescents, and adults (N = 142 8-25 year-olds, 55% female, 42% White, 31% Asian, 17% mixed race, and 8% Black; data collected in 2021) adapt their weighting of better-than-expected and worse-than-expected outcomes when learning from reinforcement. Participants made choices across two contexts: one in which weighting positive outcomes more heavily than negative outcomes led to better performance, and one in which the reverse was true. Reinforcement learning modeling revealed that across age, participants shifted their valence biases in accordance with environmental structure. Exploratory analyses revealed strengthening of context-dependent flexibility with increasing age.
Collapse
Affiliation(s)
| | | | | | | | - Catherine A. Hartley
- Corresponding Author: Catherine A. Hartley, Department of Psychology, New York University, 6 Washington Place, Room 871A, New York, NY, 10003.
| |
Collapse
|
25
|
A comparison of reinforcement learning models of human spatial navigation. Sci Rep 2022; 12:13923. [PMID: 35978035 PMCID: PMC9385652 DOI: 10.1038/s41598-022-18245-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 08/08/2022] [Indexed: 11/09/2022] Open
Abstract
Reinforcement learning (RL) models have been influential in characterizing human learning and decision making, but few studies apply them to characterizing human spatial navigation and even fewer systematically compare RL models under different navigation requirements. Because RL can characterize one's learning strategies quantitatively and in a continuous manner, and one's consistency of using such strategies, it can provide a novel and important perspective for understanding the marked individual differences in human navigation and disentangle navigation strategies from navigation performance. One-hundred and fourteen participants completed wayfinding tasks in a virtual environment where different phases manipulated navigation requirements. We compared performance of five RL models (3 model-free, 1 model-based and 1 "hybrid") at fitting navigation behaviors in different phases. Supporting implications from prior literature, the hybrid model provided the best fit regardless of navigation requirements, suggesting the majority of participants rely on a blend of model-free (route-following) and model-based (cognitive mapping) learning in such navigation scenarios. Furthermore, consistent with a key prediction, there was a correlation in the hybrid model between the weight on model-based learning (i.e., navigation strategy) and the navigator's exploration vs. exploitation tendency (i.e., consistency of using such navigation strategy), which was modulated by navigation task requirements. Together, we not only show how computational findings from RL align with the spatial navigation literature, but also reveal how the relationship between navigation strategy and a person's consistency using such strategies changes as navigation requirements change.
Collapse
|
26
|
Fengler A, Bera K, Pedersen ML, Frank MJ. Beyond Drift Diffusion Models: Fitting a Broad Class of Decision and Reinforcement Learning Models with HDDM. J Cogn Neurosci 2022; 34:1780-1805. [PMID: 35939629 DOI: 10.1162/jocn_a_01902] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Computational modeling has become a central aspect of research in the cognitive neurosciences. As the field matures, it is increasingly important to move beyond standard models to quantitatively assess models with richer dynamics that may better reflect underlying cognitive and neural processes. For example, sequential sampling models (SSMs) are a general class of models of decision-making intended to capture processes jointly giving rise to RT distributions and choice data in n-alternative choice paradigms. A number of model variations are of theoretical interest, but empirical data analysis has historically been tied to a small subset for which likelihood functions are analytically tractable. Advances in methods designed for likelihood-free inference have recently made it computationally feasible to consider a much larger spectrum of SSMs. In addition, recent work has motivated the combination of SSMs with reinforcement learning models, which had historically been considered in separate literatures. Here, we provide a significant addition to the widely used HDDM Python toolbox and include a tutorial for how users can easily fit and assess a (user-extensible) wide variety of SSMs and how they can be combined with reinforcement learning models. The extension comes batteries included, including model visualization tools, posterior predictive checks, and ability to link trial-wise neural signals with model parameters via hierarchical Bayesian regression.
Collapse
|
27
|
Lan DCL, Browning M. What Can Reinforcement Learning Models of Dopamine and Serotonin Tell Us about the Action of Antidepressants? COMPUTATIONAL PSYCHIATRY (CAMBRIDGE, MASS.) 2022; 6:166-188. [PMID: 38774776 PMCID: PMC11104395 DOI: 10.5334/cpsy.83] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 06/29/2022] [Indexed: 11/20/2022]
Abstract
Although evidence suggests that antidepressants are effective at treating depression, the mechanisms behind antidepressant action remain unclear, especially at the cognitive/computational level. In recent years, reinforcement learning (RL) models have increasingly been used to characterise the roles of neurotransmitters and to probe the computations that might be altered in psychiatric disorders like depression. Hence, RL models might present an opportunity for us to better understand the computational mechanisms underlying antidepressant effects. Moreover, RL models may also help us shed light on how these computations may be implemented in the brain (e.g., in midbrain, striatal, and prefrontal regions) and how these neural mechanisms may be altered in depression and remediated by antidepressant treatments. In this paper, we evaluate the ability of RL models to help us understand the processes underlying antidepressant action. To do this, we review the preclinical literature on the roles of dopamine and serotonin in RL, draw links between these findings and clinical work investigating computations altered in depression, and appraise the evidence linking modification of RL processes to antidepressant function. Overall, while there is no shortage of promising ideas about the computational mechanisms underlying antidepressant effects, there is insufficient evidence directly implicating these mechanisms in the response of depressed patients to antidepressant treatment. Consequently, future studies should investigate these mechanisms in samples of depressed patients and assess whether modifications in RL processes mediate the clinical effect of antidepressant treatments.
Collapse
Affiliation(s)
- Denis C. L. Lan
- Department of Experimental Psychology, University of Oxford, Oxford, GB
| | | |
Collapse
|
28
|
Palminteri S, Lebreton M. The computational roots of positivity and confirmation biases in reinforcement learning. Trends Cogn Sci 2022; 26:607-621. [PMID: 35662490 DOI: 10.1016/j.tics.2022.04.005] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 04/13/2022] [Accepted: 04/18/2022] [Indexed: 12/16/2022]
Abstract
Humans do not integrate new information objectively: outcomes carrying a positive affective value and evidence confirming one's own prior belief are overweighed. Until recently, theoretical and empirical accounts of the positivity and confirmation biases assumed them to be specific to 'high-level' belief updates. We present evidence against this account. Learning rates in reinforcement learning (RL) tasks, estimated across different contexts and species, generally present the same characteristic asymmetry, suggesting that belief and value updating processes share key computational principles and distortions. This bias generates over-optimistic expectations about the probability of making the right choices and, consequently, generates over-optimistic reward expectations. We discuss the normative and neurobiological roots of these RL biases and their position within the greater picture of behavioral decision-making theories.
Collapse
Affiliation(s)
- Stefano Palminteri
- Laboratoire de Neurosciences Cognitives et Computationnelles, Institut National de la Santé et Recherche Médicale, Paris, France; Département d'Études Cognitives, Ecole Normale Supérieure, Paris, France; Université de Recherche Paris Sciences et Lettres, Paris, France.
| | - Maël Lebreton
- Paris School of Economics, Paris, France; LabNIC, Department of Fundamental Neurosciences, University of Geneva, Geneva, Switzerland; Swiss Center for Affective Science, Geneva, Switzerland.
| |
Collapse
|
29
|
Eckstein MK, Master SL, Dahl RE, Wilbrecht L, Collins AG. Reinforcement learning and bayesian inference provide complementary models for the unique advantage of adolescents in stochastic reversal. Dev Cogn Neurosci 2022; 55:101106. [PMID: 35537273 PMCID: PMC9108470 DOI: 10.1016/j.dcn.2022.101106] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 03/01/2022] [Accepted: 03/25/2022] [Indexed: 12/02/2022] Open
Abstract
During adolescence, youth venture out, explore the wider world, and are challenged to learn how to navigate novel and uncertain environments. We investigated how performance changes across adolescent development in a stochastic, volatile reversal-learning task that uniquely taxes the balance of persistence and flexibility. In a sample of 291 participants aged 8–30, we found that in the mid-teen years, adolescents outperformed both younger and older participants. We developed two independent cognitive models, based on Reinforcement learning (RL) and Bayesian inference (BI). The RL parameter for learning from negative outcomes and the BI parameters specifying participants’ mental models were closest to optimal in mid-teen adolescents, suggesting a central role in adolescent cognitive processing. By contrast, persistence and noise parameters improved monotonically with age. We distilled the insights of RL and BI using principal component analysis and found that three shared components interacted to form the adolescent performance peak: adult-like behavioral quality, child-like time scales, and developmentally-unique processing of positive feedback. This research highlights adolescence as a neurodevelopmental window that can create performance advantages in volatile and uncertain environments. It also shows how detailed insights can be gleaned by using cognitive models in new ways.
Collapse
|
30
|
Pike AC, Robinson OJ. Reinforcement Learning in Patients With Mood and Anxiety Disorders vs Control Individuals: A Systematic Review and Meta-analysis. JAMA Psychiatry 2022; 79:313-322. [PMID: 35234834 PMCID: PMC8892374 DOI: 10.1001/jamapsychiatry.2022.0051] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
IMPORTANCE Computational psychiatry studies have investigated how reinforcement learning may be different in individuals with mood and anxiety disorders compared with control individuals, but results are inconsistent. OBJECTIVE To assess whether there are consistent differences in reinforcement-learning parameters between patients with depression or anxiety and control individuals. DATA SOURCES Web of Knowledge, PubMed, Embase, and Google Scholar searches were performed between November 15, 2019, and December 6, 2019, and repeated on December 3, 2020, and February 23, 2021, with keywords (reinforcement learning) AND (computational OR model) AND (depression OR anxiety OR mood). STUDY SELECTION Studies were included if they fit reinforcement-learning models to human choice data from a cognitive task with rewards or punishments, had a case-control design including participants with mood and/or anxiety disorders and healthy control individuals, and included sufficient information about all parameters in the models. DATA EXTRACTION AND SYNTHESIS Articles were assessed for inclusion according to MOOSE guidelines. Participant-level parameters were extracted from included articles, and a conventional meta-analysis was performed using a random-effects model. Subsequently, these parameters were used to simulate choice performance for each participant on benchmarking tasks in a simulation meta-analysis. Models were fitted, parameters were extracted using bayesian model averaging, and differences between patients and control individuals were examined. Overall effect sizes across analytic strategies were inspected. MAIN OUTCOMES AND MEASURES The primary outcomes were estimated reinforcement-learning parameters (learning rate, inverse temperature, reward learning rate, and punishment learning rate). RESULTS A total of 27 articles were included (3085 participants, 1242 of whom had depression and/or anxiety). In the conventional meta-analysis, patients showed lower inverse temperature than control individuals (standardized mean difference [SMD], -0.215; 95% CI, -0.354 to -0.077), although no parameters were common across all studies, limiting the ability to infer differences. In the simulation meta-analysis, patients showed greater punishment learning rates (SMD, 0.107; 95% CI, 0.107 to 0.108) and slightly lower reward learning rates (SMD, -0.021; 95% CI, -0.022 to -0.020) relative to control individuals. The simulation meta-analysis showed no meaningful difference in inverse temperature between patients and control individuals (SMD, 0.003; 95% CI, 0.002 to 0.004). CONCLUSIONS AND RELEVANCE The simulation meta-analytic approach introduced in this article for inferring meta-group differences from heterogeneous computational psychiatry studies indicated elevated punishment learning rates in patients compared with control individuals. This difference may promote and uphold negative affective bias symptoms and hence constitute a potential mechanistic treatment target for mood and anxiety disorders.
Collapse
Affiliation(s)
- Alexandra C. Pike
- Anxiety Lab, Neuroscience and Mental Health Group, Institute of Cognitive Neuroscience, University College London, London, United Kingdom
| | - Oliver J. Robinson
- Anxiety Lab, Neuroscience and Mental Health Group, Institute of Cognitive Neuroscience, University College London, London, United Kingdom,Research Department of Clinical, Educational and Health Psychology, University College London, London, United Kingdom
| |
Collapse
|
31
|
Effective of Smart Mathematical Model by Machine Learning Classifier on Big Data in Healthcare Fast Response. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:6927170. [PMID: 35251298 PMCID: PMC8890881 DOI: 10.1155/2022/6927170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Revised: 02/02/2022] [Accepted: 02/07/2022] [Indexed: 11/17/2022]
Abstract
In the past few years, big data related to healthcare has become more important, due to the abundance of data, the increasing cost of healthcare, and the privacy of healthcare. Create, analyze, and process large and complex data that cannot be processed by traditional methods. The proposed method is based on classifying data into several classes using the data weight derived from the features extracted from the big data. Three important criteria were used to evaluate the study as well as to benchmark the current study with previous studies using a standard dataset.
Collapse
|
32
|
Kieslich K, Valton V, Roiser JP. Pleasure, Reward Value, Prediction Error and Anhedonia. Curr Top Behav Neurosci 2022; 58:281-304. [PMID: 35156187 DOI: 10.1007/7854_2021_295] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In order to develop effective treatments for anhedonia we need to understand its underlying neurobiological mechanisms. Anhedonia is conceptually strongly linked to reward processing, which involves a variety of cognitive and neural operations. This chapter reviews the evidence for impairments in experiencing hedonic response (pleasure), reward valuation and reward learning based on outcomes (commonly conceptualised in terms of "reward prediction error"). Synthesising behavioural and neuroimaging findings, we examine case-control studies of patients with depression and schizophrenia, including those focusing specifically on anhedonia. Overall, there is reliable evidence that depression and schizophrenia are associated with disrupted reward processing. In contrast to the historical definition of anhedonia, there is surprisingly limited evidence for impairment in the ability to experience pleasure in depression and schizophrenia. There is some evidence that learning about reward and reward prediction error signals are impaired in depression and schizophrenia, but the literature is inconsistent. The strongest evidence is for impairments in the representation of reward value and how this is used to guide action. Future studies would benefit from focusing on impairments in reward processing specifically in anhedonic samples, including transdiagnostically, and from using designs separating different components of reward processing, formulating them in computational terms, and moving beyond cross-sectional designs to provide an assessment of causality.
Collapse
Affiliation(s)
- Karel Kieslich
- Institute of Cognitive Neuroscience, University College London, London, UK
| | - Vincent Valton
- Institute of Cognitive Neuroscience, University College London, London, UK
| | - Jonathan P Roiser
- Institute of Cognitive Neuroscience, University College London, London, UK.
| |
Collapse
|
33
|
Eckstein MK, Master SL, Xia L, Dahl RE, Wilbrecht L, Collins AGE. The interpretation of computational model parameters depends on the context. eLife 2022; 11:75474. [PMID: 36331872 PMCID: PMC9635876 DOI: 10.7554/elife.75474] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 09/09/2022] [Indexed: 11/06/2022] Open
Abstract
Reinforcement Learning (RL) models have revolutionized the cognitive and brain sciences, promising to explain behavior from simple conditioning to complex problem solving, to shed light on developmental and individual differences, and to anchor cognitive processes in specific brain mechanisms. However, the RL literature increasingly reveals contradictory results, which might cast doubt on these claims. We hypothesized that many contradictions arise from two commonly-held assumptions about computational model parameters that are actually often invalid: That parameters generalize between contexts (e.g. tasks, models) and that they capture interpretable (i.e. unique, distinctive) neurocognitive processes. To test this, we asked 291 participants aged 8–30 years to complete three learning tasks in one experimental session, and fitted RL models to each. We found that some parameters (exploration / decision noise) showed significant generalization: they followed similar developmental trajectories, and were reciprocally predictive between tasks. Still, generalization was significantly below the methodological ceiling. Furthermore, other parameters (learning rates, forgetting) did not show evidence of generalization, and sometimes even opposite developmental trajectories. Interpretability was low for all parameters. We conclude that the systematic study of context factors (e.g. reward stochasticity; task volatility) will be necessary to enhance the generalizability and interpretability of computational cognitive models.
Collapse
Affiliation(s)
| | - Sarah L Master
- Department of Psychology, University of California, BerkeleyBerkeleyUnited States,Department of Psychology, New York UniversityNew YorkUnited States
| | - Liyu Xia
- Department of Psychology, University of California, BerkeleyBerkeleyUnited States,Department of Mathematics, University of California, BerkeleyBerkeleyUnited States
| | - Ronald E Dahl
- Institute of Human Development, University of California, BerkeleyBerkeleyUnited States
| | - Linda Wilbrecht
- Department of Psychology, University of California, BerkeleyBerkeleyUnited States,Helen Wills Neuroscience Institute, University of California, BerkeleyBerkeleyUnited States
| | - Anne GE Collins
- Department of Psychology, University of California, BerkeleyBerkeleyUnited States,Helen Wills Neuroscience Institute, University of California, BerkeleyBerkeleyUnited States
| |
Collapse
|
34
|
Collins AGE, Shenhav A. Advances in modeling learning and decision-making in neuroscience. Neuropsychopharmacology 2022; 47:104-118. [PMID: 34453117 PMCID: PMC8617262 DOI: 10.1038/s41386-021-01126-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Revised: 07/14/2021] [Accepted: 07/22/2021] [Indexed: 02/07/2023]
Abstract
An organism's survival depends on its ability to learn about its environment and to make adaptive decisions in the service of achieving the best possible outcomes in that environment. To study the neural circuits that support these functions, researchers have increasingly relied on models that formalize the computations required to carry them out. Here, we review the recent history of computational modeling of learning and decision-making, and how these models have been used to advance understanding of prefrontal cortex function. We discuss how such models have advanced from their origins in basic algorithms of updating and action selection to increasingly account for complexities in the cognitive processes required for learning and decision-making, and the representations over which they operate. We further discuss how a deeper understanding of the real-world complexities in these computations has shed light on the fundamental constraints on optimal behavior, and on the complex interactions between corticostriatal pathways to determine such behavior. The continuing and rapid development of these models holds great promise for understanding the mechanisms by which animals adapt to their environments, and what leads to maladaptive forms of learning and decision-making within clinical populations.
Collapse
Affiliation(s)
- Anne G E Collins
- Department of Psychology and Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, USA.
| | - Amitai Shenhav
- Department of Cognitive, Linguistic, & Psychological Sciences and Carney Institute for Brain Science, Brown University, Providence, RI, USA.
| |
Collapse
|
35
|
Yoo AH, Collins AGE. How Working Memory and Reinforcement Learning Are Intertwined: A Cognitive, Neural, and Computational Perspective. J Cogn Neurosci 2021; 34:551-568. [PMID: 34942642 DOI: 10.1162/jocn_a_01808] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Reinforcement learning and working memory are two core processes of human cognition and are often considered cognitively, neuroscientifically, and algorithmically distinct. Here, we show that the brain networks that support them actually overlap significantly and that they are less distinct cognitive processes than often assumed. We review literature demonstrating the benefits of considering each process to explain properties of the other and highlight recent work investigating their more complex interactions. We discuss how future research in both computational and cognitive sciences can benefit from one another, suggesting that a key missing piece for artificial agents to learn to behave with more human-like efficiency is taking working memory's role in learning seriously. This review highlights the risks of neglecting the interplay between different processes when studying human behavior (in particular when considering individual differences). We emphasize the importance of investigating these dynamics to build a comprehensive understanding of human cognition.
Collapse
|
36
|
FeldmanHall O, Nassar MR. The computational challenge of social learning. Trends Cogn Sci 2021; 25:1045-1057. [PMID: 34583876 PMCID: PMC8585698 DOI: 10.1016/j.tics.2021.09.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 08/31/2021] [Accepted: 09/01/2021] [Indexed: 10/20/2022]
Abstract
The complex reward structure of the social world and the uncertainty endemic to social contexts poses a challenge for modeling. For example, during social interactions, the actions of one person influence the internal states of another. These social dependencies make it difficult to formalize social learning problems in a mathematically tractable way. While it is tempting to dispense with these complexities, they are a defining feature of social life. Because the structure of social interactions challenges the simplifying assumptions often made in models, they make an ideal testbed for computational models of cognition. By adopting a framework that embeds existing social knowledge into the model, we can go beyond explaining behaviors in laboratory tasks to explaining those observed in the wild.
Collapse
Affiliation(s)
- Oriel FeldmanHall
- Department of Cognitive, Linguistic, and Psychological Sciences, Brown University, Providence, RI 02912, USA; Carney Institute for Brain Sciences, Brown University, Providence, RI 02912, USA.
| | - Matthew R Nassar
- Carney Institute for Brain Sciences, Brown University, Providence, RI 02912, USA; Department of Neuroscience, Brown University, Providence, RI 02912, USA
| |
Collapse
|
37
|
Bradfield L, Balleine B. Editorial overview: Value-based decision making: control, value, and context in action. Curr Opin Behav Sci 2021. [DOI: 10.1016/j.cobeha.2021.09.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|