1
|
Yoshida N, Daikoku T, Nagai Y, Kuniyoshi Y. Emergence of integrated behaviors through direct optimization for homeostasis. Neural Netw 2024; 177:106379. [PMID: 38762941 DOI: 10.1016/j.neunet.2024.106379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 04/11/2024] [Accepted: 05/06/2024] [Indexed: 05/21/2024]
Abstract
Homeostasis is a self-regulatory process, wherein an organism maintains a specific internal physiological state. Homeostatic reinforcement learning (RL) is a framework recently proposed in computational neuroscience to explain animal behavior. Homeostatic RL organizes the behaviors of autonomous embodied agents according to the demands of the internal dynamics of their bodies, coupled with the external environment. Thus, it provides a basis for real-world autonomous agents, such as robots, to continually acquire and learn integrated behaviors for survival. However, prior studies have generally explored problems pertaining to limited size, as the agent must handle observations of such coupled dynamics. To overcome this restriction, we developed an advanced method to realize scaled-up homeostatic RL using deep RL. Furthermore, several rewards for homeostasis have been proposed in the literature. We identified that the reward definition that uses the difference in drive function yields the best results. We created two benchmark environments for homeostasis and performed a behavioral analysis. The analysis showed that the trained agents in each environment changed their behavior based on their internal physiological states. Finally, we extended our method to address vision using deep convolutional neural networks. The analysis of a trained agent revealed that it has visual saliency rooted in the survival environment and internal representations resulting from multimodal input.
Collapse
Affiliation(s)
- Naoto Yoshida
- Graduate School of Information Science and Technology, The University of Tokyo, Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan; International Research Center for Neurointelligence (WPI-IRCN), Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan.
| | - Tatsuya Daikoku
- International Research Center for Neurointelligence (WPI-IRCN), Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Yukie Nagai
- International Research Center for Neurointelligence (WPI-IRCN), Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan; Institute for AI and Beyond, The University of Tokyo, Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Yasuo Kuniyoshi
- Graduate School of Information Science and Technology, The University of Tokyo, Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| |
Collapse
|
2
|
Anlló H, Bavard S, Benmarrakchi F, Bonagura D, Cerrotti F, Cicue M, Gueguen M, Guzmán EJ, Kadieva D, Kobayashi M, Lukumon G, Sartorio M, Yang J, Zinchenko O, Bahrami B, Silva Concha J, Hertz U, Konova AB, Li J, O'Madagain C, Navajas J, Reyes G, Sarabi-Jamab A, Shestakova A, Sukumaran B, Watanabe K, Palminteri S. Comparing experience- and description-based economic preferences across 11 countries. Nat Hum Behav 2024:10.1038/s41562-024-01894-9. [PMID: 38877287 DOI: 10.1038/s41562-024-01894-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 04/19/2024] [Indexed: 06/16/2024]
Abstract
Recent evidence indicates that reward value encoding in humans is highly context dependent, leading to suboptimal decisions in some cases, but whether this computational constraint on valuation is a shared feature of human cognition remains unknown. Here we studied the behaviour of n = 561 individuals from 11 countries of markedly different socioeconomic and cultural makeup. Our findings show that context sensitivity was present in all 11 countries. Suboptimal decisions generated by context manipulation were not explained by risk aversion, as estimated through a separate description-based choice task (that is, lotteries) consisting of matched decision offers. Conversely, risk aversion significantly differed across countries. Overall, our findings suggest that context-dependent reward value encoding is a feature of human cognition that remains consistently present across different countries, as opposed to description-based decision-making, which is more permeable to cultural factors.
Collapse
Affiliation(s)
- Hernán Anlló
- Human Reinforcement Learning Team, Laboratory of Cognitive and Computational Neuroscience, Paris, France.
- Faculty of Science and Engineering, Waseda University, Tokyo, Japan.
- Intercultural Cognitive Network, Paris, France.
| | - Sophie Bavard
- Human Reinforcement Learning Team, Laboratory of Cognitive and Computational Neuroscience, Paris, France
- Intercultural Cognitive Network, Paris, France
- General Psychology Lab, Hamburg University, Hamburg, Germany
| | - FatimaEzzahra Benmarrakchi
- Intercultural Cognitive Network, Paris, France
- School of Collective Intelligence, Université Mohammed VI Polytechnique, Rabat, Morocco
| | - Darla Bonagura
- Intercultural Cognitive Network, Paris, France
- Department of Psychiatry, University Behavioral Health Care and Brain Health Institute, Rutgers University-New Brunswick, Piscataway, NJ, USA
| | - Fabien Cerrotti
- Human Reinforcement Learning Team, Laboratory of Cognitive and Computational Neuroscience, Paris, France
- Intercultural Cognitive Network, Paris, France
| | - Mirona Cicue
- Department of Cognitive Sciences, University of Haifa, Haifa, Israel
| | - Maelle Gueguen
- Intercultural Cognitive Network, Paris, France
- Department of Psychiatry, University Behavioral Health Care and Brain Health Institute, Rutgers University-New Brunswick, Piscataway, NJ, USA
| | - Eugenio José Guzmán
- Facultad de Psicología, Universidad del Desarrollo, Santiago de Chile, Chile
| | - Dzerassa Kadieva
- International Laboratory for Social Neurobiology, Institute for Cognitive Neuroscience, HSE University, Moscow, Russia
| | - Maiko Kobayashi
- Faculty of Science and Engineering, Waseda University, Tokyo, Japan
| | - Gafari Lukumon
- School of Collective Intelligence, Université Mohammed VI Polytechnique, Rabat, Morocco
| | - Marco Sartorio
- Laboratorio de Neurociencia, Universidad Torcuato Di Tella, Buenos Aires, Argentina
| | - Jiong Yang
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Oksana Zinchenko
- Intercultural Cognitive Network, Paris, France
- Centre for Cognition and Decision Making, Institute for Cognitive Neuroscience, HSE University, Moscow, Russia
| | - Bahador Bahrami
- Intercultural Cognitive Network, Paris, France
- Department of Psychology, Ludwig Maximilian University, Munich, Germany
| | - Jaime Silva Concha
- Intercultural Cognitive Network, Paris, France
- Facultad de Psicología, Universidad del Desarrollo, Santiago de Chile, Chile
| | - Uri Hertz
- Intercultural Cognitive Network, Paris, France
- Department of Cognitive Sciences, University of Haifa, Haifa, Israel
| | - Anna B Konova
- Intercultural Cognitive Network, Paris, France
- Department of Psychiatry, University Behavioral Health Care and Brain Health Institute, Rutgers University-New Brunswick, Piscataway, NJ, USA
| | - Jian Li
- Intercultural Cognitive Network, Paris, France
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
- IDG/McGovern Institute for Brain Research, Peking University, Beijing, China
| | - Cathal O'Madagain
- Intercultural Cognitive Network, Paris, France
- School of Collective Intelligence, Université Mohammed VI Polytechnique, Rabat, Morocco
| | - Joaquin Navajas
- Intercultural Cognitive Network, Paris, France
- Laboratorio de Neurociencia, Universidad Torcuato Di Tella, Buenos Aires, Argentina
- Escuela de Negocios, Universidad Torcuato Di Tella, Buenos Aires, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Gabriel Reyes
- Intercultural Cognitive Network, Paris, France
- Facultad de Psicología, Universidad del Desarrollo, Santiago de Chile, Chile
| | - Atiye Sarabi-Jamab
- Intercultural Cognitive Network, Paris, France
- School of Cognitive Sciences, Institute for Research in Fundamental Sciences, Tehran, Iran
| | - Anna Shestakova
- Intercultural Cognitive Network, Paris, France
- Centre for Cognition and Decision Making, Institute for Cognitive Neuroscience, HSE University, Moscow, Russia
| | - Bhasi Sukumaran
- Intercultural Cognitive Network, Paris, France
- Department of Clinical Psychology, SRM Medical College Hospital and Research Centre, Chennai, India
| | - Katsumi Watanabe
- Faculty of Science and Engineering, Waseda University, Tokyo, Japan
- Intercultural Cognitive Network, Paris, France
| | - Stefano Palminteri
- Human Reinforcement Learning Team, Laboratory of Cognitive and Computational Neuroscience, Paris, France.
- Intercultural Cognitive Network, Paris, France.
- Departement d'études cognitives, Ecole normale supérieure, PSL Research University, Paris, France.
| |
Collapse
|
3
|
Vollberg MC, Sander D. Hidden Reward: Affect and Its Prediction Errors as Windows Into Subjective Value. CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE 2024; 33:93-99. [PMID: 38562909 PMCID: PMC10981566 DOI: 10.1177/09637214231217678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Scientists increasingly apply concepts from reinforcement learning to affect, but which concepts should apply? And what can their application reveal that we cannot know from directly observable states? An important reinforcement learning concept is the difference between reward expectations and outcomes. Such reward prediction errors have become foundational to research on adaptive behavior in humans, animals, and machines. Owing to historical focus on animal models and observable reward (e.g., food or money), however, relatively little attention has been paid to the fact that humans can additionally report correspondingly expected and experienced affect (e.g., feelings). Reflecting a broader "rise of affectivism," attention has started to shift, revealing explanatory power of expected and experienced feelings-including prediction errors-above and beyond observable reward. We propose that applying concepts from reinforcement learning to affect holds promise for elucidating subjective value. Simultaneously, we urge scientists to test-rather than inherit-concepts that may not apply directly.
Collapse
Affiliation(s)
- Marius C Vollberg
- Department of Psychology, University of Amsterdam
- Swiss Center for Affective Sciences, University of Geneva
- Department of Psychology, FPSE, University of Geneva
| | - David Sander
- Swiss Center for Affective Sciences, University of Geneva
- Department of Psychology, FPSE, University of Geneva
| |
Collapse
|
4
|
Heimer O, Hertz U. The spread of affective and semantic valence representations across states. Cognition 2024; 244:105714. [PMID: 38176154 DOI: 10.1016/j.cognition.2023.105714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 12/22/2023] [Accepted: 12/24/2023] [Indexed: 01/06/2024]
Abstract
In many decision problems, outcomes are not reached after a single action but rather after a series of events or states. To optimize decisions over multiple states, representations of how good or bad the outcomes are, that is, the outcomes' valence, should spread across states. One mechanism for valence spreading is a temporal, state-independent process in which a single valence representation is updated when an outcome is experienced and fades away afterwards. Each state's valence is based on its temporal proximity to the experienced outcome. An alternative, state-dependent mechanism relies on the structure of transitions between states, updating a separate valence representation for each state according to its spatial distance from the outcomes. We examined how these mechanistic accounts shape the spread of two formats of valence representation, feelings (affective valence) and knowledge (semantic valence), between states. In two pre-registered experiments (N = 585), we used a novel task in which participants move in a four-state maze, one of which contains an outcome. The participants provide self-reports of affective and semantic valence throughout the maze and after finishing it. Results show that the affective representation of negative valence is more localized in state-space than the semantic representation. We also found evidence for the relative reliance of the affective valence on a temporal, state-independent mechanism and of the semantic valence on a structured, state-dependent mechanism. Our findings provide mechanistic accounts for the differences between affective and semantic valence representations and indicate how such representations may play a role in associative learning and decision-making.
Collapse
Affiliation(s)
- Orit Heimer
- Department of Psychology, University of Haifa, Haifa, Israel.
| | - Uri Hertz
- Department of Cognitive Sciences, University of Haifa, Haifa, Israel
| |
Collapse
|
5
|
Wise T, Emery K, Radulescu A. Naturalistic reinforcement learning. Trends Cogn Sci 2024; 28:144-158. [PMID: 37777463 PMCID: PMC10878983 DOI: 10.1016/j.tics.2023.08.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 08/23/2023] [Accepted: 08/24/2023] [Indexed: 10/02/2023]
Abstract
Humans possess a remarkable ability to make decisions within real-world environments that are expansive, complex, and multidimensional. Human cognitive computational neuroscience has sought to exploit reinforcement learning (RL) as a framework within which to explain human decision-making, often focusing on constrained, artificial experimental tasks. In this article, we review recent efforts that use naturalistic approaches to determine how humans make decisions in complex environments that better approximate the real world, providing a clearer picture of how humans navigate the challenges posed by real-world decisions. These studies purposely embed elements of naturalistic complexity within experimental paradigms, rather than focusing on simplification, generating insights into the processes that likely underpin humans' ability to navigate complex, multidimensional real-world environments so successfully.
Collapse
Affiliation(s)
- Toby Wise
- Department of Neuroimaging, King's College London, London, UK.
| | - Kara Emery
- Center for Data Science, New York University, New York, NY, USA
| | - Angela Radulescu
- Center for Computational Psychiatry, Icahn School of Medicine at Mt. Sinai, New York, NY, USA
| |
Collapse
|
6
|
Valdebenito-Oyarzo G, Martínez-Molina MP, Soto-Icaza P, Zamorano F, Figueroa-Vargas A, Larraín-Valenzuela J, Stecher X, Salinas C, Bastin J, Valero-Cabré A, Polania R, Billeke P. The parietal cortex has a causal role in ambiguity computations in humans. PLoS Biol 2024; 22:e3002452. [PMID: 38198502 PMCID: PMC10824459 DOI: 10.1371/journal.pbio.3002452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 01/23/2024] [Accepted: 11/28/2023] [Indexed: 01/12/2024] Open
Abstract
Humans often face the challenge of making decisions between ambiguous options. The level of ambiguity in decision-making has been linked to activity in the parietal cortex, but its exact computational role remains elusive. To test the hypothesis that the parietal cortex plays a causal role in computing ambiguous probabilities, we conducted consecutive fMRI and TMS-EEG studies. We found that participants assigned unknown probabilities to objective probabilities, elevating the uncertainty of their decisions. Parietal cortex activity correlated with the objective degree of ambiguity and with a process that underestimates the uncertainty during decision-making. Conversely, the midcingulate cortex (MCC) encodes prediction errors and increases its connectivity with the parietal cortex during outcome processing. Disruption of the parietal activity increased the uncertainty evaluation of the options, decreasing cingulate cortex oscillations during outcome evaluation and lateral frontal oscillations related to value ambiguous probability. These results provide evidence for a causal role of the parietal cortex in computing uncertainty during ambiguous decisions made by humans.
Collapse
Affiliation(s)
- Gabriela Valdebenito-Oyarzo
- Laboratorio de Neurociencia Social y Neuromodulación, Centro de Investigación en Complejidad Social, (neuroCICS), Facultad de Gobierno, Universidad del Desarrollo, Santiago, Chile
| | - María Paz Martínez-Molina
- Laboratorio de Neurociencia Social y Neuromodulación, Centro de Investigación en Complejidad Social, (neuroCICS), Facultad de Gobierno, Universidad del Desarrollo, Santiago, Chile
| | - Patricia Soto-Icaza
- Laboratorio de Neurociencia Social y Neuromodulación, Centro de Investigación en Complejidad Social, (neuroCICS), Facultad de Gobierno, Universidad del Desarrollo, Santiago, Chile
| | - Francisco Zamorano
- Unidad de Neuroimágenes Cuantitativas avanzadas (UNICA), Departamento de Imágenes, Clínica Alemana de Santiago, Santiago, Chile
- Facultad de Ciencias para el Cuidado de la Salud, Campus Los Leones, Universidad San Sebastián, Santiago, Chile
| | - Alejandra Figueroa-Vargas
- Laboratorio de Neurociencia Social y Neuromodulación, Centro de Investigación en Complejidad Social, (neuroCICS), Facultad de Gobierno, Universidad del Desarrollo, Santiago, Chile
| | - Josefina Larraín-Valenzuela
- Laboratorio de Neurociencia Social y Neuromodulación, Centro de Investigación en Complejidad Social, (neuroCICS), Facultad de Gobierno, Universidad del Desarrollo, Santiago, Chile
| | - Ximena Stecher
- Unidad de Neuroimágenes Cuantitativas avanzadas (UNICA), Departamento de Imágenes, Clínica Alemana de Santiago, Santiago, Chile
| | - César Salinas
- Unidad de Neuroimágenes Cuantitativas avanzadas (UNICA), Departamento de Imágenes, Clínica Alemana de Santiago, Santiago, Chile
| | - Julien Bastin
- Univ. Grenoble Alpes, Inserm, U1216, Grenoble Institut Neurosciences, Grenoble, France
| | - Antoni Valero-Cabré
- Causal Dynamics, Plasticity and Rehabilitation Group, FRONTLAB team, Institut du Cerveau et de la Moelle Epinière (ICM), CNRS UMR 7225, INSERM U 1127 and Sorbonne Université, Paris, France
- Cognitive Neuroscience and Information Technology Research Program, Open University of Catalonia (UOC), Barcelona, Spain
- Laboratory for Cerebral Dynamics Plasticity and Rehabilitation, Boston University, School of Medicine, Boston, Massachusetts, United States of America
| | - Rafael Polania
- Decision Neuroscience Lab, Department of Health Sciences and Technology, ETH Zurich, Zurich, Switzerland
| | - Pablo Billeke
- Laboratorio de Neurociencia Social y Neuromodulación, Centro de Investigación en Complejidad Social, (neuroCICS), Facultad de Gobierno, Universidad del Desarrollo, Santiago, Chile
| |
Collapse
|
7
|
Molinaro G, Collins AGE. A goal-centric outlook on learning. Trends Cogn Sci 2023; 27:1150-1164. [PMID: 37696690 DOI: 10.1016/j.tics.2023.08.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 08/11/2023] [Accepted: 08/14/2023] [Indexed: 09/13/2023]
Abstract
Goals play a central role in human cognition. However, computational theories of learning and decision-making often take goals as given. Here, we review key empirical findings showing that goals shape the representations of inputs, responses, and outcomes, such that setting a goal crucially influences the central aspects of any learning process: states, actions, and rewards. We thus argue that studying goal selection is essential to advance our understanding of learning. By following existing literature in framing goal selection within a hierarchy of decision-making problems, we synthesize important findings on the principles underlying goal value attribution and exploration strategies. Ultimately, we propose that a goal-centric perspective will help develop more complete accounts of learning in both biological and artificial agents.
Collapse
Affiliation(s)
- Gaia Molinaro
- Department of Psychology, University of California, Berkeley, Berkeley, CA, USA.
| | - Anne G E Collins
- Department of Psychology, University of California, Berkeley, Berkeley, CA, USA; Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, USA
| |
Collapse
|
8
|
Dulberg Z, Dubey R, Berwian IM, Cohen JD. Having multiple selves helps learning agents explore and adapt in complex changing worlds. Proc Natl Acad Sci U S A 2023; 120:e2221180120. [PMID: 37399387 PMCID: PMC10334746 DOI: 10.1073/pnas.2221180120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 05/09/2023] [Indexed: 07/05/2023] Open
Abstract
Satisfying a variety of conflicting needs in a changing environment is a fundamental challenge for any adaptive agent. Here, we show that designing an agent in a modular fashion as a collection of subagents, each dedicated to a separate need, powerfully enhanced the agent's capacity to satisfy its overall needs. We used the formalism of deep reinforcement learning to investigate a biologically relevant multiobjective task: continually maintaining homeostasis of a set of physiologic variables. We then conducted simulations in a variety of environments and compared how modular agents performed relative to standard monolithic agents (i.e., agents that aimed to satisfy all needs in an integrated manner using a single aggregate measure of success). Simulations revealed that modular agents a) exhibited a form of exploration that was intrinsic and emergent rather than extrinsically imposed; b) were robust to changes in nonstationary environments, and c) scaled gracefully in their ability to maintain homeostasis as the number of conflicting objectives increased. Supporting analysis suggested that the robustness to changing environments and increasing numbers of needs were due to intrinsic exploration and efficiency of representation afforded by the modular architecture. These results suggest that the normative principles by which agents have adapted to complex changing environments may also explain why humans have long been described as consisting of "multiple selves."
Collapse
Affiliation(s)
- Zack Dulberg
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ08544
| | - Rachit Dubey
- Department of Computer Science, Princeton University, Princeton, NJ08544
| | - Isabel M. Berwian
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ08544
| | - Jonathan D. Cohen
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ08544
| |
Collapse
|
9
|
Molinaro G, Collins AGE. Intrinsic rewards explain context-sensitive valuation in reinforcement learning. PLoS Biol 2023; 21:e3002201. [PMID: 37459394 PMCID: PMC10374061 DOI: 10.1371/journal.pbio.3002201] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Revised: 07/27/2023] [Accepted: 06/15/2023] [Indexed: 07/28/2023] Open
Abstract
When observing the outcome of a choice, people are sensitive to the choice's context, such that the experienced value of an option depends on the alternatives: getting $1 when the possibilities were 0 or 1 feels much better than when the possibilities were 1 or 10. Context-sensitive valuation has been documented within reinforcement learning (RL) tasks, in which values are learned from experience through trial and error. Range adaptation, wherein options are rescaled according to the range of values yielded by available options, has been proposed to account for this phenomenon. However, we propose that other mechanisms-reflecting a different theoretical viewpoint-may also explain this phenomenon. Specifically, we theorize that internally defined goals play a crucial role in shaping the subjective value attributed to any given option. Motivated by this theory, we develop a new "intrinsically enhanced" RL model, which combines extrinsically provided rewards with internally generated signals of goal achievement as a teaching signal. Across 7 different studies (including previously published data sets as well as a novel, preregistered experiment with replication and control studies), we show that the intrinsically enhanced model can explain context-sensitive valuation as well as, or better than, range adaptation. Our findings indicate a more prominent role of intrinsic, goal-dependent rewards than previously recognized within formal models of human RL. By integrating internally generated signals of reward, standard RL theories should better account for human behavior, including context-sensitive valuation and beyond.
Collapse
Affiliation(s)
- Gaia Molinaro
- Department of Psychology, University of California, Berkeley, Berkeley, California, United States of America
| | - Anne G E Collins
- Department of Psychology, University of California, Berkeley, Berkeley, California, United States of America
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, California, United States of America
| |
Collapse
|
10
|
Prilutski Y, Livneh Y. Physiological Needs: Sensations and Predictions in the Insular Cortex. Physiology (Bethesda) 2023; 38:0. [PMID: 36040864 DOI: 10.1152/physiol.00019.2022] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Physiological needs create powerful motivations (e.g., thirst and hunger). Studies in humans and animal models have implicated the insular cortex in the neural regulation of physiological needs and need-driven behavior. We review prominent mechanistic models of how the insular cortex might achieve this regulation and present a conceptual and analytical framework for testing these models in healthy and pathological conditions.
Collapse
Affiliation(s)
- Yael Prilutski
- Department of Brain Sciences, Weizmann Institute of Science, Rehovot, Israel
| | - Yoav Livneh
- Department of Brain Sciences, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
11
|
Mannella F, Tummolini L. Kick-starting concept formation with intrinsically motivated learning: the grounding by competence acquisition hypothesis. Philos Trans R Soc Lond B Biol Sci 2023; 378:20210370. [PMID: 36571135 PMCID: PMC9791488 DOI: 10.1098/rstb.2021.0370] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Although the spontaneous origins of concepts from interaction is often given for granted, how the process can start without a fully developed sensorimotor representation system has not been sufficiently explored. Here, we offer a new hypothesis for a mechanism supporting concept formation while learning to perceive and act intentionally. We specify an architecture in which multi-modal sensory patterns are mapped in the same lower-dimensional representation space. The motor repertoire is also represented in the same space via topological mapping. We posit that the acquisition of these mappings can be mutually constrained by maximizing the convergence between sensory and motor representations during online interaction. This learning signal reflects an intrinsic motivation of competence acquisition. We propose that topological alignment via competence acquisition eventually results in a sensorimotor representation system. To assess the consistency of this hypothesis, we develop a computational model and test it in an object manipulation task. Results show that such an intrinsically motivated learning process can create a cross-modal categorization system with semantic content, which supports perception and intentional action selection, which has the resources to re-enact its own multi-modal experiences, and, on this basis, to kick-start the formation of concepts grounded in the external environment. This article is part of the theme issue 'Concepts in interaction: social engagement and inner experiences'.
Collapse
Affiliation(s)
- Francesco Mannella
- Institute of Cognitive Sciences and Technologies, CNR, 00185, Rome, Italy
| | - Luca Tummolini
- Institute of Cognitive Sciences and Technologies, CNR, 00185, Rome, Italy,Institute for Future Studies, IFFS, Box 591, 101 31, Stockholm, Sweden
| |
Collapse
|
12
|
Emanuel A, Eldar E. Emotions as computations. Neurosci Biobehav Rev 2023; 144:104977. [PMID: 36435390 PMCID: PMC9805532 DOI: 10.1016/j.neubiorev.2022.104977] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 10/26/2022] [Accepted: 11/22/2022] [Indexed: 11/26/2022]
Abstract
Emotions ubiquitously impact action, learning, and perception, yet their essence and role remain widely debated. Computational accounts of emotion aspire to answer these questions with greater conceptual precision informed by normative principles and neurobiological data. We examine recent progress in this regard and find that emotions may implement three classes of computations, which serve to evaluate states, actions, and uncertain prospects. For each of these, we use the formalism of reinforcement learning to offer a new formulation that better accounts for existing evidence. We then consider how these distinct computations may map onto distinct emotions and moods. Integrating extensive research on the causes and consequences of different emotions suggests a parsimonious one-to-one mapping, according to which emotions are integral to how we evaluate outcomes (pleasure & pain), learn to predict them (happiness & sadness), use them to inform our (frustration & content) and others' (anger & gratitude) actions, and plan in order to realize (desire & hope) or avoid (fear & anxiety) uncertain outcomes.
Collapse
Affiliation(s)
- Aviv Emanuel
- Department of Psychology, Hebrew University of Jerusalem, Jerusalem 9190501, Israel; Department of Cognitive and Brain Sciences, Hebrew University of Jerusalem, Jerusalem 9190501, Israel.
| | - Eran Eldar
- Department of Psychology, Hebrew University of Jerusalem, Jerusalem 9190501, Israel; Department of Cognitive and Brain Sciences, Hebrew University of Jerusalem, Jerusalem 9190501, Israel.
| |
Collapse
|
13
|
De Martino B, Cortese A. Goals, usefulness and abstraction in value-based choice. Trends Cogn Sci 2023; 27:65-80. [PMID: 36446707 DOI: 10.1016/j.tics.2022.11.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 10/26/2022] [Accepted: 11/01/2022] [Indexed: 11/27/2022]
Abstract
Colombian drug lord Pablo Escobar, while on the run, purportedly burned two million dollars in banknotes to keep his daughter warm. A stark reminder that, in life, circumstances and goals can quickly change, forcing us to reassess and modify our values on-the-fly. Studies in decision-making and neuroeconomics have often implicitly equated value to reward, emphasising the hedonic and automatic aspect of the value computation, while overlooking its functional (concept-like) nature. Here we outline the computational and biological principles that enable the brain to compute the usefulness of an option or action by creating abstractions that flexibly adapt to changing goals. We present different algorithmic architectures, comparing ideas from artificial intelligence (AI) and cognitive neuroscience with psychological theories and, when possible, drawing parallels.
Collapse
Affiliation(s)
- Benedetto De Martino
- Institute of Cognitive Neuroscience, University College London, London WC1N 3AZ, UK; Computational Neuroscience Laboratories, ATR Institute International, 619-0288 Kyoto, Japan.
| | - Aurelio Cortese
- Institute of Cognitive Neuroscience, University College London, London WC1N 3AZ, UK; Computational Neuroscience Laboratories, ATR Institute International, 619-0288 Kyoto, Japan.
| |
Collapse
|
14
|
Weiß M, Iotzov V, Zhou Y, Hein G. The bright and dark sides of egoism. Front Psychiatry 2022; 13:1054065. [PMID: 36506436 PMCID: PMC9729783 DOI: 10.3389/fpsyt.2022.1054065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 11/01/2022] [Indexed: 11/25/2022] Open
Abstract
Despite its negative reputation, egoism - the excessive concern for one's own welfare - can incite prosocial behavior. So far, however, egoism-based prosociality has received little attention. Here, we first provide an overview of the conditions under which egoism turns into a prosocial motive, review the benefits and limitations of egoism-based prosociality, and compare them with empathy-driven prosocial behavior. Second, we summarize studies investigating the neural processing of egoism-based prosocial decisions, studies investigating the neural processing of empathy-based prosocial decisions, and the small number of studies that compared the neural processing of prosocial decisions elicited by the different motives. We conclude that there is evidence for differential neural networks involved in egoism and empathy-based prosocial decisions. However, this evidence is not yet conclusive, because it is mainly based on the comparison of different experimental paradigms which may exaggerate or overshadow the effect of the different motivational states. Finally, we propose paradigms and research questions that should be tackled in future research that could help to specify how egoism can be used to enhance other prosocial behavior and motivation, and the how it could be tamed.
Collapse
Affiliation(s)
- Martin Weiß
- Translational Social Neuroscience Unit, Department of Psychiatry, Center of Mental Health, Psychosomatic and Psychotherapy, University of Würzburg, Würzburg, Germany
| | | | | | | |
Collapse
|
15
|
Pearce AL, Fuchs BA, Keller KL. The role of reinforcement learning and value-based decision-making frameworks in understanding food choice and eating behaviors. Front Nutr 2022; 9:1021868. [PMID: 36483928 PMCID: PMC9722736 DOI: 10.3389/fnut.2022.1021868] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 11/04/2022] [Indexed: 11/23/2022] Open
Abstract
The obesogenic food environment includes easy access to highly-palatable, energy-dense, "ultra-processed" foods that are heavily marketed to consumers; therefore, it is critical to understand the neurocognitive processes the underlie overeating in response to environmental food-cues (e.g., food images, food branding/advertisements). Eating habits are learned through reinforcement, which is the process through which environmental food cues become valued and influence behavior. This process is supported by multiple behavioral control systems (e.g., Pavlovian, Habitual, Goal-Directed). Therefore, using neurocognitive frameworks for reinforcement learning and value-based decision-making can improve our understanding of food-choice and eating behaviors. Specifically, the role of reinforcement learning in eating behaviors was considered using the frameworks of (1) Sign-versus Goal-Tracking Phenotypes; (2) Model-Free versus Model-Based; and (3) the Utility or Value-Based Model. The sign-and goal-tracking phenotypes may contribute a mechanistic insight on the role of food-cue incentive salience in two prevailing models of overconsumption-the Extended Behavioral Susceptibility Theory and the Reactivity to Embedded Food Cues in Advertising Model. Similarly, the model-free versus model-based framework may contribute insight to the Extended Behavioral Susceptibility Theory and the Healthy Food Promotion Model. Finally, the value-based model provides a framework for understanding how all three learning systems are integrated to influence food choice. Together, these frameworks can provide mechanistic insight to existing models of food choice and overconsumption and may contribute to the development of future prevention and treatment efforts.
Collapse
Affiliation(s)
- Alaina L. Pearce
- Social Science Research Institute, Pennsylvania State University, University Park, PA, United States
- Department of Nutritional Sciences, Pennsylvania State University, University Park, PA, United States
| | - Bari A. Fuchs
- Department of Nutritional Sciences, Pennsylvania State University, University Park, PA, United States
| | - Kathleen L. Keller
- Social Science Research Institute, Pennsylvania State University, University Park, PA, United States
- Department of Nutritional Sciences, Pennsylvania State University, University Park, PA, United States
- Department of Food Science, Pennsylvania State University, University Park, PA, United States
| |
Collapse
|
16
|
Prystawski B, Mohnert F, Tošić M, Lieder F. Resource-rational Models of Human Goal Pursuit. Top Cogn Sci 2022; 14:528-549. [PMID: 34435728 DOI: 10.1111/tops.12562] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Revised: 06/19/2021] [Accepted: 06/23/2021] [Indexed: 11/30/2022]
Abstract
Goal-directed behavior is a deeply important part of human psychology. People constantly set goals for themselves and pursue them in many domains of life. In this paper, we develop computational models that characterize how humans pursue goals in a complex dynamic environment and test how well they describe human behavior in an experiment. Our models are motivated by the principle of resource rationality and draw upon psychological insights about people's limited attention and planning capacities. We find that human goal pursuit is qualitatively different and substantially less efficient than optimal goal pursuit in our simulated environment. Models of goal pursuit based on the principle of resource rationality capture human behavior better than both a model of optimal goal pursuit and heuristics that are not resource-rational. We conclude that the way humans pursue goals is shaped by the need to achieve goals effectively as well as cognitive costs and constraints on planning and attention. Our findings are an important step toward understanding humans' goal pursuit as cognitive limitations play a crucial role in shaping people's goal-directed behavior.
Collapse
Affiliation(s)
- Ben Prystawski
- Max Planck Institute for Intelligent Systems, Tübingen
- Department of Computer Science, Cognitive Science Program, University of Toronto
| | | | - Mateo Tošić
- Max Planck Institute for Intelligent Systems, Tübingen
| | - Falk Lieder
- Max Planck Institute for Intelligent Systems, Tübingen
| |
Collapse
|
17
|
Lubianiker N, Paret C, Dayan P, Hendler T. Neurofeedback through the lens of reinforcement learning. Trends Neurosci 2022; 45:579-593. [PMID: 35550813 DOI: 10.1016/j.tins.2022.03.008] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 02/11/2022] [Accepted: 03/24/2022] [Indexed: 11/29/2022]
Abstract
Despite decades of experimental and clinical practice, the neuropsychological mechanisms underlying neurofeedback (NF) training remain obscure. NF is a unique form of reinforcement learning (RL) task, during which participants are provided with rewarding feedback regarding desired changes in neural patterns. However, key RL considerations - including choices during practice, prediction errors, credit-assignment problems, or the exploration-exploitation tradeoff - have infrequently been considered in the context of NF. We offer an RL-based framework for NF, describing different internal states, actions, and rewards in common NF protocols, thus fashioning new proposals for characterizing, predicting, and hastening the course of learning. In this way we hope to advance current understanding of neural regulation via NF, and ultimately to promote its effectiveness, personalization, and clinical utility.
Collapse
Affiliation(s)
- Nitzan Lubianiker
- School of Psychological Sciences, Gershon H. Gordon Faculty of Social Sciences, Tel Aviv University, Tel Aviv, Israel; Sagol Brain Institute, Wohl Institute for Advanced Imaging, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel.
| | - Christian Paret
- School of Psychological Sciences, Gershon H. Gordon Faculty of Social Sciences, Tel Aviv University, Tel Aviv, Israel; Sagol Brain Institute, Wohl Institute for Advanced Imaging, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel; Department of Psychosomatic Medicine and Psychotherapy, Central Institute of Mental Health Mannheim, Medical Faculty Mannheim/Heidelberg University, Mannheim, Germany
| | - Peter Dayan
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany; University of Tübingen, Tübingen, Germany
| | - Talma Hendler
- School of Psychological Sciences, Gershon H. Gordon Faculty of Social Sciences, Tel Aviv University, Tel Aviv, Israel; Sagol Brain Institute, Wohl Institute for Advanced Imaging, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel; Sagol school of Neuroscience, Tel Aviv University, Tel Aviv, Israel; Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
| |
Collapse
|
18
|
Cleeremans A, Tallon-Baudry C. Consciousness matters: phenomenal experience has functional value. Neurosci Conscious 2022; 2022:niac007. [PMID: 35479522 PMCID: PMC9036654 DOI: 10.1093/nc/niac007] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Revised: 02/04/2022] [Accepted: 03/14/2022] [Indexed: 11/18/2022] Open
Abstract
‘Why would we do anything at all if the doing was not doing something to us?’ In other words: What is consciousness good for? Here, reversing classical views, according to many of which subjective experience is a mere epiphenomenon that affords no functional advantage, we propose that subject-level experience—‘What it feels like’—is endowed with intrinsic value, and it is precisely the value agents associate with their experiences that explains why they do certain things and avoid others. Because experiences have value and guide behaviour, consciousness has a function. Under this hypothesis of ‘phenomenal worthiness’, we argue that it is only in virtue of the fact that conscious agents ‘experience’ things and ‘care’ about those experiences that they are ‘motivated’ to act in certain ways and that they ‘prefer’ some states of affairs vs. others. Overviewing how the concept of value has been approached in decision-making, emotion research and consciousness research, we argue that phenomenal consciousness has intrinsic value and conclude that if this is indeed the case, then it must have a function. Phenomenal experience might act as a mental currency of sorts, which not only endows conscious mental states with intrinsic value but also makes it possible for conscious agents to compare vastly different experiences in a common subject-centred space—a feature that readily explains the fact that consciousness is ‘unified’. The phenomenal worthiness hypothesis, in turn, makes the ‘hard problem’ of consciousness more tractable, since it can then be reduced to a problem about function.
Collapse
Affiliation(s)
- Axel Cleeremans
- Consciousness, Cognition & Computation Group, Center for Research in Cognition & Neuroscience, ULB Neuroscience Institute, Université libre de Bruxelles, Brussels, Belgium
| | - Catherine Tallon-Baudry
- Cognitive and Computational Neuroscience Laboratory, Inserm, École Normale Supérieure—PSL University, Paris, France
| |
Collapse
|
19
|
Watts AG, Kanoski SE, Sanchez-Watts G, Langhans W. The physiological control of eating: signals, neurons, and networks. Physiol Rev 2022; 102:689-813. [PMID: 34486393 PMCID: PMC8759974 DOI: 10.1152/physrev.00028.2020] [Citation(s) in RCA: 53] [Impact Index Per Article: 26.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Accepted: 08/30/2021] [Indexed: 02/07/2023] Open
Abstract
During the past 30 yr, investigating the physiology of eating behaviors has generated a truly vast literature. This is fueled in part by a dramatic increase in obesity and its comorbidities that has coincided with an ever increasing sophistication of genetically based manipulations. These techniques have produced results with a remarkable degree of cell specificity, particularly at the cell signaling level, and have played a lead role in advancing the field. However, putting these findings into a brain-wide context that connects physiological signals and neurons to behavior and somatic physiology requires a thorough consideration of neuronal connections: a field that has also seen an extraordinary technological revolution. Our goal is to present a comprehensive and balanced assessment of how physiological signals associated with energy homeostasis interact at many brain levels to control eating behaviors. A major theme is that these signals engage sets of interacting neural networks throughout the brain that are defined by specific neural connections. We begin by discussing some fundamental concepts, including ones that still engender vigorous debate, that provide the necessary frameworks for understanding how the brain controls meal initiation and termination. These include key word definitions, ATP availability as the pivotal regulated variable in energy homeostasis, neuropeptide signaling, homeostatic and hedonic eating, and meal structure. Within this context, we discuss network models of how key regions in the endbrain (or telencephalon), hypothalamus, hindbrain, medulla, vagus nerve, and spinal cord work together with the gastrointestinal tract to enable the complex motor events that permit animals to eat in diverse situations.
Collapse
Affiliation(s)
- Alan G Watts
- The Department of Biological Sciences, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California
| | - Scott E Kanoski
- The Department of Biological Sciences, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California
| | - Graciela Sanchez-Watts
- The Department of Biological Sciences, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California
| | - Wolfgang Langhans
- Physiology and Behavior Laboratory, Eidgenössische Technische Hochschule-Zürich, Schwerzenbach, Switzerland
| |
Collapse
|
20
|
Fine JM, Hayden BY. The whole prefrontal cortex is premotor cortex. Philos Trans R Soc Lond B Biol Sci 2022; 377:20200524. [PMID: 34957853 PMCID: PMC8710885 DOI: 10.1098/rstb.2020.0524] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Accepted: 10/01/2021] [Indexed: 11/12/2022] Open
Abstract
We propose that the entirety of the prefrontal cortex (PFC) can be seen as fundamentally premotor in nature. By this, we mean that the PFC consists of an action abstraction hierarchy whose core function is the potentiation and depotentiation of possible action plans at different levels of granularity. We argue that the apex of the hierarchy should revolve around the process of goal-selection, which we posit is inherently a form of optimization over action abstraction. Anatomical and functional evidence supports the idea that this hierarchy originates on the orbital surface of the brain and extends dorsally to motor cortex. Accordingly, our viewpoint positions the orbitofrontal cortex in a key role in the optimization of goal-selection policies, and suggests that its other proposed roles are aspects of this more general function. Our proposed perspective will reframe outstanding questions, open up new areas of inquiry and align theories of prefrontal function with evolutionary principles. This article is part of the theme issue 'Systems neuroscience through the lens of evolutionary theory'.
Collapse
Affiliation(s)
- Justin M. Fine
- Department of Neuroscience, Center for Magnetic Resonance Research, University of Minnesota, Minneapolis, MN 55455, USA
- Department of Biomedical Engineering, University of Minnesota, Minneapolis, MN 55455, USA
| | - Benjamin Y. Hayden
- Department of Neuroscience, Center for Magnetic Resonance Research, University of Minnesota, Minneapolis, MN 55455, USA
- Department of Biomedical Engineering, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
21
|
Collins AGE, Shenhav A. Advances in modeling learning and decision-making in neuroscience. Neuropsychopharmacology 2022; 47:104-118. [PMID: 34453117 PMCID: PMC8617262 DOI: 10.1038/s41386-021-01126-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Revised: 07/14/2021] [Accepted: 07/22/2021] [Indexed: 02/07/2023]
Abstract
An organism's survival depends on its ability to learn about its environment and to make adaptive decisions in the service of achieving the best possible outcomes in that environment. To study the neural circuits that support these functions, researchers have increasingly relied on models that formalize the computations required to carry them out. Here, we review the recent history of computational modeling of learning and decision-making, and how these models have been used to advance understanding of prefrontal cortex function. We discuss how such models have advanced from their origins in basic algorithms of updating and action selection to increasingly account for complexities in the cognitive processes required for learning and decision-making, and the representations over which they operate. We further discuss how a deeper understanding of the real-world complexities in these computations has shed light on the fundamental constraints on optimal behavior, and on the complex interactions between corticostriatal pathways to determine such behavior. The continuing and rapid development of these models holds great promise for understanding the mechanisms by which animals adapt to their environments, and what leads to maladaptive forms of learning and decision-making within clinical populations.
Collapse
Affiliation(s)
- Anne G E Collins
- Department of Psychology and Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, USA.
| | - Amitai Shenhav
- Department of Cognitive, Linguistic, & Psychological Sciences and Carney Institute for Brain Science, Brown University, Providence, RI, USA.
| |
Collapse
|
22
|
A Computational View on the Nature of Reward and Value in Anhedonia. Curr Top Behav Neurosci 2021; 58:421-441. [PMID: 34935117 DOI: 10.1007/7854_2021_290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Anhedonia - a common feature of depression and other neuropsychiatric disorders - encompasses a reduction in the subjective experience and anticipation of rewarding events, and a reduction in the motivation to seek out such events. The presence of anhedonia often predicts or accompanies treatment resistance, and as such better interventions and treatments are important. Yet the mechanisms giving rise to anhedonia are not well understood. In this chapter, we briefly review existing computational conceptualisations of anhedonia. We argue that they are mostly descriptive and fail to provide an explanatory account of why anhedonia may occur. Working within the framework of reinforcement learning, we examine two potential computational mechanisms that could give rise to anhedonic phenomena. First, we show how anhedonia can arise in multi-dimensional drive-reduction settings through a trade-off between different rewards or needs. We then generalise this in terms of model-based value inference and identify a key role for associational belief structure. We close with a brief discussion of treatment implications of both of these conceptualisations. In summary, computational accounts of anhedonia have provided a useful descriptive framework. Recent advances in reinforcement learning suggest promising avenues by which the mechanisms underlying anhedonia may be teased apart, potentially motivating novel approaches to treatment.
Collapse
|
23
|
FeldmanHall O, Nassar MR. The computational challenge of social learning. Trends Cogn Sci 2021; 25:1045-1057. [PMID: 34583876 PMCID: PMC8585698 DOI: 10.1016/j.tics.2021.09.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 08/31/2021] [Accepted: 09/01/2021] [Indexed: 10/20/2022]
Abstract
The complex reward structure of the social world and the uncertainty endemic to social contexts poses a challenge for modeling. For example, during social interactions, the actions of one person influence the internal states of another. These social dependencies make it difficult to formalize social learning problems in a mathematically tractable way. While it is tempting to dispense with these complexities, they are a defining feature of social life. Because the structure of social interactions challenges the simplifying assumptions often made in models, they make an ideal testbed for computational models of cognition. By adopting a framework that embeds existing social knowledge into the model, we can go beyond explaining behaviors in laboratory tasks to explaining those observed in the wild.
Collapse
Affiliation(s)
- Oriel FeldmanHall
- Department of Cognitive, Linguistic, and Psychological Sciences, Brown University, Providence, RI 02912, USA; Carney Institute for Brain Sciences, Brown University, Providence, RI 02912, USA.
| | - Matthew R Nassar
- Carney Institute for Brain Sciences, Brown University, Providence, RI 02912, USA; Department of Neuroscience, Brown University, Providence, RI 02912, USA
| |
Collapse
|
24
|
McDougle SD, Ballard IC, Baribault B, Bishop SJ, Collins AGE. Executive Function Assigns Value to Novel Goal-Congruent Outcomes. Cereb Cortex 2021; 32:231-247. [PMID: 34231854 PMCID: PMC8634563 DOI: 10.1093/cercor/bhab205] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 05/10/2021] [Accepted: 06/04/2021] [Indexed: 11/14/2022] Open
Abstract
People often learn from the outcomes of their actions, even when these outcomes do not involve material rewards or punishments. How does our brain provide this flexibility? We combined behavior, computational modeling, and functional neuroimaging to probe whether learning from abstract novel outcomes harnesses the same circuitry that supports learning from familiar secondary reinforcers. Behavior and neuroimaging revealed that novel images can act as a substitute for rewards during instrumental learning, producing reliable reward-like signals in dopaminergic circuits. Moreover, we found evidence that prefrontal correlates of executive control may play a role in shaping flexible responses in reward circuits. These results suggest that learning from novel outcomes is supported by an interplay between high-level representations in prefrontal cortex and low-level responses in subcortical reward circuits. This interaction may allow for human reinforcement learning over arbitrarily abstract reward functions.
Collapse
Affiliation(s)
| | - Ian C Ballard
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA 94720, USA
| | - Beth Baribault
- Department of Psychology, University of California, Berkeley, CA 94704, USA
| | - Sonia J Bishop
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA 94720, USA
- Department of Psychology, University of California, Berkeley, CA 94704, USA
| | - Anne G E Collins
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA 94720, USA
- Department of Psychology, University of California, Berkeley, CA 94704, USA
| |
Collapse
|
25
|
Livneh Y, Andermann ML. Cellular activity in insular cortex across seconds to hours: Sensations and predictions of bodily states. Neuron 2021; 109:3576-3593. [PMID: 34582784 PMCID: PMC8602715 DOI: 10.1016/j.neuron.2021.08.036] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 08/17/2021] [Accepted: 08/26/2021] [Indexed: 02/09/2023]
Abstract
Our wellness relies on continuous interactions between our brain and body: different organs relay their current state to the brain and are regulated, in turn, by descending visceromotor commands from our brain and by actions such as eating, drinking, thermotaxis, and predator escape. Human neuroimaging and theoretical studies suggest a key role for predictive processing by insular cortex in guiding these efforts to maintain bodily homeostasis. Here, we review recent studies recording and manipulating cellular activity in rodent insular cortex at timescales from seconds to hours. We argue that consideration of these findings in the context of predictive processing of future bodily states may reconcile several apparent discrepancies and offer a unifying, heuristic model for guiding future work.
Collapse
Affiliation(s)
- Yoav Livneh
- Department of Neurobiology, Weizmann Institute of Science, Rehovot 76100, Israel.
| | - Mark L Andermann
- Division of Endocrinology, Diabetes and Metabolism, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA.
| |
Collapse
|
26
|
Emotion prediction errors guide socially adaptive behaviour. Nat Hum Behav 2021; 5:1391-1401. [PMID: 34667302 PMCID: PMC8544818 DOI: 10.1038/s41562-021-01213-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Accepted: 08/24/2021] [Indexed: 11/08/2022]
Abstract
People make decisions based on deviations from expected outcomes, known as prediction errors. Past work has focused on reward prediction errors, largely ignoring violations of expected emotional experiences—emotion prediction errors. We leverage a method to measure real-time fluctuations in emotion as people decide to punish or forgive others. Across four studies (N=1,016), we reveal that emotion and reward prediction errors have distinguishable contributions to choice, such that emotion prediction errors exert the strongest impact during decision-making. We additionally find that a choice to punish or forgive can be decoded in less than a second from an evolving emotional response, suggesting emotions swiftly influence choice. Finally, individuals reporting significant levels of depression exhibit selective impairments in using emotion—but not reward—prediction errors. Evidence for emotion prediction errors potently guiding social behaviors challenge standard decision-making models that have focused solely on reward.
Collapse
|
27
|
Ghambaryan A, Gutkin B, Klucharev V, Koechlin E. Additively Combining Utilities and Beliefs: Research Gaps and Algorithmic Developments. Front Neurosci 2021; 15:704728. [PMID: 34658760 PMCID: PMC8517513 DOI: 10.3389/fnins.2021.704728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 09/13/2021] [Indexed: 11/20/2022] Open
Abstract
Value-based decision making in complex environments, such as those with uncertain and volatile mapping of reward probabilities onto options, may engender computational strategies that are not necessarily optimal in terms of normative frameworks but may ensure effective learning and behavioral flexibility in conditions of limited neural computational resources. In this article, we review a suboptimal strategy - additively combining reward magnitude and reward probability attributes of options for value-based decision making. In addition, we present computational intricacies of a recently developed model (named MIX model) representing an algorithmic implementation of the additive strategy in sequential decision-making with two options. We also discuss its opportunities; and conceptual, inferential, and generalization issues. Furthermore, we suggest future studies that will reveal the potential and serve the further development of the MIX model as a general model of value-based choice making.
Collapse
Affiliation(s)
- Anush Ghambaryan
- Centre for Cognition and Decision Making, HSE University, Moscow, Russia
- Ecole Normale Supérieure, PSL Research University, Paris, France
| | - Boris Gutkin
- Centre for Cognition and Decision Making, HSE University, Moscow, Russia
- Ecole Normale Supérieure, PSL Research University, Paris, France
| | - Vasily Klucharev
- Centre for Cognition and Decision Making, HSE University, Moscow, Russia
| | - Etienne Koechlin
- Ecole Normale Supérieure, PSL Research University, Paris, France
| |
Collapse
|
28
|
Palminteri S, Lebreton M. Context-dependent outcome encoding in human reinforcement learning. Curr Opin Behav Sci 2021. [DOI: 10.1016/j.cobeha.2021.06.006] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
29
|
Value-free reinforcement learning: policy optimization as a minimal model of operant behavior. Curr Opin Behav Sci 2021; 41:114-121. [DOI: 10.1016/j.cobeha.2021.04.020] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
30
|
Abstract
The decisions we make are shaped by a lifetime of learning. Past experience guides the way that we encode information in neural systems for perception and valuation, and determines the information we retrieve when making decisions. Distinct literatures have discussed how lifelong learning and local context shape decisions made about sensory signals, propositional information, or economic prospects. Here, we build bridges between these literatures, arguing for common principles of adaptive rationality in perception, cognition, and economic choice. We discuss how a single common framework, based on normative principles of efficient coding and Bayesian inference, can help us understand a myriad of human decision biases, including sensory illusions, adaptive aftereffects, choice history biases, central tendency effects, anchoring effects, contrast effects, framing effects, congruency effects, reference-dependent valuation, nonlinear utility functions, and discretization heuristics. We describe a simple computational framework for explaining these phenomena. Expected final online publication date for the Annual Review of Psychology, Volume 73 is January 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Christopher Summerfield
- Department of Experimental Psychology, University of Oxford, Oxford OX2 6GG, United Kingdom;
| | - Paula Parpart
- Department of Experimental Psychology, University of Oxford, Oxford OX2 6GG, United Kingdom;
| |
Collapse
|
31
|
Woo TF, Law CK, Ting KH, Chan CCH, Kolling N, Watanabe K, Chau BKH. Distinct Causal Influences of Dorsolateral Prefrontal Cortex and Posterior Parietal Cortex in Multiple-Option Decision Making. Cereb Cortex 2021; 32:1390-1404. [PMID: 34470053 DOI: 10.1093/cercor/bhab278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Revised: 07/14/2021] [Accepted: 07/15/2021] [Indexed: 11/13/2022] Open
Abstract
Our knowledge about neural mechanisms underlying decision making is largely based on experiments that involved few options. However, it is more common in daily life to choose between many options, in which processing choice information selectively is particularly important. The current study examined whether the dorsolateral prefrontal cortex (dlPFC) and posterior parietal cortex (PPC) are of particular importance to multiple-option decision making. Sixty-eight participants received anodal high definition-transcranial direct current stimulation (HD-tDCS) to focally enhance dlPFC or PPC in a double-blind sham-controlled design. Participants then performed a multiple-option decision making task. We found longer fixations on poorer options were related to less optimal decisions. Interestingly, this negative impact was attenuated after applying anodal HD-tDCS over dlPFC, especially in choices with many options. This suggests that dlPFC has a causal role in filtering choice-irrelevant information. In contrast, these effects were absent after participants received anodal HD-tDCS over PPC. Instead, the choices made by these participants were more biased towards the best options presented on the side contralateral to the stimulation. This suggests PPC has a causal role in value-based spatial selection. To conclude, the dlPFC has a role in filtering undesirable options, whereas the PPC emphasizes the desirable contralateral options.
Collapse
Affiliation(s)
- Tsz-Fung Woo
- Department of Rehabilitation Sciences, The Hong Kong Polytechnic University, Hong Kong
| | - Chun-Kit Law
- Department of Rehabilitation Sciences, The Hong Kong Polytechnic University, Hong Kong
| | - Kin-Hung Ting
- University Research Facility in Behavioral and Systems Neuroscience, The Hong Kong Polytechnic University, Hong Kong
| | - Chetwyn C H Chan
- Department of Psychology, The Education University of Hong Kong, Hong Kong
| | - Nils Kolling
- Department of Psychiatry, University of Oxford, Oxford OX3 7JX, UK.,Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, OX3 9DU, UK.,Oxford Centre for Human Brain Activity (OHBA), University of Oxford, Oxford, OX3 7JX, UK
| | - Kei Watanabe
- Department of Frontier Biosciences, Osaka University, Osaka 565-0871, Japan
| | - Bolton K H Chau
- Department of Rehabilitation Sciences, The Hong Kong Polytechnic University, Hong Kong.,University Research Facility in Behavioral and Systems Neuroscience, The Hong Kong Polytechnic University, Hong Kong
| |
Collapse
|
32
|
Hunt LT, Daw ND, Kaanders P, MacIver MA, Mugan U, Procyk E, Redish AD, Russo E, Scholl J, Stachenfeld K, Wilson CRE, Kolling N. Formalizing planning and information search in naturalistic decision-making. Nat Neurosci 2021; 24:1051-1064. [PMID: 34155400 DOI: 10.1038/s41593-021-00866-w] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Accepted: 03/23/2021] [Indexed: 02/05/2023]
Abstract
Decisions made by mammals and birds are often temporally extended. They require planning and sampling of decision-relevant information. Our understanding of such decision-making remains in its infancy compared with simpler, forced-choice paradigms. However, recent advances in algorithms supporting planning and information search provide a lens through which we can explain neural and behavioral data in these tasks. We review these advances to obtain a clearer understanding for why planning and curiosity originated in certain species but not others; how activity in the medial temporal lobe, prefrontal and cingulate cortices may support these behaviors; and how planning and information search may complement each other as means to improve future action selection.
Collapse
Affiliation(s)
- L T Hunt
- Department of Psychiatry, Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, UK.
| | - N D Daw
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ, USA
| | - P Kaanders
- Department of Experimental Psychology, Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, UK
| | - M A MacIver
- Center for Robotics and Biosystems, Department of Neurobiology, Department of Biomedical Engineering, Department of Mechanical Engineering, Northwestern University, Evanston, IL, USA
| | - U Mugan
- Center for Robotics and Biosystems, Department of Neurobiology, Department of Biomedical Engineering, Department of Mechanical Engineering, Northwestern University, Evanston, IL, USA
| | - E Procyk
- Univ Lyon, Université Claude Bernard Lyon 1, INSERM, Stem Cell and Brain Research Institute U1208, Bron, France
| | - A D Redish
- Department of Neuroscience, University of Minnesota, Minneapolis, MN, USA
| | - E Russo
- Department of Theoretical Neuroscience, Central Institute of Mental Health, Mannheim, Germany.,Department of Psychiatry and Psychotherapy, University Medical Center, Johannes Gutenberg University, Mainz, Germany
| | - J Scholl
- Department of Experimental Psychology, Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, UK
| | | | - C R E Wilson
- Univ Lyon, Université Claude Bernard Lyon 1, INSERM, Stem Cell and Brain Research Institute U1208, Bron, France
| | - N Kolling
- Department of Psychiatry, Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, UK.
| |
Collapse
|
33
|
Azzalini D, Buot A, Palminteri S, Tallon-Baudry C. Responses to Heartbeats in Ventromedial Prefrontal Cortex Contribute to Subjective Preference-Based Decisions. J Neurosci 2021; 41:5102-5114. [PMID: 33926998 PMCID: PMC8197644 DOI: 10.1523/jneurosci.1932-20.2021] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Revised: 01/14/2021] [Accepted: 01/25/2021] [Indexed: 11/21/2022] Open
Abstract
Forrest Gump or The Matrix? Preference-based decisions are subjective and entail self-reflection. However, these self-related features are unaccounted for by known neural mechanisms of valuation and choice. Self-related processes have been linked to a basic interoceptive biological mechanism, the neural monitoring of heartbeats, in particular in ventromedial prefrontal cortex (vmPFC), a region also involved in value encoding. We thus hypothesized a functional coupling between the neural monitoring of heartbeats and the precision of value encoding in vmPFC. Human participants of both sexes were presented with pairs of movie titles. They indicated either which movie they preferred or performed a control objective visual discrimination that did not require self-reflection. Using magnetoencephalography, we measured heartbeat-evoked responses (HERs) before option presentation and confirmed that HERs in vmPFC were larger when preparing for the subjective, self-related task. We retrieved the expected cortical value network during choice with time-resolved statistical modeling. Crucially, we show that larger HERs before option presentation are followed by stronger value encoding during choice in vmPFC. This effect is independent of overall vmPFC baseline activity. The neural interaction between HERs and value encoding predicted preference-based choice consistency over time, accounting for both interindividual differences and trial-to-trial fluctuations within individuals. Neither cardiac activity nor arousal fluctuations could account for any of the effects. HERs did not interact with the encoding of perceptual evidence in the discrimination task. Our results show that the self-reflection underlying preference-based decisions involves HERs, and that HER integration to subjective value encoding in vmPFC contributes to preference stability.SIGNIFICANCE STATEMENT Deciding whether you prefer Forrest Gump or The Matrix is based on subjective values, which only you, the decision-maker, can estimate and compare, by asking yourself. Yet, how self-reflection is biologically implemented and its contribution to subjective valuation are not known. We show that in ventromedial prefrontal cortex, the neural response to heartbeats, an interoceptive self-related process, influences the cortical representation of subjective value. The neural interaction between the cortical monitoring of heartbeats and value encoding predicts choice consistency (i.e., whether you consistently prefer Forrest Gump over Matrix over time. Our results pave the way for the quantification of self-related processes in decision-making and may shed new light on the relationship between maladaptive decisions and impaired interoception.
Collapse
Affiliation(s)
- Damiano Azzalini
- Laboratoire de Neurosciences Cognitives et Computationnelles, Ecole Normale Supérieure, PSL University, 75005 Paris, France
- Institut National de la Santé et de la Recherche Médicale, 75005 Paris, France
| | - Anne Buot
- Laboratoire de Neurosciences Cognitives et Computationnelles, Ecole Normale Supérieure, PSL University, 75005 Paris, France
- Institut National de la Santé et de la Recherche Médicale, 75005 Paris, France
| | - Stefano Palminteri
- Laboratoire de Neurosciences Cognitives et Computationnelles, Ecole Normale Supérieure, PSL University, 75005 Paris, France
- Institut National de la Santé et de la Recherche Médicale, 75005 Paris, France
| | - Catherine Tallon-Baudry
- Laboratoire de Neurosciences Cognitives et Computationnelles, Ecole Normale Supérieure, PSL University, 75005 Paris, France
- Institut National de la Santé et de la Recherche Médicale, 75005 Paris, France
| |
Collapse
|
34
|
|
35
|
Xu HA, Modirshanechi A, Lehmann MP, Gerstner W, Herzog MH. Novelty is not surprise: Human exploratory and adaptive behavior in sequential decision-making. PLoS Comput Biol 2021; 17:e1009070. [PMID: 34081705 PMCID: PMC8205159 DOI: 10.1371/journal.pcbi.1009070] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 06/15/2021] [Accepted: 05/12/2021] [Indexed: 11/19/2022] Open
Abstract
Classic reinforcement learning (RL) theories cannot explain human behavior in the absence of external reward or when the environment changes. Here, we employ a deep sequential decision-making paradigm with sparse reward and abrupt environmental changes. To explain the behavior of human participants in these environments, we show that RL theories need to include surprise and novelty, each with a distinct role. While novelty drives exploration before the first encounter of a reward, surprise increases the rate of learning of a world-model as well as of model-free action-values. Even though the world-model is available for model-based RL, we find that human decisions are dominated by model-free action choices. The world-model is only marginally used for planning, but it is important to detect surprising events. Our theory predicts human action choices with high probability and allows us to dissociate surprise, novelty, and reward in EEG signals.
Collapse
Affiliation(s)
- He A. Xu
- Laboratory of Psychophysics, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Alireza Modirshanechi
- Brain-Mind Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- School of Computer and Communication Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Marco P. Lehmann
- Brain-Mind Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- School of Computer and Communication Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Wulfram Gerstner
- Brain-Mind Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- School of Computer and Communication Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Michael H. Herzog
- Laboratory of Psychophysics, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Brain-Mind Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| |
Collapse
|
36
|
The Best Laid Plans: Computational Principles of Anterior Cingulate Cortex. Trends Cogn Sci 2021; 25:316-329. [PMID: 33593641 DOI: 10.1016/j.tics.2021.01.008] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 01/17/2021] [Accepted: 01/19/2021] [Indexed: 12/26/2022]
Abstract
Despite continual debate for the past 30 years about the function of anterior cingulate cortex (ACC), its key contribution to neurocognition remains unknown. However, recent computational modeling work has provided insight into this question. Here we review computational models that illustrate three core principles of ACC function, related to hierarchy, world models, and cost. We also discuss four constraints on the neural implementation of these principles, related to modularity, binding, encoding, and learning and regulation. These observations suggest a role for ACC in hierarchical model-based hierarchical reinforcement learning (HMB-HRL), which instantiates a mechanism motivating the execution of high-level plans.
Collapse
|
37
|
Tomov MS, Schulz E, Gershman SJ. Multi-task reinforcement learning in humans. Nat Hum Behav 2021; 5:764-773. [PMID: 33510391 DOI: 10.1038/s41562-020-01035-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Accepted: 12/10/2020] [Indexed: 01/01/2023]
Abstract
The ability to transfer knowledge across tasks and generalize to novel ones is an important hallmark of human intelligence. Yet not much is known about human multitask reinforcement learning. We study participants' behaviour in a two-step decision-making task with multiple features and changing reward functions. We compare their behaviour with two algorithms for multitask reinforcement learning, one that maps previous policies and encountered features to new reward functions and one that approximates value functions across tasks, as well as to standard model-based and model-free algorithms. Across three exploratory experiments and a large preregistered confirmatory experiment, our results provide evidence that participants who are able to learn the task use a strategy that maps previously learned policies to novel scenarios. These results enrich our understanding of human reinforcement learning in complex environments with changing task demands.
Collapse
Affiliation(s)
- Momchil S Tomov
- Program in Neuroscience, Harvard Medical School, Boston, MA, USA. .,Center for Brain Science, Harvard University, Cambridge, MA, USA.
| | - Eric Schulz
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany. .,Department of Psychology, Harvard University, Cambridge, MA, USA.
| | - Samuel J Gershman
- Center for Brain Science, Harvard University, Cambridge, MA, USA.,Department of Psychology, Harvard University, Cambridge, MA, USA.,Center for Brains, Minds and Machines, Cambridge, MA, USA
| |
Collapse
|
38
|
Rapid trial-and-error learning with simulation supports flexible tool use and physical reasoning. Proc Natl Acad Sci U S A 2021; 117:29302-29310. [PMID: 33229515 DOI: 10.1073/pnas.1912341117] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Many animals, and an increasing number of artificial agents, display sophisticated capabilities to perceive and manipulate objects. But human beings remain distinctive in their capacity for flexible, creative tool use-using objects in new ways to act on the world, achieve a goal, or solve a problem. To study this type of general physical problem solving, we introduce the Virtual Tools game. In this game, people solve a large range of challenging physical puzzles in just a handful of attempts. We propose that the flexibility of human physical problem solving rests on an ability to imagine the effects of hypothesized actions, while the efficiency of human search arises from rich action priors which are updated via observations of the world. We instantiate these components in the "sample, simulate, update" (SSUP) model and show that it captures human performance across 30 levels of the Virtual Tools game. More broadly, this model provides a mechanism for explaining how people condense general physical knowledge into actionable, task-specific plans to achieve flexible and efficient physical problem solving.
Collapse
|
39
|
Petzschner FH, Garfinkel SN, Paulus MP, Koch C, Khalsa SS. Computational Models of Interoception and Body Regulation. Trends Neurosci 2021; 44:63-76. [PMID: 33378658 PMCID: PMC8109616 DOI: 10.1016/j.tins.2020.09.012] [Citation(s) in RCA: 85] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2020] [Revised: 08/01/2020] [Accepted: 09/30/2020] [Indexed: 02/07/2023]
Abstract
To survive, organisms must effectively respond to the challenge of maintaining their physiological integrity in the face of an ever-changing environment. Preserving this homeostasis critically relies on adaptive behavior. In this review, we consider recent frameworks that extend classical homeostatic control via reflex arcs to include more flexible forms of adaptive behavior that take interoceptive context, experiences, and expectations into account. Specifically, we define a landscape for computational models of interoception, body regulation, and forecasting, address these models' unique challenges in relation to translational research efforts, and discuss what they can teach us about cognition as well as physical and mental health.
Collapse
Affiliation(s)
- Frederike H Petzschner
- Translational Neuromodeling Unit (TNU), Institute for Biomedical Engineering, University of Zurich, ETH Zurich, Switzerland.
| | - Sarah N Garfinkel
- Department of Neuroscience, Brighton and Sussex Medical School, University of Sussex, Falmer, UK; Sussex Partnership NHS Foundation Trust, Brighton, UK
| | - Martin P Paulus
- Laureate Institute for Brain Research, Tulsa, OK, USA; Oxley College of Health Sciences, University of Tulsa, Tulsa, OK, USA
| | | | - Sahib S Khalsa
- Laureate Institute for Brain Research, Tulsa, OK, USA; Oxley College of Health Sciences, University of Tulsa, Tulsa, OK, USA
| |
Collapse
|
40
|
Fine JM, Zarr N, Brown JW. Computational Neural Mechanisms of Goal-Directed Planning and Problem Solving. ACTA ACUST UNITED AC 2020. [DOI: 10.1007/s42113-020-00095-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
|
41
|
Skov M, Nadal M. The nature of beauty: behavior, cognition, and neurobiology. Ann N Y Acad Sci 2020; 1488:44-55. [DOI: 10.1111/nyas.14524] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 10/05/2020] [Accepted: 10/13/2020] [Indexed: 12/11/2022]
Affiliation(s)
- Martin Skov
- Danish Research Centre for Magnetic Resonance Copenhagen University Hospital Hvidovre Denmark
- Decision Neuroscience Research Cluster Copenhagen Business School Frederiksberg Denmark
| | - Marcos Nadal
- Human Evolution and Cognition Group Department of Psychology University of the Balearic Islands Palma Spain
| |
Collapse
|
42
|
Abstract
This paper describes a framework for modelling dopamine function in the mammalian brain. It proposes that both learning and action planning involve processes minimizing prediction errors encoded by dopaminergic neurons. In this framework, dopaminergic neurons projecting to different parts of the striatum encode errors in predictions made by the corresponding systems within the basal ganglia. The dopaminergic neurons encode differences between rewards and expectations in the goal-directed system, and differences between the chosen and habitual actions in the habit system. These prediction errors trigger learning about rewards and habit formation, respectively. Additionally, dopaminergic neurons in the goal-directed system play a key role in action planning: They compute the difference between a desired reward and the reward expected from the current motor plan, and they facilitate action planning until this difference diminishes. Presented models account for dopaminergic responses during movements, effects of dopamine depletion on behaviour, and make several experimental predictions. In the brain, chemicals such as dopamine allow nerve cells to ‘talk’ to each other and to relay information from and to the environment. Dopamine, in particular, is released when pleasant surprises are experienced: this helps the organism to learn about the consequences of certain actions. If a new flavour of ice-cream tastes better than expected, for example, the release of dopamine tells the brain that this flavour is worth choosing again. However, dopamine has an additional role in controlling movement. When the cells that produce dopamine die, for instance in Parkinson’s disease, individuals may find it difficult to initiate deliberate movements. Here, Rafal Bogacz aimed to develop a comprehensive framework that could reconcile the two seemingly unrelated roles played by dopamine. The new theory proposes that dopamine is released when an outcome differs from expectations, which helps the organism to adjust and minimise these differences. In the ice-cream example, the difference is between how good the treat is expected to taste, and how tasty it really is. By learning to select the same flavour repeatedly, the brain aligns expectation and the result of the choice. This ability would also apply when movements are planned. In this case, the brain compares the desired reward with the predicted results of the planned actions. For example, while planning to get a spoonful of ice-cream, the brain compares the pleasure expected from the movement that is currently planned, and the pleasure of eating a full spoon of the treat. If the two differ, for example because no movement has been planned yet, the brain releases dopamine to form a better version of the action plan. The theory was then tested using a computer simulation of nerve cells that release dopamine; this showed that the behaviour of the virtual cells closely matched that of their real-life counterparts. This work offers a comprehensive description of the fundamental role of dopamine in the brain. The model now needs to be verified through experiments on living nerve cells; ultimately, it could help doctors and researchers to develop better treatments for conditions such as Parkinson’s disease or ADHD, which are linked to a lack of dopamine.
Collapse
Affiliation(s)
- Rafal Bogacz
- MRC Brain Networks Dynamics Unit, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
43
|
Four core properties of the human brain valuation system demonstrated in intracranial signals. Nat Neurosci 2020; 23:664-675. [PMID: 32284605 DOI: 10.1038/s41593-020-0615-9] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Accepted: 02/21/2020] [Indexed: 01/08/2023]
Abstract
Estimating the value of alternative options is a key process in decision-making. Human functional magnetic resonance imaging and monkey electrophysiology studies have identified brain regions, such as the ventromedial prefrontal cortex (vmPFC) and lateral orbitofrontal cortex (lOFC), composing a value system. In the present study, in an effort to bridge across species and techniques, we investigated the neural representation of value ratings in 36 people with epilepsy, using intracranial electroencephalography. We found that subjective value was positively reflected in both vmPFC and lOFC high-frequency activity, plus several other brain regions, including the hippocampus. We then demonstrated that subjective value could be decoded (1) in pre-stimulus activity, (2) for various categories of items, (3) even during a distractive task and (4) as both linear and quadratic signals (encoding both value and confidence). Thus, our findings specify key functional properties of neural value signals (anticipation, generality, automaticity, quadraticity), which might provide insights into human irrational choice behaviors.
Collapse
|
44
|
Livneh Y, Sugden AU, Madara JC, Essner RA, Flores VI, Sugden LA, Resch JM, Lowell BB, Andermann ML. Estimation of Current and Future Physiological States in Insular Cortex. Neuron 2020; 105:1094-1111.e10. [PMID: 31955944 PMCID: PMC7083695 DOI: 10.1016/j.neuron.2019.12.027] [Citation(s) in RCA: 106] [Impact Index Per Article: 26.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 11/18/2019] [Accepted: 12/20/2019] [Indexed: 01/31/2023]
Abstract
Interoception, the sense of internal bodily signals, is essential for physiological homeostasis, cognition, and emotions. While human insular cortex (InsCtx) is implicated in interoception, the cellular and circuit mechanisms remain unclear. We imaged mouse InsCtx neurons during two physiological deficiency states: hunger and thirst. InsCtx ongoing activity patterns reliably tracked the gradual return to homeostasis but not changes in behavior. Accordingly, while artificial induction of hunger or thirst in sated mice via activation of specific hypothalamic neurons (AgRP or SFOGLUT) restored cue-evoked food- or water-seeking, InsCtx ongoing activity continued to reflect physiological satiety. During natural hunger or thirst, food or water cues rapidly and transiently shifted InsCtx population activity to the future satiety-related pattern. During artificial hunger or thirst, food or water cues further shifted activity beyond the current satiety-related pattern. Together with circuit-mapping experiments, these findings suggest that InsCtx integrates visceral-sensory signals of current physiological state with hypothalamus-gated amygdala inputs that signal upcoming ingestion of food or water to compute a prediction of future physiological state.
Collapse
Affiliation(s)
- Yoav Livneh
- Division of Endocrinology, Diabetes and Metabolism, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA
| | - Arthur U Sugden
- Division of Endocrinology, Diabetes and Metabolism, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA
| | - Joseph C Madara
- Division of Endocrinology, Diabetes and Metabolism, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA
| | - Rachel A Essner
- Division of Endocrinology, Diabetes and Metabolism, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA; Program in Neuroscience, Harvard Medical School, Boston, MA 02115, USA
| | - Vanessa I Flores
- Division of Endocrinology, Diabetes and Metabolism, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA
| | - Lauren A Sugden
- Department of Mathematics and Computer Science, Duquesne University, Pittsburgh, PA 15232, USA
| | - Jon M Resch
- Division of Endocrinology, Diabetes and Metabolism, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA
| | - Bradford B Lowell
- Division of Endocrinology, Diabetes and Metabolism, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA; Program in Neuroscience, Harvard Medical School, Boston, MA 02115, USA.
| | - Mark L Andermann
- Division of Endocrinology, Diabetes and Metabolism, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA; Program in Neuroscience, Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|
45
|
Cools R. Chemistry of the Adaptive Mind: Lessons from Dopamine. Neuron 2019; 104:113-131. [DOI: 10.1016/j.neuron.2019.09.035] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Revised: 09/19/2019] [Accepted: 09/20/2019] [Indexed: 12/21/2022]
|