1
|
Kim T, Lee SW, Lho SK, Moon SY, Kim M, Kwon JS. Neurocomputational model of compulsivity: deviating from an uncertain goal-directed system. Brain 2024; 147:2230-2244. [PMID: 38584499 PMCID: PMC11146420 DOI: 10.1093/brain/awae102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 02/18/2024] [Accepted: 03/07/2024] [Indexed: 04/09/2024] Open
Abstract
Despite a theory that an imbalance in goal-directed versus habitual systems serve as building blocks of compulsions, research has yet to delineate how this occurs during arbitration between the two systems in obsessive-compulsive disorder. Inspired by a brain model in which the inferior frontal cortex selectively gates the putamen to guide goal-directed or habitual actions, this study aimed to examine whether disruptions in the arbitration process via the fronto-striatal circuit would underlie imbalanced decision-making and compulsions in patients. Thirty patients with obsessive-compulsive disorder [mean (standard deviation) age = 26.93 (6.23) years, 12 females (40%)] and 30 healthy controls [mean (standard deviation) age = 24.97 (4.72) years, 17 females (57%)] underwent functional MRI scans while performing the two-step Markov decision task, which was designed to dissociate goal-directed behaviour from habitual behaviour. We employed a neurocomputational model to account for an uncertainty-based arbitration process, in which a prefrontal arbitrator (i.e. inferior frontal gyrus) allocates behavioural control to a more reliable strategy by selectively gating the putamen. We analysed group differences in the neural estimates of uncertainty of each strategy. We also compared the psychophysiological interaction effects of system preference (goal-directed versus habitual) on fronto-striatal coupling between groups. We examined the correlation between compulsivity score and the neural activity and connectivity involved in the arbitration process. The computational model captured the subjects' preferences between the strategies. Compared with healthy controls, patients had a stronger preference for the habitual system (t = -2.88, P = 0.006), which was attributed to a more uncertain goal-directed system (t = 2.72, P = 0.009). Before the allocation of controls, patients exhibited hypoactivity in the inferior frontal gyrus compared with healthy controls when this region tracked the inverse of uncertainty (i.e. reliability) of goal-directed behaviour (P = 0.001, family-wise error rate corrected). When reorienting behaviours to reach specific goals, patients exhibited weaker right ipsilateral ventrolateral prefronto-putamen coupling than healthy controls (P = 0.001, family-wise error rate corrected). This hypoconnectivity was correlated with more severe compulsivity (r = -0.57, P = 0.002). Our findings suggest that the attenuated top-down control of the putamen by the prefrontal arbitrator underlies compulsivity in obsessive-compulsive disorder. Enhancing fronto-striatal connectivity may be a potential neurotherapeutic approach for compulsivity and adaptive decision-making.
Collapse
Affiliation(s)
- Taekwan Kim
- Department of Brain and Cognitive Sciences, Seoul National University College of Natural Sciences, Seoul 08826, Republic of Korea
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
- Center for Neuroscience-inspired Artificial Intelligence, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
| | - Sang Wan Lee
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
- Center for Neuroscience-inspired Artificial Intelligence, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
- Kim Jaechul Graduate School of AI, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
| | - Silvia Kyungjin Lho
- Department of Neuropsychiatry, Seoul National University Hospital, Seoul 03080, Republic of Korea
| | - Sun-Young Moon
- Department of Neuropsychiatry, Seoul National University Hospital, Seoul 03080, Republic of Korea
| | - Minah Kim
- Department of Neuropsychiatry, Seoul National University Hospital, Seoul 03080, Republic of Korea
- Department of Psychiatry, Seoul National University College of Medicine, Seoul 03080, Republic of Korea
| | - Jun Soo Kwon
- Department of Brain and Cognitive Sciences, Seoul National University College of Natural Sciences, Seoul 08826, Republic of Korea
- Department of Neuropsychiatry, Seoul National University Hospital, Seoul 03080, Republic of Korea
- Department of Psychiatry, Seoul National University College of Medicine, Seoul 03080, Republic of Korea
| |
Collapse
|
2
|
Charpentier CJ, Wu Q, Min S, Ding W, Cockburn J, O'Doherty JP. Heterogeneity in strategy use during arbitration between experiential and observational learning. Nat Commun 2024; 15:4436. [PMID: 38789415 PMCID: PMC11126711 DOI: 10.1038/s41467-024-48548-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 05/06/2024] [Indexed: 05/26/2024] Open
Abstract
To navigate our complex social world, it is crucial to deploy multiple learning strategies, such as learning from directly experiencing action outcomes or from observing other people's behavior. Despite the prevalence of experiential and observational learning in humans and other social animals, it remains unclear how people favor one strategy over the other depending on the environment, and how individuals vary in their strategy use. Here, we describe an arbitration mechanism in which the prediction errors associated with each learning strategy influence their weight over behavior. We designed an online behavioral task to test our computational model, and found that while a substantial proportion of participants relied on the proposed arbitration mechanism, there was some meaningful heterogeneity in how people solved this task. Four other groups were identified: those who used a fixed mixture between the two strategies, those who relied on a single strategy and non-learners with irrelevant strategies. Furthermore, groups were found to differ on key behavioral signatures, and on transdiagnostic symptom dimensions, in particular autism traits and anxiety. Together, these results demonstrate how large heterogeneous datasets and computational methods can be leveraged to better characterize individual differences.
Collapse
Affiliation(s)
- Caroline J Charpentier
- Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA.
- Department of Psychology & Brain and Behavior Institute, University of Maryland, College Park, MD, USA.
| | - Qianying Wu
- Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA
| | - Seokyoung Min
- Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA
| | - Weilun Ding
- Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA
| | - Jeffrey Cockburn
- Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA
| | - John P O'Doherty
- Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA
| |
Collapse
|
3
|
Wurm F, Ernst B, Steinhauser M. Surprise-minimization as a solution to the structural credit assignment problem. PLoS Comput Biol 2024; 20:e1012175. [PMID: 38805546 PMCID: PMC11175464 DOI: 10.1371/journal.pcbi.1012175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 06/13/2024] [Accepted: 05/18/2024] [Indexed: 05/30/2024] Open
Abstract
The structural credit assignment problem arises when the causal structure between actions and subsequent outcomes is hidden from direct observation. To solve this problem and enable goal-directed behavior, an agent has to infer structure and form a representation thereof. In the scope of this study, we investigate a possible solution in the human brain. We recorded behavioral and electrophysiological data from human participants in a novel variant of the bandit task, where multiple actions lead to multiple outcomes. Crucially, the mapping between actions and outcomes was hidden and not instructed to the participants. Human choice behavior revealed clear hallmarks of credit assignment and learning. Moreover, a computational model which formalizes action selection as the competition between multiple representations of the hidden structure was fit to account for participants data. Starting in a state of uncertainty about the correct representation, the central mechanism of this model is the arbitration of action control towards the representation which minimizes surprise about outcomes. Crucially, single-trial latent-variable analysis reveals that the neural patterns clearly support central quantitative predictions of this surprise minimization model. The results suggest that neural activity is not only related to reinforcement learning under correct as well as incorrect task representations but also reflects central mechanisms of credit assignment and behavioral arbitration.
Collapse
Affiliation(s)
- Franz Wurm
- Catholic University of Eichstätt-Ingolstadt, Eichstätt, Germany
- Leiden University, Leiden, the Netherlands
- Leiden Institute for Brain and Cognition, Leiden University, Leiden, the Netherlands
| | - Benjamin Ernst
- Catholic University of Eichstätt-Ingolstadt, Eichstätt, Germany
| | | |
Collapse
|
4
|
Philippe R, Janet R, Khalvati K, Rao RPN, Lee D, Dreher JC. Neurocomputational mechanisms involved in adaptation to fluctuating intentions of others. Nat Commun 2024; 15:3189. [PMID: 38609372 PMCID: PMC11014977 DOI: 10.1038/s41467-024-47491-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Accepted: 03/12/2024] [Indexed: 04/14/2024] Open
Abstract
Humans frequently interact with agents whose intentions can fluctuate between competition and cooperation over time. It is unclear how the brain adapts to fluctuating intentions of others when the nature of the interactions (to cooperate or compete) is not explicitly and truthfully signaled. Here, we use model-based fMRI and a task in which participants thought they were playing with another player. In fact, they played with an algorithm that alternated without signaling between cooperative and competitive strategies. We show that a neurocomputational mechanism with arbitration between competitive and cooperative experts outperforms other learning models in predicting choice behavior. At the brain level, the fMRI results show that the ventral striatum and ventromedial prefrontal cortex track the difference of reliability between these experts. When attributing competitive intentions, we find increased coupling between these regions and a network that distinguishes prediction errors related to competition and cooperation. These findings provide a neurocomputational account of how the brain arbitrates dynamically between cooperative and competitive intentions when making adaptive social decisions.
Collapse
Affiliation(s)
- Rémi Philippe
- CNRS-Institut des Sciences Cognitives Marc Jeannerod, UMR5229, Neuroeconomics, reward, and decision making laboratory, Lyon, France
- Université Claude Bernard Lyon 1, Lyon, France
| | - Rémi Janet
- CNRS-Institut des Sciences Cognitives Marc Jeannerod, UMR5229, Neuroeconomics, reward, and decision making laboratory, Lyon, France
- Université Claude Bernard Lyon 1, Lyon, France
| | - Koosha Khalvati
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Rajesh P N Rao
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
- Center for Neurotechnology, University of Washington, Seattle, WA, USA
| | - Daeyeol Lee
- Zanvyl Krieger Mind/Brain Institute, Johns Hopkins University, Baltimore, MD, USA
- Kavli Discovery Neuroscience Institute, Johns Hopkins University, Baltimore, MD, USA
- Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, MD, USA
- Department of Neuroscience, Johns Hopkins University, Baltimore, MD, USA
| | - Jean-Claude Dreher
- CNRS-Institut des Sciences Cognitives Marc Jeannerod, UMR5229, Neuroeconomics, reward, and decision making laboratory, Lyon, France.
- Université Claude Bernard Lyon 1, Lyon, France.
| |
Collapse
|
5
|
Venditto SJC, Miller KJ, Brody CD, Daw ND. Dynamic reinforcement learning reveals time-dependent shifts in strategy during reward learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.28.582617. [PMID: 38464244 PMCID: PMC10925334 DOI: 10.1101/2024.02.28.582617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Different brain systems have been hypothesized to subserve multiple "experts" that compete to generate behavior. In reinforcement learning, two general processes, one model-free (MF) and one model-based (MB), are often modeled as a mixture of agents (MoA) and hypothesized to capture differences between automaticity vs. deliberation. However, shifts in strategy cannot be captured by a static MoA. To investigate such dynamics, we present the mixture-of-agents hidden Markov model (MoA-HMM), which simultaneously learns inferred action values from a set of agents and the temporal dynamics of underlying "hidden" states that capture shifts in agent contributions over time. Applying this model to a multi-step,reward-guided task in rats reveals a progression of within-session strategies: a shift from initial MB exploration to MB exploitation, and finally to reduced engagement. The inferred states predict changes in both response time and OFC neural encoding during the task, suggesting that these states are capturing real shifts in dynamics.
Collapse
|
6
|
Colas JT, O’Doherty JP, Grafton ST. Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts. PLoS Comput Biol 2024; 20:e1011950. [PMID: 38552190 PMCID: PMC10980507 DOI: 10.1371/journal.pcbi.1011950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 02/26/2024] [Indexed: 04/01/2024] Open
Abstract
Active reinforcement learning enables dynamic prediction and control, where one should not only maximize rewards but also minimize costs such as of inference, decisions, actions, and time. For an embodied agent such as a human, decisions are also shaped by physical aspects of actions. Beyond the effects of reward outcomes on learning processes, to what extent can modeling of behavior in a reinforcement-learning task be complicated by other sources of variance in sequential action choices? What of the effects of action bias (for actions per se) and action hysteresis determined by the history of actions chosen previously? The present study addressed these questions with incremental assembly of models for the sequential choice data from a task with hierarchical structure for additional complexity in learning. With systematic comparison and falsification of computational models, human choices were tested for signatures of parallel modules representing not only an enhanced form of generalized reinforcement learning but also action bias and hysteresis. We found evidence for substantial differences in bias and hysteresis across participants-even comparable in magnitude to the individual differences in learning. Individuals who did not learn well revealed the greatest biases, but those who did learn accurately were also significantly biased. The direction of hysteresis varied among individuals as repetition or, more commonly, alternation biases persisting from multiple previous actions. Considering that these actions were button presses with trivial motor demands, the idiosyncratic forces biasing sequences of action choices were robust enough to suggest ubiquity across individuals and across tasks requiring various actions. In light of how bias and hysteresis function as a heuristic for efficient control that adapts to uncertainty or low motivation by minimizing the cost of effort, these phenomena broaden the consilient theory of a mixture of experts to encompass a mixture of expert and nonexpert controllers of behavior.
Collapse
Affiliation(s)
- Jaron T. Colas
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, California, United States of America
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
- Computation and Neural Systems Program, California Institute of Technology, Pasadena, California, United States of America
| | - John P. O’Doherty
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
- Computation and Neural Systems Program, California Institute of Technology, Pasadena, California, United States of America
| | - Scott T. Grafton
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, California, United States of America
| |
Collapse
|
7
|
Modirshanechi A, Kondrakiewicz K, Gerstner W, Haesler S. Curiosity-driven exploration: foundations in neuroscience and computational modeling. Trends Neurosci 2023; 46:1054-1066. [PMID: 37925342 DOI: 10.1016/j.tins.2023.10.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 09/28/2023] [Accepted: 10/04/2023] [Indexed: 11/06/2023]
Abstract
Curiosity refers to the intrinsic desire of humans and animals to explore the unknown, even when there is no apparent reason to do so. Thus far, no single, widely accepted definition or framework for curiosity has emerged, but there is growing consensus that curious behavior is not goal-directed but related to seeking or reacting to information. In this review, we take a phenomenological approach and group behavioral and neurophysiological studies which meet these criteria into three categories according to the type of information seeking observed. We then review recent computational models of curiosity from the field of machine learning and discuss how they enable integrating different types of information seeking into one theoretical framework. Combinations of behavioral and neurophysiological studies along with computational modeling will be instrumental in demystifying the notion of curiosity.
Collapse
Affiliation(s)
| | - Kacper Kondrakiewicz
- Neuroelectronics Research Flanders (NERF), Leuven, Belgium; VIB, Leuven, Belgium; Department of Neuroscience, KU Leuven, Leuven, Belgium
| | - Wulfram Gerstner
- École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.
| | - Sebastian Haesler
- Neuroelectronics Research Flanders (NERF), Leuven, Belgium; VIB, Leuven, Belgium; Department of Neuroscience, KU Leuven, Leuven, Belgium; Leuven Brain Institute, Leuven, Belgium.
| |
Collapse
|
8
|
Ruan Z, Seger CA, Yang Q, Kim D, Lee SW, Chen Q, Peng Z. Impairment of arbitration between model-based and model-free reinforcement learning in obsessive-compulsive disorder. Front Psychiatry 2023; 14:1162800. [PMID: 37304449 PMCID: PMC10250695 DOI: 10.3389/fpsyt.2023.1162800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 05/05/2023] [Indexed: 06/13/2023] Open
Abstract
Introduction Obsessive-compulsive disorder (OCD) is characterized by an imbalance between goal-directed and habitual learning systems in behavioral control, but it is unclear whether these impairments are due to a single system abnormality of the goal-directed system or due to an impairment in a separate arbitration mechanism that selects which system controls behavior at each point in time. Methods A total of 30 OCD patients and 120 healthy controls performed a 2-choice, 3-stage Markov decision-making paradigm. Reinforcement learning models were used to estimate goal-directed learning (as model-based reinforcement learning) and habitual learning (as model-free reinforcement learning). In general, 29 high Obsessive-Compulsive Inventory-Revised (OCI-R) score controls, 31 low OCI-R score controls, and all 30 OCD patients were selected for the analysis. Results Obsessive-compulsive disorder (OCD) patients showed less appropriate strategy choices than controls regardless of whether the OCI-R scores in the control subjects were high (p = 0.012) or low (p < 0.001), specifically showing a greater model-free strategy use in task conditions where the model-based strategy was optimal. Furthermore, OCD patients (p = 0.001) and control subjects with high OCI-R scores (H-OCI-R; p = 0.009) both showed greater system switching rather than consistent strategy use in task conditions where model-free use was optimal. Conclusion These findings indicated an impaired arbitration mechanism for flexible adaptation to environmental demands in both OCD patients and healthy individuals reporting high OCI-R scores.
Collapse
Affiliation(s)
- Zhongqiang Ruan
- Guangdong Key Laboratory of Mental Health and Cognitive Science, School of Psychology, Center for Studies of Psychological Application, South China Normal University, Guangzhou, China
| | - Carol A. Seger
- Guangdong Key Laboratory of Mental Health and Cognitive Science, School of Psychology, Center for Studies of Psychological Application, South China Normal University, Guangzhou, China
- Department of Psychology, Colorado State University, Fort Collins, CO, United States
| | - Qiong Yang
- Affective Disorder Center, Affiliated Brain Hospital of Guangzhou Medical University (Guangzhou Huiai Hospital), Guangzhou, China
| | - Dongjae Kim
- Department of AI-based Convergence, College of Engineering, Dankook University, Yongin, Republic of Korea
| | - Sang Wan Lee
- Department of Bio and Brain Engineering, Program of Brain and Cognitive Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
| | - Qi Chen
- School of Psychology, Shenzhen University, Shenzhen, China
| | - Ziwen Peng
- Guangdong Key Laboratory of Mental Health and Cognitive Science, School of Psychology, Center for Studies of Psychological Application, South China Normal University, Guangzhou, China
- Key Laboratory of Brain, Cognition and Education Sciences, Ministry of Education, Guangzhou, China
- Department of Child Psychiatry, Shenzhen Kangning Hospital, Shenzhen University School of Medicine, Shenzhen, China
| |
Collapse
|
9
|
Dundon NM, Colas JT, Garrett N, Babenko V, Rizor E, Yang D, MacNamara M, Petzold L, Grafton ST. Decision heuristics in contexts integrating action selection and execution. Sci Rep 2023; 13:6486. [PMID: 37081031 PMCID: PMC10119283 DOI: 10.1038/s41598-023-33008-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 04/05/2023] [Indexed: 04/22/2023] Open
Abstract
Heuristics can inform human decision making in complex environments through a reduction of computational requirements (accuracy-resource trade-off) and a robustness to overparameterisation (less-is-more). However, tasks capturing the efficiency of heuristics typically ignore action proficiency in determining rewards. The requisite movement parameterisation in sensorimotor control questions whether heuristics preserve efficiency when actions are nontrivial. We developed a novel action selection-execution task requiring joint optimisation of action selection and spatio-temporal skillful execution. State-appropriate choices could be determined by a simple spatial heuristic, or by more complex planning. Computational models of action selection parsimoniously distinguished human participants who adopted the heuristic from those using a more complex planning strategy. Broader comparative analyses then revealed that participants using the heuristic showed combined decisional (selection) and skill (execution) advantages, consistent with a less-is-more framework. In addition, the skill advantage of the heuristic group was predominantly in the core spatial features that also shaped their decision policy, evidence that the dimensions of information guiding action selection might be yoked to salient features in skill learning.
Collapse
Affiliation(s)
- Neil M Dundon
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, CA, 93106, USA.
- Department of Child and Adolescent Psychiatry, Psychotherapy and Psychosomatics, University of Freiburg, 79104, Freiburg, Germany.
| | - Jaron T Colas
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, CA, 93106, USA
| | - Neil Garrett
- School of Psychology, University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK
| | - Viktoriya Babenko
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, CA, 93106, USA
| | - Elizabeth Rizor
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, CA, 93106, USA
| | - Dengxian Yang
- Department of Computer Science, University of California, Santa Barbara, CA, 93106, USA
| | | | - Linda Petzold
- Department of Computer Science, University of California, Santa Barbara, CA, 93106, USA
| | - Scott T Grafton
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, CA, 93106, USA
| |
Collapse
|
10
|
Kanaev IA. Entropy and Cross-Level Orderliness in Light of the Interconnection between the Neural System and Consciousness. ENTROPY (BASEL, SWITZERLAND) 2023; 25:418. [PMID: 36981307 PMCID: PMC10047885 DOI: 10.3390/e25030418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 02/01/2023] [Accepted: 02/20/2023] [Indexed: 06/18/2023]
Abstract
Despite recent advances, the origin and utility of consciousness remains under debate. Using an evolutionary perspective on the origin of consciousness, this review elaborates on the promising theoretical background suggested in the temporospatial theory of consciousness, which outlines world-brain alignment as a critical predisposition for controlling behavior and adaptation. Such a system can be evolutionarily effective only if it can provide instant cohesion between the subsystems, which is possible only if it performs an intrinsic activity modified in light of the incoming stimulation. One can assume that the world-brain interaction results in a particular interference pattern predetermined by connectome complexity. This is what organisms experience as their exclusive subjective state, allowing the anticipation of regularities in the environment. Thus, an anticipative system can emerge only in a regular environment, which guides natural selection by reinforcing corresponding reactions and decreasing the system entropy. Subsequent evolution requires complicated, layered structures and can be traced from simple organisms to human consciousness and society. This allows us to consider the mode of entropy as a subject of natural evolution rather than an individual entity.
Collapse
Affiliation(s)
- Ilya A Kanaev
- Department of Philosophy, Sun Yat-sen University, 135 Xingang Xi Rd, Guangzhou 510275, China
| |
Collapse
|
11
|
Dong T, Sinha S, Zhai B, Fudulu DP, Chan J, Narayan P, Judge A, Caputo M, Dimagli A, Benedetto U, Angelini GD. Cardiac surgery risk prediction using ensemble machine learning to incorporate legacy risk scores: A benchmarking study. Digit Health 2023; 9:20552076231187605. [PMID: 37492033 PMCID: PMC10363892 DOI: 10.1177/20552076231187605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Accepted: 06/23/2023] [Indexed: 07/27/2023] Open
Abstract
Objective The introduction of new clinical risk scores (e.g. European System for Cardiac Operative Risk Evaluation (EuroSCORE) II) superseding original scores (e.g. EuroSCORE I) with different variable sets typically result in disparate datasets due to high levels of missingness for new score variables prior to time of adoption. Little is known about the use of ensemble learning to incorporate disparate data from legacy scores. We tested the hypothesised that Homogenenous and Heterogeneous Machine Learning (ML) ensembles will have better performance than ensembles of Dynamic Model Averaging (DMA) for combining knowledge from EuroSCORE I legacy data with EuroSCORE II data to predict cardiac surgery risk. Methods Using the National Adult Cardiac Surgery Audit dataset, we trained 12 different base learner models, based on two different variable sets from either EuroSCORE I (LogES) or EuroScore II (ES II), partitioned by the time of score adoption (1996-2016 or 2012-2016) and evaluated on holdout set (2017-2019). These base learner models were ensembled using nine different combinations of six ML algorithms to produce homogeneous or heterogeneous ensembles. Performance was assessed using a consensus metric. Results Xgboost homogenous ensemble (HE) was the highest performing model (clinical effectiveness metric (CEM) 0.725) with area under the curve (AUC) (0.8327; 95% confidence interval (CI) 0.8323-0.8329) followed by Random Forest HE (CEM 0.723; AUC 0.8325; 95%CI 0.8320-0.8326). Across different heterogenous ensembles, significantly better performance was obtained by combining siloed datasets across time (CEM 0.720) than building ensembles of either 1996-2011 (t-test adjusted, p = 1.67×10-6) or 2012-2019 (t-test adjusted, p = 1.35×10-193) datasets alone. Conclusions Both homogenous and heterogenous ML ensembles performed significantly better than DMA ensemble of Bayesian Update models. Time-dependent ensemble combination of variables, having differing qualities according to time of score adoption, enabled previously siloed data to be combined, leading to increased power, clinical interpretability of variables and usage of data.
Collapse
Affiliation(s)
- Tim Dong
- Translational Health Sciences, Bristol Heart Institute, University of Bristol, Bristol, UK
| | - Shubhra Sinha
- Translational Health Sciences, Bristol Heart Institute, University of Bristol, Bristol, UK
| | - Ben Zhai
- School of Computing Science, Northumbria University, Newcastle upon Tyne, UK
| | - Daniel P Fudulu
- Translational Health Sciences, Bristol Heart Institute, University of Bristol, Bristol, UK
| | - Jeremy Chan
- Translational Health Sciences, Bristol Heart Institute, University of Bristol, Bristol, UK
| | - Pradeep Narayan
- Department of Cardiac Surgery, Rabindranath Tagore International Institute of Cardiac Sciences, Kolkata, India
| | - Andy Judge
- Translational Health Sciences, Bristol Heart Institute, University of Bristol, Bristol, UK
| | - Massimo Caputo
- Translational Health Sciences, Bristol Heart Institute, University of Bristol, Bristol, UK
| | - Arnaldo Dimagli
- Translational Health Sciences, Bristol Heart Institute, University of Bristol, Bristol, UK
| | - Umberto Benedetto
- Translational Health Sciences, Bristol Heart Institute, University of Bristol, Bristol, UK
| | - Gianni D Angelini
- Translational Health Sciences, Bristol Heart Institute, University of Bristol, Bristol, UK
| |
Collapse
|
12
|
Lee JH, Leibo JZ, An SJ, Lee SW. Importance of prefrontal meta control in human-like reinforcement learning. Front Comput Neurosci 2022; 16:1060101. [PMID: 36618272 PMCID: PMC9811824 DOI: 10.3389/fncom.2022.1060101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Accepted: 11/30/2022] [Indexed: 12/24/2022] Open
Abstract
Recent investigation on reinforcement learning (RL) has demonstrated considerable flexibility in dealing with various problems. However, such models often experience difficulty learning seemingly easy tasks for humans. To reconcile the discrepancy, our paper is focused on the computational benefits of the brain's RL. We examine the brain's ability to combine complementary learning strategies to resolve the trade-off between prediction performance, computational costs, and time constraints. The complex need for task performance created by a volatile and/or multi-agent environment motivates the brain to continually explore an ideal combination of multiple strategies, called meta-control. Understanding these functions would allow us to build human-aligned RL models.
Collapse
Affiliation(s)
- Jee Hang Lee
- Department of Human-Centered Artificial Intelligence, Sangmyung University, Seoul, South Korea
| | | | - Su Jin An
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea
| | - Sang Wan Lee
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea
- Program of Brain and Cognitive Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea
- KAIST Center for Neuroscience-Inspired Artificial Intelligence, Korea Advanced Institute of Science and Technology, Daejeon, South Korea
- KAIST Institute for Health Science and Technology, Korea Advanced Institute of Science and Technology, Daejeon, South Korea
- KAIST Institute for Artificial Intelligence, Korea Advanced Institute of Science and Technology, Daejeon, South Korea
| |
Collapse
|
13
|
Colas JT, Dundon NM, Gerraty RT, Saragosa‐Harris NM, Szymula KP, Tanwisuth K, Tyszka JM, van Geen C, Ju H, Toga AW, Gold JI, Bassett DS, Hartley CA, Shohamy D, Grafton ST, O'Doherty JP. Reinforcement learning with associative or discriminative generalization across states and actions: fMRI at 3 T and 7 T. Hum Brain Mapp 2022; 43:4750-4790. [PMID: 35860954 PMCID: PMC9491297 DOI: 10.1002/hbm.25988] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 05/20/2022] [Accepted: 06/10/2022] [Indexed: 11/12/2022] Open
Abstract
The model-free algorithms of "reinforcement learning" (RL) have gained clout across disciplines, but so too have model-based alternatives. The present study emphasizes other dimensions of this model space in consideration of associative or discriminative generalization across states and actions. This "generalized reinforcement learning" (GRL) model, a frugal extension of RL, parsimoniously retains the single reward-prediction error (RPE), but the scope of learning goes beyond the experienced state and action. Instead, the generalized RPE is efficiently relayed for bidirectional counterfactual updating of value estimates for other representations. Aided by structural information but as an implicit rather than explicit cognitive map, GRL provided the most precise account of human behavior and individual differences in a reversal-learning task with hierarchical structure that encouraged inverse generalization across both states and actions. Reflecting inference that could be true, false (i.e., overgeneralization), or absent (i.e., undergeneralization), state generalization distinguished those who learned well more so than action generalization. With high-resolution high-field fMRI targeting the dopaminergic midbrain, the GRL model's RPE signals (alongside value and decision signals) were localized within not only the striatum but also the substantia nigra and the ventral tegmental area, including specific effects of generalization that also extend to the hippocampus. Factoring in generalization as a multidimensional process in value-based learning, these findings shed light on complexities that, while challenging classic RL, can still be resolved within the bounds of its core computations.
Collapse
Affiliation(s)
- Jaron T. Colas
- Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
- Computation and Neural Systems Program, California Institute of TechnologyPasadenaCaliforniaUSA
| | - Neil M. Dundon
- Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA
- Department of Child and Adolescent Psychiatry, Psychotherapy, and PsychosomaticsUniversity of FreiburgFreiburg im BreisgauGermany
| | - Raphael T. Gerraty
- Department of PsychologyColumbia UniversityNew YorkNew YorkUSA
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA
- Center for Science and SocietyColumbia UniversityNew YorkNew YorkUSA
| | - Natalie M. Saragosa‐Harris
- Department of PsychologyNew York UniversityNew YorkNew YorkUSA
- Department of PsychologyUniversity of CaliforniaLos AngelesCaliforniaUSA
| | - Karol P. Szymula
- Department of BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Koranis Tanwisuth
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
- Department of PsychologyUniversity of CaliforniaBerkeleyCaliforniaUSA
| | - J. Michael Tyszka
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
| | - Camilla van Geen
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA
- Department of PsychologyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Harang Ju
- Neuroscience Graduate GroupUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Arthur W. Toga
- Laboratory of Neuro ImagingUSC Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Joshua I. Gold
- Department of NeuroscienceUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Dani S. Bassett
- Department of BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of Electrical and Systems EngineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of NeurologyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of PsychiatryUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of Physics and AstronomyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Santa Fe InstituteSanta FeNew MexicoUSA
| | - Catherine A. Hartley
- Department of PsychologyNew York UniversityNew YorkNew YorkUSA
- Center for Neural ScienceNew York UniversityNew YorkNew YorkUSA
| | - Daphna Shohamy
- Department of PsychologyColumbia UniversityNew YorkNew YorkUSA
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA
- Kavli Institute for Brain ScienceColumbia UniversityNew YorkNew YorkUSA
| | - Scott T. Grafton
- Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA
| | - John P. O'Doherty
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
- Computation and Neural Systems Program, California Institute of TechnologyPasadenaCaliforniaUSA
| |
Collapse
|
14
|
Seok D, Tadayonnejad R, Wong WW, O'Neill J, Cockburn J, Bari AA, O'Doherty JP, Feusner JD. Neurocircuit dynamics of arbitration between decision-making strategies across obsessive-compulsive and related disorders. Neuroimage Clin 2022; 35:103073. [PMID: 35689978 PMCID: PMC9192960 DOI: 10.1016/j.nicl.2022.103073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2022] [Revised: 05/11/2022] [Accepted: 05/31/2022] [Indexed: 11/20/2022]
Abstract
Obsessive-compulsive and related disorders (OCRD) include OCD and BDD. Neural differences in decision-making arbitration may underlie OCRD symptoms. Resting-state effective connectivity was used to assess arbitration circuitry. Greater left putamen inhibition via left ventrolateral prefrontal cortex in OCRD. Stronger left putamen inhibition was correlated with less severe symptoms.
Obsessions and compulsions are central components of obsessive–compulsive disorder (OCD) and obsessive–compulsive related disorders such as body dysmorphic disorder (BDD). Compulsive behaviours may result from an imbalance of habitual and goal-directed decision-making strategies. The relationship between these symptoms and the neural circuitry underlying habitual and goal-directed decision-making, and the arbitration between these strategies, remains unknown. This study examined resting state effective connectivity between nodes of these systems in two cohorts with obsessions and compulsions, each compared with their own corresponding healthy controls: OCD (nOCD = 43; nhealthy = 24) and BDD (nBDD = 21; nhealthy = 16). In individuals with OCD, the left ventrolateral prefrontal cortex, a node of the arbitration system, exhibited more inhibitory causal influence over the left posterolateral putamen, a node of the habitual system, compared with controls. Inhibitory causal influence in this connection showed a trend for a similar pattern in individuals with BDD compared with controls. Those with stronger negative connectivity had lower obsession and compulsion severity in both those with OCD and those with BDD. These relationships were not evident within the habitual or goal-directed circuits, nor were they associated with depressive or anxious symptomatology. These results suggest that abnormalities in the arbitration system may represent a shared neural phenotype across these two related disorders that is specific to obsessive–compulsive symptoms. In addition to nosological implications, these results identify potential targets for novel, circuit-specific treatments.
Collapse
Affiliation(s)
- Darsol Seok
- Division of Cognitive Neuroscience, Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, 760 Westwood Plaza, Los Angeles, CA 90024, USA
| | - Reza Tadayonnejad
- Division of Neuromodulation, Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, 760 Westwood Plaza, Los Angeles, CA 90024, USA; Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, 1200 E. California Blvd., Code 228-77, Pasadena, CA 91125, USA
| | - Wan-Wa Wong
- Division of Cognitive Neuroscience, Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, 760 Westwood Plaza, Los Angeles, CA 90024, USA
| | - Joseph O'Neill
- Division of Child and Adolescent Psychiatry, Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, 760 Westwood Plaza, Los Angeles, CA 90024, USA
| | - Jeff Cockburn
- Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, 1200 E. California Blvd., Code 228-77, Pasadena, CA 91125, USA
| | - Ausaf A Bari
- Department of Neurosurgery, David Geffen School of Medicine, University of California, Los Angeles, 10833 Le Conte Ave, Los Angeles, CA 90095, USA
| | - John P O'Doherty
- Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, 1200 E. California Blvd., Code 228-77, Pasadena, CA 91125, USA; Computation & Neural Systems Program, California Institute of Technology, Pasadena, CA, 1200 East California Boulevard, Pasadena, CA 91125, USA
| | - Jamie D Feusner
- Division of Cognitive Neuroscience, Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, 760 Westwood Plaza, Los Angeles, CA 90024, USA; Centre for Addiction and Mental Health, 250 College Street, Toronto, ON M5T 1R8, Canada; Temerty Faculty of Medicine, Department of Psychiatry, University of Toronto, 250 College Street, 8th floor, Toronto, ON M5T 1R8, Canada; Department of Women's and Children's Health, The Karolinska Institute, Tomtebodavägen 18A, 171 77 Solna, Sweden.
| |
Collapse
|
15
|
Grossman CD, Cohen JY. Neuromodulation and Neurophysiology on the Timescale of Learning and Decision-Making. Annu Rev Neurosci 2022; 45:317-337. [PMID: 35363533 DOI: 10.1146/annurev-neuro-092021-125059] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Nervous systems evolved to effectively navigate the dynamics of the environment to achieve their goals. One framework used to study this fundamental problem arose in the study of learning and decision-making. In this framework, the demands of effective behavior require slow dynamics-on the scale of seconds to minutes-of networks of neurons. Here, we review the phenomena and mechanisms involved. Using vignettes from a few species and areas of the nervous system, we view neuromodulators as key substrates for temporal scaling of neuronal dynamics. Expected final online publication date for the Annual Review of Neuroscience, Volume 45 is July 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Cooper D Grossman
- The Solomon H. Snyder Department of Neuroscience, Brain Science Institute, and Kavli Neuroscience Discovery Institute, The Johns Hopkins University School of Medicine, Baltimore, Maryland, USA;
| | - Jeremiah Y Cohen
- The Solomon H. Snyder Department of Neuroscience, Brain Science Institute, and Kavli Neuroscience Discovery Institute, The Johns Hopkins University School of Medicine, Baltimore, Maryland, USA;
| |
Collapse
|
16
|
Averbeck B, O'Doherty JP. Reinforcement-learning in fronto-striatal circuits. Neuropsychopharmacology 2022; 47:147-162. [PMID: 34354249 PMCID: PMC8616931 DOI: 10.1038/s41386-021-01108-0] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 07/06/2021] [Accepted: 07/09/2021] [Indexed: 01/03/2023]
Abstract
We review the current state of knowledge on the computational and neural mechanisms of reinforcement-learning with a particular focus on fronto-striatal circuits. We divide the literature in this area into five broad research themes: the target of the learning-whether it be learning about the value of stimuli or about the value of actions; the nature and complexity of the algorithm used to drive the learning and inference process; how learned values get converted into choices and associated actions; the nature of state representations, and of other cognitive machinery that support the implementation of various reinforcement-learning operations. An emerging fifth area focuses on how the brain allocates or arbitrates control over different reinforcement-learning sub-systems or "experts". We will outline what is known about the role of the prefrontal cortex and striatum in implementing each of these functions. We then conclude by arguing that it will be necessary to build bridges from algorithmic level descriptions of computational reinforcement-learning to implementational level models to better understand how reinforcement-learning emerges from multiple distributed neural networks in the brain.
Collapse
Affiliation(s)
| | - John P O'Doherty
- Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA.
| |
Collapse
|
17
|
Kim D, Jeong J, Lee SW. Prefrontal solution to the bias-variance tradeoff during reinforcement learning. Cell Rep 2021; 37:110185. [PMID: 34965420 DOI: 10.1016/j.celrep.2021.110185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 08/09/2021] [Accepted: 12/07/2021] [Indexed: 11/17/2022] Open
Abstract
Evidence that the brain combines different value learning strategies to minimize prediction error is accumulating. However, the tradeoff between bias and variance error, which imposes different constraints on each learning strategy's performance, poses a challenge for value learning. While this tradeoff specifies the requirements for optimal learning, little has been known about how the brain deals with this issue. Here, we hypothesize that the brain adaptively resolves the bias-variance tradeoff during reinforcement learning. Our theory suggests that the solution necessitates baseline correction for prediction error, which offsets the adverse effects of irreducible error on value learning. We show behavioral evidence of adaptive control using a Markov decision task with context changes. The prediction error baseline seemingly signals context changes to improve adaptability. Critically, we identify multiplexed representations of prediction error baseline within the ventrolateral and ventromedial prefrontal cortex, key brain regions known to guide model-based and model-free reinforcement learning.
Collapse
Affiliation(s)
- Dongjae Kim
- Center for Neural Science, New York University, New York, NY, USA; Department of Psychology, New York University, New York, NY, USA
| | - Jaeseung Jeong
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), 34141 Daejeon, Republic of Korea; Program of Brain and Cognitive Engineering, Korea Advanced Institute of Science and Technology (KAIST), 34141 Daejeon, Republic of Korea
| | - Sang Wan Lee
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), 34141 Daejeon, Republic of Korea; Program of Brain and Cognitive Engineering, Korea Advanced Institute of Science and Technology (KAIST), 34141 Daejeon, Republic of Korea; KAIST Center for Neuroscience-inspired AI, Korea Advanced Institute of Science and Technology (KAIST), 34141 Daejeon, Republic of Korea; KI for Health Science and Technology, Korea Advanced Institute of Science and Technology (KAIST), 34141 Daejeon, Republic of Korea; KI for Artificial Intelligence, Korea Advanced Institute of Science and Technology (KAIST), 34141 Daejeon, Republic of Korea.
| |
Collapse
|
18
|
Ghambaryan A, Gutkin B, Klucharev V, Koechlin E. Additively Combining Utilities and Beliefs: Research Gaps and Algorithmic Developments. Front Neurosci 2021; 15:704728. [PMID: 34658760 PMCID: PMC8517513 DOI: 10.3389/fnins.2021.704728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 09/13/2021] [Indexed: 11/20/2022] Open
Abstract
Value-based decision making in complex environments, such as those with uncertain and volatile mapping of reward probabilities onto options, may engender computational strategies that are not necessarily optimal in terms of normative frameworks but may ensure effective learning and behavioral flexibility in conditions of limited neural computational resources. In this article, we review a suboptimal strategy - additively combining reward magnitude and reward probability attributes of options for value-based decision making. In addition, we present computational intricacies of a recently developed model (named MIX model) representing an algorithmic implementation of the additive strategy in sequential decision-making with two options. We also discuss its opportunities; and conceptual, inferential, and generalization issues. Furthermore, we suggest future studies that will reveal the potential and serve the further development of the MIX model as a general model of value-based choice making.
Collapse
Affiliation(s)
- Anush Ghambaryan
- Centre for Cognition and Decision Making, HSE University, Moscow, Russia
- Ecole Normale Supérieure, PSL Research University, Paris, France
| | - Boris Gutkin
- Centre for Cognition and Decision Making, HSE University, Moscow, Russia
- Ecole Normale Supérieure, PSL Research University, Paris, France
| | - Vasily Klucharev
- Centre for Cognition and Decision Making, HSE University, Moscow, Russia
| | - Etienne Koechlin
- Ecole Normale Supérieure, PSL Research University, Paris, France
| |
Collapse
|