1
|
Barbier-Chebbah A, Vestergaard CL, Masson JB. Approximate information for efficient exploration-exploitation strategies. Phys Rev E 2024; 109:L052105. [PMID: 38907409 DOI: 10.1103/physreve.109.l052105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 01/29/2024] [Indexed: 06/24/2024]
Abstract
This paper addresses the exploration-exploitation dilemma inherent in decision-making, focusing on multiarmed bandit problems. These involve an agent deciding whether to exploit current knowledge for immediate gains or explore new avenues for potential long-term rewards. We here introduce a class of algorithms, approximate information maximization (AIM), which employs a carefully chosen analytical approximation to the gradient of the entropy to choose which arm to pull at each point in time. AIM matches the performance of Thompson sampling, which is known to be asymptotically optimal, as well as that of Infomax from which it derives. AIM thus retains the advantages of Infomax while also offering enhanced computational speed, tractability, and ease of implementation. In particular, we demonstrate how to apply it to a 50-armed bandit game. Its expression is tunable, which allows for specific optimization in various settings, making it possible to surpass the performance of Thompson sampling at short and intermediary times.
Collapse
Affiliation(s)
- Alex Barbier-Chebbah
- Institut Pasteur, Université Paris Cité, CNRS UMR 3571, Decision and Bayesian Computation, 75015 Paris, France
- Épimethée, Inria, 75012 Paris, France
| | - Christian L Vestergaard
- Institut Pasteur, Université Paris Cité, CNRS UMR 3571, Decision and Bayesian Computation, 75015 Paris, France
- Épimethée, Inria, 75012 Paris, France
| | - Jean-Baptiste Masson
- Institut Pasteur, Université Paris Cité, CNRS UMR 3571, Decision and Bayesian Computation, 75015 Paris, France
- Épimethée, Inria, 75012 Paris, France
| |
Collapse
|
2
|
Hodson R, Mehta M, Smith R. The empirical status of predictive coding and active inference. Neurosci Biobehav Rev 2024; 157:105473. [PMID: 38030100 DOI: 10.1016/j.neubiorev.2023.105473] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 10/27/2023] [Accepted: 11/16/2023] [Indexed: 12/01/2023]
Abstract
Research on predictive processing models has focused largely on two specific algorithmic theories: Predictive Coding for perception and Active Inference for decision-making. While these interconnected theories possess broad explanatory potential, they have only recently begun to receive direct empirical evaluation. Here, we review recent studies of Predictive Coding and Active Inference with a focus on evaluating the degree to which they are empirically supported. For Predictive Coding, we find that existing empirical evidence offers modest support. However, some positive results can also be explained by alternative feedforward (e.g., feature detection-based) models. For Active Inference, most empirical studies have focused on fitting these models to behavior as a means of identifying and explaining individual or group differences. While Active Inference models tend to explain behavioral data reasonably well, there has not been a focus on testing empirical validity of active inference theory per se, which would require formal comparison to other models (e.g., non-Bayesian or model-free reinforcement learning models). This review suggests that, while promising, a number of specific research directions are still necessary to evaluate the empirical adequacy and explanatory power of these algorithms.
Collapse
Affiliation(s)
| | | | - Ryan Smith
- Laureate Institute for Brain Research, USA.
| |
Collapse
|
3
|
Sandhu TR, Xiao B, Lawson RP. Transdiagnostic computations of uncertainty: towards a new lens on intolerance of uncertainty. Neurosci Biobehav Rev 2023; 148:105123. [PMID: 36914079 DOI: 10.1016/j.neubiorev.2023.105123] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 02/21/2023] [Accepted: 03/08/2023] [Indexed: 03/13/2023]
Abstract
People radically differ in how they cope with uncertainty. Clinical researchers describe a dispositional characteristic known as "intolerance of uncertainty", a tendency to find uncertainty aversive, reported to be elevated across psychiatric and neurodevelopmental conditions. Concurrently, recent research in computational psychiatry has leveraged theoretical work to characterise individual differences in uncertainty processing. Under this framework, differences in how people estimate different forms of uncertainty can contribute to mental health difficulties. In this review, we briefly outline the concept of intolerance of uncertainty within its clinical context, and we argue that the mechanisms underlying this construct may be further elucidated through modelling how individuals make inferences about uncertainty. We will review the evidence linking psychopathology to different computationally specified forms of uncertainty and consider how these findings might suggest distinct mechanistic routes towards intolerance of uncertainty. We also discuss the implications of this computational approach for behavioural and pharmacological interventions, as well as the importance of different cognitive domains and subjective experiences in studying uncertainty processing.
Collapse
Affiliation(s)
- Timothy R Sandhu
- Department of Psychology, Downing Place, University of Cambridge, CB2 3EB, UK; MRC Cognition and Brain Sciences Unit, 15 Chaucer Road, CB2 7EF, UK.
| | - Bowen Xiao
- Department of Psychology, Downing Place, University of Cambridge, CB2 3EB, UK
| | - Rebecca P Lawson
- Department of Psychology, Downing Place, University of Cambridge, CB2 3EB, UK; MRC Cognition and Brain Sciences Unit, 15 Chaucer Road, CB2 7EF, UK
| |
Collapse
|
4
|
Elwood A, Leonardi M, Mohamed A, Rozza A. Maximum Entropy Exploration in Contextual Bandits with Neural Networks and Energy Based Models. ENTROPY (BASEL, SWITZERLAND) 2023; 25:188. [PMID: 36832555 PMCID: PMC9955972 DOI: 10.3390/e25020188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 01/12/2023] [Accepted: 01/15/2023] [Indexed: 06/18/2023]
Abstract
Contextual bandits can solve a huge range of real-world problems. However, current popular algorithms to solve them either rely on linear models or unreliable uncertainty estimation in non-linear models, which are required to deal with the exploration-exploitation trade-off. Inspired by theories of human cognition, we introduce novel techniques that use maximum entropy exploration, relying on neural networks to find optimal policies in settings with both continuous and discrete action spaces. We present two classes of models, one with neural networks as reward estimators, and the other with energy based models, which model the probability of obtaining an optimal reward given an action. We evaluate the performance of these models in static and dynamic contextual bandit simulation environments. We show that both techniques outperform standard baseline algorithms, such as NN HMC, NN Discrete, Upper Confidence Bound, and Thompson Sampling, where energy based models have the best overall performance. This provides practitioners with new techniques that perform well in static and dynamic settings, and are particularly well suited to non-linear scenarios with continuous action spaces.
Collapse
|
5
|
Doya K, Friston K, Sugiyama M, Tenenbaum J. Neural Networks special issue on Artificial Intelligence and Brain Science. Neural Netw 2022; 155:328-329. [PMID: 36099665 DOI: 10.1016/j.neunet.2022.08.018] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Kenji Doya
- Okinawa Institute of Science and Technology Graduate University, Japan.
| | | | | | - Josh Tenenbaum
- Massachusetts Institute of Technology, United States of America
| |
Collapse
|
6
|
Gijsen S, Grundei M, Blankenburg F. Active inference and the two-step task. Sci Rep 2022; 12:17682. [PMID: 36271279 PMCID: PMC9586964 DOI: 10.1038/s41598-022-21766-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 09/30/2022] [Indexed: 01/18/2023] Open
Abstract
Sequential decision problems distill important challenges frequently faced by humans. Through repeated interactions with an uncertain world, unknown statistics need to be learned while balancing exploration and exploitation. Reinforcement learning is a prominent method for modeling such behaviour, with a prevalent application being the two-step task. However, recent studies indicate that the standard reinforcement learning model sometimes describes features of human task behaviour inaccurately and incompletely. We investigated whether active inference, a framework proposing a trade-off to the exploration-exploitation dilemma, could better describe human behaviour. Therefore, we re-analysed four publicly available datasets of the two-step task, performed Bayesian model selection, and compared behavioural model predictions. Two datasets, which revealed more model-based inference and behaviour indicative of directed exploration, were better described by active inference, while the models scored similarly for the remaining datasets. Learning using probability distributions appears to contribute to the improved model fits. Further, approximately half of all participants showed sensitivity to information gain as formulated under active inference, although behavioural exploration effects were not fully captured. These results contribute to the empirical validation of active inference as a model of human behaviour and the study of alternative models for the influential two-step task.
Collapse
Affiliation(s)
- Sam Gijsen
- grid.14095.390000 0000 9116 4836Neurocomputation and Neuroimaging Unit, Freie Universität Berlin, 14195 Berlin, Germany ,grid.7468.d0000 0001 2248 7639Berlin School of Mind and Brain, Humboldt-Universität zu Berlin, 10117 Berlin, Germany
| | - Miro Grundei
- grid.14095.390000 0000 9116 4836Neurocomputation and Neuroimaging Unit, Freie Universität Berlin, 14195 Berlin, Germany ,grid.7468.d0000 0001 2248 7639Berlin School of Mind and Brain, Humboldt-Universität zu Berlin, 10117 Berlin, Germany
| | - Felix Blankenburg
- grid.14095.390000 0000 9116 4836Neurocomputation and Neuroimaging Unit, Freie Universität Berlin, 14195 Berlin, Germany ,grid.7468.d0000 0001 2248 7639Berlin School of Mind and Brain, Humboldt-Universität zu Berlin, 10117 Berlin, Germany
| |
Collapse
|
7
|
Marković D, Reiter AMF, Kiebel SJ. Revealing human sensitivity to a latent temporal structure of changes. Front Behav Neurosci 2022; 16:962494. [PMID: 36325156 PMCID: PMC9621332 DOI: 10.3389/fnbeh.2022.962494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 09/26/2022] [Indexed: 11/29/2022] Open
Abstract
Precisely timed behavior and accurate time perception plays a critical role in our everyday lives, as our wellbeing and even survival can depend on well-timed decisions. Although the temporal structure of the world around us is essential for human decision making, we know surprisingly little about how representation of temporal structure of our everyday environment impacts decision making. How does the representation of temporal structure affect our ability to generate well-timed decisions? Here we address this question by using a well-established dynamic probabilistic learning task. Using computational modeling, we found that human subjects' beliefs about temporal structure are reflected in their choices to either exploit their current knowledge or to explore novel options. The model-based analysis illustrates a large within-group and within-subject heterogeneity. To explain these results, we propose a normative model for how temporal structure is used in decision making, based on the semi-Markov formalism in the active inference framework. We discuss potential key applications of the presented approach to the fields of cognitive phenotyping and computational psychiatry.
Collapse
Affiliation(s)
- Dimitrije Marković
- Department of Psychology, Technische Universität Dresden, Dresden, Germany
- *Correspondence: Dimitrije Marković
| | - Andrea M. F. Reiter
- Department of Psychology, Technische Universität Dresden, Dresden, Germany
- Department of Child and Adolescence Psychiatry, Psychosomatics and Psychotherapy, Centre of Mental Health, University Hospital Würzburg, Würzburg, Germany
- German Center of Prevention Research on Mental Health, Julius-Maximilians Universität Würzburg, Würzburg, Germany
| | - Stefan J. Kiebel
- Department of Psychology, Technische Universität Dresden, Dresden, Germany
- Centre for Tactile Internet with Human-in-the-Loop (CeTI), Technische Universität Dresden, Dresden, Germany
| |
Collapse
|
8
|
How Active Inference Could Help Revolutionise Robotics. ENTROPY 2022; 24:e24030361. [PMID: 35327872 PMCID: PMC8946999 DOI: 10.3390/e24030361] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 02/24/2022] [Accepted: 02/28/2022] [Indexed: 02/05/2023]
Abstract
Recent advances in neuroscience have characterised brain function using mathematical formalisms and first principles that may be usefully applied elsewhere. In this paper, we explain how active inference—a well-known description of sentient behaviour from neuroscience—can be exploited in robotics. In short, active inference leverages the processes thought to underwrite human behaviour to build effective autonomous systems. These systems show state-of-the-art performance in several robotics settings; we highlight these and explain how this framework may be used to advance robotics.
Collapse
|
9
|
Champion T, Grześ M, Bowman H. Realizing Active Inference in Variational Message Passing: The Outcome-Blind Certainty Seeker. Neural Comput 2021; 33:2762-2826. [PMID: 34280302 DOI: 10.1162/neco_a_01422] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 04/20/2021] [Indexed: 11/04/2022]
Abstract
Active inference is a state-of-the-art framework in neuroscience that offers a unified theory of brain function. It is also proposed as a framework for planning in AI. Unfortunately, the complex mathematics required to create new models can impede application of active inference in neuroscience and AI research. This letter addresses this problem by providing a complete mathematical treatment of the active inference framework in discrete time and state spaces and the derivation of the update equations for any new model. We leverage the theoretical connection between active inference and variational message passing as described by John Winn and Christopher M. Bishop in 2005. Since variational message passing is a well-defined methodology for deriving Bayesian belief update equations, this letter opens the door to advanced generative models for active inference. We show that using a fully factorized variational distribution simplifies the expected free energy, which furnishes priors over policies so that agents seek unambiguous states. Finally, we consider future extensions that support deep tree searches for sequential policy optimization based on structure learning and belief propagation.
Collapse
Affiliation(s)
| | - Marek Grześ
- University of Kent, School of Computing, Canterbury CT2 7NZ, U.K.
| | - Howard Bowman
- University of Birmingham, School of Psychology, Birmingham B15 2TT, U.K., and University of Kent, School of Computing, Canterbury CT2 7NZ, U.K.
| |
Collapse
|
10
|
van de Laar T, Wymeersch H, Şenöz İ, Özçelikkale A. Chance-Constrained Active Inference. Neural Comput 2021; 33:2710-2735. [PMID: 34280254 DOI: 10.1162/neco_a_01427] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Accepted: 05/03/2021] [Indexed: 11/04/2022]
Abstract
Active inference (ActInf) is an emerging theory that explains perception and action in biological agents in terms of minimizing a free energy bound on Bayesian surprise. Goal-directed behavior is elicited by introducing prior beliefs on the underlying generative model. In contrast to prior beliefs, which constrain all realizations of a random variable, we propose an alternative approach through chance constraints, which allow for a (typically small) probability of constraint violation, and demonstrate how such constraints can be used as intrinsic drivers for goal-directed behavior in ActInf. We illustrate how chance-constrained ActInf weights all imposed (prior) constraints on the generative model, allowing, for example, for a trade-off between robust control and empirical chance constraint violation. Second, we interpret the proposed solution within a message passing framework. Interestingly, the message passing interpretation is not only relevant to the context of ActInf, but also provides a general-purpose approach that can account for chance constraints on graphical models. The chance constraint message updates can then be readily combined with other prederived message update rules without the need for custom derivations. The proposed chance-constrained message passing framework thus accelerates the search for workable models in general and can be used to complement message-passing formulations on generative neural models.
Collapse
Affiliation(s)
- Thijs van de Laar
- Eindhoven University of Technology, 5612 AP, Eindhoven, The Netherlands
| | - Henk Wymeersch
- Chalmers University of Technology, 41296, Gothenburg, Sweden
| | - İsmail Şenöz
- Eindhoven University of Technology, 5612 AP, Eindhoven, The Netherlands
| | | |
Collapse
|