1
|
Malekzadeh P, Plataniotis KN. Active Inference and Reinforcement Learning: A Unified Inference on Continuous State and Action Spaces Under Partial Observability. Neural Comput 2024; 36:2073-2135. [PMID: 39177966 DOI: 10.1162/neco_a_01698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 05/28/2024] [Indexed: 08/24/2024]
Abstract
Reinforcement learning (RL) has garnered significant attention for developing decision-making agents that aim to maximize rewards, specified by an external supervisor, within fully observable environments. However, many real-world problems involve partial or noisy observations, where agents cannot access complete and accurate information about the environment. These problems are commonly formulated as partially observable Markov decision processes (POMDPs). Previous studies have tackled RL in POMDPs by either incorporating the memory of past actions and observations or by inferring the true state of the environment from observed data. Nevertheless, aggregating observations and actions over time becomes impractical in problems with large decision-making time horizons and high-dimensional spaces. Furthermore, inference-based RL approaches often require many environmental samples to perform well, as they focus solely on reward maximization and neglect uncertainty in the inferred state. Active inference (AIF) is a framework naturally formulated in POMDPs and directs agents to select actions by minimizing a function called expected free energy (EFE). This supplies reward-maximizing (or exploitative) behavior, as in RL, with information-seeking (or exploratory) behavior. Despite this exploratory behavior of AIF, its use is limited to problems with small time horizons and discrete spaces due to the computational challenges associated with EFE. In this article, we propose a unified principle that establishes a theoretical connection between AIF and RL, enabling seamless integration of these two approaches and overcoming their limitations in continuous space POMDP settings. We substantiate our findings with rigorous theoretical analysis, providing novel perspectives for using AIF in designing and implementing artificial agents. Experimental results demonstrate the superior learning capabilities of our method compared to other alternative RL approaches in solving partially observable tasks with continuous spaces. Notably, our approach harnesses information-seeking exploration, enabling it to effectively solve reward-free problems and rendering explicit task reward design by an external supervisor optional.
Collapse
Affiliation(s)
- Parvin Malekzadeh
- Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, M5S 3G8, Canada
| | - Konstantinos N Plataniotis
- Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, M5S 3G8, Canada
| |
Collapse
|
2
|
Paul A, Isomura T, Razi A. On Predictive Planning and Counterfactual Learning in Active Inference. ENTROPY (BASEL, SWITZERLAND) 2024; 26:484. [PMID: 38920492 PMCID: PMC11202763 DOI: 10.3390/e26060484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 05/27/2024] [Accepted: 05/28/2024] [Indexed: 06/27/2024]
Abstract
Given the rapid advancement of artificial intelligence, understanding the foundations of intelligent behaviour is increasingly important. Active inference, regarded as a general theory of behaviour, offers a principled approach to probing the basis of sophistication in planning and decision-making. This paper examines two decision-making schemes in active inference based on "planning" and "learning from experience". Furthermore, we also introduce a mixed model that navigates the data complexity trade-off between these strategies, leveraging the strengths of both to facilitate balanced decision-making. We evaluate our proposed model in a challenging grid-world scenario that requires adaptability from the agent. Additionally, our model provides the opportunity to analyse the evolution of various parameters, offering valuable insights and contributing to an explainable framework for intelligent decision-making.
Collapse
Affiliation(s)
- Aswin Paul
- Turner Institute for Brain and Mental Health, School of Psychological Sciences, Monash University, Clayton 3800, Australia;
- IITB-Monash Research Academy, Mumbai 400076, India
- Department of Electrical Engineering, IIT Bombay, Mumbai 400076 , India
| | - Takuya Isomura
- Brain Intelligence Theory Unit, RIKEN Center for Brain Science, Wako, Saitama 351-0106, Japan;
| | - Adeel Razi
- Turner Institute for Brain and Mental Health, School of Psychological Sciences, Monash University, Clayton 3800, Australia;
- Wellcome Trust Centre for Human Neuroimaging, University College London, London WC1N 3AR, UK
- CIFAR Azrieli Global Scholars Program, CIFAR, Toronto, ON M5G 1M1, Canada
| |
Collapse
|
3
|
Matsumura T, Esaki K, Yang S, Yoshimura C, Mizuno H. Active Inference With Empathy Mechanism for Socially Behaved Artificial Agents in Diverse Situations. ARTIFICIAL LIFE 2024; 30:277-297. [PMID: 38018026 DOI: 10.1162/artl_a_00416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
This article proposes a method for an artificial agent to behave in a social manner. Although defining proper social behavior is difficult because it differs from situation to situation, the agent following the proposed method adaptively behaves appropriately in each situation by empathizing with the surrounding others. The proposed method is achieved by incorporating empathy into active inference. We evaluated the proposed method regarding control of autonomous mobile robots in diverse situations. From the evaluation results, an agent controlled by the proposed method could behave more adaptively socially than an agent controlled by the standard active inference in the diverse situations. In the case of two agents, the agent controlled with the proposed method behaved in a social way that reduced the other agent's travel distance by 13.7% and increased the margin between the agents by 25.8%, even though it increased the agent's travel distance by 8.2%. Also, the agent controlled with the proposed method behaved more socially when it was surrounded by altruistic others but less socially when it was surrounded by selfish others.
Collapse
|
4
|
Zhang Z, Xu F. An Overview of the Free Energy Principle and Related Research. Neural Comput 2024; 36:963-1021. [PMID: 38457757 DOI: 10.1162/neco_a_01642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 11/20/2023] [Indexed: 03/10/2024]
Abstract
The free energy principle and its corollary, the active inference framework, serve as theoretical foundations in the domain of neuroscience, explaining the genesis of intelligent behavior. This principle states that the processes of perception, learning, and decision making-within an agent-are all driven by the objective of "minimizing free energy," evincing the following behaviors: learning and employing a generative model of the environment to interpret observations, thereby achieving perception, and selecting actions to maintain a stable preferred state and minimize the uncertainty about the environment, thereby achieving decision making. This fundamental principle can be used to explain how the brain processes perceptual information, learns about the environment, and selects actions. Two pivotal tenets are that the agent employs a generative model for perception and planning and that interaction with the world (and other agents) enhances the performance of the generative model and augments perception. With the evolution of control theory and deep learning tools, agents based on the FEP have been instantiated in various ways across different domains, guiding the design of a multitude of generative models and decision-making algorithms. This letter first introduces the basic concepts of the FEP, followed by its historical development and connections with other theories of intelligence, and then delves into the specific application of the FEP to perception and decision making, encompassing both low-dimensional simple situations and high-dimensional complex situations. It compares the FEP with model-based reinforcement learning to show that the FEP provides a better objective function. We illustrate this using numerical studies of Dreamer3 by adding expected information gain into the standard objective function. In a complementary fashion, existing reinforcement learning, and deep learning algorithms can also help implement the FEP-based agents. Finally, we discuss the various capabilities that agents need to possess in complex environments and state that the FEP can aid agents in acquiring these capabilities.
Collapse
Affiliation(s)
- Zhengquan Zhang
- Key Laboratory of Information Science of Electromagnetic Waves, Fudan University, Shanghai, P.R.C.
| | - Feng Xu
- Key Laboratory of Information Science of Electromagnetic Waves, Fudan University, Shanghai, P.R.C.
| |
Collapse
|
5
|
Matsumoto T, Ohata W, Tani J. Incremental Learning of Goal-Directed Actions in a Dynamic Environment by a Robot Using Active Inference. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1506. [PMID: 37998198 PMCID: PMC10670890 DOI: 10.3390/e25111506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 10/19/2023] [Accepted: 10/27/2023] [Indexed: 11/25/2023]
Abstract
This study investigated how a physical robot can adapt goal-directed actions in dynamically changing environments, in real-time, using an active inference-based approach with incremental learning from human tutoring examples. Using our active inference-based model, while good generalization can be achieved with appropriate parameters, when faced with sudden, large changes in the environment, a human may have to intervene to correct actions of the robot in order to reach the goal, as a caregiver might guide the hands of a child performing an unfamiliar task. In order for the robot to learn from the human tutor, we propose a new scheme to accomplish incremental learning from these proprioceptive-exteroceptive experiences combined with mental rehearsal of past experiences. Our experimental results demonstrate that using only a few tutoring examples, the robot using our model was able to significantly improve its performance on new tasks without catastrophic forgetting of previously learned tasks.
Collapse
Affiliation(s)
| | | | - Jun Tani
- Cognitive Neurorobotics Research Unit, Okinawa Institute of Science and Technology, Okinawa 904-0495, Japan; (T.M.); (W.O.)
| |
Collapse
|
6
|
Laukkonen RE, Webb M, Salvi C, Tangen JM, Slagter HA, Schooler JW. Insight and the selection of ideas. Neurosci Biobehav Rev 2023; 153:105363. [PMID: 37598874 DOI: 10.1016/j.neubiorev.2023.105363] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 06/19/2023] [Accepted: 08/15/2023] [Indexed: 08/22/2023]
Abstract
Perhaps it is no accident that insight moments accompany some of humanity's most important discoveries in science, medicine, and art. Here we propose that feelings of insight play a central role in (heuristically) selecting an idea from the stream of consciousness by capturing attention and eliciting a sense of intuitive confidence permitting fast action under uncertainty. The mechanisms underlying this Eureka heuristic are explained within an active inference framework. First, implicit restructuring via Bayesian reduction leads to a higher-order prediction error (i.e., the content of insight). Second, dopaminergic precision-weighting of the prediction error accounts for the intuitive confidence, pleasure, and attentional capture (i.e., the feeling of insight). This insight as precision account is consistent with the phenomenology, accuracy, and neural unfolding of insight, as well as its effects on belief and decision-making. We conclude by reflecting on dangers of the Eureka Heuristic, including the arising and entrenchment of false beliefs and the vulnerability of insights under psychoactive substances and misinformation.
Collapse
|
7
|
Bolis D, Dumas G, Schilbach L. Interpersonal attunement in social interactions: from collective psychophysiology to inter-personalized psychiatry and beyond. Philos Trans R Soc Lond B Biol Sci 2023; 378:20210365. [PMID: 36571122 PMCID: PMC9791489 DOI: 10.1098/rstb.2021.0365] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
In this article, we analyse social interactions, drawing on diverse points of views, ranging from dialectics, second-person neuroscience and enactivism to dynamical systems, active inference and machine learning. To this end, we define interpersonal attunement as a set of multi-scale processes of building up and materializing social expectations-put simply, anticipating and interacting with others and ourselves. While cultivating and negotiating common ground, via communication and culture-building activities, are indispensable for the survival of the individual, the relevant multi-scale mechanisms have been largely considered in isolation. Here, collective psychophysiology, we argue, can lend itself to the fine-tuned analysis of social interactions, without neglecting the individual. On the other hand, an interpersonal mismatch of expectations can lead to a breakdown of communication and social isolation known to negatively affect mental health. In this regard, we review psychopathology in terms of interpersonal misattunement, conceptualizing psychiatric disorders as disorders of social interaction, to describe how individual mental health is inextricably linked to social interaction. By doing so, we foresee avenues for an inter-personalized psychiatry, which moves from a static spectrum of disorders to a dynamic relational space, focusing on how the multi-faceted processes of social interaction can help to promote mental health. This article is part of the theme issue 'Concepts in interaction: social engagement and inner experiences'.
Collapse
Affiliation(s)
- Dimitris Bolis
- Independent Max Planck Research Group for Social Neuroscience, Max Planck Institute of Psychiatry, Kraepelinstrasse 2–10, Muenchen-Schwabing 80804, Germany,Centre for Philosophy of Science, University of Lisbon, Campo Grande, 1749-016 Lisbon, Portugal,Department of System Neuroscience, National Institute for Physiological Sciences (NIPS), Okazaki 444-0867, Japan
| | - Guillaume Dumas
- Precision Psychiatry and Social Physiology Laboratory, CHU Ste-Justine Research Center, Department of Psychiatry, University of Montreal, Quebec, Canada H3T 1J4,Mila - Quebec AI Institute, University of Montreal, Quebec, Canada H2S 3H1,Culture Mind and Brain Program, Department of Psychiatry, McGill University, Montreal, Quebec, Canada H3A 1A1
| | - Leonhard Schilbach
- Independent Max Planck Research Group for Social Neuroscience, Max Planck Institute of Psychiatry, Kraepelinstrasse 2–10, Muenchen-Schwabing 80804, Germany,Department of Psychiatry and Psychotherapy, University Hospital, Ludwig Maximilians Universität, Munich 40629, Germany,Department of General Psychiatry 2, LVR-Klinikum Düsseldorf, Düsseldorf 80336, Germany
| |
Collapse
|
8
|
Yang Z, Diaz GJ, Fajen BR, Bailey R, Ororbia AG. A neural active inference model of perceptual-motor learning. Front Comput Neurosci 2023; 17:1099593. [PMID: 36890967 PMCID: PMC9986490 DOI: 10.3389/fncom.2023.1099593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 01/30/2023] [Indexed: 02/22/2023] Open
Abstract
The active inference framework (AIF) is a promising new computational framework grounded in contemporary neuroscience that can produce human-like behavior through reward-based learning. In this study, we test the ability for the AIF to capture the role of anticipation in the visual guidance of action in humans through the systematic investigation of a visual-motor task that has been well-explored-that of intercepting a target moving over a ground plane. Previous research demonstrated that humans performing this task resorted to anticipatory changes in speed intended to compensate for semi-predictable changes in target speed later in the approach. To capture this behavior, our proposed "neural" AIF agent uses artificial neural networks to select actions on the basis of a very short term prediction of the information about the task environment that these actions would reveal along with a long-term estimate of the resulting cumulative expected free energy. Systematic variation revealed that anticipatory behavior emerged only when required by limitations on the agent's movement capabilities, and only when the agent was able to estimate accumulated free energy over sufficiently long durations into the future. In addition, we present a novel formulation of the prior mapping function that maps a multi-dimensional world-state to a uni-dimensional distribution of free-energy/reward. Together, these results demonstrate the use of AIF as a plausible model of anticipatory visually guided behavior in humans.
Collapse
Affiliation(s)
- Zhizhuo Yang
- Golisano College of Computing and Information Sciences, Rochester Institute of Technology, Rochester, NY, United States
| | - Gabriel J Diaz
- Chester F. Carlson Center for Imaging Science, Rochester Institute of Technology, Rochester, NY, United States
| | - Brett R Fajen
- Department of Cognitive Science, Rensselaer Polytechnic Institute, Troy, NY, United States
| | - Reynold Bailey
- Golisano College of Computing and Information Sciences, Rochester Institute of Technology, Rochester, NY, United States
| | - Alexander G Ororbia
- Golisano College of Computing and Information Sciences, Rochester Institute of Technology, Rochester, NY, United States
| |
Collapse
|
9
|
Shin JY, Kim C, Hwang HJ. Prior preference learning from experts: Designing a reward with active inference. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.12.042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
10
|
Kuchling F, Fields C, Levin M. Metacognition as a Consequence of Competing Evolutionary Time Scales. ENTROPY (BASEL, SWITZERLAND) 2022; 24:601. [PMID: 35626486 PMCID: PMC9141326 DOI: 10.3390/e24050601] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 04/15/2022] [Accepted: 04/19/2022] [Indexed: 12/24/2022]
Abstract
Evolution is full of coevolving systems characterized by complex spatio-temporal interactions that lead to intertwined processes of adaptation. Yet, how adaptation across multiple levels of temporal scales and biological complexity is achieved remains unclear. Here, we formalize how evolutionary multi-scale processing underlying adaptation constitutes a form of metacognition flowing from definitions of metaprocessing in machine learning. We show (1) how the evolution of metacognitive systems can be expected when fitness landscapes vary on multiple time scales, and (2) how multiple time scales emerge during coevolutionary processes of sufficiently complex interactions. After defining a metaprocessor as a regulator with local memory, we prove that metacognition is more energetically efficient than purely object-level cognition when selection operates at multiple timescales in evolution. Furthermore, we show that existing modeling approaches to coadaptation and coevolution-here active inference networks, predator-prey interactions, coupled genetic algorithms, and generative adversarial networks-lead to multiple emergent timescales underlying forms of metacognition. Lastly, we show how coarse-grained structures emerge naturally in any resource-limited system, providing sufficient evidence for metacognitive systems to be a prevalent and vital component of (co-)evolution. Therefore, multi-scale processing is a necessary requirement for many evolutionary scenarios, leading to de facto metacognitive evolutionary outcomes.
Collapse
Affiliation(s)
- Franz Kuchling
- Department of Biology, Allen Discovery Center at Tufts University, Medford, MA 02155, USA;
| | - Chris Fields
- 23 Rue des Lavandières, 11160 Caunes Minervois, France;
| | - Michael Levin
- Department of Biology, Allen Discovery Center at Tufts University, Medford, MA 02155, USA;
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02138, USA
| |
Collapse
|
11
|
van de Laar T, Koudahl M, van Erp B, de Vries B. Active Inference and Epistemic Value in Graphical Models. Front Robot AI 2022; 9:794464. [PMID: 35462780 PMCID: PMC9019474 DOI: 10.3389/frobt.2022.794464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 01/27/2022] [Indexed: 11/29/2022] Open
Abstract
The Free Energy Principle (FEP) postulates that biological agents perceive and interact with their environment in order to minimize a Variational Free Energy (VFE) with respect to a generative model of their environment. The inference of a policy (future control sequence) according to the FEP is known as Active Inference (AIF). The AIF literature describes multiple VFE objectives for policy planning that lead to epistemic (information-seeking) behavior. However, most objectives have limited modeling flexibility. This paper approaches epistemic behavior from a constrained Bethe Free Energy (CBFE) perspective. Crucially, variational optimization of the CBFE can be expressed in terms of message passing on free-form generative models. The key intuition behind the CBFE is that we impose a point-mass constraint on predicted outcomes, which explicitly encodes the assumption that the agent will make observations in the future. We interpret the CBFE objective in terms of its constituent behavioral drives. We then illustrate resulting behavior of the CBFE by planning and interacting with a simulated T-maze environment. Simulations for the T-maze task illustrate how the CBFE agent exhibits an epistemic drive, and actively plans ahead to account for the impact of predicted outcomes. Compared to an EFE agent, the CBFE agent incurs expected reward in significantly more environmental scenarios. We conclude that CBFE optimization by message passing suggests a general mechanism for epistemic-aware AIF in free-form generative models.
Collapse
Affiliation(s)
- Thijs van de Laar
- Department of Electrical Engineering, Eindhoven University of Technology, Eindhoven, Netherlands
- *Correspondence: Thijs van de Laar,
| | - Magnus Koudahl
- Department of Electrical Engineering, Eindhoven University of Technology, Eindhoven, Netherlands
- Nested Minds Network Ltd., Liverpool, United Kingdom
| | - Bart van Erp
- Department of Electrical Engineering, Eindhoven University of Technology, Eindhoven, Netherlands
| | - Bert de Vries
- Department of Electrical Engineering, Eindhoven University of Technology, Eindhoven, Netherlands
- GN Hearing Benelux BV, Eindhoven, Netherlands
| |
Collapse
|
12
|
Wauthier ST, De Boom C, Çatal O, Verbelen T, Dhoedt B. Model Reduction Through Progressive Latent Space Pruning in Deep Active Inference. Front Neurorobot 2022; 16:795846. [PMID: 35360827 PMCID: PMC8961807 DOI: 10.3389/fnbot.2022.795846] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 02/14/2022] [Indexed: 11/17/2022] Open
Abstract
Although still not fully understood, sleep is known to play an important role in learning and in pruning synaptic connections. From the active inference perspective, this can be cast as learning parameters of a generative model and Bayesian model reduction, respectively. In this article, we show how to reduce dimensionality of the latent space of such a generative model, and hence model complexity, in deep active inference during training through a similar process. While deep active inference uses deep neural networks for state space construction, an issue remains in that the dimensionality of the latent space must be specified beforehand. We investigate two methods that are able to prune the latent space of deep active inference models. The first approach functions similar to sleep and performs model reduction post hoc. The second approach is a novel method which is more similar to reflection, operates during training and displays “aha” moments when the model is able to reduce latent space dimensionality. We show for two well-known simulated environments that model performance is retained in the first approach and only diminishes slightly in the second approach. We also show that reconstructions from a real world example are indistinguishable before and after reduction. We conclude that the most important difference constitutes a trade-off between training time and model performance in terms of accuracy and the ability to generalize, via minimization of model complexity.
Collapse
|
13
|
Mazzaglia P, Verbelen T, Çatal O, Dhoedt B. The Free Energy Principle for Perception and Action: A Deep Learning Perspective. ENTROPY (BASEL, SWITZERLAND) 2022; 24:301. [PMID: 35205595 PMCID: PMC8871280 DOI: 10.3390/e24020301] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 02/14/2022] [Accepted: 02/15/2022] [Indexed: 02/05/2023]
Abstract
The free energy principle, and its corollary active inference, constitute a bio-inspired theory that assumes biological agents act to remain in a restricted set of preferred states of the world, i.e., they minimize their free energy. Under this principle, biological agents learn a generative model of the world and plan actions in the future that will maintain the agent in an homeostatic state that satisfies its preferences. This framework lends itself to being realized in silico, as it comprehends important aspects that make it computationally affordable, such as variational inference and amortized planning. In this work, we investigate the tool of deep learning to design and realize artificial agents based on active inference, presenting a deep-learning oriented presentation of the free energy principle, surveying works that are relevant in both machine learning and active inference areas, and discussing the design choices that are involved in the implementation process. This manuscript probes newer perspectives for the active inference framework, grounding its theoretical aspects into more pragmatic affairs, offering a practical guide to active inference newcomers and a starting point for deep learning practitioners that would like to investigate implementations of the free energy principle.
Collapse
Affiliation(s)
- Pietro Mazzaglia
- IDLab, Ghent University, 9052 Gent, Belgium; (T.V.); (O.Ç.); (B.D.)
| | | | | | | |
Collapse
|
14
|
Da Costa L, Friston K, Heins C, Pavliotis GA. Bayesian mechanics for stationary processes. Proc Math Phys Eng Sci 2022; 477:20210518. [PMID: 35153603 PMCID: PMC8652275 DOI: 10.1098/rspa.2021.0518] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Accepted: 10/27/2021] [Indexed: 01/02/2023] Open
Abstract
This paper develops a Bayesian mechanics for adaptive systems. Firstly, we model the interface between a system and its environment with a Markov blanket. This affords conditions under which states internal to the blanket encode information about external states. Second, we introduce dynamics and represent adaptive systems as Markov blankets at steady state. This allows us to identify a wide class of systems whose internal states appear to infer external states, consistent with variational inference in Bayesian statistics and theoretical neuroscience. Finally, we partition the blanket into sensory and active states. It follows that active states can be seen as performing active inference and well-known forms of stochastic control (such as PID control), which are prominent formulations of adaptive behaviour in theoretical biology and engineering.
Collapse
Affiliation(s)
- Lancelot Da Costa
- Department of Mathematics, Imperial College London, London SW7 2AZ, UK.,Wellcome Centre for Human Neuroimaging, University College London, London WC1N 3AR, UK
| | - Karl Friston
- Wellcome Centre for Human Neuroimaging, University College London, London WC1N 3AR, UK
| | - Conor Heins
- Department of Collective Behaviour, Max Planck Institute of Animal Behavior, Konstanz D-78457, Germany.,Centre for the Advanced Study of Collective Behaviour, University of Konstanz, Konstanz D-78457, Germany.,Department of Biology, University of Konstanz, Konstanz D-78457, Germany
| | | |
Collapse
|
15
|
Koudahl MT, Kouw WM, de Vries B. On Epistemics in Expected Free Energy for Linear Gaussian State Space Models. ENTROPY 2021; 23:e23121565. [PMID: 34945871 PMCID: PMC8700494 DOI: 10.3390/e23121565] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 11/19/2021] [Accepted: 11/23/2021] [Indexed: 01/20/2023]
Abstract
Active Inference (AIF) is a framework that can be used both to describe information processing in naturally intelligent systems, such as the human brain, and to design synthetic intelligent systems (agents). In this paper we show that Expected Free Energy (EFE) minimisation, a core feature of the framework, does not lead to purposeful explorative behaviour in linear Gaussian dynamical systems. We provide a simple proof that, due to the specific construction used for the EFE, the terms responsible for the exploratory (epistemic) drive become constant in the case of linear Gaussian systems. This renders AIF equivalent to KL control. From a theoretical point of view this is an interesting result since it is generally assumed that EFE minimisation will always introduce an exploratory drive in AIF agents. While the full EFE objective does not lead to exploration in linear Gaussian dynamical systems, the principles of its construction can still be used to design objectives that include an epistemic drive. We provide an in-depth analysis of the mechanics behind the epistemic drive of AIF agents and show how to design objectives for linear Gaussian dynamical systems that do include an epistemic drive. Concretely, we show that focusing solely on epistemics and dispensing with goal-directed terms leads to a form of maximum entropy exploration that is heavily dependent on the type of control signals driving the system. Additive controls do not permit such exploration. From a practical point of view this is an important result since linear Gaussian dynamical systems with additive controls are an extensively used model class, encompassing for instance Linear Quadratic Gaussian controllers. On the other hand, linear Gaussian dynamical systems driven by multiplicative controls such as switching transition matrices do permit an exploratory drive.
Collapse
Affiliation(s)
- Magnus T. Koudahl
- Department of Electrical Engineering, Eindhoven University of Technology, 5612 AZ Eindhoven, The Netherlands; (W.M.K.); (B.d.V.)
- Correspondence:
| | - Wouter M. Kouw
- Department of Electrical Engineering, Eindhoven University of Technology, 5612 AZ Eindhoven, The Netherlands; (W.M.K.); (B.d.V.)
| | - Bert de Vries
- Department of Electrical Engineering, Eindhoven University of Technology, 5612 AZ Eindhoven, The Netherlands; (W.M.K.); (B.d.V.)
- GN Hearing, JF Kennedylaan 2, 5612 AB Eindhoven, The Netherlands
| |
Collapse
|
16
|
Parr T, Pezzulo G. Understanding, Explanation, and Active Inference. Front Syst Neurosci 2021; 15:772641. [PMID: 34803619 PMCID: PMC8602880 DOI: 10.3389/fnsys.2021.772641] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 10/15/2021] [Indexed: 11/13/2022] Open
Abstract
While machine learning techniques have been transformative in solving a range of problems, an important challenge is to understand why they arrive at the decisions they output. Some have argued that this necessitates augmenting machine intelligence with understanding such that, when queried, a machine is able to explain its behaviour (i.e., explainable AI). In this article, we address the issue of machine understanding from the perspective of active inference. This paradigm enables decision making based upon a model of how data are generated. The generative model contains those variables required to explain sensory data, and its inversion may be seen as an attempt to explain the causes of these data. Here we are interested in explanations of one's own actions. This implies a deep generative model that includes a model of the world, used to infer policies, and a higher-level model that attempts to predict which policies will be selected based upon a space of hypothetical (i.e., counterfactual) explanations-and which can subsequently be used to provide (retrospective) explanations about the policies pursued. We illustrate the construct validity of this notion of understanding in relation to human understanding by highlighting the similarities in computational architecture and the consequences of its dysfunction.
Collapse
Affiliation(s)
- Thomas Parr
- Wellcome Centre for Human Neuroimaging, Queen Square Institute of Neurology, University College London, London, United Kingdom
| | - Giovanni Pezzulo
- Institute of Cognitive Sciences and Technologies, National Research Council, Rome, Italy
| |
Collapse
|
17
|
Rorot W. Bayesian theories of consciousness: a review in search for a minimal unifying model. Neurosci Conscious 2021; 2021:niab038. [PMID: 34650816 PMCID: PMC8512254 DOI: 10.1093/nc/niab038] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Revised: 09/10/2021] [Accepted: 09/22/2021] [Indexed: 11/30/2022] Open
Abstract
The goal of the paper is to review existing work on consciousness within the frameworks of Predictive Processing, Active Inference, and Free Energy Principle. The emphasis is put on the role played by the precision and complexity of the internal generative model. In the light of those proposals, these two properties appear to be the minimal necessary components for the emergence of conscious experience-a Minimal Unifying Model of consciousness.
Collapse
Affiliation(s)
- Wiktor Rorot
- Faculty of Philosophy and Faculty of Psychology, University of Warsaw, ul. Krakowskie Przedmieście 3, 00-927, Stawki 5/7, Warsaw 00-183, Poland
| |
Collapse
|
18
|
Marković D, Stojić H, Schwöbel S, Kiebel SJ. An empirical evaluation of active inference in multi-armed bandits. Neural Netw 2021; 144:229-246. [PMID: 34507043 DOI: 10.1016/j.neunet.2021.08.018] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 07/07/2021] [Accepted: 08/11/2021] [Indexed: 10/20/2022]
Abstract
A key feature of sequential decision making under uncertainty is a need to balance between exploiting-choosing the best action according to the current knowledge, and exploring-obtaining information about values of other actions. The multi-armed bandit problem, a classical task that captures this trade-off, served as a vehicle in machine learning for developing bandit algorithms that proved to be useful in numerous industrial applications. The active inference framework, an approach to sequential decision making recently developed in neuroscience for understanding human and animal behaviour, is distinguished by its sophisticated strategy for resolving the exploration-exploitation trade-off. This makes active inference an exciting alternative to already established bandit algorithms. Here we derive an efficient and scalable approximate active inference algorithm and compare it to two state-of-the-art bandit algorithms: Bayesian upper confidence bound and optimistic Thompson sampling. This comparison is done on two types of bandit problems: a stationary and a dynamic switching bandit. Our empirical evaluation shows that the active inference algorithm does not produce efficient long-term behaviour in stationary bandits. However, in the more challenging switching bandit problem active inference performs substantially better than the two state-of-the-art bandit algorithms. The results open exciting venues for further research in theoretical and applied machine learning, as well as lend additional credibility to active inference as a general framework for studying human and animal behaviour.
Collapse
Affiliation(s)
- Dimitrije Marković
- Faculty of Psychology, Technische Universität Dresden, 01062 Dresden, Germany; Centre for Tactile Internet with Human-in-the-Loop (CeTI), Technische Universität Dresden, 01062 Dresden, Germany.
| | - Hrvoje Stojić
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, 10-12 Russell Square, London, WC1B 5EH, United Kingdom; Secondmind, 72 Hills Rd, Cambridge, CB2 1LA, United Kingdom
| | - Sarah Schwöbel
- Faculty of Psychology, Technische Universität Dresden, 01062 Dresden, Germany
| | - Stefan J Kiebel
- Faculty of Psychology, Technische Universität Dresden, 01062 Dresden, Germany; Centre for Tactile Internet with Human-in-the-Loop (CeTI), Technische Universität Dresden, 01062 Dresden, Germany
| |
Collapse
|
19
|
Parr T, Da Costa L, Heins C, Ramstead MJD, Friston KJ. Memory and Markov Blankets. ENTROPY (BASEL, SWITZERLAND) 2021; 23:1105. [PMID: 34573730 PMCID: PMC8469145 DOI: 10.3390/e23091105] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/10/2021] [Revised: 08/20/2021] [Accepted: 08/22/2021] [Indexed: 12/11/2022]
Abstract
In theoretical biology, we are often interested in random dynamical systems-like the brain-that appear to model their environments. This can be formalized by appealing to the existence of a (possibly non-equilibrium) steady state, whose density preserves a conditional independence between a biological entity and its surroundings. From this perspective, the conditioning set, or Markov blanket, induces a form of vicarious synchrony between creature and world-as if one were modelling the other. However, this results in an apparent paradox. If all conditional dependencies between a system and its surroundings depend upon the blanket, how do we account for the mnemonic capacity of living systems? It might appear that any shared dependence upon past blanket states violates the independence condition, as the variables on either side of the blanket now share information not available from the current blanket state. This paper aims to resolve this paradox, and to demonstrate that conditional independence does not preclude memory. Our argument rests upon drawing a distinction between the dependencies implied by a steady state density, and the density dynamics of the system conditioned upon its configuration at a previous time. The interesting question then becomes: What determines the length of time required for a stochastic system to 'forget' its initial conditions? We explore this question for an example system, whose steady state density possesses a Markov blanket, through simple numerical analyses. We conclude with a discussion of the relevance for memory in cognitive systems like us.
Collapse
Affiliation(s)
- Thomas Parr
- Wellcome Centre for Human Neuroimaging, Queen Square Institute of Neurology, University College London, London WC1N 3AR, UK; (L.D.C.); (M.J.D.R.); (K.J.F.)
| | - Lancelot Da Costa
- Wellcome Centre for Human Neuroimaging, Queen Square Institute of Neurology, University College London, London WC1N 3AR, UK; (L.D.C.); (M.J.D.R.); (K.J.F.)
- Department of Mathematics, Imperial College London, London SW7 2AZ, UK
| | - Conor Heins
- Department of Collective Behaviour, Max Planck Institute of Animal Behavior, D-78457 Konstanz, Germany;
- Centre for the Advanced Study of Collective Behaviour, University of Konstanz, D-78457 Konstanz, Germany
- Department of Biology, University of Konstanz, D-78457 Konstanz, Germany
- Nested Minds Network, London EC4A 3TW, UK
| | - Maxwell James D. Ramstead
- Wellcome Centre for Human Neuroimaging, Queen Square Institute of Neurology, University College London, London WC1N 3AR, UK; (L.D.C.); (M.J.D.R.); (K.J.F.)
- Nested Minds Network, London EC4A 3TW, UK
- Spatial Web Foundation, Los Angeles, CA 90016, USA
- Division of Social and Transcultural Psychiatry, Department of Psychiatry, McGill University, Montreal, QC H3A 1A1, Canada
| | - Karl J. Friston
- Wellcome Centre for Human Neuroimaging, Queen Square Institute of Neurology, University College London, London WC1N 3AR, UK; (L.D.C.); (M.J.D.R.); (K.J.F.)
| |
Collapse
|
20
|
Abstract
Active inference offers a first principle account of sentient behavior, from which special and important cases-for example, reinforcement learning, active learning, Bayes optimal inference, Bayes optimal design-can be derived. Active inference finesses the exploitation-exploration dilemma in relation to prior preferences by placing information gain on the same footing as reward or value. In brief, active inference replaces value functions with functionals of (Bayesian) beliefs, in the form of an expected (variational) free energy. In this letter, we consider a sophisticated kind of active inference using a recursive form of expected free energy. Sophistication describes the degree to which an agent has beliefs about beliefs. We consider agents with beliefs about the counterfactual consequences of action for states of affairs and beliefs about those latent states. In other words, we move from simply considering beliefs about "what would happen if I did that" to "what I would believe about what would happen if I did that." The recursive form of the free energy functional effectively implements a deep tree search over actions and outcomes in the future. Crucially, this search is over sequences of belief states as opposed to states per se. We illustrate the competence of this scheme using numerical simulations of deep decision problems.
Collapse
Affiliation(s)
- Karl Friston
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, London WC1N 3AR, U.K.
| | - Lancelot Da Costa
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, London WC1N 3AR, U.K., and Department of Mathematics, Imperial College London, U.K.
| | - Danijar Hafner
- Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada, and Google Research, Brain Team, Toronto, ON MSH 153, Canada
| | - Casper Hesp
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, London WC1N 3AR, U.K., and Amsterdam Brain and Cognition Center, University of Amsterdam, Amsterdam 1001 NK, The Netherlands
| | - Thomas Parr
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, London WC1N 3AR, U.K.
| |
Collapse
|
21
|
Champion T, Grześ M, Bowman H. Realizing Active Inference in Variational Message Passing: The Outcome-Blind Certainty Seeker. Neural Comput 2021; 33:2762-2826. [PMID: 34280302 DOI: 10.1162/neco_a_01422] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 04/20/2021] [Indexed: 11/04/2022]
Abstract
Active inference is a state-of-the-art framework in neuroscience that offers a unified theory of brain function. It is also proposed as a framework for planning in AI. Unfortunately, the complex mathematics required to create new models can impede application of active inference in neuroscience and AI research. This letter addresses this problem by providing a complete mathematical treatment of the active inference framework in discrete time and state spaces and the derivation of the update equations for any new model. We leverage the theoretical connection between active inference and variational message passing as described by John Winn and Christopher M. Bishop in 2005. Since variational message passing is a well-defined methodology for deriving Bayesian belief update equations, this letter opens the door to advanced generative models for active inference. We show that using a fully factorized variational distribution simplifies the expected free energy, which furnishes priors over policies so that agents seek unambiguous states. Finally, we consider future extensions that support deep tree searches for sequential policy optimization based on structure learning and belief propagation.
Collapse
Affiliation(s)
| | - Marek Grześ
- University of Kent, School of Computing, Canterbury CT2 7NZ, U.K.
| | - Howard Bowman
- University of Birmingham, School of Psychology, Birmingham B15 2TT, U.K., and University of Kent, School of Computing, Canterbury CT2 7NZ, U.K.
| |
Collapse
|
22
|
van de Laar T, Wymeersch H, Şenöz İ, Özçelikkale A. Chance-Constrained Active Inference. Neural Comput 2021; 33:2710-2735. [PMID: 34280254 DOI: 10.1162/neco_a_01427] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Accepted: 05/03/2021] [Indexed: 11/04/2022]
Abstract
Active inference (ActInf) is an emerging theory that explains perception and action in biological agents in terms of minimizing a free energy bound on Bayesian surprise. Goal-directed behavior is elicited by introducing prior beliefs on the underlying generative model. In contrast to prior beliefs, which constrain all realizations of a random variable, we propose an alternative approach through chance constraints, which allow for a (typically small) probability of constraint violation, and demonstrate how such constraints can be used as intrinsic drivers for goal-directed behavior in ActInf. We illustrate how chance-constrained ActInf weights all imposed (prior) constraints on the generative model, allowing, for example, for a trade-off between robust control and empirical chance constraint violation. Second, we interpret the proposed solution within a message passing framework. Interestingly, the message passing interpretation is not only relevant to the context of ActInf, but also provides a general-purpose approach that can account for chance constraints on graphical models. The chance constraint message updates can then be readily combined with other prederived message update rules without the need for custom derivations. The proposed chance-constrained message passing framework thus accelerates the search for workable models in general and can be used to complement message-passing formulations on generative neural models.
Collapse
Affiliation(s)
- Thijs van de Laar
- Eindhoven University of Technology, 5612 AP, Eindhoven, The Netherlands
| | - Henk Wymeersch
- Chalmers University of Technology, 41296, Gothenburg, Sweden
| | - İsmail Şenöz
- Eindhoven University of Technology, 5612 AP, Eindhoven, The Netherlands
| | | |
Collapse
|
23
|
Ueltzhöffer K, Da Costa L, Friston KJ. Variational free energy, individual fitness, and population dynamics under acute stress: Comment on "Dynamic and thermodynamic models of adaptation" by Alexander N. Gorban et al. Phys Life Rev 2021; 37:111-115. [PMID: 33901916 DOI: 10.1016/j.plrev.2021.04.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Accepted: 04/15/2021] [Indexed: 01/27/2023]
Affiliation(s)
- Kai Ueltzhöffer
- Wellcome Centre for Human Neuroimaging, University College London, UCL Queen Square Institute of Neurology, 12 Queen Square, London WC1N 3AR, United Kingdom; Department of General Psychiatry, Centre of Psychosocial Medicine, Heidelberg University Hospital, Voßstraße 2, 69115 Heidelberg, Germany.
| | - Lancelot Da Costa
- Wellcome Centre for Human Neuroimaging, University College London, UCL Queen Square Institute of Neurology, 12 Queen Square, London WC1N 3AR, United Kingdom; Department of Mathematics, Imperial College London, London SW7 2AZ, United Kingdom
| | - Karl J Friston
- Wellcome Centre for Human Neuroimaging, University College London, UCL Queen Square Institute of Neurology, 12 Queen Square, London WC1N 3AR, United Kingdom
| |
Collapse
|
24
|
Ciria A, Schillaci G, Pezzulo G, Hafner VV, Lara B. Predictive Processing in Cognitive Robotics: A Review. Neural Comput 2021; 33:1402-1432. [PMID: 34496394 DOI: 10.1162/neco_a_01383] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 12/31/2020] [Indexed: 11/04/2022]
Abstract
Predictive processing has become an influential framework in cognitive sciences. This framework turns the traditional view of perception upside down, claiming that the main flow of information processing is realized in a top-down, hierarchical manner. Furthermore, it aims at unifying perception, cognition, and action as a single inferential process. However, in the related literature, the predictive processing framework and its associated schemes, such as predictive coding, active inference, perceptual inference, and free-energy principle, tend to be used interchangeably. In the field of cognitive robotics, there is no clear-cut distinction on which schemes have been implemented and under which assumptions. In this letter, working definitions are set with the main aim of analyzing the state of the art in cognitive robotics research working under the predictive processing framework as well as some related nonrobotic models. The analysis suggests that, first, research in both cognitive robotics implementations and nonrobotic models needs to be extended to the study of how multiple exteroceptive modalities can be integrated into prediction error minimization schemes. Second, a relevant distinction found here is that cognitive robotics implementations tend to emphasize the learning of a generative model, while in nonrobotics models, it is almost absent. Third, despite the relevance for active inference, few cognitive robotics implementations examine the issues around control and whether it should result from the substitution of inverse models with proprioceptive predictions. Finally, limited attention has been placed on precision weighting and the tracking of prediction error dynamics. These mechanisms should help to explore more complex behaviors and tasks in cognitive robotics research under the predictive processing framework.
Collapse
Affiliation(s)
- Alejandra Ciria
- Facultad de Psicología, Universidad Nacional Autónoma de México, Mexico City, CP 04510, Mexico
| | - Guido Schillaci
- BioRobotics Institute, Scuola Superiore Sant'Anna, 34 56025 Pontedera, Italy
| | - Giovanni Pezzulo
- Institute of Cognitive Sciences and Technologies, National Research Council, 44 00185 Rome, Italy
| | - Verena V Hafner
- Adaptive Systems Group, Department of Computer Science, Humboldt-Universität zu Berlin, D-12489, Germany
| | - Bruno Lara
- Laboratorio de Robótica Cognitiva, Centro de Investigación en Ciencias, Universidad Autónoma del Estado de Morelos, Cuernavaca CP 62209, Mexico
| |
Collapse
|
25
|
Da Costa L, Parr T, Sengupta B, Friston K. Neural Dynamics under Active Inference: Plausibility and Efficiency of Information Processing. ENTROPY (BASEL, SWITZERLAND) 2021; 23:454. [PMID: 33921298 PMCID: PMC8069154 DOI: 10.3390/e23040454] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 04/06/2021] [Indexed: 02/07/2023]
Abstract
Active inference is a normative framework for explaining behaviour under the free energy principle-a theory of self-organisation originating in neuroscience. It specifies neuronal dynamics for state-estimation in terms of a descent on (variational) free energy-a measure of the fit between an internal (generative) model and sensory observations. The free energy gradient is a prediction error-plausibly encoded in the average membrane potentials of neuronal populations. Conversely, the expected probability of a state can be expressed in terms of neuronal firing rates. We show that this is consistent with current models of neuronal dynamics and establish face validity by synthesising plausible electrophysiological responses. We then show that these neuronal dynamics approximate natural gradient descent, a well-known optimisation algorithm from information geometry that follows the steepest descent of the objective in information space. We compare the information length of belief updating in both schemes, a measure of the distance travelled in information space that has a direct interpretation in terms of metabolic cost. We show that neural dynamics under active inference are metabolically efficient and suggest that neural representations in biological agents may evolve by approximating steepest descent in information space towards the point of optimal inference.
Collapse
Affiliation(s)
- Lancelot Da Costa
- Department of Mathematics, Imperial College London, London SW7 2AZ, UK
- Wellcome Centre for Human Neuroimaging, Institute of Neurology, University College London, London WC1N 3BG, UK; (T.P.); (B.S.); (K.F.)
| | - Thomas Parr
- Wellcome Centre for Human Neuroimaging, Institute of Neurology, University College London, London WC1N 3BG, UK; (T.P.); (B.S.); (K.F.)
| | - Biswa Sengupta
- Wellcome Centre for Human Neuroimaging, Institute of Neurology, University College London, London WC1N 3BG, UK; (T.P.); (B.S.); (K.F.)
- Core Machine Learning Group, Zebra AI, London WC2H 8TJ, UK
- Department of Bioengineering, Imperial College London, London SW7 2AZ, UK
| | - Karl Friston
- Wellcome Centre for Human Neuroimaging, Institute of Neurology, University College London, London WC1N 3BG, UK; (T.P.); (B.S.); (K.F.)
| |
Collapse
|
26
|
Abstract
Climate change, biodiversity loss, and other major social and environmental problems pose severe risks. Progress has been inadequate and scientists, global policy experts, and the general public increasingly conclude that transformational change is needed across all sectors of society in order to improve and maintain social and ecological wellbeing. At least two paths to transformation are conceivable: (1) reform of and innovation within existing societal systems (e.g., economic, legal, and governance systems); and (2) the de novo development of and migration to new and improved societal systems. This paper is the final in a three-part series of concept papers that together outline a novel science-driven research and development program aimed at the second path. It summarizes literature to build a narrative on the topic of de novo design of societal systems. The purpose is to raise issues, suggest design possibilities, and highlight directions and questions that could be explored in the context of this or any R&D program aimed at new system design. This paper does not present original research, but rather provides a synthesis of selected ideas from the literature. Following other papers in the series, a society is viewed as a superorganism and its societal systems as a cognitive architecture. Accordingly, a central goal of design is to improve the collective cognitive capacity of a society, rendering it more capable of achieving and sustainably maintaining vitality. Topics of attention, communication, self-identity, power, and influence are discussed in relation to societal cognition and system design. A prototypical societal system is described, and some design considerations are highlighted.
Collapse
|
27
|
Abstract
The expected free energy (EFE) is a central quantity in the theory of active inference. It is the quantity that all active inference agents are mandated to minimize through action, and its decomposition into extrinsic and intrinsic value terms is key to the balance of exploration and exploitation that active inference agents evince. Despite its importance, the mathematical origins of this quantity and its relation to the variational free energy (VFE) remain unclear. In this letter, we investigate the origins of the EFE in detail and show that it is not simply "the free energy in the future." We present a functional that we argue is the natural extension of the VFE but actively discourages exploratory behavior, thus demonstrating that exploration does not directly follow from free energy minimization into the future. We then develop a novel objective, the free energy of the expected future (FEEF), which possesses both the epistemic component of the EFE and an intuitive mathematical grounding as the divergence between predicted and desired futures.
Collapse
Affiliation(s)
- Beren Millidge
- School of Informatics, University of Edinburgh, Edinburgh, EH8 9AB, U.K.
| | - Alexander Tschantz
- Sackler Center for Consciousness Science, School of Engineering and Informatics, University of Sussex, Falmer, Brighton, BN1 9RH, U.K.
| | - Christopher L Buckley
- Evolutionary and Adaptive Systems Research Group, School of Engineering and Informatics, University of Sussex, Falmer, Brighton, BN1 9RH, U.K.
| |
Collapse
|
28
|
Abstract
Active inference is a first principle account of how autonomous agents operate in dynamic, nonstationary environments. This problem is also considered in reinforcement learning, but limited work exists on comparing the two approaches on the same discrete-state environments. In this letter, we provide (1) an accessible overview of the discrete-state formulation of active inference, highlighting natural behaviors in active inference that are generally engineered in reinforcement learning, and (2) an explicit discrete-state comparison between active inference and reinforcement learning on an OpenAI gym baseline. We begin by providing a condensed overview of the active inference literature, in particular viewing the various natural behaviors of active inference agents through the lens of reinforcement learning. We show that by operating in a pure belief-based setting, active inference agents can carry out epistemic exploration-and account for uncertainty about their environment-in a Bayes-optimal fashion. Furthermore, we show that the reliance on an explicit reward signal in reinforcement learning is removed in active inference, where reward can simply be treated as another observation we have a preference over; even in the total absence of rewards, agent behaviors are learned through preference learning. We make these properties explicit by showing two scenarios in which active inference agents can infer behaviors in reward-free environments compared to both Q-learning and Bayesian model-based reinforcement learning agents and by placing zero prior preferences over rewards and learning the prior preferences over the observations corresponding to reward. We conclude by noting that this formalism can be applied to more complex settings (e.g., robotic arm movement, Atari games) if appropriate generative models can be formulated. In short, we aim to demystify the behavior of active inference agents by presenting an accessible discrete state-space and time formulation and demonstrate these behaviors in a OpenAI gym environment, alongside reinforcement learning agents.
Collapse
Affiliation(s)
- Noor Sajid
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, London, WC1N 3AR, U.K.
| | - Philip J Ball
- Machine Learning Research Group, Department of Engineering Science, University of Oxford, Oxford OX1 3PJ, U.K.
| | - Thomas Parr
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, London, WC1N 3AR, U.K.
| | - Karl J Friston
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, London, WC1N 3AR, U.K.
| |
Collapse
|
29
|
Çatal O, Wauthier S, De Boom C, Verbelen T, Dhoedt B. Learning Generative State Space Models for Active Inference. Front Comput Neurosci 2020; 14:574372. [PMID: 33304260 PMCID: PMC7701292 DOI: 10.3389/fncom.2020.574372] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 10/14/2020] [Indexed: 11/13/2022] Open
Abstract
In this paper we investigate the active inference framework as a means to enable autonomous behavior in artificial agents. Active inference is a theoretical framework underpinning the way organisms act and observe in the real world. In active inference, agents act in order to minimize their so called free energy, or prediction error. Besides being biologically plausible, active inference has been shown to solve hard exploration problems in various simulated environments. However, these simulations typically require handcrafting a generative model for the agent. Therefore we propose to use recent advances in deep artificial neural networks to learn generative state space models from scratch, using only observation-action sequences. This way we are able to scale active inference to new and challenging problem domains, whilst still building on the theoretical backing of the free energy principle. We validate our approach on the mountain car problem to illustrate that our learnt models can indeed trade-off instrumental value and ambiguity. Furthermore, we show that generative models can also be learnt using high-dimensional pixel observations, both in the OpenAI Gym car racing environment and a real-world robotic navigation task. Finally we show that active inference based policies are an order of magnitude more sample efficient than Deep Q Networks on RL tasks.
Collapse
Affiliation(s)
- Ozan Çatal
- IDLab, Department of Information Technology, Ghent University - imec, Ghent, Belgium
| | | | | | | | | |
Collapse
|
30
|
van de Laar TW, de Vries B. Simulating Active Inference Processes by Message Passing. Front Robot AI 2019; 6:20. [PMID: 33501036 PMCID: PMC7805795 DOI: 10.3389/frobt.2019.00020] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Accepted: 03/05/2019] [Indexed: 01/28/2023] Open
Abstract
The free energy principle (FEP) offers a variational calculus-based description for how biological agents persevere through interactions with their environment. Active inference (AI) is a corollary of the FEP, which states that biological agents act to fulfill prior beliefs about preferred future observations (target priors). Purposeful behavior then results from variational free energy minimization with respect to a generative model of the environment with included target priors. However, manual derivations for free energy minimizing algorithms on custom dynamic models can become tedious and error-prone. While probabilistic programming (PP) techniques enable automatic derivation of inference algorithms on free-form models, full automation of AI requires specialized tools for inference on dynamic models, together with the description of an experimental protocol that governs the interaction between the agent and its simulated environment. The contributions of the present paper are two-fold. Firstly, we illustrate how AI can be automated with the use of ForneyLab, a recent PP toolbox that specializes in variational inference on flexibly definable dynamic models. More specifically, we describe AI agents in a dynamic environment as probabilistic state space models (SSM) and perform inference for perception and control in these agents by message passing on a factor graph representation of the SSM. Secondly, we propose a formal experimental protocol for simulated AI. We exemplify how this protocol leads to goal-directed behavior for flexibly definable AI agents in two classical RL examples, namely the Bayesian thermostat and the mountain car parking problems.
Collapse
Affiliation(s)
- Thijs W. van de Laar
- Department of Electrical Engineering, Eindhoven University of Technology, Eindhoven, Netherlands
| | - Bert de Vries
- Department of Electrical Engineering, Eindhoven University of Technology, Eindhoven, Netherlands
- GN Hearing Benelux BV, Eindhoven, Netherlands
| |
Collapse
|