1
|
Kargar E, Kyrki V. MACRPO: Multi-agent cooperative recurrent policy optimization. Front Robot AI 2024; 11:1394209. [PMID: 39760046 PMCID: PMC11695781 DOI: 10.3389/frobt.2024.1394209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Accepted: 11/25/2024] [Indexed: 01/07/2025] Open
Abstract
This work considers the problem of learning cooperative policies in multi-agent settings with partially observable and non-stationary environments without a communication channel. We focus on improving information sharing between agents and propose a new multi-agent actor-critic method called Multi-Agent Cooperative Recurrent Proximal Policy Optimization (MACRPO). We propose two novel ways of integrating information across agents and time in MACRPO: First, we use a recurrent layer in the critic's network architecture and propose a new framework to use the proposed meta-trajectory to train the recurrent layer. This allows the network to learn the cooperation and dynamics of interactions between agents, and also handle partial observability. Second, we propose a new advantage function that incorporates other agents' rewards and value functions by controlling the level of cooperation between agents using a parameter. The use of this control parameter is suitable for environments in which the agents are unable to fully cooperate with each other. We evaluate our algorithm on three challenging multi-agent environments with continuous and discrete action spaces, Deepdrive-Zero, Multi-Walker, and Particle environment. We compare the results with several ablations and state-of-the-art multi-agent algorithms such as MAGIC, IC3Net, CommNet, GA-Comm, QMIX, MADDPG, and RMAPPO, and also single-agent methods with shared parameters between agents such as IMPALA and APEX. The results show superior performance against other algorithms. The code is available online at https://github.com/kargarisaac/macrpo.
Collapse
Affiliation(s)
- Eshagh Kargar
- Intelligent Robotics Group, Electrical Engineering and Automation Department, Aalto University, Helsinki, Finland
| | | |
Collapse
|
2
|
Alon N, Schulz L, Bell V, Moutoussis M, Dayan P, Barnby JM. (Mal)adaptive Mentalizing in the Cognitive Hierarchy, and Its Link to Paranoia. COMPUTATIONAL PSYCHIATRY (CAMBRIDGE, MASS.) 2024; 8:159-177. [PMID: 39280241 PMCID: PMC11396085 DOI: 10.5334/cpsy.117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Accepted: 08/29/2024] [Indexed: 09/18/2024]
Abstract
Humans need to be on their toes when interacting with competitive others to avoid being taken advantage of. Too much caution out of context can, however, be detrimental and produce false beliefs of intended harm. Here, we offer a formal account of this phenomenon through the lens of Theory of Mind. We simulate agents of different depths of mentalizing within a simple game theoretic paradigm and show how, if aligned well, deep recursive mentalization gives rise to both successful deception as well as reasonable skepticism. However, we also show that if a self is mentalizing too deeply - hyper-mentalizing - false beliefs arise that a partner is trying to trick them maliciously, resulting in a material loss to the self. Importantly, we show that this is only true when hypermentalizing agents believe observed actions are generated intentionally. This theory offers a potential cognitive mechanism for suspiciousness, paranoia, and conspiratorial ideation. Rather than a deficit in Theory of Mind, paranoia may arise from the application of overly strategic thinking to ingenuous behaviour. Author Summary Interacting competitively requires vigilance to avoid deception. However, excessive caution can have adverse effects, stemming from false beliefs of intentional harm. So far there is no formal cognitive account of what may cause this suspiciousness. Here we present an examination of this phenomenon through the lens of Theory of Mind - the cognitive ability to consider the beliefs, intentions, and desires of others. By simulating interacting computer agents we illustrate how well-aligned agents can give rise to successful deception and justified skepticism. Crucially, we also reveal that overly cautious agents develop false beliefs that an ingenuous partner is attempting malicious trickery, leading to tangible losses. As well as formally defining a plausible mechanism for suspiciousness, paranoia, and conspiratorial thinking, our theory indicates that rather than a deficit in Theory of Mind, paranoia may involve an over-application of strategy to genuine behaviour.
Collapse
Affiliation(s)
- Nitay Alon
- Department of Computer Science, The Hebrew University of Jerusalem, Jerusalem, Israel
- Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Tübingen, Germany
| | - Lion Schulz
- Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Tübingen, Germany
| | - Vaughan Bell
- Clinical, Educational, and Health Psychology, University College London, United Kingdom
| | - Michael Moutoussis
- Department of Imaging Neuroscience, University College London, London, United Kingdom
| | - Peter Dayan
- Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Tübingen, Germany
- Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Joseph M Barnby
- Department of Psychology, Royal Holloway University of London, London, United Kingdom
- School of Psychiatry and Clinical Neuroscience, The University of Western Australia, Australia
| |
Collapse
|
3
|
Reiter AMF, Hula A, Vanes L, Hauser TU, Kokorikou D, Goodyer IM, Fonagy P, Moutoussis M, Dolan RJ. Self-reported childhood family adversity is linked to an attenuated gain of trust during adolescence. Nat Commun 2023; 14:6920. [PMID: 37903767 PMCID: PMC10616102 DOI: 10.1038/s41467-023-41531-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 09/07/2023] [Indexed: 11/01/2023] Open
Abstract
A longstanding proposal in developmental research is that childhood family experiences provide a template that shapes a capacity for trust-based social relationships. We leveraged longitudinal data from a cohort of healthy adolescents (n = 570, aged 14-25), which included decision-making and psychometric data, to characterise normative developmental trajectories of trust behaviour and inter-individual differences therein. Extending on previous cross-sectional findings from the same cohort, we show that a task-based measure of trust increases longitudinally from adolescence into young adulthood. Computational modelling suggests this is due to a decrease in social risk aversion. Self-reported family adversity attenuates this developmental gain in trust behaviour, and within our computational model, this relates to a higher 'irritability' parameter in those reporting greater adversity. Unconditional trust at measurement time point T1 predicts the longitudinal trajectory of self-reported peer relation quality, particularly so for those with higher family adversity, consistent with trust acting as a resilience factor.
Collapse
Affiliation(s)
- Andrea M F Reiter
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, London, UK.
- Wellcome Centre for Human Neuroimaging, University College London, London, UK.
- Department of Child and Adolescence Psychiatry, Psychosomatics and Psychotherapy, Centre of Mental Health, University Hospital Würzburg, Würzburg, Germany.
- Department of Psychology, Julius-Maximilians-Universität Würzburg, Würzburg, Germany.
- CRC Cognitive Control, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany.
| | - Andreas Hula
- Austrian Institute of Technology, Vienna, Austria
| | - Lucy Vanes
- Department of Neuroimaging, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK
| | - Tobias U Hauser
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, London, UK
- Wellcome Centre for Human Neuroimaging, University College London, London, UK
- Department of Psychiatry and Psychotherapy, Medical School and University Hospital, Eberhard Karls University of Tübingen, Tübingen, Germany
- German Center for Mental Health (DZPG), Tübingen, Germany
| | - Danae Kokorikou
- Department of Clinical, Educational and Health Psychology, University College London, London, UK
| | - Ian M Goodyer
- Department of Psychiatry, University of Cambridge, Cambridge, UK
| | - Peter Fonagy
- Department of Clinical, Educational and Health Psychology, University College London, London, UK
| | - Michael Moutoussis
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, London, UK
- Wellcome Centre for Human Neuroimaging, University College London, London, UK
| | - Raymond J Dolan
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, London, UK
- Wellcome Centre for Human Neuroimaging, University College London, London, UK
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, China
| |
Collapse
|
4
|
Alon N, Schulz L, Rosenschein JS, Dayan P. A (Dis-)information Theory of Revealed and Unrevealed Preferences: Emerging Deception and Skepticism via Theory of Mind. Open Mind (Camb) 2023; 7:608-624. [PMID: 37840764 PMCID: PMC10575559 DOI: 10.1162/opmi_a_00097] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Accepted: 07/19/2023] [Indexed: 10/17/2023] Open
Abstract
In complex situations involving communication, agents might attempt to mask their intentions, exploiting Shannon's theory of information as a theory of misinformation. Here, we introduce and analyze a simple multiagent reinforcement learning task where a buyer sends signals to a seller via its actions, and in which both agents are endowed with a recursive theory of mind. We show that this theory of mind, coupled with pure reward-maximization, gives rise to agents that selectively distort messages and become skeptical towards one another. Using information theory to analyze these interactions, we show how savvy buyers reduce mutual information between their preferences and actions, and how suspicious sellers learn to reinterpret or discard buyers' signals in a strategic manner.
Collapse
Affiliation(s)
- Nitay Alon
- Department of Computer Science, The Hebrew University of Jerusalem, Jerusalem, Israel
- Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Tübingen, Germany
| | - Lion Schulz
- Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Tübingen, Germany
| | | | - Peter Dayan
- Department of Computer Science, The Hebrew University of Jerusalem, Jerusalem, Israel
- Department of Computer Science, University of Tübingen, Tübingen, Germany
| |
Collapse
|
5
|
Pan Y, Zhang H, Zeng Y, Ma B, Tang J, Ming Z. Diversifying agent's behaviors in interactive decision models. INT J INTELL SYST 2022. [DOI: 10.1002/int.23075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Yinghui Pan
- College of Computer Science and Software Engineering Shenzhen University Shenzhen China
| | - Hanyi Zhang
- College of Computer Science and Software Engineering Shenzhen University Shenzhen China
| | - Yifeng Zeng
- Department of Computer and Information Sciences Northumbria University Newcastle‐upon‐Tyne UK
| | - Biyang Ma
- Department of Computer Sciences Minnan Normal University Zhangzhou China
| | - Jing Tang
- Newcastle Business School Northumbria University Newcastle‐upon‐Tyne UK
| | - Zhong Ming
- College of Computer Science and Software Engineering Shenzhen University Shenzhen China
| |
Collapse
|
6
|
Steixner-Kumar S, Rusch T, Doshi P, Spezio M, Gläscher J. Humans depart from optimal computational models of interactive decision-making during competition under partial information. Sci Rep 2022; 12:289. [PMID: 34997138 PMCID: PMC8741801 DOI: 10.1038/s41598-021-04272-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Accepted: 12/14/2021] [Indexed: 11/10/2022] Open
Abstract
Decision making under uncertainty in multiagent settings is of increasing interest in decision science. The degree to which human agents depart from computationally optimal solutions in socially interactive settings is generally unknown. Such understanding provides insight into how social contexts affect human interaction and the underlying contributions of Theory of Mind. In this paper, we adapt the well-known ‘Tiger Problem’ from artificial-agent research to human participants in solo and interactive settings. Compared to computationally optimal solutions, participants gathered less information before outcome-related decisions when competing than cooperating with others. These departures from optimality were not haphazard but showed evidence of improved performance through learning. Costly errors emerged under conditions of competition, yielding both lower rates of rewarding actions and accuracy in predicting others. Taken together, this work provides a novel approach and insights into studying human social interaction when shared information is partial.
Collapse
Affiliation(s)
- Saurabh Steixner-Kumar
- Institute of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
| | - Tessa Rusch
- Institute of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.,Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA
| | - Prashant Doshi
- Department of Computer Science, University of Georgia, Athens, GA, USA
| | - Michael Spezio
- Institute of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg, Germany. .,Psychology, Neuroscience, and Data Science, Scripps College, Claremont, CA, USA.
| | - Jan Gläscher
- Institute of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
| |
Collapse
|
7
|
Pan Y, Ma B, Tang J, Zeng Y. Behavioral model summarisation for other agents under uncertainty. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2021.09.039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
8
|
Na S, Chung D, Hula A, Perl O, Jung J, Heflin M, Blackmore S, Fiore VG, Dayan P, Gu X. Humans use forward thinking to exploit social controllability. eLife 2021; 10:64983. [PMID: 34711304 PMCID: PMC8555988 DOI: 10.7554/elife.64983] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Accepted: 09/30/2021] [Indexed: 12/27/2022] Open
Abstract
The controllability of our social environment has a profound impact on our behavior and mental health. Nevertheless, neurocomputational mechanisms underlying social controllability remain elusive. Here, 48 participants performed a task where their current choices either did (Controllable), or did not (Uncontrollable), influence partners’ future proposals. Computational modeling revealed that people engaged a mental model of forward thinking (FT; i.e., calculating the downstream effects of current actions) to estimate social controllability in both Controllable and Uncontrollable conditions. A large-scale online replication study (n=1342) supported this finding. Using functional magnetic resonance imaging (n=48), we further demonstrated that the ventromedial prefrontal cortex (vmPFC) computed the projected total values of current actions during forward planning, supporting the neural realization of the forward-thinking model. These findings demonstrate that humans use vmPFC-dependent FT to estimate and exploit social controllability, expanding the role of this neurocomputational mechanism beyond spatial and cognitive contexts.
Collapse
Affiliation(s)
- Soojung Na
- The Graduate School of Biomedical Sciences, Icahn School of Medicine at Mount Sinai, New York, United States.,Nash Family Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, United States.,Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Dongil Chung
- Department of Biomedical Engineering, Ulsan National Institute of Science and Technology, Ulsan, Republic of Korea
| | - Andreas Hula
- Austrian Institute of Technology, Seibersdorf, Austria
| | - Ofer Perl
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Jennifer Jung
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, United States
| | - Matthew Heflin
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Sylvia Blackmore
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, United States.,Queen Square Institute of Neurology, University College London, London, United Kingdom
| | - Vincenzo G Fiore
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Peter Dayan
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany.,University of Tübingen, Tübingen, Germany
| | - Xiaosi Gu
- Nash Family Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, United States.,Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, United States
| |
Collapse
|
9
|
Multi-Agent Distributed Deep Deterministic Policy Gradient for Partially Observable Tracking. ACTUATORS 2021. [DOI: 10.3390/act10100268] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In many existing multi-agent reinforcement learning tasks, each agent observes all the other agents from its own perspective. In addition, the training process is centralized, namely the critic of each agent can access the policies of all the agents. This scheme has certain limitations since every single agent can only obtain the information of its neighbor agents due to the communication range in practical applications. Therefore, in this paper, a multi-agent distributed deep deterministic policy gradient (MAD3PG) approach is presented with decentralized actors and distributed critics to realize multi-agent distributed tracking. The distinguishing feature of the proposed framework is that we adopted the multi-agent distributed training with decentralized execution, where each critic only takes the agent’s and the neighbor agents’ policies into account. Experiments were conducted in the distributed tracking tasks based on multi-agent particle environments where N(N=3,N=5) agents track a target agent with partial observation. The results showed that the proposed method achieves a higher reward with a shorter training time compared to other methods, including MADDPG, DDPG, PPO, and DQN. The proposed novel method leads to a more efficient and effective multi-agent tracking.
Collapse
|
10
|
Hula A, Moutoussis M, Will GJ, Kokorikou D, Reiter AM, Ziegler G, Bullmore ED, Jones PB, Goodyer I, Fonagy P, Montague PR, Dolan RJ. Multi-Round Trust Game Quantifies Inter-Individual Differences in Social Exchange from Adolescence to Adulthood. COMPUTATIONAL PSYCHIATRY (CAMBRIDGE, MASS.) 2021; 5:102-118. [PMID: 35656356 PMCID: PMC7612797 DOI: 10.5334/cpsy.65] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Investing in strangers in a socio-economic exchange is risky, as we may be uncertain whether they will reciprocate. Nevertheless, the potential rewards for cooperating can be great. Here, we used a cross sectional sample (n = 784) to study how the challenges of cooperation versus defection are negotiated across an important period of the lifespan: from adolescence to young adulthood (ages 14 to 25). We quantified social behaviour using a multi round investor-trustee task, phenotyping individuals using a validated model whose parameters characterise patterns of real exchange and constitute latent social characteristics. We found highly significant differences in investment behaviour according to age, sex, socio-economic status and IQ. Consistent with the literature, we showed an overall trend towards higher trust from adolescence to young adulthood but, in a novel finding, we characterized key cognitive mechanisms explaining this, especially regarding socio-economic risk aversion. Males showed lower risk-aversion, associated with greater investments. We also found that inequality aversion was higher in females and, in a novel relation, that socio-economic deprivation was associated with more risk averse play.
Collapse
Affiliation(s)
- Andreas Hula
- Austrian Insitute of Technology, Vienna, Austria
| | - Michael Moutoussis
- Wellcome Trust Centre for Neuroimaging, University College London, London, United Kingdom
| | - Geert-Jan Will
- Institute of Psychology, Leiden University, Leiden, the Netherlands
| | | | - Andrea M Reiter
- Wellcome Trust Centre for Neuroimaging, University College London, London, United Kingdom; Lifespan Developmental Neuroscience, Faculty of Psychology, Technische Universität Dresden, Germany; Department of Neurology, Max-Planck-Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Gabriel Ziegler
- Centre for Cognitive Neurology and Dementia Research, Magdeburg, Germany; German Center for Neurodegenerative Diseases, Magdeburg, Germany
| | - E D Bullmore
- Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom; Cambridgeshire and Peterborough National Health Service Foundation Trust, Cambridge, United Kingdom; Medical Research Council/Wellcome Trust Behavioural and Clinical Neuroscience Institute, University of Cambridge, Cambridge, United Kingdom; Max Planck University College London Centre for Computational Psychiatry, London, United Kingdom
| | - Peter B Jones
- Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom; Cambridgeshire and Peterborough National Health Service Foundation Trust, Cambridge, United Kingdom
| | - Ian Goodyer
- Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom; Cambridgeshire and Peterborough National Health Service Foundation Trust, Cambridge, United Kingdom
| | - Peter Fonagy
- Anna Freud Centre, London, United Kingdom; Research Department of Clinical, Educational and Health Psychology, University College London, United Kingdom
| | - P Read Montague
- Wellcome Trust Centre for Neuroimaging, University College London, London, United Kingdom; Human Neuroimaging Laboratory, Virginia Tech Carilion Research Institute, Roanoke, Virginia, United States of America; Department of Physics, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
| | - Raymond J Dolan
- Wellcome Trust Centre for Neuroimaging, University College London, London, United Kingdom; Max Planck University College London Centre for Computational Psychiatry, London, United Kingdom
| |
Collapse
|
11
|
Nakahashi R, Yamada S. Balancing Performance and Human Autonomy With Implicit Guidance Agent. Front Artif Intell 2021; 4:736321. [PMID: 34622202 PMCID: PMC8490733 DOI: 10.3389/frai.2021.736321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Accepted: 08/31/2021] [Indexed: 11/13/2022] Open
Abstract
The human-agent team, which is a problem in which humans and autonomous agents collaborate to achieve one task, is typical in human-AI collaboration. For effective collaboration, humans want to have an effective plan, but in realistic situations, they might have difficulty calculating the best plan due to cognitive limitations. In this case, guidance from an agent that has many computational resources may be useful. However, if an agent guides the human behavior explicitly, the human may feel that they have lost autonomy and are being controlled by the agent. We therefore investigated implicit guidance offered by means of an agent's behavior. With this type of guidance, the agent acts in a way that makes it easy for the human to find an effective plan for a collaborative task, and the human can then improve the plan. Since the human improves their plan voluntarily, he or she maintains autonomy. We modeled a collaborative agent with implicit guidance by integrating the Bayesian Theory of Mind into existing collaborative-planning algorithms and demonstrated through a behavioral experiment that implicit guidance is effective for enabling humans to maintain a balance between improving their plans and retaining autonomy.
Collapse
Affiliation(s)
- Ryo Nakahashi
- Department of Informatics, School of Multidisciplinary Sciences, The Graduate University for Advanced Studies(SOKENDAI), Chiyoda, Japan
| | - Seiji Yamada
- Department of Informatics, School of Multidisciplinary Sciences, The Graduate University for Advanced Studies(SOKENDAI), Chiyoda, Japan.,Digital Contentand MediaSciences Research Division, National Institute of Informatics, Chiyoda, Japan
| |
Collapse
|
12
|
Toward data-driven solutions to interactive dynamic influence diagrams. Knowl Inf Syst 2021. [DOI: 10.1007/s10115-021-01600-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
AbstractWith the availability of significant amount of data, data-driven decision making becomes an alternative way for solving complex multiagent decision problems. Instead of using domain knowledge to explicitly build decision models, the data-driven approach learns decisions (probably optimal ones) from available data. This removes the knowledge bottleneck in the traditional knowledge-driven decision making, which requires a strong support from domain experts. In this paper, we study data-driven decision making in the context of interactive dynamic influence diagrams (I-DIDs)—a general framework for multiagent sequential decision making under uncertainty. We propose a data-driven framework to solve the I-DIDs model and focus on learning the behavior of other agents in problem domains. The challenge is on learning a complete policy tree that will be embedded in the I-DIDs models due to limited data. We propose two new methods to develop complete policy trees for the other agents in the I-DIDs. The first method uses a simple clustering process, while the second one employs sophisticated statistical checks. We analyze the proposed algorithms in a theoretical way and experiment them over two problem domains.
Collapse
|
13
|
Arora S, Doshi P. A survey of inverse reinforcement learning: Challenges, methods and progress. ARTIF INTELL 2021. [DOI: 10.1016/j.artint.2021.103500] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
14
|
Ma B, Tang J, Chen B, Pan Y, Zeng Y. Tensor optimization with group lasso for multi-agent predictive state representation. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.106893] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
15
|
Neumeyer C, Oliehoek FA, Gavrila DM. General-Sum Multi-Agent Continuous Inverse Optimal Control. IEEE Robot Autom Lett 2021. [DOI: 10.1109/lra.2021.3060411] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
16
|
Abstract
The central theme of this review is the dynamic interaction between information selection and learning. We pose a fundamental question about this interaction: How do we learn what features of our experiences are worth learning about? In humans, this process depends on attention and memory, two cognitive functions that together constrain representations of the world to features that are relevant for goal attainment. Recent evidence suggests that the representations shaped by attention and memory are themselves inferred from experience with each task. We review this evidence and place it in the context of work that has explicitly characterized representation learning as statistical inference. We discuss how inference can be scaled to real-world decisions by approximating beliefs based on a small number of experiences. Finally, we highlight some implications of this inference process for human decision-making in social environments.
Collapse
Affiliation(s)
- Angela Radulescu
- Department of Psychology, Princeton University, Princeton, New Jersey 08544, USA; .,Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey 08544, USA
| | - Yeon Soon Shin
- Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey 08544, USA
| | - Yael Niv
- Department of Psychology, Princeton University, Princeton, New Jersey 08544, USA; .,Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey 08544, USA
| |
Collapse
|
17
|
Ceren R, He K, Doshi P, Banerjee B. PALO bounds for reinforcement learning in partially observable stochastic games. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.08.054] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
18
|
Du W, Ding S. A survey on multi-agent deep reinforcement learning: from the perspective of challenges and applications. Artif Intell Rev 2020. [DOI: 10.1007/s10462-020-09938-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
19
|
Abstract
PURPOSE OF REVIEW To assess the state-of-the-art in research on trust in robots and to examine if recent methodological advances can aid in the development of trustworthy robots. RECENT FINDINGS While traditional work in trustworthy robotics has focused on studying the antecedents and consequences of trust in robots, recent work has gravitated towards the development of strategies for robots to actively gain, calibrate, and maintain the human user's trust. Among these works, there is emphasis on endowing robotic agents with reasoning capabilities (e.g., via probabilistic modeling). SUMMARY The state-of-the-art in trust research provides roboticists with a large trove of tools to develop trustworthy robots. However, challenges remain when it comes to trust in real-world human-robot interaction (HRI) settings: there exist outstanding issues in trust measurement, guarantees on robot behavior (e.g., with respect to user privacy), and handling rich multidimensional data. We examine how recent advances in psychometrics, trustworthy systems, robot-ethics, and deep learning can provide resolution to each of these issues. In conclusion, we are of the opinion that these methodological advances could pave the way for the creation of truly autonomous, trustworthy social robots.
Collapse
Affiliation(s)
- Bing Cai Kok
- Dept. of Computer Science, School of Computing, National University of Singapore, 13 Computing Drive, Singapore, 119077 Singapore
| | - Harold Soh
- Dept. of Computer Science, School of Computing, National University of Singapore, 13 Computing Drive, Singapore, 119077 Singapore
| |
Collapse
|
20
|
Rusch T, Steixner-Kumar S, Doshi P, Spezio M, Gläscher J. Theory of mind and decision science: Towards a typology of tasks and computational models. Neuropsychologia 2020; 146:107488. [PMID: 32407906 DOI: 10.1016/j.neuropsychologia.2020.107488] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 04/27/2020] [Accepted: 05/04/2020] [Indexed: 01/27/2023]
Abstract
The ability to form a Theory of Mind (ToM), i.e., to theorize about others' mental states to explain and predict behavior in relation to attributed intentional states, constitutes a hallmark of human cognition. These abilities are multi-faceted and include a variety of different cognitive sub-functions. Here, we focus on decision processes in social contexts and review a number of experimental and computational modeling approaches in this field. We provide an overview of experimental accounts and formal computational models with respect to two dimensions: interactivity and uncertainty. Thereby, we aim at capturing the nuances of ToM functions in the context of social decision processes. We suggest there to be an increase in ToM engagement and multiplexing as social cognitive decision-making tasks become more interactive and uncertain. We propose that representing others as intentional and goal directed agents who perform consequential actions is elicited only at the edges of these two dimensions. Further, we argue that computational models of valuation and beliefs follow these dimensions to best allow researchers to effectively model sophisticated ToM-processes. Finally, we relate this typology to neuroimaging findings in neurotypical (NT) humans, studies of persons with autism spectrum (AS), and studies of nonhuman primates.
Collapse
Affiliation(s)
- Tessa Rusch
- Institute of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Martinistr. 52, 20251, Hamburg, Germany; Division of the Humanities and Social Sciences, California Institute of Technology, 1200 E. California Blvd., Pasadena, CA, 91125, USA.
| | - Saurabh Steixner-Kumar
- Institute of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Martinistr. 52, 20251, Hamburg, Germany
| | - Prashant Doshi
- Department of Computer Science, University of Georgia, 539 Boyd GSRC, Athens, GA, 30602, USA
| | - Michael Spezio
- Institute of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Martinistr. 52, 20251, Hamburg, Germany; Psychology, Neuroscience, and Data Science, Scripps College, 1030 N Columbia Ave, Claremont, CA, 91711, USA.
| | - Jan Gläscher
- Institute of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Martinistr. 52, 20251, Hamburg, Germany
| |
Collapse
|
21
|
Hayashi A, Ruiken D, Hasegawa T, Goerick C. Reasoning about uncertain parameters and agent behaviors through encoded experiences and belief planning. ARTIF INTELL 2020. [DOI: 10.1016/j.artint.2019.103228] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
22
|
Bard N, Foerster JN, Chandar S, Burch N, Lanctot M, Song HF, Parisotto E, Dumoulin V, Moitra S, Hughes E, Dunning I, Mourad S, Larochelle H, Bellemare MG, Bowling M. The Hanabi challenge: A new frontier for AI research. ARTIF INTELL 2020. [DOI: 10.1016/j.artint.2019.103216] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
23
|
Doshi P, Gmytrasiewicz P, Durfee E. Recursively modeling other agents for decision making: A research perspective. ARTIF INTELL 2020. [DOI: 10.1016/j.artint.2019.103202] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
24
|
Cognitive bots and algorithmic humans: toward a shared understanding of social intelligence. Curr Opin Behav Sci 2019. [DOI: 10.1016/j.cobeha.2019.04.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
25
|
Chinchali SP, Livingston SC, Chen M, Pavone M. Multi-objective optimal control for proactive decision making with temporal logic models. Int J Rob Res 2019. [DOI: 10.1177/0278364919868290] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The operation of today’s robots entails interactions with humans, e.g., in autonomous driving amidst human-driven vehicles. To effectively do so, robots must proactively decode the intent of humans and concurrently leverage this knowledge for safe, cooperative task satisfaction: a problem we refer to as proactive decision making. However, simultaneous intent decoding and robotic control requires reasoning over several possible human behavioral models, resulting in high-dimensional state trajectories. In this paper, we address the proactive decision-making problem using a novel combination of formal methods, control, and data mining techniques. First, we distill high-dimensional state trajectories of human–robot interaction into concise, symbolic behavioral summaries that can be learned from data. Second, we leverage formal methods to model high-level agent goals, safe interaction, and information-seeking behavior with temporal logic formulas. Finally, we design a novel decision-making scheme that maintains a belief distribution over models of human behavior, and proactively plans informative actions. After showing several desirable theoretical properties, we apply our framework to a dataset of humans driving in crowded merging scenarios. For it, temporal logic models are generated and used to synthesize control strategies using tree-based value iteration and deep reinforcement learning. In addition, we illustrate how data-driven models of human responses to informative robot probes, such as from generative models such as conditional variational autoencoders, can be clustered with formal specifications. Results from simulated self-driving car scenarios demonstrate that data-driven strategies enable safe interaction, correct model identification, and significant dimensionality reduction.
Collapse
Affiliation(s)
| | | | - Mo Chen
- Simon Fraser University, Burnaby, BC, Canada
| | | |
Collapse
|
26
|
Luo Y, Hétu S, Lohrenz T, Hula A, Dayan P, Ramey SL, Sonnier-Netto L, Lisinski J, LaConte S, Nolte T, Fonagy P, Rahmani E, Montague PR, Ramey C. Early childhood investment impacts social decision-making four decades later. Nat Commun 2018; 9:4705. [PMID: 30459305 PMCID: PMC6246600 DOI: 10.1038/s41467-018-07138-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Accepted: 10/11/2018] [Indexed: 11/09/2022] Open
Abstract
Early childhood educational investment produces positive effects on cognitive and non-cognitive skills, health, and socio-economic success. However, the effects of such interventions on social decision-making later in life are unknown. We recalled participants from one of the oldest randomized controlled studies of early childhood investment-the Abecedarian Project (ABC)-to participate in well-validated interactive economic games that probe social norm enforcement and planning. We show that in a repeated-play ultimatum game, ABC participants who received high-quality early interventions strongly reject unequal division of money across players (disadvantageous or advantageous) even at significant cost to themselves. Using a multi-round trust game and computational modeling of social exchange, we show that the same intervention participants also plan further into the future. These findings suggest that high quality early childhood investment can result in long-term changes in social decision-making and promote social norm enforcement in order to reap future benefits.
Collapse
Affiliation(s)
- Yi Luo
- Virginia Tech Carilion Research Institute, Roanoke, VA, 24016, USA
| | - Sébastien Hétu
- Virginia Tech Carilion Research Institute, Roanoke, VA, 24016, USA.,Université de Montréal, Montreal, QC, H3C 3J7, Canada
| | - Terry Lohrenz
- Virginia Tech Carilion Research Institute, Roanoke, VA, 24016, USA
| | - Andreas Hula
- Austrian Institute of Technology, 1210, Vienna, Austria
| | - Peter Dayan
- Wellcome Trust Centre for Neuroimaging, University College London, 12 Queen Square, London, WC1E 6BT, UK.,Gatsby Computational Neuroscience Unit, University College London, London, WC1E 6BT, UK
| | | | | | | | - Stephen LaConte
- Virginia Tech Carilion Research Institute, Roanoke, VA, 24016, USA
| | - Tobias Nolte
- Wellcome Trust Centre for Neuroimaging, University College London, 12 Queen Square, London, WC1E 6BT, UK.,Anna Freud National Centre for Children and Families, 21 Maresfield Gardens, London, NW3 5SD, UK
| | - Peter Fonagy
- Anna Freud National Centre for Children and Families, 21 Maresfield Gardens, London, NW3 5SD, UK.,Research Department of Clinical, Educational and Health Psychology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Elham Rahmani
- Psychiatry Department, Virginia Tech Carilion School of Medicine, Roanoke, VA, 24016, USA
| | - P Read Montague
- Virginia Tech Carilion Research Institute, Roanoke, VA, 24016, USA. .,Wellcome Trust Centre for Neuroimaging, University College London, 12 Queen Square, London, WC1E 6BT, UK.
| | - Craig Ramey
- Virginia Tech Carilion Research Institute, Roanoke, VA, 24016, USA
| |
Collapse
|
27
|
Hoey J, Schröder T, Morgan J, Rogers KB, Rishi D, Nagappan M. Artificial Intelligence and Social Simulation: Studying Group Dynamics on a Massive Scale. SMALL GROUP RESEARCH 2018. [DOI: 10.1177/1046496418802362] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Recent advances in artificial intelligence and computer science can be used by social scientists in their study of groups and teams. Here, we explain how developments in machine learning and simulations with artificially intelligent agents can help group and team scholars to overcome two major problems they face when studying group dynamics. First, because empirical research on groups relies on manual coding, it is hard to study groups in large numbers (the scaling problem). Second, conventional statistical methods in behavioral science often fail to capture the nonlinear interaction dynamics occurring in small groups (the dynamics problem). Machine learning helps to address the scaling problem, as massive computing power can be harnessed to multiply manual codings of group interactions. Computer simulations with artificially intelligent agents help to address the dynamics problem by implementing social psychological theory in data-generating algorithms that allow for sophisticated statements and tests of theory. We describe an ongoing research project aimed at computational analysis of virtual software development teams.
Collapse
|
28
|
|
29
|
Hula A, Vilares I, Lohrenz T, Dayan P, Montague PR. A model of risk and mental state shifts during social interaction. PLoS Comput Biol 2018; 14:e1005935. [PMID: 29447153 PMCID: PMC5831643 DOI: 10.1371/journal.pcbi.1005935] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2017] [Revised: 02/28/2018] [Accepted: 12/19/2017] [Indexed: 11/18/2022] Open
Abstract
Cooperation and competition between human players in repeated microeconomic games offer a window onto social phenomena such as the establishment, breakdown and repair of trust. However, although a suitable starting point for the quantitative analysis of such games exists, namely the Interactive Partially Observable Markov Decision Process (I-POMDP), computational considerations and structural limitations have limited its application, and left unmodelled critical features of behavior in a canonical trust task. Here, we provide the first analysis of two central phenomena: a form of social risk-aversion exhibited by the player who is in control of the interaction in the game; and irritation or anger, potentially exhibited by both players. Irritation arises when partners apparently defect, and it potentially causes a precipitate breakdown in cooperation. Failing to model one’s partner’s propensity for it leads to substantial economic inefficiency. We illustrate these behaviours using evidence drawn from the play of large cohorts of healthy volunteers and patients. We show that for both cohorts, a particular subtype of player is largely responsible for the breakdown of trust, a finding which sheds new light on borderline personality disorder. In multi-round games in which players can benefit by trusting each other, swift and catastrophic breakdowns can arise amidst otherwise efficient cooperation. We present a model that quantifies this as a form of anger, and we exploit novel algorithmic improvements in inference based on the model to examine exchanges involving healthy volunteers and people suffering from personality disorders. This provides a new view on the problems that can underlie social interactions.
Collapse
Affiliation(s)
- Andreas Hula
- Austrian Institute of Technology, Vienna, Austria
- * E-mail:
| | - Iris Vilares
- Wellcome Trust Centre for Neuroimaging, University College London, London, United Kingdom
- Human Neuroimaging Laboratory, Virginia Tech Carilion Research Institute, Roanoke, Virginia, United States of America
| | - Terry Lohrenz
- Human Neuroimaging Laboratory, Virginia Tech Carilion Research Institute, Roanoke, Virginia, United States of America
| | - Peter Dayan
- Wellcome Trust Centre for Neuroimaging, University College London, London, United Kingdom
- Gatsby Computational Unit, University College London, London, United Kingdom
| | - P. Read Montague
- Wellcome Trust Centre for Neuroimaging, University College London, London, United Kingdom
- Human Neuroimaging Laboratory, Virginia Tech Carilion Research Institute, Roanoke, Virginia, United States of America
- Department of Physics, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
| |
Collapse
|
30
|
Moënne-Loccoz C, Vergara RC, López V, Mery D, Cosmelli D. Modeling Search Behaviors during the Acquisition of Expertise in a Sequential Decision-Making Task. Front Comput Neurosci 2017; 11:80. [PMID: 28943847 PMCID: PMC5596102 DOI: 10.3389/fncom.2017.00080] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Accepted: 08/04/2017] [Indexed: 11/13/2022] Open
Abstract
Our daily interaction with the world is plagued of situations in which we develop expertise through self-motivated repetition of the same task. In many of these interactions, and especially when dealing with computer and machine interfaces, we must deal with sequences of decisions and actions. For instance, when drawing cash from an ATM machine, choices are presented in a step-by-step fashion and a specific sequence of choices must be performed in order to produce the expected outcome. But, as we become experts in the use of such interfaces, is it possible to identify specific search and learning strategies? And if so, can we use this information to predict future actions? In addition to better understanding the cognitive processes underlying sequential decision making, this could allow building adaptive interfaces that can facilitate interaction at different moments of the learning curve. Here we tackle the question of modeling sequential decision-making behavior in a simple human-computer interface that instantiates a 4-level binary decision tree (BDT) task. We record behavioral data from voluntary participants while they attempt to solve the task. Using a Hidden Markov Model-based approach that capitalizes on the hierarchical structure of behavior, we then model their performance during the interaction. Our results show that partitioning the problem space into a small set of hierarchically related stereotyped strategies can potentially capture a host of individual decision making policies. This allows us to follow how participants learn and develop expertise in the use of the interface. Moreover, using a Mixture of Experts based on these stereotyped strategies, the model is able to predict the behavior of participants that master the task.
Collapse
Affiliation(s)
- Cristóbal Moënne-Loccoz
- Department of Computer Science, School of Engineering, Pontificia Universidad Católica de ChileSantiago, Chile
| | - Rodrigo C. Vergara
- Facultad de Medicina, Biomedical Neuroscience Institute, Universidad de ChileSantiago, Chile
| | - Vladimir López
- Center for Interdisciplinary Neuroscience, Pontificia Universidad Católica de ChileSantiago, Chile
- School of Psychology, Pontificia Universidad Católica de ChileSantiago, Chile
| | - Domingo Mery
- Department of Computer Science, School of Engineering, Pontificia Universidad Católica de ChileSantiago, Chile
| | - Diego Cosmelli
- Center for Interdisciplinary Neuroscience, Pontificia Universidad Católica de ChileSantiago, Chile
- School of Psychology, Pontificia Universidad Católica de ChileSantiago, Chile
| |
Collapse
|
31
|
|
32
|
Huang X, Zhang S, Shang Y, Zhang W, Liu J. Creating Affective Autonomous Characters Using Planning in Partially Observable Stochastic Domains. IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES 2017. [DOI: 10.1109/tciaig.2015.2494599] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
33
|
Barrett S, Rosenfeld A, Kraus S, Stone P. Making friends on the fly: Cooperating with new teammates. ARTIF INTELL 2017. [DOI: 10.1016/j.artint.2016.10.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
34
|
Wu H, Luo J. Efficient solutions of interactive dynamic influence diagrams using model identification. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2016.07.052] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
35
|
|
36
|
Pereira Dimuro G, da Rocha Costa AC. Regulating social exchanges in open MAS: The problem of reciprocal conversions between POMDPs and HMMs. Inf Sci (N Y) 2015. [DOI: 10.1016/j.ins.2015.06.023] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
37
|
Hula A, Montague PR, Dayan P. Monte Carlo Planning Method Estimates Planning Horizons during Interactive Social Exchange. PLoS Comput Biol 2015; 11:e1004254. [PMID: 26053429 PMCID: PMC4460182 DOI: 10.1371/journal.pcbi.1004254] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2014] [Accepted: 03/23/2015] [Indexed: 11/29/2022] Open
Abstract
Reciprocating interactions represent a central feature of all human exchanges. They have been the target of various recent experiments, with healthy participants and psychiatric populations engaging as dyads in multi-round exchanges such as a repeated trust task. Behaviour in such exchanges involves complexities related to each agent’s preference for equity with their partner, beliefs about the partner’s appetite for equity, beliefs about the partner’s model of their partner, and so on. Agents may also plan different numbers of steps into the future. Providing a computationally precise account of the behaviour is an essential step towards understanding what underlies choices. A natural framework for this is that of an interactive partially observable Markov decision process (IPOMDP). However, the various complexities make IPOMDPs inordinately computationally challenging. Here, we show how to approximate the solution for the multi-round trust task using a variant of the Monte-Carlo tree search algorithm. We demonstrate that the algorithm is efficient and effective, and therefore can be used to invert observations of behavioural choices. We use generated behaviour to elucidate the richness and sophistication of interactive inference. Agents interacting in games with multiple rounds must model their partner’s thought processes over extended time horizons. This poses a substantial computational challenge that has restricted previous behavioural analyses. By taking advantage of recent advances in algorithms for planning in the face of uncertainty, we demonstrate how these formal methods can be extended. We use a well studied social exchange game called the trust task to illustrate the power of our method, showing how agents with particular cognitive and social characteristics can be expected to interact, and how to infer the properties of individuals from observing their behaviour.
Collapse
Affiliation(s)
- Andreas Hula
- Wellcome Trust Centre for Neuroimaging, University College London, London, United Kingdom
- * E-mail:
| | - P. Read Montague
- Wellcome Trust Centre for Neuroimaging, University College London, London, United Kingdom
- Human Neuroimaging Laboratory, Virginia Tech Carilion Research Institute, Roanoke, Virginia, United States of America
- Department of Physics, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
| | - Peter Dayan
- Gatsby Computational Neuroscience Unit, University College London, London, United Kingdom
| |
Collapse
|
38
|
|
39
|
Pan Y, Zeng Y, Xiang Y, Sun L, Chen X. Time-critical interactive dynamic influence diagram. Int J Approx Reason 2015. [DOI: 10.1016/j.ijar.2014.11.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
40
|
|
41
|
Pynadath DV, Rosenbloom PS, Marsella SC. Reinforcement Learning for Adaptive Theory of Mind in the Sigma Cognitive Architecture. ARTIFICIAL GENERAL INTELLIGENCE 2014. [DOI: 10.1007/978-3-319-09274-4_14] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
|
42
|
Zhang S, Sridharan M, Washington C. Active Visual Planning for Mobile Robot Teams Using Hierarchical POMDPs. IEEE T ROBOT 2013. [DOI: 10.1109/tro.2013.2252252] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
43
|
Capitan J, Spaan MT, Merino L, Ollero A. Decentralized multi-robot cooperation with auctioned POMDPs. Int J Rob Res 2013. [DOI: 10.1177/0278364913483345] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Planning under uncertainty faces a scalability problem when considering multi-robot teams, as the information space scales exponentially with the number of robots. To address this issue, this paper proposes to decentralize multi-robot partially observable Markov decision processes (POMDPs) while maintaining cooperation between robots by using POMDP policy auctions. Auctions provide a flexible way of coordinating individual policies modeled by POMDPs and have low communication requirements. In addition, communication models in the multi-agent POMDP literature severely mismatch with real inter-robot communication. We address this issue by exploiting a decentralized data fusion method in order to efficiently maintain a joint belief state among the robots. The paper presents two different applications: environmental monitoring with unmanned aerial vehicles (UAVs); and cooperative tracking, in which several robots have to jointly track a moving target of interest. The first one is used as a proof of concept and illustrates the proposed ideas through different simulations. The second one adds real multi-robot experiments, showcasing the flexibility and robust coordination that our techniques can provide.
Collapse
|
44
|
de Weerd H, Verbrugge R, Verheij B. How much does it help to know what she knows you know? An agent-based simulation study. ARTIF INTELL 2013. [DOI: 10.1016/j.artint.2013.05.004] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
45
|
Bayesian-Game-Based Fuzzy Reinforcement Learning Control for Decentralized POMDPs. IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES 2012. [DOI: 10.1109/tciaig.2012.2212279] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
46
|
Doshi P, Qu X, Goodie AS, Young DL. Modeling Human Recursive Reasoning Using Empirically Informed Interactive Partially Observable Markov Decision Processes. ACTA ACUST UNITED AC 2012. [DOI: 10.1109/tsmca.2012.2199484] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
47
|
Torreño A, Onaindia E, Sapena Ó. A flexible coupling approach to multi-agent planning under incomplete information. Knowl Inf Syst 2012. [DOI: 10.1007/s10115-012-0569-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
48
|
|
49
|
Søndberg-Jeppesen N, Jensen FV. A PGM framework for recursive modeling of players in simple sequential Bayesian games. Int J Approx Reason 2010. [DOI: 10.1016/j.ijar.2010.01.015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
50
|
Abstract
This paper introduces a model of ‘theory of mind’, namely, how we represent the intentions and goals of others to optimise our mutual interactions. We draw on ideas from optimum control and game theory to provide a ‘game theory of mind’. First, we consider the representations of goals in terms of value functions that are prescribed by utility or rewards. Critically, the joint value functions and ensuing behaviour are optimised recursively, under the assumption that I represent your value function, your representation of mine, your representation of my representation of yours, and so on ad infinitum. However, if we assume that the degree of recursion is bounded, then players need to estimate the opponent's degree of recursion (i.e., sophistication) to respond optimally. This induces a problem of inferring the opponent's sophistication, given behavioural exchanges. We show it is possible to deduce whether players make inferences about each other and quantify their sophistication on the basis of choices in sequential games. This rests on comparing generative models of choices with, and without, inference. Model comparison is demonstrated using simulated and real data from a ‘stag-hunt’. Finally, we note that exactly the same sophisticated behaviour can be achieved by optimising the utility function itself (through prosocial utility), producing unsophisticated but apparently altruistic agents. This may be relevant ethologically in hierarchal game theory and coevolution. The ability to work out what other people are thinking is essential for effective social interactions, be they cooperative or competitive. A widely used example is cooperative hunting: large prey is difficult to catch alone, but we can circumvent this by cooperating with others. However, hunting can pit private goals to catch smaller prey that can be caught alone against mutually beneficial goals that require cooperation. Understanding how we work out optimal strategies that balance cooperation and competition has remained a central puzzle in game theory. Exploiting insights from computer science and behavioural economics, we suggest a model of ‘theory of mind’ using ‘recursive sophistication’ in which my model of your goals includes a model of your model of my goals, and so on ad infinitum. By studying experimental data in which people played a computer-based group hunting game, we show that the model offers a good account of individual decisions in this context, suggesting that such a formal ‘theory of mind’ model can cast light on how people build internal representations of other people in social interactions.
Collapse
Affiliation(s)
- Wako Yoshida
- The Wellcome Trust Centre for Neuroimaging, University College London, UK.
| | | | | |
Collapse
|