1
|
Galstyan V, Saakian DB. Quantifying the stochasticity of policy parameters in reinforcement learning problems. Phys Rev E 2023; 107:034112. [PMID: 37072940 DOI: 10.1103/physreve.107.034112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Accepted: 02/16/2023] [Indexed: 04/20/2023]
Abstract
The stochastic dynamics of reinforcement learning is studied using a master equation formalism. We consider two different problems-Q learning for a two-agent game and the multiarmed bandit problem with policy gradient as the learning method. The master equation is constructed by introducing a probability distribution over continuous policy parameters or over both continuous policy parameters and discrete state variables (a more advanced case). We use a version of the moment closure approximation to solve for the stochastic dynamics of the models. Our method gives accurate estimates for the mean and the (co)variance of policy variables. For the case of the two-agent game, we find that the variance terms are finite at steady state and derive a system of algebraic equations for computing them directly.
Collapse
Affiliation(s)
- Vahe Galstyan
- AMOLF, Science Park 104, 1098 XG Amsterdam, Netherlands
- A.I. Alikhanyan National Science Laboratory (Yerevan Physics Institute) Foundation, 2 Alikhanian Brothers Street, Yerevan 375036, Armenia
| | - David B Saakian
- A.I. Alikhanyan National Science Laboratory (Yerevan Physics Institute) Foundation, 2 Alikhanian Brothers Street, Yerevan 375036, Armenia
| |
Collapse
|
2
|
Leonardos S, Sakos J, Courcoubetis C, Piliouras G. Catastrophe by Design in Population Games: A Mechanism to Destabilize Inefficient Locked-in Technologies. ACM TRANSACTIONS ON ECONOMICS AND COMPUTATION 2023. [DOI: 10.1145/3583782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/17/2023]
Abstract
In multi-agent environments in which coordination is desirable, the history of play often causes lock-in at sub-optimal outcomes. Notoriously, technologies with significant environmental footprint or high social cost persist despite the successful development of more environmentally friendly and/or socially efficient alternatives. The displacement of the status quo is hindered by entrenched economic interests and network effects. To exacerbate matters, the standard mechanism design approaches based on centralized authorities with the capacity to use preferential subsidies to effectively dictate system outcomes are not always applicable to modern decentralised economies. What other types of mechanisms are feasible?
In this paper, we develop and analyze a mechanism which induces transitions from inefficient lock-ins to superior alternatives. This mechanism does not exogenously favor one option over another – instead, the phase transition emerges endogenously via a standard evolutionary learning model, Q-learning, where agents trade off exploration and exploitation. Exerting the same transient influence to both the efficient and inefficient technologies encourages exploration and results in irreversible phase transitions and permanent stabilization of the efficient one. On a technical level, our work is based on bifurcation and catastrophe theory, a branch of mathematics that deals with changes in the number and stability properties of equilibria. Critically, our analysis is shown to be structurally robust to significant and even adversarially chosen perturbations to the parameters of both our game and our behavioral model.
Collapse
Affiliation(s)
| | - Joseph Sakos
- Singapore University of Technology and Design, Singapore
| | | | | |
Collapse
|
3
|
Banisch S, Gaisbauer F, Olbrich E. Modelling Spirals of Silence and Echo Chambers by Learning from the Feedback of Others. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1484. [PMID: 37420504 DOI: 10.3390/e24101484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 10/10/2022] [Accepted: 10/12/2022] [Indexed: 07/09/2023]
Abstract
What are the mechanisms by which groups with certain opinions gain public voice and force others holding a different view into silence? Furthermore, how does social media play into this? Drawing on neuroscientific insights into the processing of social feedback, we develop a theoretical model that allows us to address these questions. In repeated interactions, individuals learn whether their opinion meets public approval and refrain from expressing their standpoint if it is socially sanctioned. In a social network sorted around opinions, an agent forms a distorted impression of public opinion enforced by the communicative activity of the different camps. Even strong majorities can be forced into silence if a minority acts as a cohesive whole. On the other hand, the strong social organisation around opinions enabled by digital platforms favours collective regimes in which opposing voices are expressed and compete for primacy in public. This paper highlights the role that the basic mechanisms of social information processing play in massive computer-mediated interactions on opinions.
Collapse
Affiliation(s)
- Sven Banisch
- Institute of Technology Futures, Karlsruhe Institute of Technology, 76133 Karlsruhe, Germany
- Max Planck Institute for Mathematics in the Sciences, 04103 Leipzig, Germany
| | - Felix Gaisbauer
- Max Planck Institute for Mathematics in the Sciences, 04103 Leipzig, Germany
| | - Eckehard Olbrich
- Max Planck Institute for Mathematics in the Sciences, 04103 Leipzig, Germany
| |
Collapse
|
4
|
Barfuss W. Dynamical systems as a level of cognitive analysis of multi-agent learning: Algorithmic foundations of temporal-difference learning dynamics. Neural Comput Appl 2022; 34:1653-1671. [PMID: 35221541 PMCID: PMC8827307 DOI: 10.1007/s00521-021-06117-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Accepted: 05/11/2021] [Indexed: 01/02/2023]
Abstract
A dynamical systems perspective on multi-agent learning, based on the link between evolutionary game theory and reinforcement learning, provides an improved, qualitative understanding of the emerging collective learning dynamics. However, confusion exists with respect to how this dynamical systems account of multi-agent learning should be interpreted. In this article, I propose to embed the dynamical systems description of multi-agent learning into different abstraction levels of cognitive analysis. The purpose of this work is to make the connections between these levels explicit in order to gain improved insight into multi-agent learning. I demonstrate the usefulness of this framework with the general and widespread class of temporal-difference reinforcement learning. I find that its deterministic dynamical systems description follows a minimum free-energy principle and unifies a boundedly rational account of game theory with decision-making under uncertainty. I then propose an on-line sample-batch temporal-difference algorithm which is characterized by the combination of applying a memory-batch and separated state-action value estimation. I find that this algorithm serves as a micro-foundation of the deterministic learning equations by showing that its learning trajectories approach the ones of the deterministic learning equations under large batch sizes. Ultimately, this framework of embedding a dynamical systems description into different abstraction levels gives guidance on how to unleash the full potential of the dynamical systems approach to multi-agent learning.
Collapse
Affiliation(s)
- Wolfram Barfuss
- School of Mathematics, University of Leeds, Leeds, UK.,Tübingen AI Center, University of Tübingen, Tübingen, Germany
| |
Collapse
|
5
|
Leonardos S, Piliouras G. Exploration-exploitation in multi-agent learning: Catastrophe theory meets game theory. ARTIF INTELL 2022. [DOI: 10.1016/j.artint.2021.103653] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
6
|
Lauffenburger JC, Yom-Tov E, Keller PA, McDonnell ME, Bessette LG, Fontanet CP, Sears ES, Kim E, Hanken K, Buckley JJ, Barlev RA, Haff N, Choudhry NK. REinforcement learning to improve non-adherence for diabetes treatments by Optimising Response and Customising Engagement (REINFORCE): study protocol of a pragmatic randomised trial. BMJ Open 2021; 11:e052091. [PMID: 34862289 PMCID: PMC8647547 DOI: 10.1136/bmjopen-2021-052091] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
INTRODUCTION Achieving optimal diabetes control requires several daily self-management behaviours, especially adherence to medication. Evidence supports the use of text messages to support adherence, but there remains much opportunity to improve their effectiveness. One key limitation is that message content has been generic. By contrast, reinforcement learning is a machine learning method that can be used to identify individuals' patterns of responsiveness by observing their response to cues and then optimising them accordingly. Despite its demonstrated benefits outside of healthcare, its application to tailoring communication for patients has received limited attention. The objective of this trial is to test the impact of a reinforcement learning-based text messaging programme on adherence to medication for patients with type 2 diabetes. METHODS AND ANALYSIS In the REinforcement learning to Improve Non-adherence For diabetes treatments by Optimising Response and Customising Engagement (REINFORCE) trial, we are randomising 60 patients with suboptimal diabetes control treated with oral diabetes medications to receive a reinforcement learning intervention or control. Subjects in both arms will receive electronic pill bottles to use, and those in the intervention arm will receive up to daily text messages. The messages will be individually adapted using a reinforcement learning prediction algorithm based on daily adherence measurements from the pill bottles. The trial's primary outcome is average adherence to medication over the 6-month follow-up period. Secondary outcomes include diabetes control, measured by glycated haemoglobin A1c, and self-reported adherence. In sum, the REINFORCE trial will evaluate the effect of personalising the framing of text messages for patients to support medication adherence and provide insight into how this could be adapted at scale to improve other self-management interventions. ETHICS AND DISSEMINATION This study was approved by the Mass General Brigham Institutional Review Board (IRB) (USA). Findings will be disseminated through peer-reviewed journals, clinicaltrials.gov reporting and conferences. TRIAL REGISTRATION NUMBER Clinicaltrials.gov (NCT04473326).
Collapse
Affiliation(s)
- Julie C Lauffenburger
- Center for Healthcare Delivery Sciences, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Elad Yom-Tov
- Microsoft Research, Microsoft, Herzeliya, Israel
| | - Punam A Keller
- Tuck School of Business, Dartmouth College, Hanover, NH, USA
| | - Marie E McDonnell
- Endocrinology, Diabetes and Hypertension, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Lily G Bessette
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Constance P Fontanet
- Center for Healthcare Delivery Sciences, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Ellen S Sears
- Center for Healthcare Delivery Sciences, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Erin Kim
- Center for Healthcare Delivery Sciences, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Kaitlin Hanken
- Center for Healthcare Delivery Sciences, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - J Joseph Buckley
- Division of Sleep Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Renee A Barlev
- Center for Healthcare Delivery Sciences, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Nancy Haff
- Center for Healthcare Delivery Sciences, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Niteesh K Choudhry
- Center for Healthcare Delivery Sciences, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
7
|
Gaisbauer F, Olbrich E, Banisch S. Dynamics of opinion expression. Phys Rev E 2020; 102:042303. [PMID: 33212677 DOI: 10.1103/physreve.102.042303] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Accepted: 09/04/2020] [Indexed: 11/07/2022]
Abstract
Modeling efforts in opinion dynamics have to a large extent ignored that opinion exchange between individuals can also have an effect on how willing they are to express their opinion publicly. Here, we introduce a model of public opinion expression. Two groups of agents with different opinion on an issue interact with each other, changing the willingness to express their opinion according to whether they perceive themselves as part of the majority or minority opinion. We formulate the model as a multigroup majority game and investigate the Nash equilibria. We also provide a dynamical systems perspective: Using the reinforcement learning algorithm of Q-learning, we reduce the N-agent system in a mean-field approach to two dimensions which represent the two opinion groups. This two-dimensional system is analyzed in a comprehensive bifurcation analysis of its parameters. The model identifies social-structural conditions for public opinion predominance of different groups. Among other findings, we show under which circumstances a minority can dominate public discourse.
Collapse
Affiliation(s)
- Felix Gaisbauer
- Max Planck Institute for Mathematics in the Sciences and Inselstrasse 22, 04103 Leipzig, Germany
| | - Eckehard Olbrich
- Max Planck Institute for Mathematics in the Sciences and Inselstrasse 22, 04103 Leipzig, Germany
| | - Sven Banisch
- Max Planck Institute for Mathematics in the Sciences and Inselstrasse 22, 04103 Leipzig, Germany
| |
Collapse
|
8
|
Zhang SP, Dong JQ, Liu L, Huang ZG, Huang L, Lai YC. Reinforcement learning meets minority game: Toward optimal resource allocation. Phys Rev E 2019; 99:032302. [PMID: 30999513 DOI: 10.1103/physreve.99.032302] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Indexed: 11/06/2022]
Abstract
The main point of this paper is to provide an affirmative answer through exploiting reinforcement learning (RL) in artificial intelligence (AI) for eliminating herding without any external control in complex resource allocation systems. In particular, we demonstrate that when agents are empowered with RL (e.g., the popular Q-learning algorithm in AI) in that they get familiar with the unknown game environment gradually and attempt to deliver the optimal actions to maximize the payoff, herding can effectively be eliminated. Furthermore, computations reveal the striking phenomenon that, regardless of the initial state, the system evolves persistently and relentlessly toward the optimal state in which all resources are used efficiently. However, the evolution process is not without interruptions: there are large fluctuations that occur but only intermittently in time. The statistical distribution of the time between two successive fluctuating events is found to depend on the parity of the evolution, i.e., whether the number of time steps in between is odd or even. We develop a physical analysis and derive mean-field equations to gain an understanding of these phenomena. Since AI is becoming increasingly widespread, we expect our RL empowered minority game system to have broad applications.
Collapse
Affiliation(s)
- Si-Ping Zhang
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education, The Key Laboratory of Neuro-informatics & Rehabilitation Engineering of Ministry of Civil Affairs, and Institute of Health and Rehabilitation Science, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China.,Institute of Computational Physics and Complex Systems, Lanzhou University, Lanzhou 730000, China
| | - Jia-Qi Dong
- Institute of Computational Physics and Complex Systems, Lanzhou University, Lanzhou 730000, China.,Institute of Theoretical Physics, Key Laboratory of Theoretical Physics, Chinese Academy of Sciences, P.O. Box 2735, Beijing 100190, China
| | - Li Liu
- School of Software Engineering, Chongqing University, Chongqing 400044, People's Republic of China
| | - Zi-Gang Huang
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education, The Key Laboratory of Neuro-informatics & Rehabilitation Engineering of Ministry of Civil Affairs, and Institute of Health and Rehabilitation Science, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
| | - Liang Huang
- Institute of Computational Physics and Complex Systems, Lanzhou University, Lanzhou 730000, China
| | - Ying-Cheng Lai
- School of Electrical, Computer and Energy Engineering, Department of Physics, Arizona State University, Tempe, Arizona 85287, USA
| |
Collapse
|
9
|
Path planning of a mobile robot in a free-space environment using Q-learning. PROGRESS IN ARTIFICIAL INTELLIGENCE 2018. [DOI: 10.1007/s13748-018-00168-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
10
|
|
11
|
Bifurcation Mechanism Design—From Optimal Flat Taxes to Better Cancer Treatments. GAMES 2018. [DOI: 10.3390/g9020021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
12
|
Li X, Cao R, Hao J. An Adaptive Learning Based Network Selection Approach for 5G Dynamic Environments. ENTROPY 2018; 20:e20040236. [PMID: 33265327 PMCID: PMC7512751 DOI: 10.3390/e20040236] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/08/2018] [Revised: 03/07/2018] [Accepted: 03/24/2018] [Indexed: 11/17/2022]
Abstract
Networks will continue to become increasingly heterogeneous as we move toward 5G. Meanwhile, the intelligent programming of the core network makes the available radio resource be more changeable rather than static. In such a dynamic and heterogeneous network environment, how to help terminal users select optimal networks to access is challenging. Prior implementations of network selection are usually applicable for the environment with static radio resources, while they cannot handle the unpredictable dynamics in 5G network environments. To this end, this paper considers both the fluctuation of radio resources and the variation of user demand. We model the access network selection scenario as a multiagent coordination problem, in which a bunch of rationally terminal users compete to maximize their benefits with incomplete information about the environment (no prior knowledge of network resource and other users’ choices). Then, an adaptive learning based strategy is proposed, which enables users to adaptively adjust their selections in response to the gradually or abruptly changing environment. The system is experimentally shown to converge to Nash equilibrium, which also turns out to be both Pareto optimal and socially optimal. Extensive simulation results show that our approach achieves significantly better performance compared with two learning and non-learning based approaches in terms of load balancing, user payoff and the overall bandwidth utilization efficiency. In addition, the system has a good robustness performance under the condition with non-compliant terminal users.
Collapse
Affiliation(s)
- Xiaohong Li
- School of Computer Science and Technology, Tianjin University, Tianjin 300000, China
| | - Ru Cao
- School of Computer Science and Technology, Tianjin University, Tianjin 300000, China
| | - Jianye Hao
- School of Software, Tianjin University, Tianjin 300000, China
- Correspondence:
| |
Collapse
|
13
|
Zhang Z, Zhao D, Gao J, Wang D, Dai Y. FMRQ-A Multiagent Reinforcement Learning Algorithm for Fully Cooperative Tasks. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:1367-1379. [PMID: 27101627 DOI: 10.1109/tcyb.2016.2544866] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, we propose a multiagent reinforcement learning algorithm dealing with fully cooperative tasks. The algorithm is called frequency of the maximum reward Q-learning (FMRQ). FMRQ aims to achieve one of the optimal Nash equilibria so as to optimize the performance index in multiagent systems. The frequency of obtaining the highest global immediate reward instead of immediate reward is used as the reinforcement signal. With FMRQ each agent does not need the observation of the other agents' actions and only shares its state and reward at each step. We validate FMRQ through case studies of repeated games: four cases of two-player two-action and one case of three-player two-action. It is demonstrated that FMRQ can converge to one of the optimal Nash equilibria in these cases. Moreover, comparison experiments on tasks with multiple states and finite steps are conducted. One is box-pushing and the other one is distributed sensor network problem. Experimental results show that the proposed algorithm outperforms others with higher performance.
Collapse
|
14
|
Abstract
Melioration learning is an empirically well-grounded model of reinforcement learning. By means of computer simulations, this paper derives predictions for several repeatedly played two-person games from this model. The results indicate a likely convergence to a pure Nash equilibrium of the game. If no pure equilibrium exists, the relative frequencies of choice may approach the predictions of the mixed Nash equilibrium. Yet in some games, no stable state is reached.
Collapse
Affiliation(s)
- Johannes Zschache
- Institute of Sociology, Leipzig University, Leipzig, Germany
- * E-mail:
| |
Collapse
|
15
|
Sun J, Wang L. The interaction between BIM's promotion and interest game under information asymmetry. ACTA ACUST UNITED AC 2015. [DOI: 10.3934/jimo.2015.11.1301] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
16
|
Kianercy A, Veltri R, Pienta KJ. Critical transitions in a game theoretic model of tumour metabolism. Interface Focus 2014; 4:20140014. [PMID: 25097747 PMCID: PMC4071509 DOI: 10.1098/rsfs.2014.0014] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Tumour proliferation is promoted by an intratumoral metabolic symbiosis in which lactate from stromal cells fuels energy generation in the oxygenated domain of the tumour. Furthermore, empirical data show that tumour cells adopt an intermediate metabolic state between lactate respiration and glycolysis. This study models the metabolic symbiosis in the tumour through the formalism of evolutionary game theory. Our game model of metabolic symbiosis in cancer considers two types of tumour cells, hypoxic and oxygenated, while glucose and lactate are considered as the two main sources of energy within the tumour. The model confirms the presence of multiple intermediate stable states and hybrid energy strategies in the tumour. It predicts that nonlinear interaction between two subpopulations leads to tumour metabolic critical transitions and that tumours can obtain different intermediate states between glycolysis and respiration which can be regulated by the genomic mutation rate. The model can apply in the epithelial-stromal metabolic decoupling therapy.
Collapse
Affiliation(s)
- Ardeshir Kianercy
- Brady Urological Institute , Johns Hopkins Hospital , Baltimore, MD 21287 , USA
| | - Robert Veltri
- Brady Urological Institute , Johns Hopkins Hospital , Baltimore, MD 21287 , USA
| | - Kenneth J Pienta
- Brady Urological Institute , Johns Hopkins Hospital , Baltimore, MD 21287 , USA
| |
Collapse
|
17
|
Juul J, Kianercy A, Bernhardsson S, Pigolotti S. Replicator dynamics with turnover of players. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2013; 88:022806. [PMID: 24032882 DOI: 10.1103/physreve.88.022806] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2013] [Revised: 07/05/2013] [Indexed: 06/02/2023]
Abstract
We study adaptive dynamics in games where players abandon the population at a given rate and are replaced by naive players characterized by a prior distribution over the admitted strategies. We demonstrate how such a process leads macroscopically to a variant of the replicator equation, with an additional term accounting for player turnover. We study how Nash equilibria and the dynamics of the system are modified by this additional term for prototypical examples such as the rock-paper-scissors game and different classes of two-action games played between two distinct populations. We conclude by showing how player turnover can account for nontrivial departures from Nash equilibria observed in data from lowest unique bid auctions.
Collapse
Affiliation(s)
- Jeppe Juul
- Niels Bohr Institute, Blegdamsvej 17, DK-2100 Copenhagen, Denmark
| | | | | | | |
Collapse
|
18
|
Kianercy A, Galstyan A. Coevolutionary networks of reinforcement-learning agents. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2013; 88:012815. [PMID: 23944526 DOI: 10.1103/physreve.88.012815] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2013] [Indexed: 06/02/2023]
Abstract
This paper presents a model of network formation in repeated games where the players adapt their strategies and network ties simultaneously using a simple reinforcement-learning scheme. It is demonstrated that the coevolutionary dynamics of such systems can be described via coupled replicator equations. We provide a comprehensive analysis for three-player two-action games, which is the minimum system size with nontrivial structural dynamics. In particular, we characterize the Nash equilibria (NE) in such games and examine the local stability of the rest points corresponding to those equilibria. We also study general n-player networks via both simulations and analytical methods and find that, in the absence of exploration, the stable equilibria consist of star motifs as the main building blocks of the network. Furthermore, in all stable equilibria the agents play pure strategies, even when the game allows mixed NE. Finally, we study the impact of exploration on learning outcomes and observe that there is a critical exploration rate above which the symmetric and uniformly connected network topology becomes stable.
Collapse
Affiliation(s)
- Ardeshir Kianercy
- Information Sciences Institute, University of Southern California, Marina del Rey, California 90292, USA
| | | |
Collapse
|