1
|
Yao Q, Wang Y, Xiong X, Wang P, Li Y. Adversarial Decision-Making for Moving Target Defense: A Multi-Agent Markov Game and Reinforcement Learning Approach. ENTROPY (BASEL, SWITZERLAND) 2023; 25:e25040605. [PMID: 37190393 PMCID: PMC10137508 DOI: 10.3390/e25040605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 03/29/2023] [Accepted: 03/31/2023] [Indexed: 05/17/2023]
Abstract
Reinforcement learning has shown a great ability and has defeated human beings in the field of real-time strategy games. In recent years, reinforcement learning has been used in cyberspace to carry out automated and intelligent attacks. Traditional defense methods are not enough to deal with this problem, so it is necessary to design defense agents to counter intelligent attacks. The interaction between the attack agent and the defense agent can be modeled as a multi-agent Markov game. In this paper, an adversarial decision-making approach that combines the Bayesian Strong Stackelberg and the WoLF algorithms was proposed to obtain the equilibrium point of multi-agent Markov games. With this method, the defense agent can obtain the adversarial decision-making strategy as well as continuously adjust the strategy in cyberspace. As verified in experiments, the defense agent should attach importance to short-term rewards in the process of a real-time game between the attack agent and the defense agent. The proposed approach can obtain the largest rewards for defense agent compared with the classic Nash-Q and URS-Q algorithms. In addition, the proposed approach adjusts the action selection probability dynamically, so that the decision entropy of optimal action gradually decreases.
Collapse
Affiliation(s)
- Qian Yao
- College of Electronic Engineering, National University of Defense Technology, Hefei 230037, China
- Anhui Province Key Laboratory of Cyberspace Security Situation Awareness and Evaluation, Hefei 230037, China
| | - Yongjie Wang
- College of Electronic Engineering, National University of Defense Technology, Hefei 230037, China
- Anhui Province Key Laboratory of Cyberspace Security Situation Awareness and Evaluation, Hefei 230037, China
| | - Xinli Xiong
- College of Electronic Engineering, National University of Defense Technology, Hefei 230037, China
- Anhui Province Key Laboratory of Cyberspace Security Situation Awareness and Evaluation, Hefei 230037, China
| | - Peng Wang
- College of Electronic Engineering, National University of Defense Technology, Hefei 230037, China
- Anhui Province Key Laboratory of Cyberspace Security Situation Awareness and Evaluation, Hefei 230037, China
| | - Yang Li
- College of Electronic Engineering, National University of Defense Technology, Hefei 230037, China
- Anhui Province Key Laboratory of Cyberspace Security Situation Awareness and Evaluation, Hefei 230037, China
| |
Collapse
|
2
|
Shi H, Li J, Mao J, Hwang KS. Lateral Transfer Learning for Multiagent Reinforcement Learning. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:1699-1711. [PMID: 34506297 DOI: 10.1109/tcyb.2021.3108237] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Some researchers have introduced transfer learning mechanisms to multiagent reinforcement learning (MARL). However, the existing works devoted to cross-task transfer for multiagent systems were designed just for homogeneous agents or similar domains. This work proposes an all-purpose cross-transfer method, called multiagent lateral transfer (MALT), assisting MARL with alleviating the training burden. We discuss several challenges in developing an all-purpose multiagent cross-task transfer learning method and provide a feasible way of reusing knowledge for MARL. In the developed method, we take features as the transfer object rather than policies or experiences, inspired by the progressive network. To achieve more efficient transfer, we assign pretrained policy networks for agents based on clustering, while an attention module is introduced to enhance the transfer framework. The proposed method has no strict requirements for the source task and target task. Compared with the existing works, our method can transfer knowledge among heterogeneous agents and also avoid negative transfer in the case of fully different tasks. As far as we know, this article is the first work denoted to all-purpose cross-task transfer for MARL. Several experiments in various scenarios have been conducted to compare the performance of the proposed method with baselines. The results demonstrate that the method is sufficiently flexible for most settings, including cooperative, competitive, homogeneous, and heterogeneous configurations.
Collapse
|
3
|
Modeling opponent learning in multiagent repeated games. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04249-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
AbstractMultiagent reinforcement learning (MARL) has been used extensively in the game environment. One of the main challenges in MARL is that the environment of the agent system is dynamic, and the other agents are also updating their strategies. Therefore, modeling the opponents’ learning process and adopting specific strategies to shape learning is an effective way to obtain better training results. Previous studies such as DRON, LOLA and SOS approximated the opponent’s learning process and gave effective applications. However, these studies modeled only transient changes in opponent strategies and lacked stability in the improvement of equilibrium efficiency. In this article, we design the MOL (modeling opponent learning) method based on the Stackelberg game. We use best response theory to approximate the opponents’ preferences for different actions and explore stable equilibrium with higher rewards. We find that MOL achieves better results in several games with classical structures (the Prisoner’s Dilemma, Stackelberg Leader game and Stag Hunt with 3 players), and in randomly generated bimatrix games. MOL performs well in competitive games played against different opponents and converges to stable points that score above the Nash equilibrium in repeated game environments. The results may provide a reference for the definition of equilibrium in multiagent reinforcement learning systems, and contribute to the design of learning objectives in MARL to avoid local disadvantageous equilibrium and improve general efficiency.
Collapse
|
4
|
Min–Max Q-learning for multi-player pursuit-evasion games. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.12.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
5
|
Multi-Agent Reinforcement Learning Approach for Residential Microgrid Energy Scheduling. ENERGIES 2019. [DOI: 10.3390/en13010123] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Residential microgrid is widely considered as a new paradigm of the home energy management system. The complexity of Microgrid Energy Scheduling (MES) is increasing with the integration of Electric Vehicles (EVs) and Renewable Generations (RGs). Moreover, it is challenging to determine optimal scheduling strategies to guarantee the efficiency of the microgrid market and to balance all market participants’ benefits. In this paper, a Multi-Agent Reinforcement Learning (MARL) approach for residential MES is proposed to promote the autonomy and fairness of microgrid market operation. First, a multi-agent based residential microgrid model including Vehicle-to-Grid (V2G) and RGs is constructed and an auction-based microgrid market is built. Then, distinguish from Single-Agent Reinforcement Learning (SARL), MARL can achieve distributed autonomous learning for each agent and realize the equilibrium of all agents’ benefits, therefore, we formulate an equilibrium-based MARL framework according to each participant’ market orientation. Finally, to guarantee the fairness and privacy of the MARL process, we proposed an improved optimal Equilibrium Selection-MARL (ES-MARL) algorithm based on two mechanisms, private negotiation and maximum average reward. Simulation results demonstrate the overall performance and efficiency of proposed MARL are superior to that of SARL. Besides, it is verified that the improved ES-MARL can get higher average profit to balance all agents.
Collapse
|
6
|
Da Silva FL, Glatt R, Costa AHR. MOO-MDP: An Object-Oriented Representation for Cooperative Multiagent Reinforcement Learning. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:567-579. [PMID: 29990289 DOI: 10.1109/tcyb.2017.2781130] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Reinforcement learning (RL) is a widely known technique to enable autonomous learning. Even though RL methods achieved successes in increasingly large and complex problems, scaling solutions remains a challenge. One way to simplify (and consequently accelerate) learning is to exploit regularities in a domain, which allows generalization and reduction of the learning space. While object-oriented Markov decision processes (OO-MDPs) provide such generalization opportunities, we argue that the learning process may be further simplified by dividing the workload of tasks amongst multiple agents, solving problems as multiagent systems (MAS). In this paper, we propose a novel combination of OO-MDP and MAS, called multiagent OO-MDP (MOO-MDP). Our proposal accrues the benefits of both OO-MDP and MAS, better addressing scalability issues. We formalize the general model MOO-MDP and present an algorithm to solve deterministic cooperative MOO-MDPs. We show that our algorithm learns optimal policies while reducing the learning space by exploiting state abstractions. We experimentally compare our results with earlier approaches in three domains and evaluate the advantages of our approach in sample efficiency and memory requirements.
Collapse
|
7
|
Niu L, Ren F, Zhang M, Bai Q. A Concurrent Multiple Negotiation Protocol Based on Colored Petri Nets. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3692-3705. [PMID: 27337734 DOI: 10.1109/tcyb.2016.2577635] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Concurrent multiple negotiation (CMN) provides a mechanism for an agent to simultaneously conduct more than one negotiation. There may exist different interdependency relationships among these negotiations and these interdependency relationships can impact the outcomes of these negotiations. The outcomes of these concurrent negotiations contribute together for the agent to achieve an overall negotiation goal. Handling a CMN while considering interdependency relationships among multiple negotiations is a challenging research problem. This paper: 1) comprehensively highlights research problems of negotiations at concurrent negotiation level; 2) provides a graph-based CMN model with consideration of the interdependency relationships; and 3) proposes a colored Petri net-based negotiation protocol for conducting CMNs. With the proposed protocol, a CMN can be efficiently and concurrently processed and negotiation agreements can be efficiently achieved. Experimental results indicate the effectiveness and efficiency of the proposed protocol in terms of the negotiation success rate, the negotiation time and the negotiation outcome.
Collapse
|
8
|
Zhang Z, Zhao D, Gao J, Wang D, Dai Y. FMRQ-A Multiagent Reinforcement Learning Algorithm for Fully Cooperative Tasks. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:1367-1379. [PMID: 27101627 DOI: 10.1109/tcyb.2016.2544866] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, we propose a multiagent reinforcement learning algorithm dealing with fully cooperative tasks. The algorithm is called frequency of the maximum reward Q-learning (FMRQ). FMRQ aims to achieve one of the optimal Nash equilibria so as to optimize the performance index in multiagent systems. The frequency of obtaining the highest global immediate reward instead of immediate reward is used as the reinforcement signal. With FMRQ each agent does not need the observation of the other agents' actions and only shares its state and reward at each step. We validate FMRQ through case studies of repeated games: four cases of two-player two-action and one case of three-player two-action. It is demonstrated that FMRQ can converge to one of the optimal Nash equilibria in these cases. Moreover, comparison experiments on tasks with multiple states and finite steps are conducted. One is box-pushing and the other one is distributed sensor network problem. Experimental results show that the proposed algorithm outperforms others with higher performance.
Collapse
|
9
|
Zha W, Chen J, Peng Z, Gu D. Construction of Barrier in a Fishing Game With Point Capture. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:1409-1422. [PMID: 27071205 DOI: 10.1109/tcyb.2016.2546381] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
This paper addresses a particular pursuit-evasion game, called as "fishing game" where a faster evader attempts to pass the gap between two pursuers. We are concerned with the conditions under which the evader or pursuers can win the game. This is a game of kind in which an essential aspect, barrier, separates the state space into disjoint parts associated with each player's winning region. We present a method of explicit policy to construct the barrier. This method divides the fishing game into two subgames related to the included angle and the relative distances between the evader and the pursuers, respectively, and then analyzes the possibility of capture or escape for each subgame to ascertain the analytical forms of the barrier. Furthermore, we fuse the games of kind and degree by solving the optimal control strategies in the minimum time for each player when the initial state lies in their winning regions. Along with the optimal strategies, the trajectories of the players are delineated and the upper bounds of their winning times are also derived.
Collapse
|
10
|
Zhou L, Yang P, Chen C, Gao Y. Multiagent Reinforcement Learning With Sparse Interactions by Negotiation and Knowledge Transfer. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:1238-1250. [PMID: 27046917 DOI: 10.1109/tcyb.2016.2543238] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Reinforcement learning has significant applications for multiagent systems, especially in unknown dynamic environments. However, most multiagent reinforcement learning (MARL) algorithms suffer from such problems as exponential computation complexity in the joint state-action space, which makes it difficult to scale up to realistic multiagent problems. In this paper, a novel algorithm named negotiation-based MARL with sparse interactions (NegoSIs) is presented. In contrast to traditional sparse-interaction-based MARL algorithms, NegoSI adopts the equilibrium concept and makes it possible for agents to select the nonstrict equilibrium-dominating strategy profile (nonstrict EDSP) or meta equilibrium for their joint actions. The presented NegoSI algorithm consists of four parts: 1) the equilibrium-based framework for sparse interactions; 2) the negotiation for the equilibrium set; 3) the minimum variance method for selecting one joint action; and 4) the knowledge transfer of local Q -values. In this integrated algorithm, three techniques, i.e., unshared value functions, equilibrium solutions, and sparse interactions are adopted to achieve privacy protection, better coordination and lower computational complexity, respectively. To evaluate the performance of the presented NegoSI algorithm, two groups of experiments are carried out regarding three criteria: 1) steps of each episode; 2) rewards of each episode; and 3) average runtime. The first group of experiments is conducted using six grid world games and shows fast convergence and high scalability of the presented algorithm. Then in the second group of experiments NegoSI is applied to an intelligent warehouse problem and simulated results demonstrate the effectiveness of the presented NegoSI algorithm compared with other state-of-the-art MARL algorithms.
Collapse
|
11
|
Wu Y, Su H, Shi P, Shu Z, Wu ZG. Consensus of Multiagent Systems Using Aperiodic Sampled-Data Control. IEEE TRANSACTIONS ON CYBERNETICS 2016; 46:2132-2143. [PMID: 26316291 DOI: 10.1109/tcyb.2015.2466115] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
This paper is concerned with the consensus of multiagent systems with nonlinear dynamics through the use of aperiodic sampled-data controllers, which are more flexible than classical periodic sampled-data controllers. By input delay approach, the resulting sampled-data system is reformulated as a continuous system with time-varying delay in the control input. A continuous Lyapunov functional, which captures the information on sampling pattern, together with the free-weighting matrix method, is then used to establish a sufficient condition for consensusability. For a more general case that the sampled-data controllers are subject to constant input delays, a novel discontinuous Lyapunov functional is introduced on the basis of the vector extension of Wirtinger's inequality. This functional can lead to simplified and efficient stability conditions for computation and optimization. Further results on the estimate of maximal allowable sampling interval upper bound is given as well. Numerical example is provided to show the effectiveness and merits of the proposed protocol.
Collapse
|
12
|
Ye M, Hu G. Solving Potential Games With Dynamical Constraint. IEEE TRANSACTIONS ON CYBERNETICS 2016; 46:1156-1164. [PMID: 25974960 DOI: 10.1109/tcyb.2015.2425411] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
We solve N -player potential games with dynamical constraint in this paper. Potential games with stable dynamics are first considered followed by one type of potential games without inherently stable dynamics. Different from most of the existing Nash seeking methods, we provide an extremum seeking-based method that does not require explicit information on the game dynamics or the payoff functions. Only measurements of the payoff functions are needed in the game strategy synthesis. Lie bracket approximation is used for the analysis of the proposed Nash seeking scheme. A singularly semi-globally practically uniformly asymptotically stable result is presented for potential games with stable dynamics and an ultimately bounded result is provided for potential games without inherently stable dynamics. For first-order perturbed integrator-type dynamics, we employ an extended-state observer to deal with the disturbance such that better convergence is achievable. Stability of the closed-loop system is proven and the ultimate bound is quantified. Numerical examples are presented to verify the effectiveness of the proposed methods.
Collapse
|
13
|
Yu C, Zhang M, Ren F, Tan G. Multiagent Learning of Coordination in Loosely Coupled Multiagent Systems. IEEE TRANSACTIONS ON CYBERNETICS 2015; 45:2853-2867. [PMID: 25594993 DOI: 10.1109/tcyb.2014.2387277] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Multiagent learning (MAL) is a promising technique for agents to learn efficient coordinated behaviors in multiagent systems (MASs). In MAL, concurrent multiple distributed learning processes can make the learning environment nonstationary for each individual learner. Developing an efficient learning approach to coordinate agents' behaviors in this dynamic environment is a difficult problem, especially when agents do not know the domain structure and have only local observability of the environment. In this paper, a coordinated MAL approach is proposed to enable agents to learn efficient coordinated behaviors by exploiting agent independence in loosely coupled MASs. The main feature of the proposed approach is to explicitly quantify and dynamically adapt agent independence during learning so that agents can make a trade-off between a single-agent learning process and a coordinated learning process for an efficient decision making. The proposed approach is employed to solve two-robot navigation problems in different scales of domains. Experimental results show that agents using the proposed approach can learn to act in concert or independently in different areas of the environment, which results in great computational savings and near optimal performance.
Collapse
|