1
|
Fang H, Zhang M, He S, Luan X, Liu F, Ding Z. Solving the Zero-Sum Control Problem for Tidal Turbine System: An Online Reinforcement Learning Approach. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:7635-7647. [PMID: 35839191 DOI: 10.1109/tcyb.2022.3186886] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
A novel completely mode-free integral reinforcement learning (CMFIRL)-based iteration algorithm is proposed in this article to compute the two-player zero-sum games and the Nash equilibrium problems, that is, the optimal control policy pairs, for tidal turbine system based on continuous-time Markov jump linear model with exact transition probability and completely unknown dynamics. First, the tidal turbine system is modeled into Markov jump linear systems, followed by a designed subsystem transformation technique to decouple the jumping modes. Then, a completely mode-free reinforcement learning algorithm is employed to address the game-coupled algebraic Riccati equations without using the information of the system dynamics, in order to reach the Nash equilibrium. The learning algorithm includes one iteration loop by updating the control policy and the disturbance policy simultaneously. Also, the exploration signal is added for motivating the system, and the convergence of the CMFIRL iteration algorithm is rigorously proved. Finally, a simulation example is given to illustrate the effectiveness and applicability of the control design approach.
Collapse
|
2
|
Li B, Yang Q, Duan L, Sun Y. Operator-as-a-Consumer: A Novel Energy Storage Sharing Approach Under Demand Charge. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:941-953. [PMID: 34398773 DOI: 10.1109/tcyb.2021.3088221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Energy storage systems (ESSs)-based demand response (DR) is an appealing way to save electricity bills for consumers under demand charge and time-of-use (TOU) price. In order to counteract the high investment cost of ESS, a novel operator-enabled ESS sharing scheme, namely, the "operator-as-a-consumer (OaaC)," is proposed and investigated in this article. In this scheme, the users and the operator form a Stackelberg game. The users send ESS orders to the operator and apply their own ESS dispatching strategies for their own purposes. Meanwhile, the operator maximizes its profit through optimal ESS sizing and scheduling, as well as pricing for the users' ESS orders. The feasibility and economic performance of OaaC are further analyzed by solving a bilevel joint optimization problem of ESS pricing, sizing, and scheduling. To make the analysis tractable, the bilevel model is first transformed into its single-level mathematical program with equilibrium constraints (MPEC) formulation and is then linearized into a mixed-integer linear programming (MILP) problem using multiple linearization methods. Case studies with actual data are utilized to demonstrate the profitability for the operator and simultaneously the ability of bill saving for the users under the proposed OaaC scheme.
Collapse
|
3
|
Li J, Ma Y, Gao R, Cao Z, Lim A, Song W, Zhang J. Deep Reinforcement Learning for Solving the Heterogeneous Capacitated Vehicle Routing Problem. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:13572-13585. [PMID: 34554923 DOI: 10.1109/tcyb.2021.3111082] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Existing deep reinforcement learning (DRL)-based methods for solving the capacitated vehicle routing problem (CVRP) intrinsically cope with a homogeneous vehicle fleet, in which the fleet is assumed as repetitions of a single vehicle. Hence, their key to construct a solution solely lies in the selection of the next node (customer) to visit excluding the selection of vehicle. However, vehicles in real-world scenarios are likely to be heterogeneous with different characteristics that affect their capacity (or travel speed), rendering existing DRL methods less effective. In this article, we tackle heterogeneous CVRP (HCVRP), where vehicles are mainly characterized by different capacities. We consider both min-max and min-sum objectives for HCVRP, which aim to minimize the longest or total travel time of the vehicle(s) in the fleet. To solve those problems, we propose a DRL method based on the attention mechanism with a vehicle selection decoder accounting for the heterogeneous fleet constraint and a node selection decoder accounting for the route construction, which learns to construct a solution by automatically selecting both a vehicle and a node for this vehicle at each step. Experimental results based on randomly generated instances show that, with desirable generalization to various problem sizes, our method outperforms the state-of-the-art DRL method and most of the conventional heuristics, and also delivers competitive performance against the state-of-the-art heuristic method, that is, slack induction by string removal. In addition, the results of extended experiments demonstrate that our method is also able to solve CVRPLib instances with satisfactory performance.
Collapse
|
4
|
Chen S, Xu C, Yan Z, Guan X, Le X. Accommodating Strategic Players in Distributed Algorithms for Power Dispatch Problems. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:12594-12603. [PMID: 34166217 DOI: 10.1109/tcyb.2021.3085400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Distributed algorithms are gaining increasing research interests in the area of power system optimization and dispatch. Existing distributed power dispatch algorithms (DPDAs) usually assume that suppliers/consumers bid truthfully. However, this article shows the need for DPDAs to consider strategic players and to take account of their behavior deviation from what the DPDAs expect. To address this, we propose a distributed strategy update algorithm (DSUA) on top of a DPDA. The DSUA considers strategic suppliers who optimize their bids in a DPDA, using only the information accessible from a DPDA, that is, price. The DSUA also considers the cases when suppliers update bids alternately or simultaneously. Under both cases, we show the closeness of supplier bids to the Nash equilibrium via game-theoretic analysis as well as simulation.
Collapse
|
5
|
Yuwen C, Wang X, Liu S, Zhang X, Sun B. Distributed Nash equilibrium seeking strategy with incomplete information. ISA TRANSACTIONS 2022; 129:372-379. [PMID: 35125213 DOI: 10.1016/j.isatra.2022.01.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Revised: 01/17/2022] [Accepted: 01/17/2022] [Indexed: 06/14/2023]
Abstract
In this paper, two kinds of distributed Nash equilibrium seeking strategies based on Kalman filter are proposed in non-cooperative games with incomplete information. In the discrete-time system with process and measurement noises, each player, selfish and only considering its own profit, utilizes the gradient method to maximize the benefit. Since the payoff function is related to all players' states, Kalman filter and leader-following consensus are used to estimate the states in the network. Furthermore, considering the trade-off between strategy precision of Nash equilibrium and communication rate, another Nash equilibrium seeking method is proposed by introducing an event-based scheduler. The convergence of both Nash equilibrium seeking strategies is analyzed based on Lyapunov method. It is proved that both strategies are bounded in the mean square sense. Simulation examples are given to verify the efficiency.
Collapse
Affiliation(s)
- Cheng Yuwen
- School of Control Science and Engineering, Shandong University, Jinan 250012, China.
| | - Xiaowen Wang
- School of Control Science and Engineering, Shandong University, Jinan 250012, China.
| | - Shuai Liu
- School of Control Science and Engineering, Shandong University, Jinan 250012, China.
| | - Xianfu Zhang
- School of Control Science and Engineering, Shandong University, Jinan 250012, China.
| | - Bo Sun
- School of Control Science and Engineering, Shandong University, Jinan 250012, China.
| |
Collapse
|
6
|
Zhang H, Yue D, Dou C, Hancke GP. Resilient Optimal Defensive Strategy of Micro-Grids System via Distributed Deep Reinforcement Learning Approach Against FDI Attack. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:598-608. [PMID: 35622801 DOI: 10.1109/tnnls.2022.3175917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The ever-increasing false data injection (FDI) attack on the demand side brings great challenges to the energy management of interconnected microgrids. To address those aspects, this article proposes a resilient optimal defensive strategy with the distributed deep reinforcement learning (DRL) approach. To evaluate the FDI attack on demand response (DR), an online evaluation approach with the recursive least-square (RLS) method is proposed to evaluate the extent of supply security or voltage stability of the microgrids system is affected by the FDI attack. On the basis of evaluated security confidence, a distributed actor network learning approach is proposed to deduce optimal network weight, which can generate an optimal defensive scheme to ensure the economic and security issue of the microgrids system. From the methodology's view, it can also enhance the autonomy of each microgrid as well as accelerate DRL efficiency. According to those simulation results, it can reveal that the proposed method can evaluate FDI attack impact well and an improved distributed DRL approach can be a viable and promising way for the optimal defense of microgrids against the FDI attack on the demand side.
Collapse
|
7
|
Incentive Mechanisms for Smart Grid: State of the Art, Challenges, Open Issues, Future Directions. BIG DATA AND COGNITIVE COMPUTING 2022. [DOI: 10.3390/bdcc6020047] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Smart grids (SG) are electricity grids that communicate with each other, provide reliable information, and enable administrators to operate energy supplies across the country, ensuring optimized reliability and efficiency. The smart grid contains sensors that measure and transmit data to adjust the flow of electricity automatically based on supply/demand, and thus, responding to problems becomes quicker and easier. This also plays a crucial role in controlling carbon emissions, by avoiding energy losses during peak load hours and ensuring optimal energy management. The scope of big data analytics in smart grids is huge, as they collect information from raw data and derive intelligent information from the same. However, these benefits of the smart grid are dependent on the active and voluntary participation of the consumers in real-time. Consumers need to be motivated and conscious to avail themselves of the achievable benefits. Incentivizing the appropriate actor is an absolute necessity to encourage prosumers to generate renewable energy sources (RES) and motivate industries to establish plants that support sustainable and green-energy-based processes or products. The current study emphasizes similar aspects and presents a comprehensive survey of the start-of-the-art contributions pertinent to incentive mechanisms in smart grids, which can be used in smart grids to optimize the power distribution during peak times and also reduce carbon emissions. The various technologies, such as game theory, blockchain, and artificial intelligence, used in implementing incentive mechanisms in smart grids are discussed, followed by different incentive projects being implemented across the globe. The lessons learnt, challenges faced in such implementations, and open issues such as data quality, privacy, security, and pricing related to incentive mechanisms in SG are identified to guide the future scope of research in this sector.
Collapse
|
8
|
Li J, Gu C, Wu Z, Huang T. Online Learning Algorithm for Distributed Convex Optimization With Time-Varying Coupled Constraints and Bandit Feedback. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:1009-1020. [PMID: 32452789 DOI: 10.1109/tcyb.2020.2990796] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
This article focuses on multiagent distributed-constrained optimization problems in a dynamic environment, in which a group of agents aims to cooperatively optimize a sum of time-changing local cost functions subject to time-varying coupled constraints. Both the local cost functions and constraint functions are unrevealed to an individual agent until an action is submitted. We first investigate a gradient-feedback scenario, where each agent can access both values and gradients of cost functions and constraint functions owned by itself at the chosen action. Then, we design a distributed primal-dual online learning algorithm and show that the proposed algorithm can achieve the sublinear bounds for both the regret and constraint violations. Furthermore, we extend the gradient-feedback algorithm to a gradient-free setup, where an individual agent has only attained the values of local cost functions and constraint functions at two queried points near the selected action. We develop a bandit version of the previous method and give the explicitly sublinear bounds on the expected regret and expected constraint violations. The results indicate that the bandit algorithm can achieve almost the same performance as the gradient-feedback algorithm under wild conditions. Finally, numerical simulations on an electric vehicle charging problem demonstrate the effectiveness of the proposed algorithms.
Collapse
|
9
|
Bai W, Zhou Q, Li T, Li H. Adaptive Reinforcement Learning Neural Network Control for Uncertain Nonlinear System With Input Saturation. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:3433-3443. [PMID: 31251205 DOI: 10.1109/tcyb.2019.2921057] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, an adaptive neural network (NN) control problem is investigated for discrete-time nonlinear systems with input saturation. Radial-basis-function (RBF) NNs, including critic NNs and action NNs, are employed to approximate the utility functions and system uncertainties, respectively. In the previous works, a gradient descent scheme is applied to update weight vectors, which may lead to local optimal problem. To circumvent this problem, a multigradient recursive (MGR) reinforcement learning scheme is proposed, which utilizes both the current gradient and the past gradients. As a consequence, the MGR scheme not only eliminates the local optimal problem but also guarantees faster convergence rate than the gradient descent scheme. Moreover, the constraint of actuator input saturation is considered. The closed-loop system stability is developed by using the Lyapunov stability theory, and it is proved that all the signals in the closed-loop system are semiglobal uniformly ultimately bounded (SGUUB). Finally, the effectiveness of the proposed approach is further validated via some simulation results.
Collapse
|
10
|
Wang Z, Li HX, Chen C. Reinforcement Learning-Based Optimal Sensor Placement for Spatiotemporal Modeling. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:2861-2871. [PMID: 30892267 DOI: 10.1109/tcyb.2019.2901897] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
A reinforcement learning-based method is proposed for optimal sensor placement in the spatial domain for modeling distributed parameter systems (DPSs). First, a low-dimensional subspace, derived by Karhunen-Loève decomposition, is identified to capture the dominant dynamic features of the DPS. Second, a spatial objective function is proposed for the sensor placement. This function is defined in the obtained low-dimensional subspace by exploiting the time-space separation property of distributed processes, and in turn aims at minimizing the modeling error over the entire time and space domain. Third, the sensor placement configuration is mathematically formulated as a Markov decision process (MDP) with specified elements. Finally, the sensor locations are optimized through learning the optimal policies of the MDP according to the spatial objective function. The experimental results of a simulated catalytic rod and a real snap curing oven system are provided to demonstrate the feasibility and efficiency of the proposed method in solving the combinatorial optimization problems, such as optimal sensor placement.
Collapse
|
11
|
An Adaptive Fuzzy Predictive Controller with Hysteresis Compensation for Piezoelectric Actuators. Cognit Comput 2020. [DOI: 10.1007/s12559-020-09722-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
12
|
Multi-Agent Reinforcement Learning Approach for Residential Microgrid Energy Scheduling. ENERGIES 2019. [DOI: 10.3390/en13010123] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Residential microgrid is widely considered as a new paradigm of the home energy management system. The complexity of Microgrid Energy Scheduling (MES) is increasing with the integration of Electric Vehicles (EVs) and Renewable Generations (RGs). Moreover, it is challenging to determine optimal scheduling strategies to guarantee the efficiency of the microgrid market and to balance all market participants’ benefits. In this paper, a Multi-Agent Reinforcement Learning (MARL) approach for residential MES is proposed to promote the autonomy and fairness of microgrid market operation. First, a multi-agent based residential microgrid model including Vehicle-to-Grid (V2G) and RGs is constructed and an auction-based microgrid market is built. Then, distinguish from Single-Agent Reinforcement Learning (SARL), MARL can achieve distributed autonomous learning for each agent and realize the equilibrium of all agents’ benefits, therefore, we formulate an equilibrium-based MARL framework according to each participant’ market orientation. Finally, to guarantee the fairness and privacy of the MARL process, we proposed an improved optimal Equilibrium Selection-MARL (ES-MARL) algorithm based on two mechanisms, private negotiation and maximum average reward. Simulation results demonstrate the overall performance and efficiency of proposed MARL are superior to that of SARL. Besides, it is verified that the improved ES-MARL can get higher average profit to balance all agents.
Collapse
|
13
|
A Q-Cube Framework of Reinforcement Learning Algorithm for Continuous Double Auction among Microgrids. ENERGIES 2019. [DOI: 10.3390/en12152891] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Decision-making of microgrids in the condition of a dynamic uncertain bidding environment has always been a significant subject of interest in the context of energy markets. The emerging application of reinforcement learning algorithms in energy markets provides solutions to this problem. In this paper, we investigate the potential of applying a Q-learning algorithm into a continuous double auction mechanism. By choosing a global supply and demand relationship as states and considering both bidding price and quantity as actions, a new Q-learning architecture is proposed to better reflect personalized bidding preferences and response to real-time market conditions. The application of battery energy storage system performs an alternative form of demand response by exerting potential capacity. A Q-cube framework is designed to describe the Q-value distribution iteration. Results from a case study on 14 microgrids in Guizhou Province, China indicate that the proposed Q-cube framework is capable of making rational bidding decisions and raising the microgrids’ profits.
Collapse
|
14
|
Ma K, Wang C, Yang J, Hua C, Guan X. Pricing Mechanism With Noncooperative Game and Revenue Sharing Contract in Electricity Market. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:97-106. [PMID: 29990181 DOI: 10.1109/tcyb.2017.2766171] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, a pricing mechanism is proposed for the electricity supply chain, which is consisting of one generation company (GC), multiple consumers, and competing utility companies (UCs). The UC participates in electricity supply chain management by a revenue sharing contract (RSC). In the electricity supply chain, the electricity real-time balance has an important role in the stable operation of the power system. Therefore, we introduce the demand response into the electricity supply chain to match supply with demand under forecast errors. Hence, we formulate a noncooperative game to characterize the interactions among the multiple competing UCs, which set the retail prices to maximize their profits. Besides, the UCs select their preferred contractual terms offered by the GC to maximize its profits and coordinate the electricity supply chain simultaneously. The existence and uniqueness of the Nash equilibrium (NE) are examined, and an iterative algorithm is developed to obtain the NE. Furthermore, we analyze the RSC that can coordinate the electricity supply chain and align the NE with the cooperative optimum under the RSC. Finally, numerical results demonstrate the superiority of the proposed model and the influence of market demand disruptions on the profits of the UCs, GC, and supply chain.
Collapse
|
15
|
Luo B, Liu D, Wu HN, Wang D, Lewis FL. Policy Gradient Adaptive Dynamic Programming for Data-Based Optimal Control. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3341-3354. [PMID: 27893404 DOI: 10.1109/tcyb.2016.2623859] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The model-free optimal control problem of general discrete-time nonlinear systems is considered in this paper, and a data-based policy gradient adaptive dynamic programming (PGADP) algorithm is developed to design an adaptive optimal controller method. By using offline and online data rather than the mathematical system model, the PGADP algorithm improves control policy with a gradient descent scheme. The convergence of the PGADP algorithm is proved by demonstrating that the constructed Q -function sequence converges to the optimal Q -function. Based on the PGADP algorithm, the adaptive control method is developed with an actor-critic structure and the method of weighted residuals. Its convergence properties are analyzed, where the approximate Q -function converges to its optimum. Computer simulation results demonstrate the effectiveness of the PGADP-based adaptive control method.
Collapse
|