1
|
Chen L, Dai SL, Dong C. Adaptive Optimal Tracking Control of an Underactuated Surface Vessel Using Actor-Critic Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7520-7533. [PMID: 36449582 DOI: 10.1109/tnnls.2022.3214681] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
In this article, we present an adaptive reinforcement learning optimal tracking control (RLOTC) algorithm for an underactuated surface vessel subject to modeling uncertainties and time-varying external disturbances. By integrating backstepping technique with the optimized control design, we show that the desired optimal tracking performance of vessel control is guaranteed due to the fact that the virtual and actual control inputs are designed as optimized solutions of every subsystem. To enhance the robustness of vessel control systems, we employ neural network (NN) approximators to approximate uncertain vessel dynamics and present adaptive control technique to estimate the upper boundedness of external disturbances. Under the reinforcement learning framework, we construct actor-critic networks to solve the Hamilton-Jacobi-Bellman equations corresponding to subsystems of surface vessel to achieve the optimized control. The optimized control algorithm can synchronously train the adaptive parameters not only for actor-critic networks but also for NN approximators and adaptive control. By Lyapunov stability theorem, we show that the RLOTC algorithm can ensure the semiglobal uniform ultimate boundedness of the closed-loop systems. Compared with the existing reinforcement learning control results, the presented RLOTC algorithm can compensate for uncertain vessel dynamics and unknown disturbances, and obtain the optimized control performance by considering optimization in every backstepping design. Simulation studies on an underactuated surface vessel are given to illustrate the effectiveness of the RLOTC algorithm.
Collapse
|
2
|
Zhu L, Guo P, Wei Q. Synergetic learning for unknown nonlinear H ∞ control using neural networks. Neural Netw 2023; 168:287-299. [PMID: 37774514 DOI: 10.1016/j.neunet.2023.09.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 08/24/2023] [Accepted: 09/15/2023] [Indexed: 10/01/2023]
Abstract
The well-known H∞ control design gives robustness to a controller by rejecting perturbations from the external environment, which is difficult to do for completely unknown affine nonlinear systems. Accordingly, the immediate objective of this paper is to develop an on-line real-time synergetic learning algorithm, so that a data-driven H∞ controller can be received. By converting the H∞ control problem into a two-player zero-sum game, a model-free Hamilton-Jacobi-Isaacs equation (MF-HJIE) is first derived using off-policy reinforcement learning, followed by a proof of equivalence between the MF-HJIE and the conventional HJIE. Next, by applying the temporal difference to the MF-HJIE, a synergetic evolutionary rule with experience replay is designed to learn the optimal value function, the optimal control, and the worst perturbation, that can be performed on-line and in real-time along the system state trajectory. It is proven that the synergistic learning system constructed by the system plant and the evolutionary rule is uniformly ultimately bounded. Finally, simulation results on an F16 aircraft system and a nonlinear system back up the tractability of the proposed method.
Collapse
Affiliation(s)
- Liao Zhu
- International Academic Center of Complex Systems, Beijing Normal University, Zhuhai, 519087, Guangdong, China; School of Systems Science, Beijing Normal University, Beijing, 100875, China.
| | - Ping Guo
- International Academic Center of Complex Systems, Beijing Normal University, Zhuhai, 519087, Guangdong, China; School of Systems Science, Beijing Normal University, Beijing, 100875, China.
| | - Qinglai Wei
- The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China; Institute of Systems Engineering, Macau University of Science and Technology, 999078, Macao Special Administrative Region of China.
| |
Collapse
|
3
|
Liu G, Sun Q, Wang R, Hu X. Nonzero-Sum Game-Based Voltage Recovery Consensus Optimal Control for Nonlinear Microgrids System. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:8617-8629. [PMID: 35275823 DOI: 10.1109/tnnls.2022.3151650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Since most of the existing models based on the microgrids (MGs) are nonlinear, which could cause the controller oscillate, resulting in the excessive line loss, and the nonlinear could also lead to the controller design difficulty of MGs system. Therefore, this article researches the distributed voltage recovery consensus optimal control problem for the nonlinear MGs system with N -distributed generations (DGs), in the case of providing stringent real power sharing. First, based on the distributed cooperative control concept of multiagent systems and the critic neural networks (NNs), a novel distributed secondary voltage recovery consensus optimal control protocol is constructed via applying the backstepping technique and nonzero-sum (NZS) differential game strategy to realize the voltage recovery of island MGs. Meanwhile, the model identifier is established to reconstruct the unknown NZS games systems based on a three-layer NN. Then, a critic NN weight adaptive adjustment tuning law is proposed to ensure the convergence of the cost functions and the stability of the closed-loop system. Furthermore, according to Lyapunov stability theory, it is proven that all signals are uniform ultimate boundedness in the closed loop system and the voltage recovery synchronization error converges to an arbitrarily small neighborhood of the origin near. Finally, some simulation results in MATLAB illustrate the validity of the proposed control strategy.
Collapse
|
4
|
Luo R, Peng Z, Hu J, Ghosh BK. Adaptive optimal control of affine nonlinear systems via identifier-critic neural network approximation with relaxed PE conditions. Neural Netw 2023; 167:588-600. [PMID: 37703669 DOI: 10.1016/j.neunet.2023.08.044] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 07/25/2023] [Accepted: 08/23/2023] [Indexed: 09/15/2023]
Abstract
This paper considers an optimal control of an affine nonlinear system with unknown system dynamics. A new identifier-critic framework is proposed to solve the optimal control problem. Firstly, a neural network identifier is built to estimate the unknown system dynamics, and a critic NN is constructed to solve the Hamiltonian-Jacobi-Bellman equation associated with the optimal control problem. A dynamic regressor extension and mixing technique is applied to design the weight update laws with relaxed persistence of excitation conditions for the two classes of neural networks. The parameter estimation of the update laws and the stability of the closed-loop system under the adaptive optimal control are analyzed using a Lyapunov function method. Numerical simulation results are presented to demonstrate the effectiveness of the proposed IC learning based optimal control algorithm for the affine nonlinear system.
Collapse
Affiliation(s)
- Rui Luo
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Zhinan Peng
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Jiangping Hu
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China; Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou 313001, China.
| | - Bijoy Kumar Ghosh
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China; Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX, 79409-1042, USA
| |
Collapse
|
5
|
Lv Y, Na J, Zhao X, Huang Y, Ren X. Multi-H∞ Controls for Unknown Input-Interference Nonlinear System With Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:5601-5613. [PMID: 34874874 DOI: 10.1109/tnnls.2021.3130092] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This article studies the multi- [Formula: see text] controls for the input-interference nonlinear systems via adaptive dynamic programming (ADP) method, which allows for multiple inputs to have the individual selfish component of the strategy to resist weighted interference. In this line, the ADP scheme is used to learn the Nash-optimization solutions of the input-interference nonlinear system such that multiple [Formula: see text] performance indices can reach the defined Nash equilibrium. First, the input-interference nonlinear system is given and the Nash equilibrium is defined. An adaptive neural network (NN) observer is introduced to identify the input-interference nonlinear dynamics. Then, the critic NNs are used to learn the multiple [Formula: see text] performance indices. A novel adaptive law is designed to update the critic NN weights by minimizing the Hamiltonian-Jacobi-Isaacs (HJI) equation, which can be used to directly calculate the multi- [Formula: see text] controls effectively by using input-output data such that the actor structure is avoided. Moreover, the control system stability and updated parameter convergence are proved. Finally, two numerical examples are simulated to verify the proposed ADP scheme for the input-interference nonlinear system.
Collapse
|
6
|
Zhao S, Wang J, Xu H, Wang B. Composite Observer-Based Optimal Attitude-Tracking Control With Reinforcement Learning for Hypersonic Vehicles. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:913-926. [PMID: 35969557 DOI: 10.1109/tcyb.2022.3192871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
This article proposes an observer-based reinforcement learning (RL) control approach to address the optimal attitude-tracking problem and application for hypersonic vehicles in the reentry phase. Due to the unknown uncertainty and nonlinearity caused by parameter perturbation and external disturbance, accurate model information of hypersonic vehicles in the reentry phase is generally unavailable. For this reason, a novel synchronous estimation is proposed to construct a composite observer for hypersonic vehicles, which consists of a neural-network (NN)-based Luenberger-type observer and a synchronous disturbance observer. This solves the identification problem of nonlinear dynamics in the reference control and realizes the estimation of the system state when unknown nonlinear dynamics and unknown disturbance exist at the same time. By synthesizing the information from the composite observer, an RL tracking controller is developed to solve the optimal attitude-tracking control problem. To improve the convergence performance of critic network weights, concurrent learning is employed to replace the traditional persistent excitation condition with a historical experience replay manner. In addition, this article proves that the weight estimation error is bounded when the learning rate satisfies the given sufficient condition. Finally, the numerical simulation demonstrates the effectiveness and superiority of the proposed approaches to attitude-tracking control systems for hypersonic vehicles.
Collapse
|
7
|
Lin M, Zhao B, Liu D. Policy gradient adaptive dynamic programming for nonlinear discrete-time zero-sum games with unknown dynamics. Soft comput 2023. [DOI: 10.1007/s00500-023-07817-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
8
|
Wu Q, Zhao B, Liu D, Polycarpou MM. Event-triggered adaptive dynamic programming for decentralized tracking control of input constrained unknown nonlinear interconnected systems. Neural Netw 2022; 157:336-349. [DOI: 10.1016/j.neunet.2022.10.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Revised: 09/26/2022] [Accepted: 10/24/2022] [Indexed: 11/11/2022]
|
9
|
Xue S, Luo B, Liu D, Gao Y. Neural network-based event-triggered integral reinforcement learning for constrained H∞ tracking control with experience replay. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
10
|
Wang K, Mu C. Asynchronous learning for actor-critic neural networks and synchronous triggering for multiplayer system. ISA TRANSACTIONS 2022; 129:295-308. [PMID: 35216805 DOI: 10.1016/j.isatra.2022.02.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Revised: 01/19/2022] [Accepted: 02/05/2022] [Indexed: 06/14/2023]
Abstract
In this paper, based on actor-critic neural network structure and reinforcement learning scheme, a novel asynchronous learning algorithm with event communication is developed, so as to solve Nash equilibrium of multiplayer nonzero-sum differential game in an adaptive fashion. From the point of optimal control view, each player or local controller wants to minimize the individual infinite-time cost function by finding an optimal policy. In this novel learning framework, each player consists of one critic and one actor, and implements distributed asynchronous policy iteration to optimize decision-making process. In addition, communication burden between the system and players is effectively reduced by setting up a central event generator. Critic network executes fast updates by gradient-descent adaption while actor network gives event-induced updates using the gradient projection. The closed-loop asymptotic stability is ensured along with uniform ultimate convergence. Then, the effectiveness of the proposed algorithm is substantiated on a four-player nonlinear system, revealing that it can significantly reduce sampling numbers without impairing learning accuracy. Finally, by leveraging nonzero-sum game idea, the proposed learning scheme is also applied to solve the lateral-directional stability of a linear aircraft system, and is further extended to a nonlinear vehicle system for achieving adaptive cruise control.
Collapse
Affiliation(s)
- Ke Wang
- School of Electrical and Information Engineering, Tianjin University, Tianjin, China.
| | - Chaoxu Mu
- School of Electrical and Information Engineering, Tianjin University, Tianjin, China.
| |
Collapse
|
11
|
Gao X, Si J, Wen Y, Li M, Huang H. Reinforcement Learning Control of Robotic Knee With Human-in-the-Loop by Flexible Policy Iteration. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:5873-5887. [PMID: 33956634 DOI: 10.1109/tnnls.2021.3071727] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
We are motivated by the real challenges presented in a human-robot system to develop new designs that are efficient at data level and with performance guarantees, such as stability and optimality at system level. Existing approximate/adaptive dynamic programming (ADP) results that consider system performance theoretically are not readily providing practically useful learning control algorithms for this problem, and reinforcement learning (RL) algorithms that address the issue of data efficiency usually do not have performance guarantees for the controlled system. This study fills these important voids by introducing innovative features to the policy iteration algorithm. We introduce flexible policy iteration (FPI), which can flexibly and organically integrate experience replay and supplemental values from prior experience into the RL controller. We show system-level performances, including convergence of the approximate value function, (sub)optimality of the solution, and stability of the system. We demonstrate the effectiveness of the FPI via realistic simulations of the human-robot system. It is noted that the problem we face in this study may be difficult to address by design methods based on classical control theory as it is nearly impossible to obtain a customized mathematical model of a human-robot system either online or offline. The results we have obtained also indicate the great potential of RL control to solving realistic and challenging problems with high-dimensional control inputs.
Collapse
|
12
|
Xue S, Luo B, Liu D, Gao Y. Event-Triggered ADP for Tracking Control of Partially Unknown Constrained Uncertain Systems. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:9001-9012. [PMID: 33661749 DOI: 10.1109/tcyb.2021.3054626] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
An event-triggered adaptive dynamic programming (ADP) algorithm is developed in this article to solve the tracking control problem for partially unknown constrained uncertain systems. First, an augmented system is constructed, and the solution of the optimal tracking control problem of the uncertain system is transformed into an optimal regulation of the nominal augmented system with a discounted value function. The integral reinforcement learning is employed to avoid the requirement of augmented drift dynamics. Second, the event-triggered ADP is adopted for its implementation, where the learning of neural network weights not only relaxes the initial admissible control but also executes only when the predefined execution rule is violated. Third, the tracking error and the weight estimation error prove to be uniformly ultimately bounded, and the existence of a lower bound for the interexecution times is analyzed. Finally, simulation results demonstrate the effectiveness of the present event-triggered ADP method.
Collapse
|
13
|
Event-triggered integral reinforcement learning for nonzero-sum games with asymmetric input saturation. Neural Netw 2022; 152:212-223. [DOI: 10.1016/j.neunet.2022.04.013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 04/04/2022] [Accepted: 04/14/2022] [Indexed: 11/20/2022]
|
14
|
Peng Z, Luo R, Hu J, Shi K, Nguang SK, Ghosh BK. Optimal Tracking Control of Nonlinear Multiagent Systems Using Internal Reinforce Q-Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:4043-4055. [PMID: 33587710 DOI: 10.1109/tnnls.2021.3055761] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this article, a novel reinforcement learning (RL) method is developed to solve the optimal tracking control problem of unknown nonlinear multiagent systems (MASs). Different from the representative RL-based optimal control algorithms, an internal reinforce Q-learning (IrQ-L) method is proposed, in which an internal reinforce reward (IRR) function is introduced for each agent to improve its capability of receiving more long-term information from the local environment. In the IrQL designs, a Q-function is defined on the basis of IRR function and an iterative IrQL algorithm is developed to learn optimally distributed control scheme, followed by the rigorous convergence and stability analysis. Furthermore, a distributed online learning framework, namely, reinforce-critic-actor neural networks, is established in the implementation of the proposed approach, which is aimed at estimating the IRR function, the Q-function, and the optimal control scheme, respectively. The implemented procedure is designed in a data-driven way without needing knowledge of the system dynamics. Finally, simulations and comparison results with the classical method are given to demonstrate the effectiveness of the proposed tracking control method.
Collapse
|
15
|
Sun J, He H, Yi J, Pu Z. Finite-Time Command-Filtered Composite Adaptive Neural Control of Uncertain Nonlinear Systems. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:6809-6821. [PMID: 33301412 DOI: 10.1109/tcyb.2020.3032096] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article presents a new command-filtered composite adaptive neural control scheme for uncertain nonlinear systems. Compared with existing works, this approach focuses on achieving finite-time convergent composite adaptive control for the higher-order nonlinear system with unknown nonlinearities, parameter uncertainties, and external disturbances. First, radial basis function neural networks (NNs) are utilized to approximate the unknown functions of the considered uncertain nonlinear system. By constructing the prediction errors from the serial-parallel nonsmooth estimation models, the prediction errors and the tracking errors are fused to update the weights of the NNs. Afterward, the composite adaptive neural backstepping control scheme is proposed via nonsmooth command filter and adaptive disturbance estimation techniques. The proposed control scheme ensures that high-precision tracking performances and NN approximation performances can be achieved simultaneously. Meanwhile, it can avoid the singularity problem in the finite-time backstepping framework. Moreover, it is proved that all signals in the closed-loop control system can be convergent in finite time. Finally, simulation results are given to illustrate the effectiveness of the proposed control scheme.
Collapse
|
16
|
Trajectory Tracking within a Hierarchical Primitive-Based Learning Approach. ENTROPY 2022; 24:e24070889. [PMID: 35885112 PMCID: PMC9321877 DOI: 10.3390/e24070889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 06/20/2022] [Accepted: 06/23/2022] [Indexed: 11/30/2022]
Abstract
A hierarchical learning control framework (HLF) has been validated on two affordable control laboratories: an active temperature control system (ATCS) and an electrical rheostatic braking system (EBS). The proposed HLF is data-driven and model-free, while being applicable on general control tracking tasks which are omnipresent. At the lowermost level, L1, virtual state-feedback control is learned from input–output data, using a recently proposed virtual state-feedback reference tuning (VSFRT) principle. L1 ensures a linear reference model tracking (or matching) and thus, indirect closed-loop control system (CLCS) linearization. On top of L1, an experiment-driven model-free iterative learning control (EDMFILC) is then applied for learning reference input–controlled outputs pairs, coined as primitives. The primitives’ signals at the L2 level encode the CLCS dynamics, which are not explicitly used in the learning phase. Data reusability is applied to derive monotonic and safely guaranteed learning convergence. The learning primitives in the L2 level are finally used in the uppermost and final L3 level, where a decomposition/recomposition operation enables prediction of the optimal reference input assuring optimal tracking of a previously unseen trajectory, without relearning by repetitions, as it was in level L2. Hence, the HLF enables control systems to generalize their tracking behavior to new scenarios by extrapolating their current knowledge base. The proposed HLF framework endows the CLCSs with learning, memorization and generalization features which are specific to intelligent organisms. This may be considered as an advancement towards intelligent, generalizable and adaptive control systems.
Collapse
|
17
|
Liu P, Zhang H, Sun J, Tan Z. Event-triggered adaptive integral reinforcement learning method for zero-sum differential games of nonlinear systems with incomplete known dynamics. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07010-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
18
|
Robust Tracking Control for Non-Zero-Sum Games of Continuous-Time Uncertain Nonlinear Systems. MATHEMATICS 2022. [DOI: 10.3390/math10111904] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
In this paper, a new adaptive critic design is proposed to approximate the online Nash equilibrium solution for the robust trajectory tracking control of non-zero-sum (NZS) games for continuous-time uncertain nonlinear systems. First, the augmented system was constructed by combining the tracking error and the reference trajectory. By modifying the cost function, the robust tracking control problem was transformed into an optimal tracking control problem. Based on adaptive dynamic programming (ADP), a single critic neural network (NN) was applied for each player to solve the coupled Hamilton–Jacobi–Bellman (HJB) equations approximately, and the obtained control laws were regarded as the feedback Nash equilibrium. Two additional terms were introduced in the weight update law of each critic NN, which strengthened the weight update process and eliminated the strict requirements for the initial stability control policy. More importantly, in theory, through the Lyapunov theory, the stability of the closed-loop system was guaranteed, and the robust tracking performance was analyzed. Finally, the effectiveness of the proposed scheme was verified by two examples.
Collapse
|
19
|
Off-policy algorithm based Hierarchical optimal control for completely unknown dynamic systems. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.11.077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
20
|
Dong F, Jin D, Zhao X, Han J, Lu W. A non-cooperative game approach to the robust control design for a class of fuzzy dynamical systems. ISA TRANSACTIONS 2022; 125:119-133. [PMID: 34238520 DOI: 10.1016/j.isatra.2021.06.031] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 06/21/2021] [Accepted: 06/22/2021] [Indexed: 06/13/2023]
Abstract
Aiming at the time-varying bounded uncertainty in the mechanical system and the problem of excessive control gain caused by the "worst case", a robust control with tunable parameters that can be used to compensate for the uncertainty is proposed. The proposed control can guarantee that the system has uniform boundedness and uniform ultimate boundedness. Besides, the fuzzy set theory is used to describe the uncertainty to avoid introducing any IF-THEN fuzzy logic rules or probability theory. Based on the non-cooperative game theory, the parameters of the proposed control are optimized. We treat mutually independent parameters to be optimized as players, then design performance indexes for each parameter as its cost function, and solve the optimal result by the definition of Nash equilibrium. Finally, taking the trajectory tracking control of omnidirectional mobile platform as an example, the feasibility of the optimization method and the effectiveness of the control are verified by comparing with the other two control methods.
Collapse
Affiliation(s)
- Fangfang Dong
- School of Mechanical Engineering, Hefei University of Technology, Hefei, Anhui, 230009, PR China; Anhui Engineering Laboratory of Intelligent CNC Technology and Equipment, Hefei, Anhui, 230009, PR China
| | - Dong Jin
- School of Mechanical Engineering, Hefei University of Technology, Hefei, Anhui, 230009, PR China; Anhui Engineering Laboratory of Intelligent CNC Technology and Equipment, Hefei, Anhui, 230009, PR China
| | - Xiaomin Zhao
- School of Automotive and Transportation Engineering, Hefei University of Technology, Hefei, Anhui, 230009, PR China.
| | - Jiang Han
- School of Mechanical Engineering, Hefei University of Technology, Hefei, Anhui, 230009, PR China; Anhui Engineering Laboratory of Intelligent CNC Technology and Equipment, Hefei, Anhui, 230009, PR China
| | - Wei Lu
- Anhui Jiexun Optoelectronic Technology Co., Ltd, Hefei, Anhui, 230009, PR China
| |
Collapse
|
21
|
|
22
|
Yang X, He H. Event-Driven H ∞-Constrained Control Using Adaptive Critic Learning. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:4860-4872. [PMID: 32112694 DOI: 10.1109/tcyb.2020.2972748] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This article considers an event-driven H∞ control problem of continuous-time nonlinear systems with asymmetric input constraints. Initially, the H∞ -constrained control problem is converted into a two-person zero-sum game with the discounted nonquadratic cost function. Then, we present the event-driven Hamilton-Jacobi-Isaacs equation (HJIE) associated with the two-person zero-sum game. Meanwhile, we develop a novel event-triggering condition making Zeno behavior excluded. The present event-triggering condition differs from the existing literature in that it can make the triggering threshold non-negative without the requirement of properly selecting the prescribed level of disturbance attenuation. After that, under the framework of adaptive critic learning, we use a single critic network to solve the event-driven HJIE and tune its weight parameters by using historical and instantaneous state data simultaneously. Based on the Lyapunov approach, we demonstrate that the uniform ultimate boundedness of all the signals in the closed-loop system is guaranteed. Finally, simulations of a nonlinear plant are presented to validate the developed event-driven H∞ control strategy.
Collapse
|
23
|
Online event-based adaptive critic design with experience replay to solve partially unknown multi-player nonzero-sum games. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.05.087] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
24
|
Xue S, Luo B, Liu D. Event-Triggered Adaptive Dynamic Programming for Unmatched Uncertain Nonlinear Continuous-Time Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:2939-2951. [PMID: 32721899 DOI: 10.1109/tnnls.2020.3009015] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this article, an event-triggered adaptive dynamic programming (ADP) method is proposed to solve the robust control problem of unmatched uncertain systems. First, the robust control problem with unmatched uncertainties is transformed into the optimal control design for an auxiliary system. Subsequently, to reduce controller executions and save computational and communication resources, an event-triggering mechanism is introduced. By using a critic neural network (NN) to approximate the value function, novel concurrent learning is developed to learn NN weights, which avoids the requirement of an initial admissible control and the persistence of excitation condition. Moreover, it is proven that the developed event-triggered ADP controller guarantees the robustness of the uncertain system and the uniform ultimate boundedness of the NN weight estimation error. Finally, by using the F-16 aircraft and the inverted pendulum with unmatched uncertainties as examples, the simulation results show the effectiveness of the developed event-triggered ADP method.
Collapse
|
25
|
Song R, Wei Q, Zhang H, Lewis FL. Discrete-Time Non-Zero-Sum Games With Completely Unknown Dynamics. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:2929-2943. [PMID: 31902792 DOI: 10.1109/tcyb.2019.2957406] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, off-policy reinforcement learning (RL) algorithm is established to solve the discrete-time N -player nonzero-sum (NZS) games with completely unknown dynamics. The N -coupled generalized algebraic Riccati equations (GARE) are derived, and then policy iteration (PI) algorithm is used to obtain the N -tuple of iterative control and iterative value function. As the system dynamics is necessary in PI algorithm, off-policy RL method is developed for discrete-time N -player NZS games. The off-policy N -coupled Hamilton-Jacobi (HJ) equation is derived based on quadratic value functions. According to the Kronecker product, the N -coupled HJ equation is decomposed into unknown parameter part and the system operation data part, which makes the N -coupled HJ equation solved independent of system dynamics. The least square is used to calculate the iterative value function and N -tuple of iterative control. The existence of Nash equilibrium is proved. The result of the proposed method for discrete-time unknown dynamics NZS games is indicated by the simulation examples.
Collapse
|
26
|
Mu C, Peng J, Tang Y. Learning‐based control for discrete‐time constrained nonzero‐sum games. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2021. [DOI: 10.1049/cit2.12015] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
- Chaoxu Mu
- School of Electrical and Information Engineering Tianjin University Tianjin China
| | - Jiangwen Peng
- School of Electrical and Information Engineering Tianjin University Tianjin China
| | - Yufei Tang
- Department of Computer Electrical Engineering and Computer Science Florida Atlantic University USA
| |
Collapse
|
27
|
Li H, Wu Y, Chen M. Adaptive Fault-Tolerant Tracking Control for Discrete-Time Multiagent Systems via Reinforcement Learning Algorithm. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:1163-1174. [PMID: 32386171 DOI: 10.1109/tcyb.2020.2982168] [Citation(s) in RCA: 55] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
This article investigates the adaptive fault-tolerant tracking control problem for a class of discrete-time multiagent systems via a reinforcement learning algorithm. The action neural networks (NNs) are used to approximate unknown and desired control input signals, and the critic NNs are employed to estimate the cost function in the design procedure. Furthermore, the direct adaptive optimal controllers are designed by combining the backstepping technique with the reinforcement learning algorithm. Comparing the existing reinforcement learning algorithm, the computational burden can be effectively reduced by using the method of less learning parameters. The adaptive auxiliary signals are established to compensate for the influence of the dead zones and actuator faults on the control performance. Based on the Lyapunov stability theory, it is proved that all signals of the closed-loop system are semiglobally uniformly ultimately bounded. Finally, some simulation results are presented to illustrate the effectiveness of the proposed approach.
Collapse
|
28
|
Yang X, He H. Decentralized Event-Triggered Control for a Class of Nonlinear-Interconnected Systems Using Reinforcement Learning. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:635-648. [PMID: 31670691 DOI: 10.1109/tcyb.2019.2946122] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, we propose a novel decentralized event-triggered control (ETC) scheme for a class of continuous-time nonlinear systems with matched interconnections. The present interconnected systems differ from most of the existing interconnected plants in that their equilibrium points are no longer assumed to be zero. Initially, we establish a theorem to indicate that the decentralized ETC law for the overall system can be represented by an array of optimal ETC laws for nominal subsystems. Then, to obtain these optimal ETC laws, we develop a reinforcement learning (RL)-based method to solve the Hamilton-Jacobi-Bellman equations arising in the discounted-cost optimal ETC problems of the nominal subsystems. Meanwhile, we only use critic networks to implement the RL-based approach and tune the critic network weight vectors by using the gradient descent method and the concurrent learning technique together. With the proposed weight vectors tuning rule, we are able to not only relax the persistence of the excitation condition but also ensure the critic network weight vectors to be uniformly ultimately bounded. Moreover, by utilizing the Lyapunov method, we prove that the obtained decentralized ETC law can force the entire system to be stable in the sense of uniform ultimate boundedness. Finally, we validate the proposed decentralized ETC strategy through simulations of the nonlinear-interconnected systems derived from two inverted pendulums connected via a spring.
Collapse
|
29
|
Zhang Y, Zhao B, Liu D. Event-triggered adaptive dynamic programming for multi-player zero-sum games with unknown dynamics. Soft comput 2021. [DOI: 10.1007/s00500-020-05293-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
30
|
Su H, Zhang H, Jiang H, Wen Y. Decentralized Event-Triggered Adaptive Control of Discrete-Time Nonzero-Sum Games Over Wireless Sensor-Actuator Networks With Input Constraints. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:4254-4266. [PMID: 31940556 DOI: 10.1109/tnnls.2019.2953613] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This article studies an event-triggered communication and adaptive dynamic programming (ADP) co-design control method for the multiplayer nonzero-sum (NZS) games of a class of nonlinear discrete-time wireless sensor-actuator network (WSAN) systems subject to input constraints. By virtue of the ADP algorithm, the critic and actor networks are established to attain the approximate Nash equilibrium point solution in the context of the constrained control mechanism. Simultaneously, as the sensors and actuators are physically distributed, a decentralized event-triggered communication protocol is presented, accompanied by a dead-zone operation which avoids the unnecessary events. By predefining the triggering thresholds and compensation values, a novel adaptive triggering condition is derived to guarantee the stability of the event-based closed-loop control system. Then resorting to the Lyapunov theory, the system states and the critic/actor network weight estimation errors are proven to be ultimately bounded. Moreover, an explicit analysis on the nontriviality of the interevent times is also provided. Finally, two numerical examples are conducted to validate the effectiveness of the proposed method.
Collapse
|
31
|
Wei Q, Song R, Liao Z, Li B, Lewis FL. Discrete-Time Impulsive Adaptive Dynamic Programming. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:4293-4306. [PMID: 30990209 DOI: 10.1109/tcyb.2019.2906694] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, a new iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal impulsive control problems for infinite horizon discrete-time nonlinear systems. Considering the constraint of the impulsive interval, in each iteration, the iterative impulsive value function under each possible impulsive interval is obtained, and then the iterative value function and iterative control law are achieved. A new convergence analysis method is developed which proves an iterative value function to converge to the optimum as the iteration index increases to infinity. The properties of the iterative control law are analyzed, and the detailed implementation of the optimal impulsive control law is presented. Finally, two simulation examples with comparisons are given to show the effectiveness of the developed method.
Collapse
|
32
|
Neural networks-based optimal tracking control for nonzero-sum games of multi-player continuous-time nonlinear systems via reinforcement learning. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.06.083] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
33
|
Jiang H, Zhang H, Xie X. Critic-only adaptive dynamic programming algorithms' applications to the secure control of cyber-physical systems. ISA TRANSACTIONS 2020; 104:138-144. [PMID: 30853105 DOI: 10.1016/j.isatra.2019.02.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 01/22/2019] [Accepted: 02/14/2019] [Indexed: 06/09/2023]
Abstract
Industrial cyber-physical systems generally suffer from the malicious attacks and unmatched perturbation, and thus the security issue is always the core research topic in the related fields. This paper proposes a novel intelligent secure control scheme, which integrates optimal control theory, zero-sum game theory, reinforcement learning and neural networks. First, the secure control problem of the compromised system is converted into the zero-sum game issue of the nominal auxiliary system, and then both policy-iteration-based and value-iteration-based adaptive dynamic programming methods are introduced to solve the Hamilton-Jacobi-Isaacs equations. The proposed secure control scheme can mitigate the effects of actuator attacks and unmatched perturbation, and stabilize the compromised cyber-physical systems by tuning the system performance parameters, which is proved through the Lyapunov stability theory. Finally, the proposed approach is applied to the Quanser helicopter to verify the effectiveness.
Collapse
Affiliation(s)
- He Jiang
- College of Information Science and Engineering, Northeastern University, Box 134, 110819, Shenyang, PR China.
| | - Huaguang Zhang
- College of Information Science and Engineering, Northeastern University, Box 134, 110819, Shenyang, PR China.
| | - Xiangpeng Xie
- Institute of Advanced Technology, Nanjing University of Posts and Telecommunications, 210003, Nanjing, PR China.
| |
Collapse
|
34
|
Li H, Zhang Q, Zhao D. Deep Reinforcement Learning-Based Automatic Exploration for Navigation in Unknown Environment. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2064-2076. [PMID: 31398138 DOI: 10.1109/tnnls.2019.2927869] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This paper investigates the automatic exploration problem under the unknown environment, which is the key point of applying the robotic system to some social tasks. The solution to this problem via stacking decision rules is impossible to cover various environments and sensor properties. Learning-based control methods are adaptive for these scenarios. However, these methods are damaged by low learning efficiency and awkward transferability from simulation to reality. In this paper, we construct a general exploration framework via decomposing the exploration process into the decision, planning, and mapping modules, which increases the modularity of the robotic system. Based on this framework, we propose a deep reinforcement learning-based decision algorithm that uses a deep neural network to learning exploration strategy from the partial map. The results show that this proposed algorithm has better learning efficiency and adaptability for unknown environments. In addition, we conduct the experiments on the physical robot, and the results suggest that the learned policy can be well transferred from simulation to the real robot.
Collapse
|
35
|
Mu C, Wang K, Zhang Q, Zhao D. Hierarchical optimal control for input-affine nonlinear systems through the formulation of Stackelberg game. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2019.12.078] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
36
|
Lv Y, Ren X, Na J. Online Nash-optimization tracking control of multi-motor driven load system with simplified RL scheme. ISA TRANSACTIONS 2020; 98:251-262. [PMID: 31439393 DOI: 10.1016/j.isatra.2019.08.025] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Revised: 08/13/2019] [Accepted: 08/13/2019] [Indexed: 06/10/2023]
Abstract
Although the optimal tracking control problem (OTCP) has been addressed recently, only the single-input system is considered in the recent literature. In this paper, the OTCP of unknown multi-motor driven load systems (MMDLS) is addressed based on a simplified reinforcement learning (RL) structure, where all the motor inputs with different dynamics will be obtained as a Nash equilibrium. Thus, the performance indexes associated with each input can be optimized as an outcome of a Nash equilibrium. Firstly, we use an identifier to reconstruct MMDLS dynamics, such that the accurate model required in the general control design is avoided. We use the identified dynamics to drive Nash-optimization inputs, which include the steady-state controls and the RL-based controls. The steady-state controls are designed with the identified system model. The RL-based controls are designed using the optimization method with the simplified RL-based critic NN schemes. We use the simplified RL structures to approximate the cost function of each motor input in the optimal control design. The NN weights of both the identified algorithm and simplified RL-based structure are approximated by using a novel adaptation algorithm, where the learning gains can be optimized adaptively. The weight convergences and the Nash-optimization MMDLS stability are all proved. Finally, numerical MMDLS simulations are implemented to show the correctness and the improved performance of the proposed methods.
Collapse
Affiliation(s)
- Yongfeng Lv
- School of Automation, Beijing Institute of Technology, Beijing 100081, China
| | - Xuemei Ren
- School of Automation, Beijing Institute of Technology, Beijing 100081, China.
| | - Jing Na
- Faculty of Mechanical & Electrical Engineering, Kunming University of Science & Technology, Kunming 650500, China
| |
Collapse
|
37
|
Sahoo A, Narayanan V. Differential-game for resource aware approximate optimal control of large-scale nonlinear systems with multiple players. Neural Netw 2020; 124:95-108. [PMID: 31986447 DOI: 10.1016/j.neunet.2019.12.031] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Revised: 12/08/2019] [Accepted: 12/30/2019] [Indexed: 11/29/2022]
Abstract
In this paper, we propose a novel differential-game based neural network (NN) control architecture to solve an optimal control problem for a class of large-scale nonlinear systems involving N-players. We focus on optimizing the usage of the computational resources along with the system performance simultaneously. In particular, the N-players' control policies are desired to be designed such that they cooperatively optimize the large-scale system performance, and the sampling intervals for each player are desired to reduce the frequency of feedback execution. To develop a unified design framework that achieves both these objectives, we propose an optimal control problem by integrating both the design requirements, which leads to a multi-player differential-game. A solution to this problem is numerically obtained by solving the associated Hamilton-Jacobi (HJ) equation using event-driven approximate dynamic programming (E-ADP) and artificial NNs online and forward-in-time. We employ the critic neural networks to approximate the solution to the HJ equation, i.e., the optimal value function, with aperiodically available feedback information. Using the NN approximated value function, we design the control policies and the sampling schemes. Finally, the event-driven N-player system is remodeled as a hybrid dynamical system with impulsive weight update rules for analyzing its stability and convergence properties. The closed-loop practical stability of the system and Zeno free behavior of the sampling scheme are demonstrated using the Lyapunov method. Simulation results using a numerical example are also included to substantiate the analytical results.
Collapse
Affiliation(s)
- Avimanyu Sahoo
- 555 Engineering North, Division of Engineering Technology, Oklahoma State University, Stillwater, OK 74078, United States of America.
| | | |
Collapse
|
38
|
Ni Z, Malla N, Zhong X. Prioritizing Useful Experience Replay for Heuristic Dynamic Programming-Based Learning Systems. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:3911-3922. [PMID: 30059327 DOI: 10.1109/tcyb.2018.2853582] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The adaptive dynamic programming controller usually needs a long training period because the data usage efficiency is relatively low by discarding the samples once used. Prioritized experience replay (ER) promotes important experiences and is more efficient in learning the control process. This paper proposes integrating an efficient learning capability of prioritized ER design into heuristic dynamic programming (HDP). First, a one time-step backward state-action pair is used to design the ER tuple and, thus, avoids the model network. Second, a systematic approach is proposed to integrate the ER into both critic and action networks of HDP controller design. The proposed approach is tested for two case studies: a cart-pole balancing task and a triple-link pendulum balancing task. For fair comparison, we set the same initial weight parameters and initial starting states for both traditional HDP and the proposed approach under the same simulation environment. The proposed approach improves the required average number of trials to succeed by 60.56% for cart-pole, and 56.89% for triple-link balancing tasks, in comparison with the traditional HDP approach. Also, we have added results of ER-based HDP for comparison. Moreover, theoretical convergence analysis is presented to guarantee the stability of the proposed control design.
Collapse
|
39
|
Zhang P, Yuan Y, Yang H, Liu H. Near-Nash Equilibrium Control Strategy for Discrete-Time Nonlinear Systems With Round-Robin Protocol. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:2478-2492. [PMID: 30602423 DOI: 10.1109/tnnls.2018.2884674] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, the near-Nash equilibrium (NE) control strategies are investigated for a class of discrete-time nonlinear systems subjected to the round-robin protocol (RRP). In the studied systems, three types of complexities, namely, the additive nonlinearities, the RRP, and the output feedback form of controllers, are simultaneously taken into consideration. To tackle these complexities, an approximate dynamic programing (ADP) algorithm is first developed for NE control strategies by solving the coupled Bellman's equation. Then, a Luenberger-type observer is designed under the RRP scheduling to estimate the system states. The near-NE control strategies are implemented via the actor-critic neural networks. More importantly, the stability analysis of the closed-loop system is conducted to guarantee that the studied system with the proposed control strategies is bounded stable. Finally, simulation results are provided to demonstrate the validity of the proposed method.
Collapse
|
40
|
Zhang Q, Zhao D. Data-Based Reinforcement Learning for Nonzero-Sum Games With Unknown Drift Dynamics. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:2874-2885. [PMID: 29994780 DOI: 10.1109/tcyb.2018.2830820] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper is concerned about the nonlinear optimization problem of nonzero-sum (NZS) games with unknown drift dynamics. The data-based integral reinforcement learning (IRL) method is proposed to approximate the Nash equilibrium of NZS games iteratively. Furthermore, we prove that the data-based IRL method is equivalent to the model-based policy iteration algorithm, which guarantees the convergence of the proposed method. For the implementation purpose, a single-critic neural network structure for the NZS games is given. To enhance the application capability of the data-based IRL method, we design the updating laws of critic weights based on the offline and online iterative learning methods, respectively. Note that the experience replay technique is introduced in the online iterative learning, which can improve the convergence rate of critic weights during the learning process. The uniform ultimate boundedness of the critic weights are guaranteed using the Lyapunov method. Finally, the numerical results demonstrate the effectiveness of the data-based IRL algorithm for nonlinear NZS games with unknown drift dynamics.
Collapse
|
41
|
Neural-network-based learning algorithms for cooperative games of discrete-time multi-player systems with control constraints via adaptive dynamic programming. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.02.107] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
42
|
Yang X, He H. Adaptive Critic Designs for Event-Triggered Robust Control of Nonlinear Systems With Unknown Dynamics. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:2255-2267. [PMID: 29993650 DOI: 10.1109/tcyb.2018.2823199] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper develops a novel event-triggered robust control strategy for continuous-time nonlinear systems with unknown dynamics. To begin with, the event-triggered robust nonlinear control problem is transformed into an event-triggered nonlinear optimal control problem by introducing an infinite-horizon integral cost for the nominal system. Then, a recurrent neural network (RNN) and adaptive critic designs (ACDs) are employed to solve the derived event-triggered nonlinear optimal control problem. The RNN is applied to reconstruct the system dynamics based on collected system data. After acquiring the knowledge of system dynamics, a unique critic network is proposed to obtain the approximate solution of the event-triggered Hamilton-Jacobi-Bellman equation within the framework of ACDs. The critic network is updated by using simultaneously historical and instantaneous state data. An advantage of the present critic network update law is that it can relax the persistence of excitation condition. Meanwhile, under a newly developed event-triggering condition, the proposed critic network tuning rule not only guarantees the critic network weights to converge to optimums but also ensures nominal system states to be uniformly ultimately bounded. Moreover, by using Lyapunov method, it is proved that the derived optimal event-triggered control (ETC) guarantees uniform ultimate boundedness of all the signals in the original system. Finally, a nonlinear oscillator and an unstable power system are provided to validate the developed robust ETC scheme.
Collapse
|
43
|
Li D, Zhao D, Zhang Q, Chen Y. Reinforcement Learning and Deep Learning Based Lateral Control for Autonomous Driving [Application Notes]. IEEE COMPUT INTELL M 2019. [DOI: 10.1109/mci.2019.2901089] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
44
|
Song R, Zhu L. Stable value iteration for two-player zero-sum game of discrete-time nonlinear systems based on adaptive dynamic programming. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.03.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
45
|
Yan R, Shi Z, Zhong Y. Reach-Avoid Games With Two Defenders and One Attacker: An Analytical Approach. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:1035-1046. [PMID: 29994434 DOI: 10.1109/tcyb.2018.2794769] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper considers a reach-avoid game on a rectangular domain with two defenders and one attacker. The attacker aims to reach a specified edge of the game domain boundary, while the defenders strive to prevent that by capturing the attacker. First, we are concerned with the barrier, which is the boundary of the reach-avoid set, splitting the state space into two disjoint parts: 1) defender dominance region (DDR) and 2) attacker dominance region (ADR). For the initial states lying in the DDR, there exists a strategy for the defenders to intercept the attacker regardless of the attacker's best effort, while for the initial states lying in the ADR, the attacker can always find a successful attack strategy. We propose an attack region method to construct the barrier analytically by employing Voronoi diagram and Apollonius circle for two kinds of speed ratios. Then, by taking practical payoff functions into considerations, we present optimal strategies for the players when their initial states lie in their winning regions, and show that the ADR is divided into several parts corresponding to different strategies for the players. Numerical approaches, which suffer from inherent inaccuracy, have already been utilized for multiplayer reach-avoid games, but computational complexity complicates solving such games and consequently hinders efficient on-line applications. However, this method can obtain the exact formulation of the barrier and is applicable for real-time updates.
Collapse
|
46
|
Qu Q, Zhang H, Luo C, Yu R. Robust control design for multi-player nonlinear systems with input disturbances via adaptive dynamic programming. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.11.054] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
47
|
Luo B, Yang Y, Liu D. Adaptive -Learning for Data-Based Optimal Output Regulation With Experience Replay. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:3337-3348. [PMID: 29994038 DOI: 10.1109/tcyb.2018.2821369] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, the data-based optimal output regulation problem of discrete-time systems is investigated. An off-policy adaptive -learning (QL) method is developed by using real system data without requiring the knowledge of system dynamics and the mathematical model of utility function. By introducing the -function, an off-policy adaptive QL algorithm is developed to learn the optimal -function. An adaptive parameter in the policy evaluation is used to achieve tradeoff between the current and future -functions. The convergence of adaptive QL algorithm is proved and the influence of the adaptive parameter is analyzed. To realize the adaptive QL algorithm with real system data, the actor-critic neural network (NN) structure is developed. The least-squares scheme and the batch gradient descent method are developed to update the critic and actor NN weights, respectively. The experience replay technique is employed in the learning process, which leads to simple and convenient implementation of the adaptive QL method. Finally, the effectiveness of the developed adaptive QL method is verified through numerical simulations.
Collapse
|
48
|
Tang L, Liu YJ, Chen CLP. Adaptive Critic Design for Pure-Feedback Discrete-Time MIMO Systems Preceded by Unknown Backlashlike Hysteresis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:5681-5690. [PMID: 29993785 DOI: 10.1109/tnnls.2018.2805689] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper concentrates on the adaptive critic design (ACD) issue for a class of uncertain multi-input multioutput (MIMO) nonlinear discrete-time systems preceded by unknown backlashlike hysteresis. The considered systems are in a block-triangular pure-feedback form, in which there exist nonaffine functions and couplings between states and inputs. This makes that the ACD-based optimal control becomes very difficult and complicated. To this end, the mean value theorem is employed to transform the original systems into input-output models. Based on the reinforcement learning algorithm, the optimal control strategy is established with an actor-critic structure. Not only the stability of the systems is ensured but also the performance index is minimized. In contrast to the previous results, the main contributions are: 1) it is the first time to build an ACD framework for such MIMO systems with unknown hysteresis and 2) an adaptive auxiliary signal is developed to compensate the influence of hysteresis. In the end, a numerical study is provided to demonstrate the effectiveness of the present method.
Collapse
|
49
|
Jiang H, Zhang H, Han J, Zhang K. Iterative adaptive dynamic programming methods with neural network implementation for multi-player zero-sum games. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.04.005] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
50
|
Pan J, Wang X, Cheng Y, Yu Q, Yu Q, Cheng Y, Pan J, Wang X. Multisource Transfer Double DQN Based on Actor Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2227-2238. [PMID: 29771674 DOI: 10.1109/tnnls.2018.2806087] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Deep reinforcement learning (RL) comprehensively uses the psychological mechanisms of "trial and error" and "reward and punishment" in RL as well as powerful feature expression and nonlinear mapping in deep learning. Currently, it plays an essential role in the fields of artificial intelligence and machine learning. Since an RL agent needs to constantly interact with its surroundings, the deep Q network (DQN) is inevitably faced with the need to learn numerous network parameters, which results in low learning efficiency. In this paper, a multisource transfer double DQN (MTDDQN) based on actor learning is proposed. The transfer learning technique is integrated with deep RL to make the RL agent collect, summarize, and transfer action knowledge, including policy mimic and feature regression, to the training of related tasks. There exists action overestimation in DQN, i.e., the lower probability limit of action corresponding to the maximum Q value is nonzero. Therefore, the transfer network is trained by using double DQN to eliminate the error accumulation caused by action overestimation. In addition, to avoid negative transfer, i.e., to ensure strong correlations between source and target tasks, a multisource transfer learning mechanism is applied. The Atari2600 game is tested on the arcade learning environment platform to evaluate the feasibility and performance of MTDDQN by comparing it with some mainstream approaches, such as DQN and double DQN. Experiments prove that MTDDQN achieves not only human-like actor learning transfer capability, but also the desired learning efficiency and testing accuracy on target task.
Collapse
|