1
|
Wang Y, Wang D, Zhao M, Liu N, Qiao J. Neural Q-learning for discrete-time nonlinear zero-sum games with adjustable convergence rate. Neural Netw 2024; 175:106274. [PMID: 38583264 DOI: 10.1016/j.neunet.2024.106274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 03/15/2024] [Accepted: 03/25/2024] [Indexed: 04/09/2024]
Abstract
In this paper, an adjustable Q-learning scheme is developed to solve the discrete-time nonlinear zero-sum game problem, which can accelerate the convergence rate of the iterative Q-function sequence. First, the monotonicity and convergence of the iterative Q-function sequence are analyzed under some conditions. Moreover, by employing neural networks, the model-free tracking control problem can be overcome for zero-sum games. Second, two practical algorithms are designed to guarantee the convergence with accelerated learning. In one algorithm, an adjustable acceleration phase is added to the iteration process of Q-learning, which can be adaptively terminated with convergence guarantee. In another algorithm, a novel acceleration function is developed, which can adjust the relaxation factor to ensure the convergence. Finally, through a simulation example with the practical physical background, the fantastic performance of the developed algorithm is demonstrated with neural networks.
Collapse
Affiliation(s)
- Yuan Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Ding Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Mingming Zhao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Nan Liu
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Junfei Qiao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
2
|
Liu Q, Yan H, Wang M, Li Z, Liu S. Data-Driven Optimal Bipartite Consensus Control for Second-Order Multiagent Systems via Policy Gradient Reinforcement Learning. IEEE TRANSACTIONS ON CYBERNETICS 2024; 54:3468-3478. [PMID: 37307179 DOI: 10.1109/tcyb.2023.3276797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
This article investigates the optimal bipartite consensus control (OBCC) problem for unknown second-order discrete-time multiagent systems (MASs). First, the coopetition network is constructed to describe the cooperative and competitive relationships between agents, and the OBCC problem is proposed by the tracking error and related performance index function. Based on the distributed policy gradient reinforcement learning (RL) theory, a data-driven distributed optimal control strategy is obtained to guarantee the bipartite consensus of all agents' position and velocity states. In addition, the offline data sets ensure the learning efficiency of the system. These data sets are generated by running the system in real time. Besides, the designed algorithm is an asynchronous version, which is essential to solve the challenge caused by the computational ability difference between nodes in MASs. Then, by means of the functional analysis and Lyapunov theory, the stability of the proposed MASs and the convergence of the learning process are analyzed. Furthermore, an actor-critic structure containing two neural networks is used to implement the proposed methods. Finally, a numerical simulation shows the effectiveness and validity of the results.
Collapse
|
3
|
Mu C, Peng J, Sun C. Hierarchical Multiagent Formation Control Scheme via Actor-Critic Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:8764-8777. [PMID: 35302940 DOI: 10.1109/tnnls.2022.3153028] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
This article presents a nearly optimal solution to the cooperative formation control problem for large-scale multiagent system (MAS). First, multigroup technique is widely used for the decomposition of the large-scale problem, but there is no consensus between different subgroups. Inspired by the hierarchical structure applied in the MAS, a hierarchical leader-following formation control structure with multigroup technique is constructed, where two layers and three types of agents are designed. Second, adaptive dynamic programming technique is conformed to the optimal formation control problem by the establishment of performance index function. Based on the traditional generalized policy iteration (PI) algorithm, the multistep generalized policy iteration (MsGPI) is developed with the modification of policy evaluation. The novel algorithm not only inherits the advantages of high convergence speed and low computational complexity in the generalized PI algorithm but also further accelerates the convergence speed and reduces run time. Besides, the stability analysis, convergence analysis, and optimality analysis are given for the proposed multistep PI algorithm. Afterward, a neural network-based actor-critic structure is built for approximating the iterative control policies and value functions. Finally, a large-scale formation control problem is provided to demonstrate the performance of our developed hierarchical leader-following formation control structure and MsGPI algorithm.
Collapse
|
4
|
Yang X, Zhang H, Wang Z, Yan H, Zhang C. Data-Based Predictive Control via Multistep Policy Gradient Reinforcement Learning. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:2818-2828. [PMID: 34752414 DOI: 10.1109/tcyb.2021.3121078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this article, a model-free predictive control algorithm for the real-time system is presented. The algorithm is data driven and is able to improve system performance based on multistep policy gradient reinforcement learning. By learning from the offline dataset and real-time data, the knowledge of system dynamics is avoided in algorithm design and application. Cooperative games of the multiplayer in time horizon are presented to model the predictive control as optimization problems of multiagent and guarantee the optimality of the predictive control policy. In order to implement the algorithm, neural networks are used to approximate the action-state value function and predictive control policy, respectively. The weights are determined by using the methods of weighted residual. Numerical results show the effectiveness of the proposed algorithm.
Collapse
|
5
|
Li M, Wang D, Zhao M, Qiao J. Event-triggered constrained neural critic control of nonlinear continuous-time multiplayer nonzero-sum games. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.02.081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2023]
|
6
|
Lin M, Zhao B, Liu D. Policy gradient adaptive dynamic programming for nonlinear discrete-time zero-sum games with unknown dynamics. Soft comput 2023. [DOI: 10.1007/s00500-023-07817-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
7
|
Liu M, Cai Q, Li D, Meng W, Fu M. Output feedback Q-Learning for Discrete-time Finite-Horizon Zero-Sum Games with Application to the H∞ Control. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.01.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
|
8
|
Ha M, Wang D, Liu D. Offline and Online Adaptive Critic Control Designs With Stability Guarantee Through Value Iteration. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:13262-13274. [PMID: 34516384 DOI: 10.1109/tcyb.2021.3107801] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This article is concerned with the stability of the closed-loop system using various control policies generated by value iteration. Some stability properties involving admissibility criteria, the attraction domain, and so forth, are investigated. An offline integrated value iteration (VI) scheme with a stability guarantee is developed by combining the advantages of VI and policy iteration, which is convenient to obtain admissible control policies. Also, based on the concept of attraction domain, an online adaptive dynamic programming algorithm using immature control policies is developed. Remarkably, it is ensured that the state trajectory under the online algorithm converges to the origin. Particularly, for linear systems, the online ADP algorithm with a general scheme possesses more enhanced stability property. The theoretical results reveal that the stability of the linear system can be guaranteed even if the control policy sequence includes finite unstable elements. The numerical results verify the effectiveness of the present algorithms.
Collapse
|
9
|
Optimal antisynchronization control for unknown multiagent systems with deep deterministic policy gradient approach. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.12.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
10
|
Xue S, Luo B, Liu D, Gao Y. Neural network-based event-triggered integral reinforcement learning for constrained H∞ tracking control with experience replay. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
11
|
Zhang Z, Xu J, Fu M. Q-Learning for Feedback Nash Strategy of Finite-Horizon Nonzero-Sum Difference Games. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:9170-9178. [PMID: 33710965 DOI: 10.1109/tcyb.2021.3052832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this article, we study the feedback Nash strategy of the model-free nonzero-sum difference game. The main contribution is to present the Q -learning algorithm for the linear quadratic game without prior knowledge of the system model. It is noted that the studied game is in finite horizon which is novel to the learning algorithms in the literature which are mostly for the infinite-horizon Nash strategy. The key is to characterize the Q -factors in terms of the arbitrary control input and state information. A numerical example is given to verify the effectiveness of the proposed algorithm.
Collapse
|
12
|
Zhang Y, Li S, Weng J. Learning and Near-Optimal Control of Underactuated Surface Vessels With Periodic Disturbances. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:7453-7463. [PMID: 33400666 DOI: 10.1109/tcyb.2020.3041368] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this article, we propose a novel learning and near-optimal control approach for underactuated surface (USV) vessels with unknown mismatched periodic external disturbances and unknown hydrodynamic parameters. Given a prior knowledge of the periods of the disturbances, an analytical near-optimal control law is derived through the approximation of the integral-type quadratic performance index with respect to the tracking error, where the equivalent unknown parameters are generated online by an auxiliary system that can learn the dynamics of the controlled system. It is proved that the state differences between the auxiliary system and the corresponding controlled USV vessel are globally asymptotically convergent to zero. Besides, the approach theoretically guarantees asymptotic optimality of the performance index. The efficacy of the method is demonstrated via simulations based on the real parameters of an USV vessel.
Collapse
|
13
|
Yang X, Zhang H, Wang Z. Data-Based Optimal Consensus Control for Multiagent Systems With Policy Gradient Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:3872-3883. [PMID: 33587707 DOI: 10.1109/tnnls.2021.3054685] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article investigates the optimally distributed consensus control problem for discrete-time multiagent systems with completely unknown dynamics and computational ability differences. The problem can be viewed as solving nonzero-sum games with distributed reinforcement learning (RL), and each agent is a player in these games. First, to guarantee the real-time performance of learning algorithms, a data-based distributed control algorithm is proposed for multiagent systems using offline system interaction data sets. By utilizing the interactive data produced during the run of a real-time system, the proposed algorithm improves system performance based on distributed policy gradient RL. The convergence and stability are guaranteed based on functional analysis and the Lyapunov method. Second, to address asynchronous learning caused by computational ability differences in multiagent systems, the proposed algorithm is extended to an asynchronous version in which executing policy improvement or not of each agent is independent of its neighbors. Furthermore, an actor-critic structure, which contains two neural networks, is developed to implement the proposed algorithm in synchronous and asynchronous cases. Based on the method of weighted residuals, the convergence and optimality of the neural networks are guaranteed by proving the approximation errors converge to zero. Finally, simulations are conducted to show the effectiveness of the proposed algorithm.
Collapse
|
14
|
A DRL based cooperative approach for parking space allocation in an automated valet parking system. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03757-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
15
|
Yuan L, Li T, Tong S, Xiao Y, Gao X. NN adaptive optimal tracking control for a class of uncertain nonstrict feedback nonlinear systems. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.03.049] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
16
|
Huang M, Jiang ZP, Ozbay K. Learning-Based Adaptive Optimal Control for Connected Vehicles in Mixed Traffic: Robustness to Driver Reaction Time. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:5267-5277. [PMID: 33170792 DOI: 10.1109/tcyb.2020.3029077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Through vehicle-to-vehicle (V2V) communication, both human-driven and autonomous vehicles can actively exchange data, such as velocities and bumper-to-bumper distances. Employing the shared data, control laws with improved performance can be designed for connected and autonomous vehicles (CAVs). In this article, taking into account human-vehicle interaction and heterogeneous driver behavior, an adaptive optimal control design method is proposed for a platoon mixed with multiple preceding human-driven vehicles and one CAV at the tail. It is shown that by using reinforcement learning and adaptive dynamic programming techniques, a near-optimal controller can be learned from real-time data for the CAV with V2V communications, but without the precise knowledge of the accurate car-following parameters of any driver in the platoon. The proposed method allows the CAV controller to adapt to different platoon dynamics caused by the unknown and heterogeneous driver-dependent parameters. To improve the safety performance during the learning process, our off-policy learning algorithm can leverage both the historical data and the data collected in real time, which leads to considerably reduced learning time duration. The effectiveness and efficiency of our proposed method is demonstrated by rigorous proofs and microscopic traffic simulations.
Collapse
|
17
|
|
18
|
Ye J, Bian Y, Luo B, Hu M, Xu B, Ding R. Costate-Supplement ADP for Model-Free Optimal Control of Discrete-Time Nonlinear Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:45-59. [PMID: 35544498 DOI: 10.1109/tnnls.2022.3172126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In this article, an adaptive dynamic programming (ADP) scheme utilizing a costate function is proposed for optimal control of unknown discrete-time nonlinear systems. The state-action data are obtained by interacting with the environment under the iterative scheme without any model information. In contrast with the traditional ADP scheme, the collected data in the proposed algorithm are generated with different policies, which improves data utilization in the learning process. In order to approximate the cost function more accurately and to achieve a better policy improvement direction in the case of insufficient data, a separate costate network is introduced to approximate the costate function under the actor-critic framework, and the costate is utilized as supplement information to estimate the cost function more precisely. Furthermore, convergence properties of the proposed algorithm are analyzed to demonstrate that the costate function plays a positive role in the convergence process of the cost function based on the alternate iteration mode of the costate function and cost function under a mild assumption. The uniformly ultimately bounded (UUB) property of all the variables is proven by using the Lyapunov approach. Finally, two numerical examples are presented to demonstrate the effectiveness and computation efficiency of the proposed method.
Collapse
|
19
|
Din AFU, Mir I, Gul F, Al Nasar MR, Abualigah L. Reinforced Learning-Based Robust Control Design for Unmanned Aerial Vehicle. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2022. [DOI: 10.1007/s13369-022-06746-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
20
|
Mehrafrooz A, He F, Lalbakhsh A. Introducing a Novel Model-Free Multivariable Adaptive Neural Network Controller for Square MIMO Systems. SENSORS (BASEL, SWITZERLAND) 2022; 22:2089. [PMID: 35336257 PMCID: PMC8948623 DOI: 10.3390/s22062089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 03/02/2022] [Accepted: 03/07/2022] [Indexed: 06/14/2023]
Abstract
In this study, a novel Multivariable Adaptive Neural Network Controller (MANNC) is developed for coupled model-free n-input n-output systems. The learning algorithm of the proposed controller does not rely on the model of a system and uses only the history of the system inputs and outputs. The system is considered as a 'black box' with no pre-knowledge of its internal structure. By online monitoring and possessing the system inputs and outputs, the parameters of the controller are adjusted. Using the accumulated gradient of the system error along with the Lyapunov stability analysis, the weights' adjustment convergence of the controller can be observed, and an optimal training number of the controller can be selected. The Lyapunov stability of the system is checked during the entire weight training process to enable the controller to handle any possible nonlinearities of the system. The effectiveness of the MANNC in controlling nonlinear square multiple-input multiple-output (MIMO) systems is demonstrated via three simulation studies covering the cases of a time-invariant nonlinear MIMO system, a time-variant nonlinear MIMO system, and a hybrid MIMO system, respectively. In each case, the performance of the MANNC is compared with that of a properly selected existing counterpart. Simulation results demonstrate that the proposed MANNC is capable of controlling various types of square MIMO systems with much improved performance over its existing counterpart. The unique properties of the MANNC will make it a suitable candidate for many industrial applications.
Collapse
Affiliation(s)
- Arash Mehrafrooz
- Macquarie University College, Macquarie University, Sydney, NSW 2113, Australia;
| | - Fangpo He
- Advanced Control Systems Research Group, College of Science and Engineering, Flinders University, Adelaide, SA 5042, Australia;
| | - Ali Lalbakhsh
- School of Engineering, Macquarie University, Ryde, NSW 2109, Australia
- School of Electrical & Data Engineering, University of Technology Sydney, Sydney, NSW 2007, Australia
| |
Collapse
|
21
|
Han Z, Pedrycz W, Zhao J, Wang W. Hierarchical Granular Computing-Based Model and Its Reinforcement Structural Learning for Construction of Long-Term Prediction Intervals. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:666-676. [PMID: 32011274 DOI: 10.1109/tcyb.2020.2964011] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
As one of the most essential sources of energy, byproduct gas plays a pivotal role in the steel industry, for which the flow tendency is generally regarded as the guidance for planning and scheduling in real production. In order to obtain the numeric estimation along with its reliability, the construction of prediction intervals (PIs) is highly demanded by any practical applications as well as being long term for providing more information on future trends. Bearing this in mind, in this article, a hierarchical granular computing (HGrC)-based model is established for constructing long-term PIs, in which probabilistic modeling gives rise to a long horizon of numeric prediction, and the deployment of information granularities hierarchically extends the result to be interval-valued format. Considering that the structure of this model has a direct impact on its performance, Monte-Carlo search with a policy gradient technique is then applied for reinforcement structure learning. Compared with the existing methods, the size (length) of the granules in the proposed approach is unequal so that it becomes effective for not only periodic but also nonperiodic data. Furthermore, with the use of parallel strategy, the efficiency can be also guaranteed for real-world applications. The experimental results demonstrate that the proposed method is superior to other commonly encountered techniques, and the stability of the structure learning process behaves better when compared with other reinforcement learning approaches.
Collapse
|
22
|
Ha M, Wang D, Liu D. Neural-network-based discounted optimal control via an integrated value iteration with accuracy guarantee. Neural Netw 2021; 144:176-186. [PMID: 34500256 DOI: 10.1016/j.neunet.2021.08.025] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Revised: 08/19/2021] [Accepted: 08/19/2021] [Indexed: 10/20/2022]
Abstract
A data-based value iteration algorithm with the bidirectional approximation feature is developed for discounted optimal control. The unknown nonlinear system dynamics is first identified by establishing a model neural network. To improve the identification precision, biases are introduced to the model network. The model network with biases is trained by the gradient descent algorithm, where the weights and biases across all layers are updated. The uniform ultimate boundedness stability with a proper learning rate is analyzed, by using the Lyapunov approach. Moreover, an integrated value iteration with the discounted cost is developed to fully guarantee the approximation accuracy of the optimal value function. Then, the effectiveness of the proposed algorithm is demonstrated by carrying out two simulation examples with physical backgrounds.
Collapse
Affiliation(s)
- Mingming Ha
- School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China.
| | - Ding Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China.
| | - Derong Liu
- Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, IL 60607, USA.
| |
Collapse
|
23
|
Sun C, Li X, Sun Y. A Parallel Framework of Adaptive Dynamic Programming Algorithm With Off-Policy Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:3578-3587. [PMID: 32833647 DOI: 10.1109/tnnls.2020.3015767] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this article, a model-free online adaptive dynamic programming (ADP) approach is developed for solving the optimal control problem of nonaffine nonlinear systems. Combining the off-policy learning mechanism with the parallel paradigm, multithread agents are employed to collect the transitions by interacting with the environment that significantly augments the number of sampled data. On the other hand, each thread agent explores the environment with different initial states under its own behavior policy that enhances the exploration capability and alleviates the correlation between the sampled data. After the policy evaluation process, only one step update is required for policy improvement based on the policy gradient method. The stability of the system under iterative control laws is guaranteed. Moreover, the convergence analysis is given to prove that the iterative Q-function is monotonically nonincreasing and finally converges to the solution of the Hamilton-Jacobi-Bellman (HJB) equation. For implementing the algorithm, the actor-critic (AC) structure is utilized with two neural networks (NNs) to approximate the Q-function and the control policy. Finally, the effectiveness of the proposed algorithm is verified by two numerical examples.
Collapse
|
24
|
Yang X, He H, Zhong X. Approximate Dynamic Programming for Nonlinear-Constrained Optimizations. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:2419-2432. [PMID: 31329149 DOI: 10.1109/tcyb.2019.2926248] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this paper, we study the constrained optimization problem of a class of uncertain nonlinear interconnected systems. First, we prove that the solution of the constrained optimization problem can be obtained through solving an array of optimal control problems of constrained auxiliary subsystems. Then, under the framework of approximate dynamic programming, we present a simultaneous policy iteration (SPI) algorithm to solve the Hamilton-Jacobi-Bellman equations corresponding to the constrained auxiliary subsystems. By building an equivalence relationship, we demonstrate the convergence of the SPI algorithm. Meanwhile, we implement the SPI algorithm via an actor-critic structure, where actor networks are used to approximate optimal control policies and critic networks are applied to estimate optimal value functions. By using the least squares method and the Monte Carlo integration technique together, we are able to determine the weight vectors of actor and critic networks. Finally, we validate the developed control method through the simulation of a nonlinear interconnected plant.
Collapse
|
25
|
Wei Q, Liao Z, Yang Z, Li B, Liu D. Continuous-Time Time-Varying Policy Iteration. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:4958-4971. [PMID: 31329153 DOI: 10.1109/tcyb.2019.2926631] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
A novel policy iteration algorithm, called the continuous-time time-varying (CTTV) policy iteration algorithm, is presented in this paper to obtain the optimal control laws for infinite horizon CTTV nonlinear systems. The adaptive dynamic programming (ADP) technique is utilized to obtain the iterative control laws for the optimization of the performance index function. The properties of the CTTV policy iteration algorithm are analyzed. Monotonicity, convergence, and optimality of the iterative value function have been analyzed, and the iterative value function can be proven to monotonically converge to the optimal solution of the Hamilton-Jacobi-Bellman (HJB) equation. Furthermore, the iterative control law is guaranteed to be admissible to stabilize the nonlinear systems. In the implementation of the presented CTTV policy algorithm, the approximate iterative control laws and iterative value function are obtained by neural networks. Finally, the numerical results are given to verify the effectiveness of the presented method.
Collapse
|
26
|
|
27
|
Nguyen TT, Nguyen ND, Nahavandi S. Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:3826-3839. [PMID: 32203045 DOI: 10.1109/tcyb.2020.2977374] [Citation(s) in RCA: 99] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Reinforcement learning (RL) algorithms have been around for decades and employed to solve various sequential decision-making problems. These algorithms, however, have faced great challenges when dealing with high-dimensional environments. The recent development of deep learning has enabled RL methods to drive optimal policies for sophisticated and capable agents, which can perform efficiently in these challenging environments. This article addresses an important aspect of deep RL related to situations that require multiple agents to communicate and cooperate to solve complex tasks. A survey of different approaches to problems related to multiagent deep RL (MADRL) is presented, including nonstationarity, partial observability, continuous state and action spaces, multiagent training schemes, and multiagent transfer learning. The merits and demerits of the reviewed methods will be analyzed and discussed with their corresponding applications explored. It is envisaged that this review provides insights about various MADRL methods and can lead to the future development of more robust and highly useful multiagent learning methods for solving real-world problems.
Collapse
|
28
|
Jiang H, Zhang H, Xie X. Critic-only adaptive dynamic programming algorithms' applications to the secure control of cyber-physical systems. ISA TRANSACTIONS 2020; 104:138-144. [PMID: 30853105 DOI: 10.1016/j.isatra.2019.02.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 01/22/2019] [Accepted: 02/14/2019] [Indexed: 06/09/2023]
Abstract
Industrial cyber-physical systems generally suffer from the malicious attacks and unmatched perturbation, and thus the security issue is always the core research topic in the related fields. This paper proposes a novel intelligent secure control scheme, which integrates optimal control theory, zero-sum game theory, reinforcement learning and neural networks. First, the secure control problem of the compromised system is converted into the zero-sum game issue of the nominal auxiliary system, and then both policy-iteration-based and value-iteration-based adaptive dynamic programming methods are introduced to solve the Hamilton-Jacobi-Isaacs equations. The proposed secure control scheme can mitigate the effects of actuator attacks and unmatched perturbation, and stabilize the compromised cyber-physical systems by tuning the system performance parameters, which is proved through the Lyapunov stability theory. Finally, the proposed approach is applied to the Quanser helicopter to verify the effectiveness.
Collapse
Affiliation(s)
- He Jiang
- College of Information Science and Engineering, Northeastern University, Box 134, 110819, Shenyang, PR China.
| | - Huaguang Zhang
- College of Information Science and Engineering, Northeastern University, Box 134, 110819, Shenyang, PR China.
| | - Xiangpeng Xie
- Institute of Advanced Technology, Nanjing University of Posts and Telecommunications, 210003, Nanjing, PR China.
| |
Collapse
|
29
|
Köpf F, Westermann J, Flad M, Hohmann S. Adaptive optimal control for reference tracking independent of exo-system dynamics. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.04.140] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
30
|
Integral reinforcement learning based event-triggered control with input saturation. Neural Netw 2020; 131:144-153. [PMID: 32771844 DOI: 10.1016/j.neunet.2020.07.016] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Revised: 06/13/2020] [Accepted: 07/10/2020] [Indexed: 11/20/2022]
Abstract
In this paper, a novel integral reinforcement learning (IRL)-based event-triggered adaptive dynamic programming scheme is developed for input-saturated continuous-time nonlinear systems. By using the IRL technique, the learning system does not require the knowledge of the drift dynamics. Then, a single critic neural network is designed to approximate the unknown value function and its learning is not subjected to the requirement of an initial admissible control. In order to reduce computational and communication costs, the event-triggered control law is designed. The triggering threshold is given to guarantee the asymptotic stability of the control system. Two examples are employed in the simulation studies, and the results verify the effectiveness of the developed IRL-based event-triggered control method.
Collapse
|
31
|
Robust optimal control for a class of nonlinear systems with unknown disturbances based on disturbance observer and policy iteration. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.01.082] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
32
|
Zhang Y, Zhao B, Liu D. Deterministic policy gradient adaptive dynamic programming for model-free optimal control. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.11.032] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
33
|
Wei C, Luo J, Dai H, Duan G. Learning-Based Adaptive Attitude Control of Spacecraft Formation With Guaranteed Prescribed Performance. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:4004-4016. [PMID: 30072354 DOI: 10.1109/tcyb.2018.2857400] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper investigates a novel leader-following attitude control approach for spacecraft formation under the preassigned two-layer performance with consideration of unknown inertial parameters, external disturbance torque, and unmodeled uncertainty. First, two-layer prescribed performance is preselected for both the attitude angular and angular velocity tracking errors. Subsequently, a distributed two-layer performance controller is devised, which can guarantee that all the involved closed-loop signals are uniformly ultimately bounded. In order to tackle the defect of statically two-layer performance controller, learning-based control strategy is introduced to serve as an adaptive supplementary controller based on adaptive dynamic programming technique. This enhances the adaptiveness of the statically two-layer performance controller with respect to unexpected uncertainty dramatically, without any prior knowledge of the inertial information. Furthermore, by employing the robustly positively invariant theory, the input-to-state stability is rigorously proven under the designed learning-based distributed controller. Finally, two groups of simulation examples are organized to validate the feasibility and effectiveness of the proposed distributed control approach.
Collapse
|
34
|
Yang X, He H. Adaptive Critic Designs for Event-Triggered Robust Control of Nonlinear Systems With Unknown Dynamics. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:2255-2267. [PMID: 29993650 DOI: 10.1109/tcyb.2018.2823199] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper develops a novel event-triggered robust control strategy for continuous-time nonlinear systems with unknown dynamics. To begin with, the event-triggered robust nonlinear control problem is transformed into an event-triggered nonlinear optimal control problem by introducing an infinite-horizon integral cost for the nominal system. Then, a recurrent neural network (RNN) and adaptive critic designs (ACDs) are employed to solve the derived event-triggered nonlinear optimal control problem. The RNN is applied to reconstruct the system dynamics based on collected system data. After acquiring the knowledge of system dynamics, a unique critic network is proposed to obtain the approximate solution of the event-triggered Hamilton-Jacobi-Bellman equation within the framework of ACDs. The critic network is updated by using simultaneously historical and instantaneous state data. An advantage of the present critic network update law is that it can relax the persistence of excitation condition. Meanwhile, under a newly developed event-triggering condition, the proposed critic network tuning rule not only guarantees the critic network weights to converge to optimums but also ensures nominal system states to be uniformly ultimately bounded. Moreover, by using Lyapunov method, it is proved that the derived optimal event-triggered control (ETC) guarantees uniform ultimate boundedness of all the signals in the original system. Finally, a nonlinear oscillator and an unstable power system are provided to validate the developed robust ETC scheme.
Collapse
|
35
|
New insight into the simultaneous policy update algorithms related to H∞ state feedback control. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2019.01.060] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
36
|
|
37
|
Li Y, Sun K, Tong S. Observer-Based Adaptive Fuzzy Fault-Tolerant Optimal Control for SISO Nonlinear Systems. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:649-661. [PMID: 29993971 DOI: 10.1109/tcyb.2017.2785801] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper investigates adaptive fuzzy output feedback fault-tolerant optimal control problem for a class of single-input and single-output nonlinear systems in strict feedback form. The considered nonlinear systems contain unknown nonaffine nonlinear faults and unmeasured states. Fuzzy logic systems are used to approximate cost function and unknown nonlinear functions, respectively. It is assumed that the states of the systems to be controlled are unmeasurable, thus an adaptive state observer is developed. To solve the nonaffine nonlinear fault control design problem, filtered signals are introduced into the adaptive backstepping control design procedures, and in the framework of adaptive critic technique and fault-tolerant control technique, a novel adaptive fuzzy fault-tolerant optimal control scheme is developed. The stability of the closed-loop system is proved by using Lyapunov stability theory. The simulation results verify the effectiveness of the proposed control strategy.
Collapse
|
38
|
Li L, Li D, Song T, Xu X. Actor-Critic Learning Control Based on -Regularized Temporal-Difference Prediction With Gradient Correction. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:5899-5909. [PMID: 29993664 DOI: 10.1109/tnnls.2018.2808203] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Actor-critic based on the policy gradient (PG-based AC) methods have been widely studied to solve learning control problems. In order to increase the data efficiency of learning prediction in the critic of PG-based AC, studies on how to use recursive least-squares temporal difference (RLS-TD) algorithms for policy evaluation have been conducted in recent years. In such contexts, the critic RLS-TD evaluates an unknown mixed policy generated by a series of different actors, but not one fixed policy generated by the current actor. Therefore, this AC framework with RLS-TD critic cannot be proved to converge to the optimal fixed point of learning problem. To address the above problem, this paper proposes a new AC framework named critic-iteration PG (CIPG), which learns the state-value function of current policy in an on-policy way and performs gradient ascent in the direction of improving discounted total reward. During each iteration, CIPG keeps the policy parameters fixed and evaluates the resulting fixed policy by -regularized RLS-TD critic. Our convergence analysis extends previous convergence analysis of PG with function approximation to the case of RLS-TD critic. The simulation results demonstrate that the -regularization term in the critic of CIPG is undamped during the learning process, and CIPG has better learning efficiency and faster convergence rate than conventional AC learning control methods.
Collapse
|
39
|
Luo B, Yang Y, Liu D. Adaptive -Learning for Data-Based Optimal Output Regulation With Experience Replay. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:3337-3348. [PMID: 29994038 DOI: 10.1109/tcyb.2018.2821369] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, the data-based optimal output regulation problem of discrete-time systems is investigated. An off-policy adaptive -learning (QL) method is developed by using real system data without requiring the knowledge of system dynamics and the mathematical model of utility function. By introducing the -function, an off-policy adaptive QL algorithm is developed to learn the optimal -function. An adaptive parameter in the policy evaluation is used to achieve tradeoff between the current and future -functions. The convergence of adaptive QL algorithm is proved and the influence of the adaptive parameter is analyzed. To realize the adaptive QL algorithm with real system data, the actor-critic neural network (NN) structure is developed. The least-squares scheme and the batch gradient descent method are developed to update the critic and actor NN weights, respectively. The experience replay technique is employed in the learning process, which leads to simple and convenient implementation of the adaptive QL method. Finally, the effectiveness of the developed adaptive QL method is verified through numerical simulations.
Collapse
|
40
|
Training a robust reinforcement learning controller for the uncertain system based on policy gradient method. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.08.007] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
41
|
Tang L, Liu YJ, Chen CLP. Adaptive Critic Design for Pure-Feedback Discrete-Time MIMO Systems Preceded by Unknown Backlashlike Hysteresis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:5681-5690. [PMID: 29993785 DOI: 10.1109/tnnls.2018.2805689] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper concentrates on the adaptive critic design (ACD) issue for a class of uncertain multi-input multioutput (MIMO) nonlinear discrete-time systems preceded by unknown backlashlike hysteresis. The considered systems are in a block-triangular pure-feedback form, in which there exist nonaffine functions and couplings between states and inputs. This makes that the ACD-based optimal control becomes very difficult and complicated. To this end, the mean value theorem is employed to transform the original systems into input-output models. Based on the reinforcement learning algorithm, the optimal control strategy is established with an actor-critic structure. Not only the stability of the systems is ensured but also the performance index is minimized. In contrast to the previous results, the main contributions are: 1) it is the first time to build an ACD framework for such MIMO systems with unknown hysteresis and 2) an adaptive auxiliary signal is developed to compensate the influence of hysteresis. In the end, a numerical study is provided to demonstrate the effectiveness of the present method.
Collapse
|
42
|
Model-free optimal containment control of multi-agent systems based on actor-critic framework. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.06.011] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
43
|
Zuo S, Song Y, Lewis FL, Davoudi A. Optimal Robust Output Containment of Unknown Heterogeneous Multiagent System Using Off-Policy Reinforcement Learning. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:3197-3207. [PMID: 29989978 DOI: 10.1109/tcyb.2017.2761878] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper investigates optimal robust output containment problem of general linear heterogeneous multiagent systems (MAS) with completely unknown dynamics. A model-based algorithm using offline policy iteration (PI) is first developed, where the -copy internal model principle is utilized to address the system parameter variations. This offline PI algorithm requires the nominal model of each agent, which may not be available in most real-world applications. To address this issue, a discounted performance function is introduced to express the optimal robust output containment problem as an optimal output-feedback design problem with bounded -gain. To solve this problem online in real time, a Bellman equation is first developed to evaluate a certain control policy and find the updated control policies, simultaneously, using only the state/output information measured online. Then, using this Bellman equation, a model-free off-policy integral reinforcement learning algorithm is proposed to solve the optimal robust output containment problem of heterogeneous MAS, in real time, without requiring any knowledge of the system dynamics. Simulation results are provided to verify the effectiveness of the proposed method.
Collapse
|
44
|
A data-driven online ADP control method for nonlinear system based on policy iteration and nonlinear MIMO decoupling ADRC. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.04.024] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
45
|
Luo B, Liu D, Wu HN. Adaptive Constrained Optimal Control Design for Data-Based Nonlinear Discrete-Time Systems With Critic-Only Structure. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2099-2111. [PMID: 28981435 DOI: 10.1109/tnnls.2017.2751018] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Reinforcement learning has proved to be a powerful tool to solve optimal control problems over the past few years. However, the data-based constrained optimal control problem of nonaffine nonlinear discrete-time systems has rarely been studied yet. To solve this problem, an adaptive optimal control approach is developed by using the value iteration-based Q-learning (VIQL) with the critic-only structure. Most of the existing constrained control methods require the use of a certain performance index and only suit for linear or affine nonlinear systems, which is unreasonable in practice. To overcome this problem, the system transformation is first introduced with the general performance index. Then, the constrained optimal control problem is converted to an unconstrained optimal control problem. By introducing the action-state value function, i.e., Q-function, the VIQL algorithm is proposed to learn the optimal Q-function of the data-based unconstrained optimal control problem. The convergence results of the VIQL algorithm are established with an easy-to-realize initial condition . To implement the VIQL algorithm, the critic-only structure is developed, where only one neural network is required to approximate the Q-function. The converged Q-function obtained from the critic-only VIQL method is employed to design the adaptive constrained optimal controller based on the gradient descent scheme. Finally, the effectiveness of the developed adaptive control method is tested on three examples with computer simulation.
Collapse
|
46
|
Zhang H, Qu Q, Xiao G, Cui Y. Optimal Guaranteed Cost Sliding Mode Control for Constrained-Input Nonlinear Systems With Matched and Unmatched Disturbances. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2112-2126. [PMID: 29771665 DOI: 10.1109/tnnls.2018.2791419] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Based on integral sliding mode and approximate dynamic programming (ADP) theory, a novel optimal guaranteed cost sliding mode control is designed for constrained-input nonlinear systems with matched and unmatched disturbances. When the system moves on the sliding surface, the optimal guaranteed cost control problem of sliding mode dynamics is transformed into the optimal control problem of a reformulated auxiliary system with a modified cost function. The ADP algorithm based on single critic neural network (NN) is applied to obtain the approximate optimal control law for the auxiliary system. Lyapunov techniques are used to demonstrate the convergence of the NN weight errors. In addition, the derived approximate optimal control is verified to guarantee the sliding mode dynamics system to be stable in the sense of uniform ultimate boundedness. Some simulation results are presented to verify the feasibility of the proposed control scheme.
Collapse
|
47
|
Yang X, He H. Adaptive critic designs for optimal control of uncertain nonlinear systems with unmatched interconnections. Neural Netw 2018; 105:142-153. [PMID: 29843095 DOI: 10.1016/j.neunet.2018.05.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2018] [Revised: 04/13/2018] [Accepted: 05/04/2018] [Indexed: 10/16/2022]
Abstract
In this paper, we develop a novel optimal control strategy for a class of uncertain nonlinear systems with unmatched interconnections. To begin with, we present a stabilizing feedback controller for the interconnected nonlinear systems by modifying an array of optimal control laws of auxiliary subsystems. We also prove that this feedback controller ensures a specified cost function to achieve optimality. Then, under the framework of adaptive critic designs, we use critic networks to solve the Hamilton-Jacobi-Bellman equations associated with auxiliary subsystem optimal control laws. The critic network weights are tuned through the gradient descent method combined with an additional stabilizing term. By using the newly established weight tuning rules, we no longer need the initial admissible control condition. In addition, we demonstrate that all signals in the closed-loop auxiliary subsystems are stable in the sense of uniform ultimate boundedness by using classic Lyapunov techniques. Finally, we provide an interconnected nonlinear plant to validate the present control scheme.
Collapse
Affiliation(s)
- Xiong Yang
- School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China; Department of Electrical, Computer and Biomedical Engineering, University of Rhode Island, Kingston, RI 02881, USA.
| | - Haibo He
- Department of Electrical, Computer and Biomedical Engineering, University of Rhode Island, Kingston, RI 02881, USA.
| |
Collapse
|