1
|
Wallace BA, Si J. Continuous-Time Reinforcement Learning Control: A Review of Theoretical Results, Insights on Performance, and Needs for New Designs. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:10199-10219. [PMID: 37027747 DOI: 10.1109/tnnls.2023.3245980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
This exposition discusses continuous-time reinforcement learning (CT-RL) for the control of affine nonlinear systems. We review four seminal methods that are the centerpieces of the most recent results on CT-RL control. We survey the theoretical results of the four methods, highlighting their fundamental importance and successes by including discussions on problem formulation, key assumptions, algorithm procedures, and theoretical guarantees. Subsequently, we evaluate the performance of the control designs to provide analyses and insights on the feasibility of these design methods for applications from a control designer's point of view. Through systematic evaluations, we point out when theory diverges from practical controller synthesis. We, furthermore, introduce a new quantitative analytical framework to diagnose the observed discrepancies. Based on the analyses and the insights gained through quantitative evaluations, we point out potential future research directions to unleash the potential of CT-RL control algorithms in addressing the identified challenges.
Collapse
|
2
|
Fang H, Zhang M, He S, Luan X, Liu F, Ding Z. Solving the Zero-Sum Control Problem for Tidal Turbine System: An Online Reinforcement Learning Approach. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:7635-7647. [PMID: 35839191 DOI: 10.1109/tcyb.2022.3186886] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
A novel completely mode-free integral reinforcement learning (CMFIRL)-based iteration algorithm is proposed in this article to compute the two-player zero-sum games and the Nash equilibrium problems, that is, the optimal control policy pairs, for tidal turbine system based on continuous-time Markov jump linear model with exact transition probability and completely unknown dynamics. First, the tidal turbine system is modeled into Markov jump linear systems, followed by a designed subsystem transformation technique to decouple the jumping modes. Then, a completely mode-free reinforcement learning algorithm is employed to address the game-coupled algebraic Riccati equations without using the information of the system dynamics, in order to reach the Nash equilibrium. The learning algorithm includes one iteration loop by updating the control policy and the disturbance policy simultaneously. Also, the exploration signal is added for motivating the system, and the convergence of the CMFIRL iteration algorithm is rigorously proved. Finally, a simulation example is given to illustrate the effectiveness and applicability of the control design approach.
Collapse
|
3
|
Wu L, Li Z, Liu S, Li Z, Sun D. An improved compact-form antisaturation model-free adaptive control algorithm for a class of nonlinear systems with time delays. Sci Prog 2023; 106:368504231210361. [PMID: 37933475 PMCID: PMC10631356 DOI: 10.1177/00368504231210361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2023]
Abstract
To solve the time-delay problem and actuator saturation problem of nonlinear plants in industrial processes, an improved compact-form antisaturation model-free adaptive control (ICF-AS-MFAC) method is proposed in this work. The ICF-AS-MFAC scheme is based on the concept of the pseudo partial derivative (PPD) and adopts equivalent dynamic linearization technology. Then, a tracking differentiator is used to predict the future output of a time-delay system to effectively control the system. Additionally, the concept of the saturation parameter is proposed, and the ICF-AS-MFAC controller is designed to ensure that the control system will not exhibit actuator saturation. The proposed algorithm is more flexible, has faster output responses for time-delay systems, and solves the problem of actuator saturation. The convergence and stability of the proposed method are rigorously proven mathematically. The effectiveness of the proposed method is verified by numerical simulations, and the applicability of the proposed method is verified by a series of experimental results based on double tanks.
Collapse
Affiliation(s)
- Lipu Wu
- School of Electrical and Control Engineering, North China University of Technology, Beijing 100144, China
| | - Zhen Li
- School of Electrical and Control Engineering, North China University of Technology, Beijing 100144, China
| | - Shida Liu
- School of Electrical and Control Engineering, North China University of Technology, Beijing 100144, China
| | - Zhijun Li
- School of Electrical and Control Engineering, North China University of Technology, Beijing 100144, China
| | - Dehui Sun
- School of Electrical and Control Engineering, North China University of Technology, Beijing 100144, China
| |
Collapse
|
4
|
Wang T, Wang Y, Yang X, Yang J. Further Results on Optimal Tracking Control for Nonlinear Systems With Nonzero Equilibrium via Adaptive Dynamic Programming. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:1900-1910. [PMID: 34428163 DOI: 10.1109/tnnls.2021.3105646] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This article develops a novel cost function (performance index function) to overcome the obstacles in solving the optimal tracking control problem for a class of nonlinear systems with known system dynamics via adaptive dynamic programming (ADP) technique. For the traditional optimal control problems, the assumption that the controlled system has zero equilibrium is generally required to guarantee the finiteness of an infinite horizon cost function and a unique solution. In order to solve the optimal tracking control problem of nonlinear systems with nonzero equilibrium, a specific cost function related to tracking errors and their derivatives is designed in this article, in which the aforementioned assumption and related obstacles are removed and the controller design process is simplified. Finally, comparative simulations are conducted on an inverted pendulum system to illustrate the effectiveness and advantages of the proposed optimal tracking control strategy.
Collapse
|
5
|
Shi H, Wang M, Wang C. Leader-Follower Formation Learning Control of Discrete-Time Nonlinear Multiagent Systems. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:1184-1194. [PMID: 34606467 DOI: 10.1109/tcyb.2021.3110645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This article investigates the leader-follower formation learning control (FLC) problem for discrete-time strict-feedback multiagent systems (MASs). The objective is to acquire the experience knowledge from the stable leader-follower adaptive formation control process and improve the control performance by reusing the experiential knowledge. First, a two-layer control scheme is proposed to solve the leader-follower formation control problem. In the first layer, by combining adaptive distributed observers and constructed in -step predictors, the leader's future state is predicted by the followers in a distributed manner. In the second layer, the adaptive neural network (NN) controllers are constructed for the followers to ensure that all the followers track the predicted output of the leader. In the stable formation control process, the NN weights are verified to exponentially converge to their optimal values by developing an extended stability corollary of linear time-varying (LTV) system. Second, by constructing some specific "learning rules," the NN weights with convergent sequences are synthetically acquired and stored in the followers as experience knowledge. Then, the stored knowledge is reused to construct the FLC. The proposed FLC method not only solves the leader-follower formation problem but also improves the transient control performance. Finally, the validity of the presented FLC scheme is illustrated by simulations.
Collapse
|
6
|
Yang Y, Modares H, Vamvoudakis KG, He W, Xu CZ, Wunsch DC. Hamiltonian-Driven Adaptive Dynamic Programming With Approximation Errors. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:13762-13773. [PMID: 34495864 DOI: 10.1109/tcyb.2021.3108034] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this article, we consider an iterative adaptive dynamic programming (ADP) algorithm within the Hamiltonian-driven framework to solve the Hamilton-Jacobi-Bellman (HJB) equation for the infinite-horizon optimal control problem in continuous time for nonlinear systems. First, a novel function, "min-Hamiltonian," is defined to capture the fundamental properties of the classical Hamiltonian. It is shown that both the HJB equation and the policy iteration (PI) algorithm can be formulated in terms of the min-Hamiltonian within the Hamiltonian-driven framework. Moreover, we develop an iterative ADP algorithm that takes into consideration the approximation errors during the policy evaluation step. We then derive a sufficient condition on the iterative value gradient to guarantee closed-loop stability of the equilibrium point as well as convergence to the optimal value. A model-free extension based on an off-policy reinforcement learning (RL) technique is also provided. Finally, numerical results illustrate the efficacy of the proposed framework.
Collapse
|
7
|
Li J, Ding J, Chai T, Lewis FL, Jagannathan S. Adaptive Interleaved Reinforcement Learning: Robust Stability of Affine Nonlinear Systems With Unknown Uncertainty. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:270-280. [PMID: 33112750 DOI: 10.1109/tnnls.2020.3027653] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
This article investigates adaptive robust controller design for discrete-time (DT) affine nonlinear systems using an adaptive dynamic programming. A novel adaptive interleaved reinforcement learning algorithm is developed for finding a robust controller of DT affine nonlinear systems subject to matched or unmatched uncertainties. To this end, the robust control problem is converted into the optimal control problem for nominal systems by selecting an appropriate utility function. The performance evaluation and control policy update combined with neural networks approximation are alternately implemented at each time step for solving a simplified Hamilton-Jacobi-Bellman (HJB) equation such that the uniformly ultimately bounded (UUB) stability of DT affine nonlinear systems can be guaranteed, allowing for all realization of unknown bounded uncertainties. The rigorously theoretical proofs of convergence of the proposed interleaved RL algorithm and UUB stability of uncertain systems are provided. Simulation results are given to verify the effectiveness of the proposed method.
Collapse
|
8
|
Song R, Wei Q, Zhang H, Lewis FL. Discrete-Time Non-Zero-Sum Games With Completely Unknown Dynamics. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:2929-2943. [PMID: 31902792 DOI: 10.1109/tcyb.2019.2957406] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, off-policy reinforcement learning (RL) algorithm is established to solve the discrete-time N -player nonzero-sum (NZS) games with completely unknown dynamics. The N -coupled generalized algebraic Riccati equations (GARE) are derived, and then policy iteration (PI) algorithm is used to obtain the N -tuple of iterative control and iterative value function. As the system dynamics is necessary in PI algorithm, off-policy RL method is developed for discrete-time N -player NZS games. The off-policy N -coupled Hamilton-Jacobi (HJ) equation is derived based on quadratic value functions. According to the Kronecker product, the N -coupled HJ equation is decomposed into unknown parameter part and the system operation data part, which makes the N -coupled HJ equation solved independent of system dynamics. The least square is used to calculate the iterative value function and N -tuple of iterative control. The existence of Nash equilibrium is proved. The result of the proposed method for discrete-time unknown dynamics NZS games is indicated by the simulation examples.
Collapse
|
9
|
|
10
|
Xiong W, Ho DWC, Xu L. Multilayered Sampled-Data Iterative Learning Tracking for Discrete Systems With Cooperative-Antagonistic Interactions. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:4420-4429. [PMID: 31150352 DOI: 10.1109/tcyb.2019.2915664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The tracking for discrete systems is discussed by designing two kinds of multilayered iterative learning schemes with cooperative-antagonistic interactions in this paper. The definition of the signed graph is presented and iterative learning schemes are then designed to be multilayered and have cooperative-antagonistic interactions. Moreover, considering the limited bandwidth of information storage, the state information of these controllers is updated in light of previous learning iterations but not just dependent on the last iteration. Two simple criteria are addressed to discuss the tracking of discrete systems with multilayered and cooperative-antagonistic iterative schemes. The simulation results are shown to demonstrate the validity of the given criteria.
Collapse
|
11
|
Wei Q, Song R, Liao Z, Li B, Lewis FL. Discrete-Time Impulsive Adaptive Dynamic Programming. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:4293-4306. [PMID: 30990209 DOI: 10.1109/tcyb.2019.2906694] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, a new iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal impulsive control problems for infinite horizon discrete-time nonlinear systems. Considering the constraint of the impulsive interval, in each iteration, the iterative impulsive value function under each possible impulsive interval is obtained, and then the iterative value function and iterative control law are achieved. A new convergence analysis method is developed which proves an iterative value function to converge to the optimum as the iteration index increases to infinity. The properties of the iterative control law are analyzed, and the detailed implementation of the optimal impulsive control law is presented. Finally, two simulation examples with comparisons are given to show the effectiveness of the developed method.
Collapse
|
12
|
Adaptive dynamic programming based event-triggered control for unknown continuous-time nonlinear systems with input constraints. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2018.09.097] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
13
|
Robust optimal control for a class of nonlinear systems with unknown disturbances based on disturbance observer and policy iteration. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.01.082] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
14
|
Reddy TK, Arora V, Behera L. HJB-Equation-Based Optimal Learning Scheme for Neural Networks With Applications in Brain–Computer Interface. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2020. [DOI: 10.1109/tetci.2018.2858761] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
15
|
Wei C, Luo J, Dai H, Duan G. Learning-Based Adaptive Attitude Control of Spacecraft Formation With Guaranteed Prescribed Performance. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:4004-4016. [PMID: 30072354 DOI: 10.1109/tcyb.2018.2857400] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper investigates a novel leader-following attitude control approach for spacecraft formation under the preassigned two-layer performance with consideration of unknown inertial parameters, external disturbance torque, and unmodeled uncertainty. First, two-layer prescribed performance is preselected for both the attitude angular and angular velocity tracking errors. Subsequently, a distributed two-layer performance controller is devised, which can guarantee that all the involved closed-loop signals are uniformly ultimately bounded. In order to tackle the defect of statically two-layer performance controller, learning-based control strategy is introduced to serve as an adaptive supplementary controller based on adaptive dynamic programming technique. This enhances the adaptiveness of the statically two-layer performance controller with respect to unexpected uncertainty dramatically, without any prior knowledge of the inertial information. Furthermore, by employing the robustly positively invariant theory, the input-to-state stability is rigorously proven under the designed learning-based distributed controller. Finally, two groups of simulation examples are organized to validate the feasibility and effectiveness of the proposed distributed control approach.
Collapse
|
16
|
Neural-network-based learning algorithms for cooperative games of discrete-time multi-player systems with control constraints via adaptive dynamic programming. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.02.107] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
17
|
Chen X, Wang W, Cao W, Wu M. Gaussian-kernel-based adaptive critic design using two-phase value iteration. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2018.12.019] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
18
|
Qu Q, Zhang H, Luo C, Yu R. Robust control design for multi-player nonlinear systems with input disturbances via adaptive dynamic programming. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.11.054] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
19
|
Jiang H, Zhang H, Han J, Zhang K. Iterative adaptive dynamic programming methods with neural network implementation for multi-player zero-sum games. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.04.005] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
20
|
Guo W, Si J, Liu F, Mei S. Policy Approximation in Policy Iteration Approximate Dynamic Programming for Discrete-Time Nonlinear Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2794-2807. [PMID: 28600262 DOI: 10.1109/tnnls.2017.2702566] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Policy iteration approximate dynamic programming (DP) is an important algorithm for solving optimal decision and control problems. In this paper, we focus on the problem associated with policy approximation in policy iteration approximate DP for discrete-time nonlinear systems using infinite-horizon undiscounted value functions. Taking policy approximation error into account, we demonstrate asymptotic stability of the control policy under our problem setting, show boundedness of the value function during each policy iteration step, and introduce a new sufficient condition for the value function to converge to a bounded neighborhood of the optimal value function. Aiming for practical implementation of an approximate policy, we consider using Volterra series, which has been extensively covered in controls literature for its good theoretical properties and for its success in practical applications. We illustrate the effectiveness of the main ideas developed in this paper using several examples including a practical problem of excitation control of a hydrogenerator.
Collapse
|
21
|
Luo B, Liu D, Wu HN. Adaptive Constrained Optimal Control Design for Data-Based Nonlinear Discrete-Time Systems With Critic-Only Structure. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2099-2111. [PMID: 28981435 DOI: 10.1109/tnnls.2017.2751018] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Reinforcement learning has proved to be a powerful tool to solve optimal control problems over the past few years. However, the data-based constrained optimal control problem of nonaffine nonlinear discrete-time systems has rarely been studied yet. To solve this problem, an adaptive optimal control approach is developed by using the value iteration-based Q-learning (VIQL) with the critic-only structure. Most of the existing constrained control methods require the use of a certain performance index and only suit for linear or affine nonlinear systems, which is unreasonable in practice. To overcome this problem, the system transformation is first introduced with the general performance index. Then, the constrained optimal control problem is converted to an unconstrained optimal control problem. By introducing the action-state value function, i.e., Q-function, the VIQL algorithm is proposed to learn the optimal Q-function of the data-based unconstrained optimal control problem. The convergence results of the VIQL algorithm are established with an easy-to-realize initial condition . To implement the VIQL algorithm, the critic-only structure is developed, where only one neural network is required to approximate the Q-function. The converged Q-function obtained from the critic-only VIQL method is employed to design the adaptive constrained optimal controller based on the gradient descent scheme. Finally, the effectiveness of the developed adaptive control method is tested on three examples with computer simulation.
Collapse
|
22
|
Wang D, Mu C, Liu D, Ma H. On Mixed Data and Event Driven Design for Adaptive-Critic-Based Nonlinear $H_{\infty}$ Control. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:993-1005. [PMID: 28166505 DOI: 10.1109/tnnls.2016.2642128] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, based on the adaptive critic learning technique, the control for a class of unknown nonlinear dynamic systems is investigated by adopting a mixed data and event driven design approach. The nonlinear control problem is formulated as a two-player zero-sum differential game and the adaptive critic method is employed to cope with the data-based optimization. The novelty lies in that the data driven learning identifier is combined with the event driven design formulation, in order to develop the adaptive critic controller, thereby accomplishing the nonlinear control. The event driven optimal control law and the time driven worst case disturbance law are approximated by constructing and tuning a critic neural network. Applying the event driven feedback control, the closed-loop system is built with stability analysis. Simulation studies are conducted to verify the theoretical results and illustrate the control performance. It is significant to observe that the present research provides a new avenue of integrating data-based control and event-triggering mechanism into establishing advanced adaptive critic systems.
Collapse
|
23
|
Wei Q, Liu D, Lin Q, Song R. Adaptive Dynamic Programming for Discrete-Time Zero-Sum Games. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:957-969. [PMID: 28141530 DOI: 10.1109/tnnls.2016.2638863] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, a novel adaptive dynamic programming (ADP) algorithm, called "iterative zero-sum ADP algorithm," is developed to solve infinite-horizon discrete-time two-player zero-sum games of nonlinear systems. The present iterative zero-sum ADP algorithm permits arbitrary positive semidefinite functions to initialize the upper and lower iterations. A novel convergence analysis is developed to guarantee the upper and lower iterative value functions to converge to the upper and lower optimums, respectively. When the saddle-point equilibrium exists, it is emphasized that both the upper and lower iterative value functions are proved to converge to the optimal solution of the zero-sum game, where the existence criteria of the saddle-point equilibrium are not required. If the saddle-point equilibrium does not exist, the upper and lower optimal performance index functions are obtained, respectively, where the upper and lower performance index functions are proved to be not equivalent. Finally, simulation results and comparisons are shown to illustrate the performance of the present method.
Collapse
|
24
|
Zhang H, Cui X, Luo Y, Jiang H. Finite-Horizon $H_{\infty }$ Tracking Control for Unknown Nonlinear Systems With Saturating Actuators. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1200-1212. [PMID: 28362620 DOI: 10.1109/tnnls.2017.2669099] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In this paper, a neural network (NN)-based online model-free integral reinforcement learning algorithm is developed to solve the finite-horizon optimal tracking control problem for completely unknown nonlinear continuous-time systems with disturbance and saturating actuators (constrained control input). An augmented system is constructed with the tracking error system and the command generator system. A time-varying Hamilton-Jacobi-Isaacs (HJI) equation is formulated for the augmented problem, which is extremely difficult or impossible to solve due to its time-dependent property and nonlinearity. Then, an actor-critic-disturbance NN structure-based scheme is proposed to learn the time-varying solution to the HJI equation in real time without using the knowledge of system dynamics. Since the solution to the HJI equation is time-dependent, the form of NNs representation with constant weights and time-dependent activation functions is considered. Furthermore, an extra error is incorporated in order to satisfy the terminal constraints in the weight update law. Convergence and stability proofs are given based on the Lyapunov theory for nonautonomous systems. Two simulation examples are provided to demonstrate the effectiveness of the designed algorithm.
Collapse
|
25
|
Li H, Liu D, Wang D. Manifold Regularized Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:932-943. [PMID: 28141532 DOI: 10.1109/tnnls.2017.2650943] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
This paper introduces a novel manifold regularized reinforcement learning scheme for continuous Markov decision processes. Smooth feature representations for value function approximation can be automatically learned using the unsupervised manifold regularization method. The learned features are data-driven, and can be adapted to the geometry of the state space. Furthermore, the scheme provides a direct basis representation extension for novel samples during policy learning and control. The performance of the proposed scheme is evaluated on two benchmark control tasks, i.e., the inverted pendulum and the energy storage problem. Simulation results illustrate the concepts of the proposed scheme and show that it can obtain excellent performance.
Collapse
|
26
|
Wei Q, Li B, Song R. Discrete-Time Stable Generalized Self-Learning Optimal Control With Approximation Errors. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1226-1238. [PMID: 28362617 DOI: 10.1109/tnnls.2017.2661865] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In this paper, a generalized policy iteration (GPI) algorithm with approximation errors is developed for solving infinite horizon optimal control problems for nonlinear systems. The developed stable GPI algorithm provides a general structure of discrete-time iterative adaptive dynamic programming algorithms, by which most of the discrete-time reinforcement learning algorithms can be described using the GPI structure. It is for the first time that approximation errors are explicitly considered in the GPI algorithm. The properties of the stable GPI algorithm with approximation errors are analyzed. The admissibility of the approximate iterative control law can be guaranteed if the approximation errors satisfy the admissibility criteria. The convergence of the developed algorithm is established, which shows that the iterative value function is convergent to a finite neighborhood of the optimal performance index function, if the approximate errors satisfy the convergence criterion. Finally, numerical examples and comparisons are presented.
Collapse
|
27
|
Gorodetsky A, Karaman S, Marzouk Y. High-dimensional stochastic optimal control using continuous tensor decompositions. Int J Rob Res 2018. [DOI: 10.1177/0278364917753994] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Alex Gorodetsky
- Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Sertac Karaman
- Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Youssef Marzouk
- Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
28
|
Heydari A. Optimal Switching of DC-DC Power Converters Using Approximate Dynamic Programming. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:586-596. [PMID: 28055919 DOI: 10.1109/tnnls.2016.2635586] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Optimal switching between different topologies in step-down dc-dc voltage converters, with nonideal inductors and capacitors, is investigated in this paper. Challenges including constraint on the inductor current and voltage leakages across the capacitor (due to switching) are incorporated. The objective is generating the desired voltage with low ripples and high robustness toward line and load disturbances. A previously developed tool, which is based on approximate dynamic programming, is adapted for this application. The scheme leads to tuning a parametric function approximator to provide optimal switching in a feedback form. No fixed cycle time is assumed, as the cycle time and the duty ratio will be adjusted on the fly in an optimal fashion. The controller demonstrates good capabilities in controlling the system even under parameter uncertainties. Finally, some modifications on the scheme are conducted to handle optimal switching problems with state jumps at the switching times.
Collapse
|
29
|
Jiang H, Zhang H. Iterative ADP learning algorithms for discrete-time multi-player games. Artif Intell Rev 2018. [DOI: 10.1007/s10462-017-9603-1] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
30
|
Data-driven adaptive dynamic programming schemes for non-zero-sum games of unknown discrete-time nonlinear systems. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.09.020] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
31
|
Data-based approximate optimal control for nonzero-sum games of multi-player systems using adaptive dynamic programming. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.05.086] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
32
|
Lin H, Su H, Shi P, Lu R, Wu ZG. Estimation and LQG Control Over Unreliable Network With Acknowledgment Randomly Lost. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:4074-4085. [PMID: 28113691 DOI: 10.1109/tcyb.2016.2597259] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, we study the state estimation and optimal control [i.e., linear quadratic Gaussian (LQG) control] problems for networked control systems in which control inputs, observations, and packet acknowledgments (ACKs) are randomly lost. The packet ACK is a signal that is transmitted from the actuator to notice the estimator the occurence of control packet loss. For such systems, we obtain the optimal estimator, which is consisted of exponentially increasing terms. For the solvability of the LQG problem, we come to a conclusion that in general even the optimal LQG control exists, it is impossible and unnecessary to be obtained as its calculation is not only technically difficult but also computationally prohibitive. This issue motivates us to design a suboptimal LQG controller for the underlying systems. We first develop a suboptimal estimator by using the estimator gain in each term of the optimal estimator. Then we derive a suboptimal LQG controller and establish the conditions for stability of the closed-loop systems. Examples are given to illustrate the effectiveness and advantages of the proposed design scheme.
Collapse
|
33
|
Bounded robust control design for uncertain nonlinear systems using single-network adaptive dynamic programming. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.05.030] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
34
|
Wei Q, Liu D, Lin Q. Discrete-Time Local Value Iteration Adaptive Dynamic Programming: Admissibility and Termination Analysis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:2490-2502. [PMID: 27529879 DOI: 10.1109/tnnls.2016.2593743] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, a novel local value iteration adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon optimal control problems for discrete-time nonlinear systems. The focuses of this paper are to study admissibility properties and the termination criteria of discrete-time local value iteration ADP algorithms. In the discrete-time local value iteration ADP algorithm, the iterative value functions and the iterative control laws are both updated in a given subset of the state space in each iteration, instead of the whole state space. For the first time, admissibility properties of iterative control laws are analyzed for the local value iteration ADP algorithm. New termination criteria are established, which terminate the iterative local ADP algorithm with an admissible approximate optimal control law. Finally, simulation results are given to illustrate the performance of the developed algorithm.In this paper, a novel local value iteration adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon optimal control problems for discrete-time nonlinear systems. The focuses of this paper are to study admissibility properties and the termination criteria of discrete-time local value iteration ADP algorithms. In the discrete-time local value iteration ADP algorithm, the iterative value functions and the iterative control laws are both updated in a given subset of the state space in each iteration, instead of the whole state space. For the first time, admissibility properties of iterative control laws are analyzed for the local value iteration ADP algorithm. New termination criteria are established, which terminate the iterative local ADP algorithm with an admissible approximate optimal control law. Finally, simulation results are given to illustrate the performance of the developed algorithm.
Collapse
Affiliation(s)
- Qinglai Wei
- The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| | - Derong Liu
- School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, China
| | - Qiao Lin
- The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
35
|
Wang D, He H, Liu D. Improving the Critic Learning for Event-Based Nonlinear $H_{\infty }$ Control Design. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3417-3428. [PMID: 28166513 DOI: 10.1109/tcyb.2017.2653800] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, we aim at improving the critic learning criterion to cope with the event-based nonlinear H∞ state feedback control design. First of all, the H∞ control problem is regarded as a two-player zero-sum game and the adaptive critic mechanism is used to achieve the minimax optimization under event-based environment. Then, based on an improved updating rule, the event-based optimal control law and the time-based worst-case disturbance law are obtained approximately by training a single critic neural network. The initial stabilizing control is no longer required during the implementation process of the new algorithm. Next, the closed-loop system is formulated as an impulsive model and its stability issue is handled by incorporating the improved learning criterion. The infamous Zeno behavior of the present event-based design is also avoided through theoretical analysis on the lower bound of the minimal intersample time. Finally, the applications to an aircraft dynamics and a robot arm plant are carried out to verify the efficient performance of the present novel design method.
Collapse
|
36
|
Zhang H, Jiang H, Luo C, Xiao G. Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3331-3340. [PMID: 28113535 DOI: 10.1109/tcyb.2016.2611613] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, we investigate the nonzero-sum games for a class of discrete-time (DT) nonlinear systems by using a novel policy iteration (PI) adaptive dynamic programming (ADP) method. The main idea of our proposed PI scheme is to utilize the iterative ADP algorithm to obtain the iterative control policies, which not only ensure the system to achieve stability but also minimize the performance index function for each player. This paper integrates game theory, optimal control theory, and reinforcement learning technique to formulate and handle the DT nonzero-sum games for multiplayer. First, we design three actor-critic algorithms, an offline one and two online ones, for the PI scheme. Subsequently, neural networks are employed to implement these algorithms and the corresponding stability analysis is also provided via the Lyapunov theory. Finally, a numerical simulation example is presented to demonstrate the effectiveness of our proposed approach.
Collapse
|
37
|
Wei Q, Liu D, Lin Q, Song R. Discrete-Time Optimal Control via Local Policy Iteration Adaptive Dynamic Programming. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3367-3379. [PMID: 27448382 DOI: 10.1109/tcyb.2016.2586082] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, a discrete-time optimal control scheme is developed via a novel local policy iteration adaptive dynamic programming algorithm. In the discrete-time local policy iteration algorithm, the iterative value function and iterative control law can be updated in a subset of the state space, where the computational burden is relaxed compared with the traditional policy iteration algorithm. Convergence properties of the local policy iteration algorithm are presented to show that the iterative value function is monotonically nonincreasing and converges to the optimum under some mild conditions. The admissibility of the iterative control law is proven, which shows that the control system can be stabilized under any of the iterative control laws, even if the iterative control law is updated in a subset of the state space. Finally, two simulation examples are given to illustrate the performance of the developed method.
Collapse
|
38
|
Luo B, Liu D, Wu HN, Wang D, Lewis FL. Policy Gradient Adaptive Dynamic Programming for Data-Based Optimal Control. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3341-3354. [PMID: 27893404 DOI: 10.1109/tcyb.2016.2623859] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The model-free optimal control problem of general discrete-time nonlinear systems is considered in this paper, and a data-based policy gradient adaptive dynamic programming (PGADP) algorithm is developed to design an adaptive optimal controller method. By using offline and online data rather than the mathematical system model, the PGADP algorithm improves control policy with a gradient descent scheme. The convergence of the PGADP algorithm is proved by demonstrating that the constructed Q -function sequence converges to the optimal Q -function. Based on the PGADP algorithm, the adaptive control method is developed with an actor-critic structure and the method of weighted residuals. Its convergence properties are analyzed, where the approximate Q -function converges to its optimum. Computer simulation results demonstrate the effectiveness of the PGADP-based adaptive control method.
Collapse
|
39
|
Wang Z, Liu X, Liu K, Li S, Wang H. Backstepping-Based Lyapunov Function Construction Using Approximate Dynamic Programming and Sum of Square Techniques. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3393-3403. [PMID: 27337732 DOI: 10.1109/tcyb.2016.2574747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, backstepping for a class of block strict-feedback nonlinear systems is considered. Since the input function could be zero for each backstepping step, the backstepping technique cannot be applied directly. Based on the assumption that nonlinear systems are polynomials, for each backstepping step, Lypunov function can be constructed in a polynomial form by sum of square (SOS) technique. The virtual control can be obtained by the Sontag feedback formula, which is equivalent to an optimal control-the solution of a Hamilton-Jacobi-Bellman equation. Thus, approximate dynamic programming (ADP) could be used to estimate value functions (Lyapunov functions) instead of SOS. Through backstepping technique, the control Lyapunov function (CLF) of the full system is constructed finally making use of the strict-feedback structure and a stabilizable controller can be obtained through the constructed CLF. The contributions of the proposed method are twofold. On one hand, introducing ADP into backstepping can broaden the application of the backstepping technique. A class of block strict-feedback systems can be dealt by the proposed method and the requirement of nonzero input function for each backstepping step can be relaxed. On the other hand, backstepping with surface dynamic control actually reduces the computation complexity of ADP through constructing one part of the CLF by solving semidefinite programming using SOS. Simulation results verify contributions of the proposed method.
Collapse
|
40
|
Tracking control optimization scheme of continuous-time nonlinear system via online single network adaptive critic design method. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.04.008] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
41
|
Hou Z, Liu S, Tian T. Lazy-Learning-Based Data-Driven Model-Free Adaptive Predictive Control for a Class of Discrete-Time Nonlinear Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:1914-1928. [PMID: 28113442 DOI: 10.1109/tnnls.2016.2561702] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, a novel data-driven model-free adaptive predictive control method based on lazy learning technique is proposed for a class of discrete-time single-input and single-output nonlinear systems. The feature of the proposed approach is that the controller is designed only using the input-output (I/O) measurement data of the system by means of a novel dynamic linearization technique with a new concept termed pseudogradient (PG). Moreover, the predictive function is implemented in the controller using a lazy-learning (LL)-based PG predictive algorithm, such that the controller not only shows good robustness but also can realize the effect of model-free adaptive prediction for the sudden change of the desired signal. Further, since the LL technique has the characteristic of database queries, both the online and offline I/O measurement data are fully and simultaneously utilized to real-time adjust the controller parameters during the control process. Moreover, the stability of the proposed method is guaranteed by rigorous mathematical analysis. Meanwhile, the numerical simulations and the laboratory experiments implemented on a practical three-tank water level control system both verify the effectiveness of the proposed approach.
Collapse
|
42
|
Chen CLP. Neural Approximation-Based Adaptive Control for a Class of Nonlinear Nonstrict Feedback Discrete-Time Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:1531-1541. [PMID: 28113479 DOI: 10.1109/tnnls.2016.2531089] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, an adaptive control approach-based neural approximation is developed for a class of uncertain nonlinear discrete-time (DT) systems. The main characteristic of the considered systems is that they can be viewed as a class of multi-input multioutput systems in the nonstrict feedback structure. The similar control problem of this class of systems has been addressed in the past, but it focused on the continuous-time systems. Due to the complicacies of the system structure, it will become more difficult for the controller design and the stability analysis. To stabilize this class of systems, a new recursive procedure is developed, and the effect caused by the noncausal problem in the nonstrict feedback DT structure can be solved using a semirecurrent neural approximation. Based on the Lyapunov difference approach, it is proved that all the signals of the closed-loop system are semiglobal, ultimately uniformly bounded, and a good tracking performance can be guaranteed. The feasibility of the proposed controllers can be validated by setting a simulation example.
Collapse
|
43
|
Song R, Wei Q, Xiao W. Off-policy neuro-optimal control for unknown complex-valued nonlinear systems based on policy iteration. Neural Comput Appl 2017. [DOI: 10.1007/s00521-015-2144-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
44
|
Song R, Wei Q, Song B. Neural-network-based synchronous iteration learning method for multi-player zero-sum games. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.02.051] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
45
|
Wei Q, Lewis FL, Sun Q, Yan P, Song R. Discrete-Time Deterministic $Q$ -Learning: A Novel Convergence Analysis. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:1224-1237. [PMID: 27093714 DOI: 10.1109/tcyb.2016.2542923] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, a novel discrete-time deterministic Q -learning algorithm is developed. In each iteration of the developed Q -learning algorithm, the iterative Q function is updated for all the state and control spaces, instead of updating for a single state and a single control in traditional Q -learning algorithm. A new convergence criterion is established to guarantee that the iterative Q function converges to the optimum, where the convergence criterion of the learning rates for traditional Q -learning algorithms is simplified. During the convergence analysis, the upper and lower bounds of the iterative Q function are analyzed to obtain the convergence criterion, instead of analyzing the iterative Q function itself. For convenience of analysis, the convergence properties for undiscounted case of the deterministic Q -learning algorithm are first developed. Then, considering the discounted factor, the convergence criterion for the discounted case is established. Neural networks are used to approximate the iterative Q function and compute the iterative control law, respectively, for facilitating the implementation of the deterministic Q -learning algorithm. Finally, simulation results and comparisons are given to illustrate the performance of the developed algorithm.
Collapse
|
46
|
Zhao B, Liu D, Li Y. Observer based adaptive dynamic programming for fault tolerant control of a class of nonlinear systems. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2016.12.016] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
47
|
Zhang H, Feng T, Liang H, Luo Y. LQR-Based Optimal Distributed Cooperative Design for Linear Discrete-Time Multiagent Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:599-611. [PMID: 26540717 DOI: 10.1109/tnnls.2015.2490072] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, a novel linear quadratic regulator (LQR)-based optimal distributed cooperative design method is developed for synchronization control of general linear discrete-time multiagent systems on a fixed, directed graph. Sufficient conditions are derived for synchronization, which restrict the graph eigenvalues into a bounded circular region in the complex plane. The synchronizing speed issue is also considered, and it turns out that the synchronizing region reduces as the synchronizing speed becomes faster. To obtain more desirable synchronizing capacity, the weighting matrices are selected by sufficiently utilizing the guaranteed gain margin of the optimal regulators. Based on the developed LQR-based cooperative design framework, an approximate dynamic programming technique is successfully introduced to overcome the (partially or completely) model-free cooperative design for linear multiagent systems. Finally, two numerical examples are given to illustrate the effectiveness of the proposed design methods.
Collapse
|
48
|
Bertsekas DP. Value and Policy Iterations in Optimal Control and Adaptive Dynamic Programming. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:500-509. [PMID: 28055911 DOI: 10.1109/tnnls.2015.2503980] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, we consider discrete-time infinite horizon problems of optimal control to a terminal set of states. These are the problems that are often taken as the starting point for adaptive dynamic programming. Under very general assumptions, we establish the uniqueness of the solution of Bellman's equation, and we provide convergence results for value and policy iterations.
Collapse
|
49
|
Mu C, Ni Z, Sun C, He H. Air-Breathing Hypersonic Vehicle Tracking Control Based on Adaptive Dynamic Programming. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:584-598. [PMID: 26863677 DOI: 10.1109/tnnls.2016.2516948] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, we propose a data-driven supplementary control approach with adaptive learning capability for air-breathing hypersonic vehicle tracking control based on action-dependent heuristic dynamic programming (ADHDP). The control action is generated by the combination of sliding mode control (SMC) and the ADHDP controller to track the desired velocity and the desired altitude. In particular, the ADHDP controller observes the differences between the actual velocity/altitude and the desired velocity/altitude, and then provides a supplementary control action accordingly. The ADHDP controller does not rely on the accurate mathematical model function and is data driven. Meanwhile, it is capable to adjust its parameters online over time under various working conditions, which is very suitable for hypersonic vehicle system with parameter uncertainties and disturbances. We verify the adaptive supplementary control approach versus the traditional SMC in the cruising flight, and provide three simulation studies to illustrate the improved performance with the proposed approach.
Collapse
|
50
|
Zhu Y, Zhao D, Li X. Iterative Adaptive Dynamic Programming for Solving Unknown Nonlinear Zero-Sum Game Based on Online Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:714-725. [PMID: 27249839 DOI: 10.1109/tnnls.2016.2561300] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
H∞ control is a powerful method to solve the disturbance attenuation problems that occur in some control systems. The design of such controllers relies on solving the zero-sum game (ZSG). But in practical applications, the exact dynamics is mostly unknown. Identification of dynamics also produces errors that are detrimental to the control performance. To overcome this problem, an iterative adaptive dynamic programming algorithm is proposed in this paper to solve the continuous-time, unknown nonlinear ZSG with only online data. A model-free approach to the Hamilton-Jacobi-Isaacs equation is developed based on the policy iteration method. Control and disturbance policies and value are approximated by neural networks (NNs) under the critic-actor-disturber structure. The NN weights are solved by the least-squares method. According to the theoretical analysis, our algorithm is equivalent to a Gauss-Newton method solving an optimization problem, and it converges uniformly to the optimal solution. The online data can also be used repeatedly, which is highly efficient. Simulation results demonstrate its feasibility to solve the unknown nonlinear ZSG. When compared with other algorithms, it saves a significant amount of online measurement time.
Collapse
|