51
|
Yang X, Liu D, Luo B, Li C. Data-based robust adaptive control for a class of unknown nonlinear constrained-input systems via integral reinforcement learning. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2016.07.051] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
52
|
Luo B, Liu D, Huang T, Wang D. Model-Free Optimal Tracking Control via Critic-Only Q-Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:2134-2144. [PMID: 27416608 DOI: 10.1109/tnnls.2016.2585520] [Citation(s) in RCA: 84] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Model-free control is an important and promising topic in control fields, which has attracted extensive attention in the past few years. In this paper, we aim to solve the model-free optimal tracking control problem of nonaffine nonlinear discrete-time systems. A critic-only Q-learning (CoQL) method is developed, which learns the optimal tracking control from real system data, and thus avoids solving the tracking Hamilton-Jacobi-Bellman equation. First, the Q-learning algorithm is proposed based on the augmented system, and its convergence is established. Using only one neural network for approximating the Q-function, the CoQL method is developed to implement the Q-learning algorithm. Furthermore, the convergence of the CoQL method is proved with the consideration of neural network approximation error. With the convergent Q-function obtained from the CoQL method, the adaptive optimal tracking control is designed based on the gradient descent scheme. Finally, the effectiveness of the developed CoQL method is demonstrated through simulation studies. The developed CoQL method learns with off-policy data and implements with a critic-only structure, thus it is easy to realize and overcome the inadequate exploration problem.
Collapse
|
53
|
Neural network-based online H∞ control for discrete-time affine nonlinear system using adaptive dynamic programming. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2015.08.120] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
54
|
Jiang H, Zhang H, Luo Y, Wang J. Optimal tracking control for completely unknown nonlinear discrete-time Markov jump systems using data-based reinforcement learning method. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2016.02.029] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
55
|
Online approximate solution of HJI equation for unknown constrained-input nonlinear continuous-time systems. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2015.09.001] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
56
|
Yan P, Liu D, Wang D, Ma H. Data-driven controller design for general MIMO nonlinear systems via virtual reference feedback tuning and neural networks. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2015.07.017] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
57
|
Fu Y, Fu J, Chai T. Robust Adaptive Dynamic Programming of Two-Player Zero-Sum Games for Continuous-Time Linear Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:3314-3319. [PMID: 26540715 DOI: 10.1109/tnnls.2015.2461452] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this brief, an online robust adaptive dynamic programming algorithm is proposed for two-player zero-sum games of continuous-time unknown linear systems with matched uncertainties, which are functions of system outputs and states of a completely unknown exosystem. The online algorithm is developed using the policy iteration (PI) scheme with only one iteration loop. A new analytical method is proposed for convergence proof of the PI scheme. The sufficient conditions are given to guarantee globally asymptotic stability and suboptimal property of the closed-loop system. Simulation studies are conducted to illustrate the effectiveness of the proposed method.
Collapse
|
58
|
Luo B, Huang T, Wu HN, Yang X. Data-Driven H∞ Control for Nonlinear Distributed Parameter Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:2949-2961. [PMID: 26277007 DOI: 10.1109/tnnls.2015.2461023] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
The data-driven H∞ control problem of nonlinear distributed parameter systems is considered in this paper. An off-policy learning method is developed to learn the H∞ control policy from real system data rather than the mathematical model. First, Karhunen-Loève decomposition is used to compute the empirical eigenfunctions, which are then employed to derive a reduced-order model (ROM) of slow subsystem based on the singular perturbation theory. The H∞ control problem is reformulated based on the ROM, which can be transformed to solve the Hamilton-Jacobi-Isaacs (HJI) equation, theoretically. To learn the solution of the HJI equation from real system data, a data-driven off-policy learning approach is proposed based on the simultaneous policy update algorithm and its convergence is proved. For implementation purpose, a neural network (NN)- based action-critic structure is developed, where a critic NN and two action NNs are employed to approximate the value function, control, and disturbance policies, respectively. Subsequently, a least-square NN weight-tuning rule is derived with the method of weighted residuals. Finally, the developed data-driven off-policy learning approach is applied to a nonlinear diffusion-reaction process, and the obtained results demonstrate its effectiveness.
Collapse
|
59
|
Xu B, Fan Y, Zhang S. Minimal-learning-parameter technique based adaptive neural control of hypersonic flight dynamics without back-stepping. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2015.02.069] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
60
|
Ni Z, He H, Zhong X, Prokhorov DV. Model-Free Dual Heuristic Dynamic Programming. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:1834-1839. [PMID: 25955997 DOI: 10.1109/tnnls.2015.2424971] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Model-based dual heuristic dynamic programming (MB-DHP) is a popular approach in approximating optimal solutions in control problems. Yet, it usually requires offline training for the model network, and thus resulting in extra computational cost. In this brief, we propose a model-free DHP (MF-DHP) design based on finite-difference technique. In particular, we adopt multilayer perceptron with one hidden layer for both the action and the critic networks design, and use delayed objective functions to train both the action and the critic networks online over time. We test both the MF-DHP and MB-DHP approaches with a discrete time example and a continuous time example under the same parameter settings. Our simulation results demonstrate that the MF-DHP approach can obtain a control performance competitive with that of the traditional MB-DHP approach while requiring less computational resources.
Collapse
|
61
|
Liu D, Li H, Wang D. Error bounds of adaptive dynamic programming algorithms for solving undiscounted optimal control problems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:1323-1334. [PMID: 25751878 DOI: 10.1109/tnnls.2015.2402203] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
In this paper, we establish error bounds of adaptive dynamic programming algorithms for solving undiscounted infinite-horizon optimal control problems of discrete-time deterministic nonlinear systems. We consider approximation errors in the update equations of both value function and control policy. We utilize a new assumption instead of the contraction assumption in discounted optimal control problems. We establish the error bounds for approximate value iteration based on a new error condition. Furthermore, we also establish the error bounds for approximate policy iteration and approximate optimistic policy iteration algorithms. It is shown that the iterative approximate value function can converge to a finite neighborhood of the optimal value function under some conditions. To implement the developed algorithms, critic and action neural networks are used to approximate the value function and control policy, respectively. Finally, a simulation example is given to demonstrate the effectiveness of the developed algorithms.
Collapse
|
62
|
Wei Q, Liu D, Yang X. Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:866-879. [PMID: 25751877 DOI: 10.1109/tnnls.2015.2401334] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
In this paper, a novel iterative adaptive dynamic programming (ADP)-based infinite horizon self-learning optimal control algorithm, called generalized policy iteration algorithm, is developed for nonaffine discrete-time (DT) nonlinear systems. Generalized policy iteration algorithm is a general idea of interacting policy and value iteration algorithms of ADP. The developed generalized policy iteration algorithm permits an arbitrary positive semidefinite function to initialize the algorithm, where two iteration indices are used for policy improvement and policy evaluation, respectively. It is the first time that the convergence, admissibility, and optimality properties of the generalized policy iteration algorithm for DT nonlinear systems are analyzed. Neural networks are used to implement the developed algorithm. Finally, numerical examples are presented to illustrate the performance of the developed algorithm.
Collapse
|
63
|
Song R, Lewis F, Wei Q, Zhang HG, Jiang ZP, Levine D. Multiple actor-critic structures for continuous-time optimal control using input-output data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:851-865. [PMID: 25730830 DOI: 10.1109/tnnls.2015.2399020] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
In industrial process control, there may be multiple performance objectives, depending on salient features of the input-output data. Aiming at this situation, this paper proposes multiple actor-critic structures to obtain the optimal control via input-output data for unknown nonlinear systems. The shunting inhibitory artificial neural network (SIANN) is used to classify the input-output data into one of several categories. Different performance measure functions may be defined for disparate categories. The approximate dynamic programming algorithm, which contains model module, critic network, and action network, is used to establish the optimal control in each category. A recurrent neural network (RNN) model is used to reconstruct the unknown system dynamics using input-output data. NNs are used to approximate the critic and action networks, respectively. It is proven that the model error and the closed unknown system are uniformly ultimately bounded. Simulation results demonstrate the performance of the proposed optimal control scheme for the unknown nonlinear system.
Collapse
|
64
|
Ni Z, He H, Zhao D, Xu X, Prokhorov DV. GrDHP: a general utility function representation for dual heuristic dynamic programming. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:614-627. [PMID: 25014969 DOI: 10.1109/tnnls.2014.2329942] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
A general utility function representation is proposed to provide the required derivable and adjustable utility function for the dual heuristic dynamic programming (DHP) design. Goal representation DHP (GrDHP) is presented with a goal network being on top of the traditional DHP design. This goal network provides a general mapping between the system states and the derivatives of the utility function. With this proposed architecture, we can obtain the required derivatives of the utility function directly from the goal network. In addition, instead of a fixed predefined utility function in literature, we conduct an online learning process for the goal network so that the derivatives of the utility function can be adaptively tuned over time. We provide the control performance of both the proposed GrDHP and the traditional DHP approaches under the same environment and parameter settings. The statistical simulation results and the snapshot of the system variables are presented to demonstrate the improved learning and controlling performance. We also apply both approaches to a power system example to further demonstrate the control capabilities of the GrDHP approach.
Collapse
|
65
|
|
66
|
Wei Q, Liu D. Neural-network-based adaptive optimal tracking control scheme for discrete-time nonlinear systems with approximation errors. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2013.09.069] [Citation(s) in RCA: 65] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
67
|
Luo B, Wu HN, Huang T. Off-policy reinforcement learning for H∞ control design. IEEE TRANSACTIONS ON CYBERNETICS 2015; 45:65-76. [PMID: 25532162 DOI: 10.1109/tcyb.2014.2319577] [Citation(s) in RCA: 95] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
The H∞ control design problem is considered for nonlinear systems with unknown internal system model. It is known that the nonlinear H∞ control problem can be transformed into solving the so-called Hamilton-Jacobi-Isaacs (HJI) equation, which is a nonlinear partial differential equation that is generally impossible to be solved analytically. Even worse, model-based approaches cannot be used for approximately solving HJI equation, when the accurate system model is unavailable or costly to obtain in practice. To overcome these difficulties, an off-policy reinforcement leaning (RL) method is introduced to learn the solution of HJI equation from real system data instead of mathematical system model, and its convergence is proved. In the off-policy RL method, the system data can be generated with arbitrary policies rather than the evaluating policy, which is extremely important and promising for practical systems. For implementation purpose, a neural network (NN)-based actor-critic structure is employed and a least-square NN weight update algorithm is derived based on the method of weighted residuals. Finally, the developed NN-based off-policy RL method is tested on a linear F16 aircraft plant, and further applied to a rotational/translational actuator system.
Collapse
|
68
|
Liu D, Wang D, Wang FY, Li H, Yang X. Neural-network-based online HJB solution for optimal robust guaranteed cost control of continuous-time uncertain nonlinear systems. IEEE TRANSACTIONS ON CYBERNETICS 2014; 44:2834-2847. [PMID: 25415951 DOI: 10.1109/tcyb.2014.2357896] [Citation(s) in RCA: 89] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
In this paper, the infinite horizon optimal robust guaranteed cost control of continuous-time uncertain nonlinear systems is investigated using neural-network-based online solution of Hamilton-Jacobi-Bellman (HJB) equation. By establishing an appropriate bounded function and defining a modified cost function, the optimal robust guaranteed cost control problem is transformed into an optimal control problem. It can be observed that the optimal cost function of the nominal system is nothing but the optimal guaranteed cost of the original uncertain system. A critic neural network is constructed to facilitate the solution of the modified HJB equation corresponding to the nominal system. More importantly, an additional stabilizing term is introduced for helping to verify the stability, which reinforces the updating process of the weight vector and reduces the requirement of an initial stabilizing control. The uniform ultimate boundedness of the closed-loop system is analyzed by using the Lyapunov approach as well. Two simulation examples are provided to verify the effectiveness of the present control approach.
Collapse
|
69
|
Zhang H, Qin C, Jiang B, Luo Y. Online adaptive policy learning algorithm for H∞ state feedback control of unknown affine nonlinear discrete-time systems. IEEE TRANSACTIONS ON CYBERNETICS 2014; 44:2706-2718. [PMID: 25095274 DOI: 10.1109/tcyb.2014.2313915] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The problem of H∞ state feedback control of affine nonlinear discrete-time systems with unknown dynamics is investigated in this paper. An online adaptive policy learning algorithm (APLA) based on adaptive dynamic programming (ADP) is proposed for learning in real-time the solution to the Hamilton-Jacobi-Isaacs (HJI) equation, which appears in the H∞ control problem. In the proposed algorithm, three neural networks (NNs) are utilized to find suitable approximations of the optimal value function and the saddle point feedback control and disturbance policies. Novel weight updating laws are given to tune the critic, actor, and disturbance NNs simultaneously by using data generated in real-time along the system trajectories. Considering NN approximation errors, we provide the stability analysis of the proposed algorithm with Lyapunov approach. Moreover, the need of the system input dynamics for the proposed algorithm is relaxed by using a NN identification scheme. Finally, simulation examples show the effectiveness of the proposed algorithm.
Collapse
|
70
|
Zhong X, He H, Zhang H, Wang Z. Optimal control for unknown discrete-time nonlinear Markov jump systems using adaptive dynamic programming. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2014; 25:2141-2155. [PMID: 25420238 DOI: 10.1109/tnnls.2014.2305841] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
In this paper, we develop and analyze an optimal control method for a class of discrete-time nonlinear Markov jump systems (MJSs) with unknown system dynamics. Specifically, an identifier is established for the unknown systems to approximate system states, and an optimal control approach for nonlinear MJSs is developed to solve the Hamilton-Jacobi-Bellman equation based on the adaptive dynamic programming technique. We also develop detailed stability analysis of the control approach, including the convergence of the performance index function for nonlinear MJSs and the existence of the corresponding admissible control. Neural network techniques are used to approximate the proposed performance index function and the control law. To demonstrate the effectiveness of our approach, three simulation studies, one linear case, one nonlinear case, and one single link robot arm case, are used to validate the performance of the proposed optimal control method.
Collapse
|
71
|
Wang D, Liu D, Li H, Ma H, Li C. A neural-network-based online optimal control approach for nonlinear robust decentralized stabilization. Soft comput 2014. [DOI: 10.1007/s00500-014-1534-z] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
72
|
Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming. Inf Sci (N Y) 2014. [DOI: 10.1016/j.ins.2014.05.050] [Citation(s) in RCA: 111] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
73
|
Liu D, Wei Q. Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2014; 25:621-634. [PMID: 24807455 DOI: 10.1109/tnnls.2013.2281663] [Citation(s) in RCA: 174] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
This paper is concerned with a new discrete-time policy iteration adaptive dynamic programming (ADP) method for solving the infinite horizon optimal control problem of nonlinear systems. The idea is to use an iterative ADP technique to obtain the iterative control law, which optimizes the iterative performance index function. The main contribution of this paper is to analyze the convergence and stability properties of policy iteration method for discrete-time nonlinear systems for the first time. It shows that the iterative performance index function is nonincreasingly convergent to the optimal solution of the Hamilton-Jacobi-Bellman equation. It is also proven that any of the iterative control laws can stabilize the nonlinear systems. Neural networks are used to approximate the performance index function and compute the optimal control law, respectively, for facilitating the implementation of the iterative ADP algorithm, where the convergence of the weight matrices is analyzed. Finally, the numerical results and analysis are presented to illustrate the performance of the developed method.
Collapse
|