1
|
Wu W, Hu J, Zhu Z, Zhang F, Xu J, Wang C. Deterministic learning-based neural identification and knowledge fusion. Neural Netw 2024; 169:165-180. [PMID: 37890366 DOI: 10.1016/j.neunet.2023.10.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 07/27/2023] [Accepted: 10/04/2023] [Indexed: 10/29/2023]
Abstract
Recent deterministic learning methods have achieved locally-accurate identification of unknown system dynamics. However, the locally-accurate identification means that the neural networks can only capture the local dynamics knowledge along the system trajectory. In order to capture a broader knowledge region, this article investigates the knowledge fusion problem of deterministic learning, that is, the integration of different knowledge regions along different individual trajectories. Specifically, two kinds of knowledge fusion schemes are systematically introduced: an online fusion scheme and an offline fusion scheme. The online scheme can be viewed as an extension of distributed cooperative learning control to cooperative neural identification for sampled-data systems. By designing an auxiliary information transmission strategy to enable the neural network to receive information learned from other tasks while learning its own task, it is proven that the weights of all localized RBF networks exponentially converge to their common true/ideal values. The offline scheme can be regarded as a knowledge distillation strategy, in which the fused network is obtained by offline training through the knowledge learned from all individual system trajectories via deterministic learning. A novel weight fusion algorithm with low computational complexity is proposed based on the least squares solution under subspace constraints. Simulation studies show that the proposed fusion schemes can successfully integrate the knowledge regions of different individual trajectories while maintaining the learning performance, thereby greatly expanding the knowledge region learned from deterministic learning.
Collapse
Affiliation(s)
- Weiming Wu
- School of Control Science and Engineering, Shandong University, JiNan 250061, China
| | - Jingtao Hu
- School of Control Science and Engineering, Shandong University, JiNan 250061, China
| | - Zejian Zhu
- School of Automation Science and Engineering, South China University of Technology, GuangZhou 510641, China
| | - Fukai Zhang
- School of Control Science and Engineering, Shandong University, JiNan 250061, China
| | - Juanjuan Xu
- School of Control Science and Engineering, Shandong University, JiNan 250061, China
| | - Cong Wang
- School of Control Science and Engineering, Shandong University, JiNan 250061, China.
| |
Collapse
|
2
|
Zhang H, Ming Z, Yan Y, Wang W. Data-Driven Finite-Horizon H ∞ Tracking Control With Event-Triggered Mechanism for the Continuous-Time Nonlinear Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:4687-4701. [PMID: 34633936 DOI: 10.1109/tnnls.2021.3116464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this article, the neural network (NN)-based adaptive dynamic programming (ADP) event-triggered control method is presented to obtain the near-optimal control policy for the model-free finite-horizon H∞ optimal tracking control problem with constrained control input. First, using available input-output data, a data-driven model is established by a recurrent NN (RNN) to reconstruct the unknown system. Then, an augmented system with event-triggered mechanism is obtained by a tracking error system and a command generator. We present a novel event-triggering condition without Zeno behavior. On this basis, the relationship between event-triggered Hamilton-Jacobi-Isaacs (HJI) equation and time-triggered HJI equation is given in Theorem 3. Since the solution of the HJI equation is time-dependent for the augmented system, the time-dependent activation functions of NNs are considered. Moreover, an extra error is incorporated to satisfy the terminal constraints of cost function. This adaptive control pattern finds, in real time, approximations of the optimal value while also ensuring the uniform ultimate boundedness of the closed-loop system. Finally, the effectiveness of the proposed near-optimal control pattern is verified by two simulation examples.
Collapse
|
3
|
Safe Reinforcement Learning for Affine Nonlinear Systems with State Constraints and Input Saturation Using Control Barrier Functions. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
4
|
Song S, Zhu M, Dai X, Gong D. Model-Free Optimal Tracking Control of Nonlinear Input-Affine Discrete-Time Systems via an Iterative Deterministic Q-Learning Algorithm. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:999-1012. [PMID: 35657846 DOI: 10.1109/tnnls.2022.3178746] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In this article, a novel model-free dynamic inversion-based Q-learning (DIQL) algorithm is proposed to solve the optimal tracking control (OTC) problem of unknown nonlinear input-affine discrete-time (DT) systems. Compared with the existing DIQL algorithm and the discount factor-based Q-learning (DFQL) algorithm, the proposed algorithm can eliminate the tracking error while ensuring that it is model-free and off-policy. First, a new deterministic Q-learning iterative scheme is presented, and based on this scheme, a model-based off-policy DIQL algorithm is designed. The advantage of this new scheme is that it can avoid the training of unusual data and improve data utilization, thereby saving computing resources. Simultaneously, the convergence and stability of the designed algorithm are analyzed, and the proof that adding probing noise into the behavior policy does not affect the convergence is presented. Then, by introducing neural networks (NNs), the model-free version of the designed algorithm is further proposed so that the OTC problem can be solved without any knowledge about the system dynamics. Finally, three simulation examples are given to demonstrate the effectiveness of the proposed algorithm.
Collapse
|
5
|
Duan J, Liu Z, Li SE, Sun Q, Jia Z, Cheng B. Adaptive dynamic programming for nonaffine nonlinear optimal control problem with state constraints. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.04.134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
6
|
|
7
|
Sun B, van Kampen EJ. Event-triggered constrained control using explainable global dual heuristic programming for nonlinear discrete-time systems. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.10.046] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
8
|
Li S, Ding L, Gao H, Liu YJ, Huang L, Deng Z. ADP-Based Online Tracking Control of Partially Uncertain Time-Delayed Nonlinear System and Application to Wheeled Mobile Robots. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:3182-3194. [PMID: 30872249 DOI: 10.1109/tcyb.2019.2900326] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, an adaptive dynamic programming-based online adaptive tracking control algorithm is proposed to solve the tracking problem of the partial uncertain time-delayed nonlinear affine system with uncertain resistance. Using the discrete-time Hamilton-Jacobi-Bellman function, the input time-delay separation lemma, and the Lyapunov-Krasovskii functionals, the partial state and input time delay can be determined. With the approximation of the action and critic, and resistance neural networks, a near-optimal controller and appropriate adaptive laws are defined to guarantee the uniform ultimate boundedness of all signals in the target system, and the tracking error convergence to a small compact set to zero. A numerical simulation of the wheeled mobile robotic system is presented to verify the validity of the proposed method.
Collapse
|
9
|
Deptula P, Chen HY, Licitra RA, Rosenfeld JA, Dixon WE. Approximate Optimal Motion Planning to Avoid Unknown Moving Avoidance Regions. IEEE T ROBOT 2020. [DOI: 10.1109/tro.2019.2955321] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
10
|
Song R, Xie Y, Zhang Z. Data-driven finite-horizon optimal tracking control scheme for completely unknown discrete-time nonlinear systems. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.05.026] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
11
|
Rosenfeld JA, Kamalapurkar R, Dixon WE. The State Following Approximation Method. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:1716-1730. [PMID: 30369450 DOI: 10.1109/tnnls.2018.2870040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
A function approximation method is developed which aims to approximate a function in a small neighborhood of a state that travels within a compact set. The method provides a novel approximation strategy for the efficient approximation of nonlinear functions for real-time simulations and experiments. The development is based on the theory of universal reproducing kernel Hilbert spaces over the n -dimensional Euclidean space. Several theorems are introduced which support the development of this state following (StaF) method. In particular, it is shown that there is a bound on the number of kernel functions required for the maintenance of an accurate function approximation as a state moves through a compact set. In addition, a weight update law, based on gradient descent, is introduced where arbitrarily close accuracy can be achieved provided the weight update law is iterated at a sufficient frequency, as detailed in Theorem 4. An experience-based approximation method is presented which utilizes the samples of the estimations of the ideal weights to generate a global approximation of a function. The experience-based approximation interpolates the samples of the weight estimates using radial basis functions. To illustrate the StaF method, the method is utilized for derivative estimation, function approximation, and is applied to an adaptive dynamic programming problem where it is demonstrated that the stability is maintained with a reduced number of basis functions.
Collapse
|
12
|
Li X, Xue L, Sun C. Linear quadratic tracking control of unknown discrete-time systems using value iteration algorithm. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.05.111] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
13
|
Mu C, Wang D, He H. Data-Driven Finite-Horizon Approximate Optimal Control for Discrete-Time Nonlinear Systems Using Iterative HDP Approach. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:2948-2961. [PMID: 29028219 DOI: 10.1109/tcyb.2017.2752845] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This paper presents a data-based finite-horizon optimal control approach for discrete-time nonlinear affine systems. The iterative adaptive dynamic programming (ADP) is used to approximately solve Hamilton-Jacobi-Bellman equation by minimizing the cost function in finite time. The idea is implemented with the heuristic dynamic programming (HDP) involved the model network, which makes the iterative control at the first step can be obtained without the system function, meanwhile the action network is used to obtain the approximate optimal control law and the critic network is utilized for approximating the optimal cost function. The convergence of the iterative ADP algorithm and the stability of the weight estimation errors based on the HDP structure are intensively analyzed. Finally, two simulation examples are provided to demonstrate the theoretical results and show the performance of the proposed method.
Collapse
|
14
|
Deptula P, Rosenfeld JA, Kamalapurkar R, Dixon WE. Approximate Dynamic Programming: Combining Regional and Local State Following Approximations. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2154-2166. [PMID: 29771668 DOI: 10.1109/tnnls.2018.2808102] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
An infinite-horizon optimal regulation problem for a control-affine deterministic system is solved online using a local state following (StaF) kernel and a regional model-based reinforcement learning (R-MBRL) method to approximate the value function. Unlike traditional methods such as R-MBRL that aim to approximate the value function over a large compact set, the StaF kernel approach aims to approximate the value function in a local neighborhood of the state that travels within a compact set. In this paper, the value function is approximated using a state-dependent convex combination of the StaF-based and the R-MBRL-based approximations. As the state enters a neighborhood containing the origin, the value function transitions from being approximated by the StaF approach to the R-MBRL approach. Semiglobal uniformly ultimately bounded (SGUUB) convergence of the system states to the origin is established using a Lyapunov-based analysis. Simulation results are provided for two, three, six, and ten-state dynamical systems to demonstrate the scalability and performance of the developed method.
Collapse
|
15
|
Wang D, Mu C, Liu D, Ma H. On Mixed Data and Event Driven Design for Adaptive-Critic-Based Nonlinear $H_{\infty}$ Control. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:993-1005. [PMID: 28166505 DOI: 10.1109/tnnls.2016.2642128] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, based on the adaptive critic learning technique, the control for a class of unknown nonlinear dynamic systems is investigated by adopting a mixed data and event driven design approach. The nonlinear control problem is formulated as a two-player zero-sum differential game and the adaptive critic method is employed to cope with the data-based optimization. The novelty lies in that the data driven learning identifier is combined with the event driven design formulation, in order to develop the adaptive critic controller, thereby accomplishing the nonlinear control. The event driven optimal control law and the time driven worst case disturbance law are approximated by constructing and tuning a critic neural network. Applying the event driven feedback control, the closed-loop system is built with stability analysis. Simulation studies are conducted to verify the theoretical results and illustrate the control performance. It is significant to observe that the present research provides a new avenue of integrating data-based control and event-triggering mechanism into establishing advanced adaptive critic systems.
Collapse
|
16
|
Wang B, Zhao D, Cheng J. Adaptive cruise control via adaptive dynamic programming with experience replay. Soft comput 2018. [DOI: 10.1007/s00500-018-3063-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
17
|
Data-driven adaptive dynamic programming schemes for non-zero-sum games of unknown discrete-time nonlinear systems. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.09.020] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
18
|
Jiang H, Zhang H, Cui Y, Xiao G. Robust control scheme for a class of uncertain nonlinear systems with completely unknown dynamics using data-driven reinforcement learning method. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.07.058] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
19
|
Wang D, Mu C. A novel neural optimal control framework with nonlinear dynamics: Closed-loop stability and simulation verification. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.05.051] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
20
|
Wang D, He H, Liu D. Improving the Critic Learning for Event-Based Nonlinear $H_{\infty }$ Control Design. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3417-3428. [PMID: 28166513 DOI: 10.1109/tcyb.2017.2653800] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, we aim at improving the critic learning criterion to cope with the event-based nonlinear H∞ state feedback control design. First of all, the H∞ control problem is regarded as a two-player zero-sum game and the adaptive critic mechanism is used to achieve the minimax optimization under event-based environment. Then, based on an improved updating rule, the event-based optimal control law and the time-based worst-case disturbance law are obtained approximately by training a single critic neural network. The initial stabilizing control is no longer required during the implementation process of the new algorithm. Next, the closed-loop system is formulated as an impulsive model and its stability issue is handled by incorporating the improved learning criterion. The infamous Zeno behavior of the present event-based design is also avoided through theoretical analysis on the lower bound of the minimal intersample time. Finally, the applications to an aircraft dynamics and a robot arm plant are carried out to verify the efficient performance of the present novel design method.
Collapse
|
21
|
Zhang H, Jiang H, Luo C, Xiao G. Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3331-3340. [PMID: 28113535 DOI: 10.1109/tcyb.2016.2611613] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, we investigate the nonzero-sum games for a class of discrete-time (DT) nonlinear systems by using a novel policy iteration (PI) adaptive dynamic programming (ADP) method. The main idea of our proposed PI scheme is to utilize the iterative ADP algorithm to obtain the iterative control policies, which not only ensure the system to achieve stability but also minimize the performance index function for each player. This paper integrates game theory, optimal control theory, and reinforcement learning technique to formulate and handle the DT nonzero-sum games for multiplayer. First, we design three actor-critic algorithms, an offline one and two online ones, for the PI scheme. Subsequently, neural networks are employed to implement these algorithms and the corresponding stability analysis is also provided via the Lyapunov theory. Finally, a numerical simulation example is presented to demonstrate the effectiveness of our proposed approach.
Collapse
|
22
|
Wang D, He H, Liu D. Adaptive Critic Nonlinear Robust Control: A Survey. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3429-3451. [PMID: 28682269 DOI: 10.1109/tcyb.2017.2712188] [Citation(s) in RCA: 80] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Adaptive dynamic programming (ADP) and reinforcement learning are quite relevant to each other when performing intelligent optimization. They are both regarded as promising methods involving important components of evaluation and improvement, at the background of information technology, such as artificial intelligence, big data, and deep learning. Although great progresses have been achieved and surveyed when addressing nonlinear optimal control problems, the research on robustness of ADP-based control strategies under uncertain environment has not been fully summarized. Hence, this survey reviews the recent main results of adaptive-critic-based robust control design of continuous-time nonlinear systems. The ADP-based nonlinear optimal regulation is reviewed, followed by robust stabilization of nonlinear systems with matched uncertainties, guaranteed cost control design of unmatched plants, and decentralized stabilization of interconnected systems. Additionally, further comprehensive discussions are presented, including event-based robust control design, improvement of the critic learning rule, nonlinear H∞ control design, and several notes on future perspectives. By applying the ADP-based optimal and robust control methods to a practical power system and an overhead crane plant, two typical examples are provided to verify the effectiveness of theoretical results. Overall, this survey is beneficial to promote the development of adaptive critic control methods with robustness guarantee and the construction of higher level intelligent systems.
Collapse
|
23
|
Esfandiari K, Abdollahi F, Talebi HA. Adaptive near-optimal neuro controller for continuous-time nonaffine nonlinear systems with constrained input. Neural Netw 2017. [DOI: 10.1016/j.neunet.2017.05.013] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
24
|
Jiang H, Zhang H, Luo Y, Cui X. H ∞ control with constrained input for completely unknown nonlinear systems using data-driven reinforcement learning method. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.11.041] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
25
|
Kamalapurkar R, Andrews L, Walters P, Dixon WE. Model-Based Reinforcement Learning for Infinite-Horizon Approximate Optimal Tracking. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:753-758. [PMID: 26863674 DOI: 10.1109/tnnls.2015.2511658] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
This brief paper provides an approximate online adaptive solution to the infinite-horizon optimal tracking problem for control-affine continuous-time nonlinear systems with unknown drift dynamics. To relax the persistence of excitation condition, model-based reinforcement learning is implemented using a concurrent-learning-based system identifier to simulate experience by evaluating the Bellman error over unexplored areas of the state space. Tracking of the desired trajectory and convergence of the developed policy to a neighborhood of the optimal policy are established via Lyapunov-based stability analysis. Simulation results demonstrate the effectiveness of the developed technique.
Collapse
|
26
|
Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2016; 2016:4824072. [PMID: 27795704 PMCID: PMC5066029 DOI: 10.1155/2016/4824072] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/29/2016] [Revised: 07/28/2016] [Accepted: 08/16/2016] [Indexed: 11/18/2022]
Abstract
To improve the convergence rate and the sample efficiency, two efficient learning methods AC-HMLP and RAC-HMLP (AC-HMLP with ℓ2-regularization) are proposed by combining actor-critic algorithm with hierarchical model learning and planning. The hierarchical models consisting of the local and the global models, which are learned at the same time during learning of the value function and the policy, are approximated by local linear regression (LLR) and linear function approximation (LFA), respectively. Both the local model and the global model are applied to generate samples for planning; the former is used only if the state-prediction error does not surpass the threshold at each time step, while the latter is utilized at the end of each episode. The purpose of taking both models is to improve the sample efficiency and accelerate the convergence rate of the whole algorithm through fully utilizing the local and global information. Experimentally, AC-HMLP and RAC-HMLP are compared with three representative algorithms on two Reinforcement Learning (RL) benchmark problems. The results demonstrate that they perform best in terms of convergence rate and sample efficiency.
Collapse
|
27
|
Jiang H, Zhang H, Luo Y, Wang J. Optimal tracking control for completely unknown nonlinear discrete-time Markov jump systems using data-based reinforcement learning method. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2016.02.029] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
28
|
Jin X, Shin YC. Nonlinear discrete time optimal control based on Fuzzy Models. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2015. [DOI: 10.3233/ifs-141376] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
29
|
Online optimal control of unknown discrete-time nonlinear systems by using time-based adaptive dynamic programming. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2015.03.006] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
30
|
Han H, Zhou W, Qiao J, Feng G. A direct self-constructing neural controller design for a class of nonlinear systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:1312-1322. [PMID: 25706896 DOI: 10.1109/tnnls.2015.2401395] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
This paper is concerned with the problem of adaptive neural control for a class of uncertain or ill-defined nonaffine nonlinear systems. Using a self-organizing radial basis function neural network (RBFNN), a direct self-constructing neural controller (DSNC) is designed so that unknown nonlinearities can be approximated and the closed-loop system is stable. The key features of the proposed DSNC design scheme can be summarized as follows. First, different from the existing results in literature, a self-organizing RBFNN with adaptive threshold is constructed online for DSNC to improve the control performance. Second, the control law and adaptive law for the weights of RBFNN are established so that the closed-loop system is stable in the term of Lyapunov stability theory. Third, the tracking error is guaranteed to uniformly asymptotically converge to zero with the aid of an additional robustifying control term. An example is finally given to demonstrate the design procedure and the performance of the proposed method. Simulation results reveal the effectiveness of the proposed method.
Collapse
|
31
|
Liu D, Li H, Wang D. Error bounds of adaptive dynamic programming algorithms for solving undiscounted optimal control problems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:1323-1334. [PMID: 25751878 DOI: 10.1109/tnnls.2015.2402203] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
In this paper, we establish error bounds of adaptive dynamic programming algorithms for solving undiscounted infinite-horizon optimal control problems of discrete-time deterministic nonlinear systems. We consider approximation errors in the update equations of both value function and control policy. We utilize a new assumption instead of the contraction assumption in discounted optimal control problems. We establish the error bounds for approximate value iteration based on a new error condition. Furthermore, we also establish the error bounds for approximate policy iteration and approximate optimistic policy iteration algorithms. It is shown that the iterative approximate value function can converge to a finite neighborhood of the optimal value function under some conditions. To implement the developed algorithms, critic and action neural networks are used to approximate the value function and control policy, respectively. Finally, a simulation example is given to demonstrate the effectiveness of the developed algorithms.
Collapse
|
32
|
Zhao Q, Xu H, Jagannathan S. Neural network-based finite-horizon optimal control of uncertain affine nonlinear discrete-time systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:486-499. [PMID: 25720005 DOI: 10.1109/tnnls.2014.2315646] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
In this paper, the finite-horizon optimal control design for nonlinear discrete-time systems in affine form is presented. In contrast with the traditional approximate dynamic programming methodology, which requires at least partial knowledge of the system dynamics, in this paper, the complete system dynamics are relaxed utilizing a neural network (NN)-based identifier to learn the control coefficient matrix. The identifier is then used together with the actor-critic-based scheme to learn the time-varying solution, referred to as the value function, of the Hamilton-Jacobi-Bellman (HJB) equation in an online and forward-in-time manner. Since the solution of HJB is time-varying, NNs with constant weights and time-varying activation functions are considered. To properly satisfy the terminal constraint, an additional error term is incorporated in the novel update law such that the terminal constraint error is also minimized over time. Policy and/or value iterations are not needed and the NN weights are updated once a sampling instant. The uniform ultimate boundedness of the closed-loop system is verified by standard Lyapunov stability theory under nonautonomous analysis. Numerical examples are provided to illustrate the effectiveness of the proposed method.
Collapse
|
33
|
Heydari A, Balakrishnan S. Optimal switching between controlled subsystems with free mode sequence. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2014.08.030] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
34
|
Heydari A. Revisiting approximate dynamic programming and its convergence. IEEE TRANSACTIONS ON CYBERNETICS 2014; 44:2733-2743. [PMID: 24846687 DOI: 10.1109/tcyb.2014.2314612] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Value iteration-based approximate/adaptive dynamic programming (ADP) as an approximate solution to infinite-horizon optimal control problems with deterministic dynamics and continuous state and action spaces is investigated. The learning iterations are decomposed into an outer loop and an inner loop. A relatively simple proof for the convergence of the outer-loop iterations to the optimal solution is provided using a novel idea with some new features. It presents an analogy between the value function during the iterations and the value function of a fixed-final-time optimal control problem. The inner loop is utilized to avoid the need for solving a set of nonlinear equations or a nonlinear optimization problem numerically, at each iteration of ADP for the policy update. Sufficient conditions for the uniqueness of the solution to the policy update equation and for the convergence of the inner-loop iterations to the solution are obtained. Afterwards, the results are formed as a learning algorithm for training a neurocontroller or creating a look-up table to be used for optimal control of nonlinear systems with different initial conditions. Finally, some of the features of the investigated method are numerically analyzed.
Collapse
|
35
|
Heydari A, Balakrishnan S. Global optimality of approximate dynamic programming and its use in non-convex function minimization. Appl Soft Comput 2014. [DOI: 10.1016/j.asoc.2014.07.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
36
|
Heydari A, Balakrishnan S. Fixed-final-time optimal tracking control of input-affine nonlinear systems. Neurocomputing 2014. [DOI: 10.1016/j.neucom.2013.09.006] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
37
|
Neural-network-based optimal tracking control scheme for a class of unknown discrete-time nonlinear systems using iterative ADP algorithm. Neurocomputing 2014. [DOI: 10.1016/j.neucom.2012.07.047] [Citation(s) in RCA: 72] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
38
|
Neuro-optimal control for a class of unknown nonlinear dynamic systems using SN-DHP technique. Neurocomputing 2013. [DOI: 10.1016/j.neucom.2013.04.006] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
39
|
|
40
|
Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm. Neurocomputing 2013. [DOI: 10.1016/j.neucom.2012.11.021] [Citation(s) in RCA: 132] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
41
|
Zhang H, Cui L, Luo Y. Near-Optimal Control for Nonzero-Sum Differential Games of Continuous-Time Nonlinear Systems Using Single-Network ADP. IEEE TRANSACTIONS ON CYBERNETICS 2013; 43:206-216. [PMID: 22759477 DOI: 10.1109/tsmcb.2012.2203336] [Citation(s) in RCA: 162] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
In this paper, a near-optimal control scheme is proposed to solve the nonzero-sum differential games of continuous-time nonlinear systems. The single-network adaptive dynamic programming (ADP) is utilized to obtain the optimal control policies which make the cost functions reach the Nash equilibrium of nonzero-sum differential games, where only one critic network is used for each player instead of the action-critic dual network used in a typical ADP architecture. Furthermore, the novel weight tuning laws for critic neural networks are proposed, which not only ensure the Nash equilibrium to be reached but also guarantee the system to be stable. No initial stabilizing control policy is required for each player. Moreover, Lyapunov theory is utilized to demonstrate the uniform ultimate boundedness of the closed-loop system. Finally, a simulation example is given to verify the effectiveness of the proposed near-optimal control scheme.
Collapse
|
42
|
Liu D, Wang D, Yang X. An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs. Inf Sci (N Y) 2013. [DOI: 10.1016/j.ins.2012.07.006] [Citation(s) in RCA: 107] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
43
|
Chemachema M. Output feedback direct adaptive neural network control for uncertain SISO nonlinear systems using a fuzzy estimator of the control error. Neural Netw 2012; 36:25-34. [DOI: 10.1016/j.neunet.2012.08.010] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2010] [Revised: 06/20/2012] [Accepted: 08/19/2012] [Indexed: 10/27/2022]
|
44
|
Adaptive dynamic programming-based optimal control of unknown nonaffine nonlinear discrete-time systems with proof of convergence. Neurocomputing 2012. [DOI: 10.1016/j.neucom.2012.01.025] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
45
|
Wei Q, Liu D. An iterative -optimal control scheme for a class of discrete-time nonlinear systems with unfixed initial state. Neural Netw 2012; 32:236-44. [DOI: 10.1016/j.neunet.2012.02.027] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2011] [Revised: 01/23/2012] [Accepted: 02/07/2012] [Indexed: 11/29/2022]
|
46
|
Huaguang Zhang, Lili Cui, Xin Zhang, Yanhong Luo. Data-Driven Robust Approximate Optimal Tracking Control for Unknown General Nonlinear Systems Using Adaptive Dynamic Programming Method. ACTA ACUST UNITED AC 2011; 22:2226-36. [DOI: 10.1109/tnn.2011.2168538] [Citation(s) in RCA: 434] [Impact Index Per Article: 33.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
47
|
|
48
|
Online optimal control of nonlinear discrete-time systems using approximate dynamic programming. ACTA ACUST UNITED AC 2011. [DOI: 10.1007/s11768-011-0178-0] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
49
|
|
50
|
Gribovskaya E, Khansari-Zadeh S, Billard A. Learning Non-linear Multivariate Dynamics of Motion in Robotic Manipulators. Int J Rob Res 2010. [DOI: 10.1177/0278364910376251] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Motion imitation requires reproduction of a dynamical signature of a movement, i.e. a robot should be able to encode and reproduce a particular path together with a specific velocity and/or an acceleration profile. Furthermore, a human provides only few demonstrations, which cannot cover all possible contexts in which the robot will need to reproduce the motion autonomously. Therefore, the encoding should be able to efficiently generalize knowledge by generating similar motions in unseen contexts. This work follows a recent trend in programming by demonstration in which the dynamics of the motion is learned. We present an algorithm to estimate multivariate robot motions through a mixture of Gaussians. The strengths of the proposed encoding are three-fold: (i) it allows a generalization of motion to unseen context; (ii) it provides fast on-line replanning of the motion in case of of spatio-temporal perturbations; (iii) it may embed different types of dynamics, governed by different attractors. The generality of the method to estimate arbitrary non-linear motion dynamics is demonstrated by accurately estimating a set of known non-linear dynamical systems. The platform-independency and real-time performance of the method are further validated to learn the non-linear motion dynamics of manipulation tasks with different robotic platforms. We provide an experimental comparison of our approach with a related state-of-the-art method.
Collapse
Affiliation(s)
- E. Gribovskaya
- EPFL-STI-I2S-LASA, Station 9, CH 1015, Lausanne, Switzerland,
| | | | - A. Billard
- EPFL-STI-I2S-LASA, Station 9, CH 1015, Lausanne, Switzerland
| |
Collapse
|