1
|
Wang D, Wu J, Ha M, Zhao M, Li M, Qiao J. Advanced Optimal Tracking Control With Stability Guarantee via Novel Value Learning Formulation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:8254-8265. [PMID: 37015365 DOI: 10.1109/tnnls.2022.3226518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
In this article, to solve the optimal tracking control problem (OTCP) for discrete-time (DT) nonlinear systems, general value iteration (GVI) scheme and online value iteration (VI) algorithms with novel value function are discussed. First, the disadvantage of the traditional value function for the OTCP is presented and the novel value function is introduced. Second, we analyze the monotonicity and convergence of GVI and establish the admissibility condition of GVI to evaluate the admissibility of the current iterative control. Note that a novel approach is introduced to analyze the admissibility. Third, based on the attraction domain, improved control policies with online VI can be obtained by judging the location of the current tracking error and reference point. Finally, the stability of the online VI-based control system is guaranteed. Besides, we provide two simulation examples to show the performance of the proposed methods.
Collapse
|
2
|
Qiao J, Li M, Wang D. Asymmetric Constrained Optimal Tracking Control With Critic Learning of Nonlinear Multiplayer Zero-Sum Games. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:5671-5683. [PMID: 36191112 DOI: 10.1109/tnnls.2022.3208611] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
By utilizing a neural-network-based adaptive critic mechanism, the optimal tracking control problem is investigated for nonlinear continuous-time (CT) multiplayer zero-sum games (ZSGs) with asymmetric constraints. Initially, we build an augmented system with the tracking error system and the reference system. Moreover, a novel nonquadratic function is introduced to address asymmetric constraints. Then, we derive the tracking Hamilton-Jacobi-Isaacs (HJI) equation of the constrained nonlinear multiplayer ZSG. However, it is extremely hard to get the analytical solution to the HJI equation. Hence, an adaptive critic mechanism based on neural networks is established to estimate the optimal cost function, so as to obtain the near-optimal control policy set and the near worst disturbance policy set. In the process of neural critic learning, we only utilize one critic neural network and develop a new weight updating rule. After that, by using the Lyapunov approach, the uniform ultimate boundedness stability of the tracking error in the augmented system and the weight estimation error of the critic network is verified. Finally, two simulation examples are provided to demonstrate the efficacy of the established mechanism.
Collapse
|
3
|
Zhang H, Zhao X, Wang H, Zong G, Xu N. Hierarchical Sliding-Mode Surface-Based Adaptive Actor-Critic Optimal Control for Switched Nonlinear Systems With Unknown Perturbation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:1559-1571. [PMID: 35834452 DOI: 10.1109/tnnls.2022.3183991] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
This article studies the hierarchical sliding-mode surface (HSMS)-based adaptive optimal control problem for a class of switched continuous-time (CT) nonlinear systems with unknown perturbation under an actor-critic (AC) neural networks (NNs) architecture. First, a novel perturbation observer with a nested parameter adaptive law is designed to estimate the unknown perturbation. Then, by constructing an especial cost function related to HSMS, the original control issue is further converted into the problem of finding a series of optimal control policies. The solution to the HJB equation is identified by the HSMS-based AC NNs, where the actor and critic updating laws are developed to implement the reinforcement learning (RL) strategy simultaneously. The critic update law is designed via the gradient descent approach and the principle of standardization, such that the persistence of excitation (PE) condition is no longer needed. Based on the Lyapunov stability theory, all the signals of the closed-loop switched nonlinear systems are strictly proved to be bounded in the sense of uniformly ultimate boundedness (UUB). Finally, the simulation results are presented to verify the validity of the proposed adaptive optimal control scheme.
Collapse
|
4
|
Zhang L, Che WW, Deng C, Wu ZG. Optimized Adaptive Fuzzy Security Control of Nonlinear Systems With Prescribed Tracking Performance. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:7868-7880. [PMID: 37022031 DOI: 10.1109/tcyb.2023.3234295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
This article studies the optimized fuzzy prescribed performance control problem for nonlinear nonstrict-feedback systems under denial-of-service (DoS) attacks. A fuzzy estimator is delicately designed to model the immeasurable system states in the presence of DoS attacks. To achieve the preset tracking performance, a simper prescribed performance error transformation is constructed considering the characteristics of DoS attacks, which helps obtain a novel Hamilton-Jacobi-Bellman equation to derive the optimized prescribed performance controller. Furthermore, the fuzzy-logic system, combined with the reinforcement learning (RL) technique, is employed to approximate the unknown nonlinearity existing in the prescribed performance controller design process. An optimized adaptive fuzzy security control law is then proposed for the considered nonlinear nonstrict-feedback systems subject to DoS attacks. Through the Lyapunov stability analysis, the tracking error is proved to approach the predefined region by the preset finite time, even in the presence of DoS attacks. Meanwhile, the consumed control resources are minimized due to the RL-based optimized algorithm. Finally, an actual example with comparisons verifies the effectiveness of the proposed control algorithm.
Collapse
|
5
|
Xu Y, Li T, Yang Y, Shan Q, Tong S, Chen CLP. Anti-Attack Event-Triggered Control for Nonlinear Multi-Agent Systems With Input Quantization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:10105-10115. [PMID: 35442892 DOI: 10.1109/tnnls.2022.3164881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this article, an anti-attack event-triggered secure control scheme for a class of nonlinear multi-agent systems with input quantization is developed. With the help of neural networks approximating unknown nonlinear functions, unknown states are obtained by designing an adaptive neural state observer. Then, a relative threshold event-triggered control strategy is introduced to save communication resources including network bandwidth and computational capabilities. Furthermore, a quantizer is employed to provide sufficient accuracy under the requirement of a low transmission rate, which is represented by the so-called a hysteresis quantizer. Meanwhile, to resist attacks in the multi-agent network, a predictor is designed to record whether an edge is attacked or not. Through the Lyapunov analysis, the proposed secure control protocol can ensure that all the closed-loop signals remain bounded under attacks. Finally, the effectiveness of the designed scheme is verified by simulation results.
Collapse
|
6
|
Huang Y, Zhang Z. Neural Adaptive H∞ Sliding-Mode Control for Uncertain Nonlinear Systems with Disturbances Using Adaptive Dynamic Programming. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1570. [PMID: 38136450 PMCID: PMC10742753 DOI: 10.3390/e25121570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 11/16/2023] [Accepted: 11/17/2023] [Indexed: 12/24/2023]
Abstract
This paper focuses on a neural adaptive H∞ sliding-mode control scheme for a class of uncertain nonlinear systems subject to external disturbances by the aid of adaptive dynamic programming (ADP). First, by combining the neural network (NN) approximation method with a nonlinear disturbance observer, an enhanced observer framework is developed for estimating the system uncertainties and observing the external disturbances simultaneously. Then, based on the reliable estimations provided by the enhanced observer, an adaptive sliding-mode controller is meticulously designed, which can effectively counteract the effects of the system uncertainties and the separated matched disturbances, even in the absence of prior knowledge regarding their upper bounds. While the remaining unmatched disturbances are attenuated by means of H∞ control performance on the sliding surface. Moreover, a single critic network-based ADP algorithm is employed to learn the cost function related to the Hamilton-Jacobi-Isaacs equation, and thus, the H∞ optimal control is obtained. An updated law for the critic NN is proposed not only to make the Nash equilibrium achieved, but also to stabilize the sliding-mode dynamics without the need for an initial stabilizing control. In addition, we analyze the uniform ultimate boundedness stability of the resultant closed-loop system via Lyapunov's method. Finally, the effectiveness of the proposed scheme is verified through simulations of a single-link robot arm and a power system.
Collapse
Affiliation(s)
- Yuzhu Huang
- College of Electronic and Information Engineering, Hebei University, Baoding 071002, China;
| | | |
Collapse
|
7
|
Wang QG, Lim LHI, Ye Z, Nie ZY, Yang D. LQR approach to robust stabilization of state space systems with matched uncertainties. ISA TRANSACTIONS 2023; 142:420-426. [PMID: 37544823 DOI: 10.1016/j.isatra.2023.07.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 06/09/2023] [Accepted: 07/21/2023] [Indexed: 08/08/2023]
Abstract
This note shows an elegant relationship between the quadratic optimal control and robust stabilization for linear time-invariant (LTI) systems, where the former control can robustly stabilize the latter system, provided that the matched uncertainty is bounded. Through reviewing the relevant literature, some common mistakes in regard to this relationship are found. The correct results are obtained and proved in both frequency and time domains. The results are applicable to both single- and multi-input cases. They are significant as the simple LQR design for the nominal system can be utilized to directly solve-with no further effort-the complex robust stabilization problem for a class of linear uncertain systems.
Collapse
Affiliation(s)
- Qing-Guo Wang
- Institute of AI and Future Networks, Beijing Normal University, Guangdong Key Lab of AI and MM Data Processing, Guangdong Provincial Key Laboratory IRADS, IAS, DST, BNU-HKBU United International College, Zhuhai, 519087, PR China.
| | - Li Hong Idris Lim
- Department of Electronic Systems, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Zhen Ye
- R&D Department of Leica Instruments Singapore Pte. Ltd., Singapore 608924, Singapore
| | - Zhuo-Yun Nie
- School of Information Science and Engineering, National Huaqiao University, Xiamen, 361021, PR China
| | - Dazhi Yang
- School of Electrical Engineering and Automation, Harbin Institute of Technology, Harbin, Heilongjiang, PR China
| |
Collapse
|
8
|
Xu Y, Zhao Z, Yin S. Performance Optimization and Fault-Tolerance of Highly Dynamic Systems Via Q-Learning With an Incrementally Attached Controller Gain System. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:9128-9138. [PMID: 35290189 DOI: 10.1109/tnnls.2022.3155876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
High-performance and reliable control of systems that are highly dynamic and open-loop unstable is challenging but of considerable practical interest. Thus, this article investigates the performance optimization and fault tolerance of highly dynamic systems. First, an incremental control structure is proposed, where a controller gain system is attached to the predesigned controller, and by reconfiguring the controller gain system, the performance can be equivalently optimized as configuring the predesigned one. The incremental attachment of the controller gain system does not modify the existing control system, and it can be easily attached via various communication channels. Second, a structure integrating fault-tolerance strategy and hardware redundancy is proposed. Under this structure, command fusion and fault-tolerance strategies are developed where the control commands from different control units are optimally fused, and each control unit can be reconfigured w.r.t. the performance of the other ones. Furthermore, Q -learning algorithms are developed to realize the proposed structures and strategies in real-time model-freely. As such, varying operational conditions of the highly dynamic system can be tackled. Finally, the proposed structures and algorithms are validated case by case to show their effectiveness.
Collapse
|
9
|
Wang D, Zhao M, Ha M, Qiao J. Stability and Admissibility Analysis for Zero-Sum Games Under General Value Iteration Formulation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:8707-8718. [PMID: 35239493 DOI: 10.1109/tnnls.2022.3152268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this article, the general value iteration (GVI) algorithm for discrete-time zero-sum games is investigated. The theoretical analysis focuses on stability properties of the systems and also the admissibility properties of the iterative policy pair. A new criterion is established to determine the admissibility of the current policy pair. Besides, based on the admissibility criterion, the improved GVI algorithm toward zero-sum games is developed to guarantee that all iterative policy pairs are admissible if the current policy pair satisfies the criterion. On the basis of the attraction domain, we demonstrate that the state trajectory will stay in the region using the fixed or the evolving policy pair if the initial state belongs to the domain. It is emphasized that the evolving policy pair can stabilize the controlled system. These theoretical results are applied to linear and nonlinear systems via offline and online critic control design.
Collapse
|
10
|
Wang X, Xu B, Cheng Y, Wang H, Sun F. Robust Adaptive Learning Control of Space Robot for Target Capturing Using Neural Network. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:7567-7577. [PMID: 35157591 DOI: 10.1109/tnnls.2022.3144569] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
This article investigates the robust adaptive learning control for space robots with target capturing. Based on the momentum conservation theory, the impact dynamics is constructed to derive the relationship of generalized velocity in the pre-impact and post-impact phase. Considering the nonlinear dynamics with contact impact, the robust control using nonsingular terminal sliding mode (NTSM) and fast NTSM is designed to achieve the fast realization of the desired states. Furthermore, for the unknown dynamics of the combination system after capturing a target, the adaptive learning control is developed based on neural network and disturbance observer. Through the serial-parallel estimation model, the prediction error is constructed for the update of adaptive law. The system signals involved in the Lyapunov function are proved to be bounded and the sliding mode surface converges in finite time. Simulation studies present the desired tracking and learning performance.
Collapse
|
11
|
Ha M, Wang D, Liu D. A Novel Value Iteration Scheme With Adjustable Convergence Rate. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:7430-7442. [PMID: 35089866 DOI: 10.1109/tnnls.2022.3143527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this article, a novel value iteration scheme is developed with convergence and stability discussions. A relaxation factor is introduced to adjust the convergence rate of the value function sequence. The convergence conditions with respect to the relaxation factor are given. The stability of the closed-loop system using the control policies generated by the present VI algorithm is investigated. Moreover, an integrated VI approach is developed to accelerate and guarantee the convergence by combining the advantages of the present and traditional value iterations. Also, a relaxation function is designed to adaptively make the developed value iteration scheme possess fast convergence property. Finally, the theoretical results and the effectiveness of the present algorithm are validated by numerical examples.
Collapse
|
12
|
Yang X, Zhou Y, Gao Z. Reinforcement learning for robust stabilization of nonlinear systems with asymmetric saturating actuators. Neural Netw 2023; 158:132-141. [PMID: 36455428 DOI: 10.1016/j.neunet.2022.11.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2022] [Revised: 08/11/2022] [Accepted: 11/07/2022] [Indexed: 11/17/2022]
Abstract
We study the robust stabilization problem of a class of nonlinear systems with asymmetric saturating actuators and mismatched disturbances. Initially, we convert such a robust stabilization problem into a nonlinear-constrained optimal control problem by constructing a discounted cost function for the auxiliary system. Then, for the purpose of solving the nonlinear-constrained optimal control problem, we develop a simultaneous policy iteration (PI) in the reinforcement learning framework. The implementation of the simultaneous PI relies on an actor-critic architecture, which employs actor and critic neural networks (NNs) to separately approximate the control policy and the value function. To determine the actor and critic NNs' weights, we use the approach of weighted residuals together with the typical Monte-Carlo integration technique. Finally, we perform simulations of two nonlinear plants to validate the established theoretical claims.
Collapse
Affiliation(s)
- Xiong Yang
- School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China.
| | - Yingjiang Zhou
- College of Automation and College of Artificial Intelligence, Nanjing University of Posts and Telecommunications, Nanjing 210023, China.
| | - Zhongke Gao
- School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China.
| |
Collapse
|
13
|
Guo J, Tan N, Zhang Y. General ELLRFS-DAZN algorithm for solving future linear equation system under various noises. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2022.10.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
14
|
Wang D, Ren J, Ha M. Discounted linear Q-learning control with novel tracking cost and its stability. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.01.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
15
|
Zhao M, Wang D, Ha M, Qiao J. Evolving and Incremental Value Iteration Schemes for Nonlinear Discrete-Time Zero-Sum Games. IEEE TRANSACTIONS ON CYBERNETICS 2022; PP:4487-4499. [PMID: 36063514 DOI: 10.1109/tcyb.2022.3198078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In this article, evolving and incremental value iteration (VI) frameworks are constructed to address the discrete-time zero-sum game problem. First, the evolving scheme means that the closed-loop system is regulated by using the evolving policy pair. During the control stage, we are committed to establishing the stability criterion in order to guarantee the availability of evolving policy pairs. Second, a novel incremental VI algorithm, which takes the historical information of the iterative process into account, is developed to solve the regulation and tracking problems for the nonlinear zero-sum game. Via introducing different incremental factors, it is highlighted that we can adjust the convergence rate of the iterative cost function sequence. Finally, two simulation examples, including linear and nonlinear systems, are conducted to demonstrate the performance and the validity of the proposed evolving and incremental VI schemes.
Collapse
|
16
|
Fan ZX, Adhikary AC, Li S, Liu R. Disturbance observer based inverse optimal control for a class of nonlinear systems. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.05.115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
17
|
Chen Z, Chen SZ, Chen K, Zhang Y. Constrained Decoupling Adaptive Dynamic Programming for A Partially Uncontrollable Time-Delayed Model of Energy Systems. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.07.032] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
18
|
Wang D, Zhao H, Zhao M, Ren J. Novel optimal trajectory tracking for nonlinear affine systems with an advanced critic learning structure. Neural Netw 2022; 154:131-140. [PMID: 35882081 DOI: 10.1016/j.neunet.2022.07.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 07/03/2022] [Accepted: 07/12/2022] [Indexed: 10/17/2022]
Abstract
In this paper, a critic learning structure based on the novel utility function is developed to solve the optimal tracking control problem with the discount factor of affine nonlinear systems. The utility function is defined as the quadratic form of the error at the next moment, which can not only avoid solving the stable control input, but also effectively eliminate the tracking error. Next, the theoretical derivation of the method under value iteration is given in detail with convergence and stability analysis. Then, the dual heuristic dynamic programming (DHP) algorithm via a single neural network is introduced to reduce the amount of computation. The polynomial is used to approximate the costate function during the DHP implementation. The weighted residual method is used to update the weight matrix. During simulation, the convergence speed of the given strategy is compared with the heuristic dynamic programming (HDP) algorithm. The experiment results display that the convergence speed of the proposed method is faster than the HDP algorithm. Besides, the proposed method is compared with the traditional tracking control approach to verify its tracking performance. The experiment results show that the proposed method can avoid solving the stable control input, and the tracking error is closer to zero than the traditional strategy.
Collapse
Affiliation(s)
- Ding Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Huiling Zhao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Mingming Zhao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Jin Ren
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
19
|
|
20
|
Jiang X, Yang L, Liu S, Liu M. Consensus control protocol for stochastic multiagents with predictors. Soft comput 2022. [DOI: 10.1007/s00500-021-06430-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
21
|
Ha M, Wang D, Liu D. Neural-network-based discounted optimal control via an integrated value iteration with accuracy guarantee. Neural Netw 2021; 144:176-186. [PMID: 34500256 DOI: 10.1016/j.neunet.2021.08.025] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Revised: 08/19/2021] [Accepted: 08/19/2021] [Indexed: 10/20/2022]
Abstract
A data-based value iteration algorithm with the bidirectional approximation feature is developed for discounted optimal control. The unknown nonlinear system dynamics is first identified by establishing a model neural network. To improve the identification precision, biases are introduced to the model network. The model network with biases is trained by the gradient descent algorithm, where the weights and biases across all layers are updated. The uniform ultimate boundedness stability with a proper learning rate is analyzed, by using the Lyapunov approach. Moreover, an integrated value iteration with the discounted cost is developed to fully guarantee the approximation accuracy of the optimal value function. Then, the effectiveness of the proposed algorithm is demonstrated by carrying out two simulation examples with physical backgrounds.
Collapse
Affiliation(s)
- Mingming Ha
- School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China.
| | - Ding Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China.
| | - Derong Liu
- Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, IL 60607, USA.
| |
Collapse
|