1
|
Liu L, Song R. Adaptive sampling artificial-actual control for non-zero-sum games of constrained systems. Neural Netw 2024; 178:106413. [PMID: 38850637 DOI: 10.1016/j.neunet.2024.106413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 04/21/2024] [Accepted: 05/27/2024] [Indexed: 06/10/2024]
Abstract
Considering physical constraints encountered by actuators, this paper addresses the non-zero-sum game of continuous nonlinear systems with symmetric and asymmetric input constraints through aperiodic sampling artificial-actual control. Initially, the artificial system built by the improved Elman dynamic neural networks (EDNNs) has artificial-actual interaction with the physical system, which provides a new perspective for predicting the system state. By constantly learning and adjusting parameters, EDNNs can gradually approximate the dynamic behavior of the real system to achieve more effective control. Aiming at accommodating diverse input constraints, the non-quadratic value function constructed from a smoothly bounded function is devised. Then, the polynomial parameterized adaptive dynamic programming (ADP) is employed to approximate the solution of the coupled Hamilton-Jacobi equation (HJE), deriving optimal control laws for two players. To improve the efficiency of data communication, three adaptive sampling mechanisms including event-triggered mechanism (ETM) with relative threshold, dynamic ETM (DETM) and self-triggered mechanism (STM) are introduced in turn during the iterative learning process of control sequences. DETM further extends sampling intervals by incorporating internal dynamic variables, while STM determines the next trigger time through soft calculation without hardware monitoring. All three trigger modes can ensure the system stability while avoiding the Zeno phenomenon, and relevant proofs are given. Finally, the simulation validates the effectiveness of the designed algorithm and highlights the unique characteristics of each trigger mode.
Collapse
Affiliation(s)
- Lu Liu
- Beijing Engineering Research Center of Industrial Spectrum Imaging, School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China.
| | - Ruizhuo Song
- Beijing Engineering Research Center of Industrial Spectrum Imaging, School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China.
| |
Collapse
|
2
|
Wang T, Niu B, Xu N, Zhang L. ADP-based online compensation hierarchical sliding-mode control for partially unknown switched nonlinear systems with actuator failures. ISA TRANSACTIONS 2024:1-13. [PMID: 39304368 DOI: 10.1016/j.isatra.2024.09.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 07/27/2024] [Accepted: 09/06/2024] [Indexed: 09/22/2024]
Abstract
This article investigates an adaptive dynamic programming-based online compensation hierarchical sliding-mode control problem for a class of partially unknown switched nonlinear systems with actuator failures and uncertain perturbations under an identifier-critic neural networks architecture. Firstly, by introducing a cost function related to hierarchical sliding-mode surfaces for the nominal system, the original control problem is equivalently converted into an optimal control problem. To obtain this optimal control policy, the Hamilton-Jacobi-Bellman equation is solved through an adaptive dynamic programming method. Compared with conventional adaptive dynamic programming methods, the identifier-critic network architecture not only overcomes the limitation on the unknown internal dynamic but also eliminates the approximation error arising from the actor network. The weights in the critic network are tuned via the gradient descent approach and the experience replay technology, such that the persistence of excitation condition can be relaxed. Then, a compensation term containing hierarchical sliding-mode surfaces is used to offset uncertain actuator failures without the fault detection and isolation unit. Based on the Lyapunov stability theory, all states of the closed-loop nonlinear system are stable in the sense of uniformly ultimately boundedness. Finally, numerical and practical examples are given to demonstrate the effectiveness of our presented online compensation control strategy.
Collapse
Affiliation(s)
- Tengda Wang
- College of Control Science and Engineering, Bohai University, Jinzhou 121013, Liaoning, China
| | - Ben Niu
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, Liaoning, 116024, China
| | - Ning Xu
- College of Information Science and Technology, Bohai University, Jinzhou 121013, Liaoning, China.
| | - Liang Zhang
- College of Control Science and Engineering, Bohai University, Jinzhou 121013, Liaoning, China
| |
Collapse
|
3
|
Guo Z, Li H, Ma H, Meng W. Distributed Optimal Attitude Synchronization Control of Multiple QUAVs via Adaptive Dynamic Programming. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:8053-8063. [PMID: 36446013 DOI: 10.1109/tnnls.2022.3224029] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
This article proposes a distributed optimal attitude synchronization control strategy for multiple quadrotor unmanned aerial vehicles (QUAVs) through the adaptive dynamic programming (ADP) algorithm. The attitude systems of QUAVs are modeled as affine nominal systems subject to parameter uncertainties and external disturbances. Considering attitude constraints in complex flying environments, a one-to-one mapping technique is utilized to transform the constrained systems into equivalent unconstrained systems. An improved nonquadratic cost function is constructed for each QUAV, which reflects the requirements of robustness and the constraints of control input simultaneously. To overcome the issue that the persistence of excitation (PE) condition is difficult to meet, a novel tuning rule of critic neural network (NN) weights is developed via the concurrent learning (CL) technique. In terms of the Lyapunov stability theorem, the stability of the closed-loop system and the convergence of critic NN weights are proved. Finally, simulation results on multiple QUAVs show the effectiveness of the proposed control strategy.
Collapse
|
4
|
Lin G, Li H, Ahn CK, Yao D. Event-Based Finite-Time Neural Control for Human-in-the-Loop UAV Attitude Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:10387-10397. [PMID: 35511837 DOI: 10.1109/tnnls.2022.3166531] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
This article focuses on the event-based finite-time neural attitude consensus control problem for the six-rotor unmanned aerial vehicle (UAV) systems with unknown disturbances. It is assumed that the six-rotor UAV systems are controlled by a human operator sending command signals to the leader. A disturbance observer and radial basis function neural networks (RBF NNs) are applied to address the problems regarding external disturbances and uncertain nonlinear dynamics, respectively. In addition, the proposed finite-time command filtered (FTCF) backstepping method effectively manages the issue of "explosion of complexity," where filtering errors are eliminated by the error compensation mechanism. In addition, an event-triggered mechanism is considered to alleviate the communication burden between the controller and the actuator in practice. It is shown that all signals of the six-rotor UAV systems are bounded and the consensus errors converge to a small neighborhood of the origin in finite time. Finally, the simulation results demonstrate the effectiveness of the proposed control scheme.
Collapse
|
5
|
Zhou Y. Efficient Online Globalized Dual Heuristic Programming With an Associated Dual Network. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:10079-10090. [PMID: 35436197 DOI: 10.1109/tnnls.2022.3164727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Globalized dual heuristic programming (GDHP) is the most comprehensive adaptive critic design, which employs its critic to minimize the error with respect to both the cost-to-go and its derivatives simultaneously. Its implementation, however, confronts a dilemma of either introducing more computational load by explicitly calculating the second partial derivative term or sacrificing the accuracy by loosening the association between the cost-to-go and its derivatives. This article aims at increasing the online learning efficiency of GDHP while retaining its analytical accuracy by introducing a novel GDHP design based on a critic network and an associated dual network. This associated dual network is derived from the critic network explicitly and precisely, and its structure is in the same level of complexity as dual heuristic programming critics. Three simulation experiments are conducted to validate the learning ability, efficiency, and feasibility of the proposed GDHP critic design.
Collapse
|
6
|
Yuan X, Wang Y, Liu J, Sun C. Action Mapping: A Reinforcement Learning Method for Constrained-Input Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:7145-7157. [PMID: 35025751 DOI: 10.1109/tnnls.2021.3138924] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Existing approaches to constrained-input optimal control problems mainly focus on systems with input saturation, whereas other constraints, such as combined inequality constraints and state-dependent constraints, are seldom discussed. In this article, a reinforcement learning (RL)-based algorithm is developed for constrained-input optimal control of discrete-time (DT) systems. The deterministic policy gradient (DPG) is introduced to iteratively search the optimal solution to the Hamilton-Jacobi-Bellman (HJB) equation. To deal with input constraints, an action mapping (AM) mechanism is proposed. The objective of this mechanism is to transform the exploration space from the subspace generated by the given inequality constraints to the standard Cartesian product space, which can be searched effectively by existing algorithms. By using the proposed architecture, the learned policy can output control signals satisfying the given constraints, and the original reward function can be kept unchanged. In our study, the convergence analysis is given. It is shown that the iterative algorithm is convergent to the optimal solution of the HJB equation. In addition, the continuity of the iterative estimated Q -function is investigated. Two numerical examples are provided to demonstrate the effectiveness of our approach.
Collapse
|
7
|
Wang D, Ren J, Ha M, Qiao J. System Stability of Learning-Based Linear Optimal Control With General Discounted Value Iteration. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:6504-6514. [PMID: 34986105 DOI: 10.1109/tnnls.2021.3137524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
For discounted optimal regulation design, the stability of the controlled system is affected by the discount factor. If an inappropriate discount factor is employed, the optimal control policy might be unstabilizing. Therefore, in this article, the effect of the discount factor on the stabilization of control strategies is discussed. We develop the system stability criterion and the selection rules of the discount factor with respect to the linear quadratic regulator problem under the general discounted value iteration algorithm. Based on the monotonicity of the value function sequence, the method to judge the stability of the controlled system is established during the iteration process. In addition, once some stability conditions are satisfied at a certain iteration step, all control policies after this iteration step are stabilizing. Furthermore, combined with the undiscounted optimal control problem, the practical rule of how to select an appropriate discount factor is constructed. Finally, several simulation examples with physical backgrounds are conducted to demonstrate the present theoretical results.
Collapse
|
8
|
Sun W, Diao S, Su SF, Sun ZY. Fixed-Time Adaptive Neural Network Control for Nonlinear Systems With Input Saturation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:1911-1920. [PMID: 34464271 DOI: 10.1109/tnnls.2021.3105664] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This study concentrates on the tracking control problem for nonlinear systems subject to actuator saturation. To improve the performance of the controller, we propose a fixed-time tracking control scheme, in which the upper bound of the convergence time is independent of the initial conditions. In the control scheme, first, a smooth nonlinear function is employed to approximate the saturation function so that the controller can be designed under the framework of backstepping. Then, the effect of input saturation is compensated by introducing an auxiliary system. Furthermore, a fixed-time adaptive neural network control method is given with the help of fixed-time control theory, in which the dynamic order of controllers is reduced to a certain extent since there is only one updating law in the entire control design. Through rigorous theoretical analysis, it is concluded that the proposed control scheme can guarantee that: 1) the output tracking error can converge to a small neighborhood near the origin in a fixed time and 2) all signals in the closed-loop system are bounded. Finally, a numerical example and a practical example based on the single-link manipulator are provided to verify the effectiveness of the proposed method.
Collapse
|
9
|
Li K, Li Y. Adaptive NN Optimal Consensus Fault-Tolerant Control for Stochastic Nonlinear Multiagent Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:947-957. [PMID: 34432637 DOI: 10.1109/tnnls.2021.3104839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This article investigates the problem of adaptive neural network (NN) optimal consensus tracking control for nonlinear multiagent systems (MASs) with stochastic disturbances and actuator bias faults. In control design, NN is adopted to approximate the unknown nonlinear dynamic, and a state identifier is constructed. The fault estimator is designed to solve the problem raised by time-varying actuator bias fault. By utilizing adaptive dynamic programming (ADP) in identifier-critic-actor construction, an adaptive NN optimal consensus fault-tolerant control algorithm is presented. It is proven that all signals of the controlled system are uniformly ultimately bounded (UUB) in probability, and all states of the follower agents can remain consensus with the leader's state. Finally, simulation results are given to illustrate the effectiveness of the developed optimal consensus control scheme and theorem.
Collapse
|
10
|
Bai W, Li T, Long Y, Chen CLP. Event-Triggered Multigradient Recursive Reinforcement Learning Tracking Control for Multiagent Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:366-379. [PMID: 34270435 DOI: 10.1109/tnnls.2021.3094901] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this article, the tracking control problem of event-triggered multigradient recursive reinforcement learning is investigated for nonlinear multiagent systems (MASs). Attention is focused on the distributed reinforcement learning approach for MASs. The critic neural network (NN) is applied to estimate the long-term strategic utility function, and the actor NN is designed to approximate the uncertain dynamics in MASs. The multigradient recursive (MGR) strategy is tailored to learn the weight vector in NN, which eliminates the local optimal problem inherent in gradient descent method and decreases the dependence of initial value. Furthermore, reinforcement learning and event-triggered mechanism can improve the energy conservation of MASs by decreasing the amplitude of the controller signal and the controller update frequency, respectively. It is proved that all signals in MASs are semiglobal uniformly ultimately bounded (SGUUB) according to the Lyapunov theory. Simulation results are given to demonstrate the effectiveness of the proposed strategy.
Collapse
|
11
|
Liu P, Zhang H, Sun J, Tan Z. Event-triggered adaptive integral reinforcement learning method for zero-sum differential games of nonlinear systems with incomplete known dynamics. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07010-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
12
|
Mohammadi M, Arefi MM, Vafamand N, Kaynak O. Control of an AUV with completely unknown dynamics and multi-asymmetric input constraints via off-policy reinforcement learning. Neural Comput Appl 2022. [DOI: 10.1007/s00521-021-06476-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
13
|
Sun B, van Kampen EJ. Event-triggered constrained control using explainable global dual heuristic programming for nonlinear discrete-time systems. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.10.046] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
14
|
On Stability of Perturbed Nonlinear Switched Systems with Adaptive Reinforcement Learning. ENERGIES 2020. [DOI: 10.3390/en13195069] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In this paper, a tracking control approach is developed based on an adaptive reinforcement learning algorithm with a bounded cost function for perturbed nonlinear switched systems, which represent a useful framework for modelling these converters, such as DC–DC converter, multi-level converter, etc. An optimal control method is derived for nominal systems to solve the tracking control problem, which results in solving a Hamilton–Jacobi–Bellman (HJB) equation. It is shown that the optimal controller obtained by solving the HJB equation can stabilize the perturbed nonlinear switched systems. To develop a solution to the translated HJB equation, the proposed neural networks consider the training technique obtaining the minimization of square of Bellman residual error in critic term due to the description of Hamilton function. Theoretical analysis shows that all the closed-loop system signals are uniformly ultimately bounded (UUB) and the proposed controller converges to optimal control law. The simulation results of two situations demonstrate the effectiveness of the proposed controller.
Collapse
|