1
|
Liang Y, Zhang H, Zhang J, Ming Z. Event-Triggered Guarantee Cost Control for Partially Unknown Stochastic Systems via Explorized Integral Reinforcement Learning Strategy. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7830-7844. [PMID: 36395138 DOI: 10.1109/tnnls.2022.3221105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
In this article, an integral reinforcement learning (IRL)-based event-triggered guarantee cost control (GCC) approach is proposed for stochastic systems which are modulated by randomly time-varying parameters. First, with the aid of the RL algorithm, the optimal GCC (OGCC) problem is converted into an optimal zero-sum game by solving a modified Hamilton-Jacobin-Isaac (HJI) equation of the auxiliary system. Moreover, in order to address the stochastic zero-sum game, we propose an on-policy IRL-based control approach involved by the multivariate probabilistic collocation method (MPCM), which can accurately predict the mean value of uncertain functions with randomly time-varying parameters. Furthermore, a novel GCC method, which combines the explorized IRL algorithm and MPCM, is designed to relax the restriction of knowing the system dynamics for the class of stochastic systems. On this foundation, for the purpose of reducing computation cost and avoiding the waste of resources, we propose an event-triggered GCC approach involved with explorized IRL and MPCM by utilizing critic-actor-disturbance neural networks (NNs). Meanwhile, the weight vectors of three NNs are updated simultaneously and aperiodically according to the designed triggering condition. The ultimate boundedness (UB) properties of the controlled systems have been proved by means of the Lyapunov theorem. Finally, the effectiveness of the developed GCC algorithms is illustrated via two simulation examples.
Collapse
|
2
|
Wang R, Wang Z, Liu S, Li T, Li F, Qin B, Wei Q. Optimal Spin Polarization Control for the Spin-Exchange Relaxation-Free System Using Adaptive Dynamic Programming. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:5835-5847. [PMID: 37015668 DOI: 10.1109/tnnls.2022.3230200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
This work is the first to solve the 3-D spin polarization control (3DSPC) problem of atomic ensembles, which controls the spin polarization to achieve arbitrary states with the cooperation of multiphysics fields. First, a novel adaptive dynamic programming (ADP) structure is proposed based on the developed multicritic multiaction neural network (MCMANN) structure with nonquadratic performance functions, as a way to solve the multiplayer nonzero-sum game (MP-NZSG) problem in 3DSPC under the constraints of asymmetric saturation inputs. Then, we utilize the MCMANNs to implement the multicritic multiaction ADP (MCMA-ADP) algorithm, whose convergence is proven by the compression mapping principle. Finally, the MCMA-ADP is deployed in the spin-exchange relaxation-free (SERF) system to provide a set of control laws in 3DSPC that fully exploits the multiphysics fields to achieve arbitrary spin polarization states. Numerical simulations support the theoretical results.
Collapse
|
3
|
Zhang Y, Zhang J, Weng J. Dynamic Moore-Penrose Inversion With Unknown Derivatives: Gradient Neural Network Approach. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:10919-10929. [PMID: 35536807 DOI: 10.1109/tnnls.2022.3171715] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Finding dynamic Moore-Penrose inverses (DMPIs) in real-time is a challenging problem due to the time-varying nature of the inverse. Traditional numerical methods for static Moore-Penrose inverse are not efficient for calculating DMPIs and are restricted by serial processing. The current state-of-the-art method for finding DMPIs is called the zeroing neural network (ZNN) method, which requires that the time derivative of the associated matrix is available all the time during the solution process. However, in practice, the time derivative of the associated dynamic matrix may not be available in a real-time manner or be subject to noises caused by differentiators. In this article, we propose a novel gradient-based neural network (GNN) method for computing DMPIs, which does not need the time derivative of the associated dynamic matrix. In particular, the neural state matrix of the proposed GNN converges to the theoretical DMPI in finite time. The finite-time convergence is kept by simply setting a large parameter when there are additive noises in the implementation of the GNN model. Simulation results demonstrate the efficacy and superiority of the proposed GNN method.
Collapse
|
4
|
Liu Y, Zhang H, Shi Z, Gao Z. Neural-Network-Based Finite-Time Bipartite Containment Control for Fractional-Order Multi-Agent Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:7418-7429. [PMID: 35100125 DOI: 10.1109/tnnls.2022.3143494] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
This article focuses on the adaptive bipartite containment control problem for the nonaffine fractional-order multi-agent systems (FOMASs) with disturbances and completely unknown high-order dynamics. Different from the existing finite-time theory of fractional-order system, a lemma is developed that can be applied to actualize the aim of finite-time bipartite containment for the considered FOMASs, in which the settling time and convergence accuracy can be estimated. Via applying the mean-value theorem, the difficulty of the controller design generated by the nonaffine nonlinear term is overcome. A neural network (NN) is employed to approximate the ideal input signal instead of the unknown nonaffine function, then a distributed adaptive NN bipartite containment control for the FOMASs is developed under the backstepping structure. It can be proved that the bipartite containment error under the proposed control scheme can achieve finite-time convergence even though the follower agents are subjected to completely unknown dynamic and disturbances. Finally, the feasibility and validity of the obtained results are exhibited by the simulation examples.
Collapse
|
5
|
Lin D, Xue S, Liu D, Liang M, Wang Y. Adaptive dynamic programming-based hierarchical decision-making of non-affine systems. Neural Netw 2023; 167:331-341. [PMID: 37673023 DOI: 10.1016/j.neunet.2023.07.044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 06/03/2023] [Accepted: 07/27/2023] [Indexed: 09/08/2023]
Abstract
In this paper, the problem of multiplayer hierarchical decision-making problem for non-affine systems is solved by adaptive dynamic programming. Firstly, the control dynamics are obtained according to the theory of dynamic feedback and combined with the original system dynamics to construct the affine augmented system. Thus, the non-affine multiplayer system is transformed into a general affine form. Then, the hierarchical decision problem is modeled as a Stackelberg game. In the Stackelberg game, the leader makes a decision based on the information of all followers, whereas the followers do not know each other's information and only obtain their optimal control strategy based on the leader's decision. Then, the augmented system is reconstructed by a neural network (NN) using input-output data. Moreover, a single critic NN is used to approximate the value function to obtain the optimal control strategy for each player. An extra term added to the weight update law makes the initial admissible control law no longer needed. According to the Lyapunov theory, the state of the system and the error of the weights of the NN are both uniformly ultimately bounded. Finally, the feasibility and validity of the algorithm are confirmed by simulation.
Collapse
Affiliation(s)
- Danyu Lin
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China.
| | - Shan Xue
- School of Information and Communication Engineering, Hainan University, Haikou 570100, China.
| | - Derong Liu
- School of System Design and Intelligent Manufacturing, Southern University of Science and Technology, Shenzhen 518055, China; Department of Electrical and Computer Engineering, University of illinois Chicago, Chicago, IL 60607, USA.
| | - Mingming Liang
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China.
| | - Yonghua Wang
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China.
| |
Collapse
|
6
|
Zhang H, Ming Z, Yan Y, Wang W. Data-Driven Finite-Horizon H ∞ Tracking Control With Event-Triggered Mechanism for the Continuous-Time Nonlinear Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:4687-4701. [PMID: 34633936 DOI: 10.1109/tnnls.2021.3116464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this article, the neural network (NN)-based adaptive dynamic programming (ADP) event-triggered control method is presented to obtain the near-optimal control policy for the model-free finite-horizon H∞ optimal tracking control problem with constrained control input. First, using available input-output data, a data-driven model is established by a recurrent NN (RNN) to reconstruct the unknown system. Then, an augmented system with event-triggered mechanism is obtained by a tracking error system and a command generator. We present a novel event-triggering condition without Zeno behavior. On this basis, the relationship between event-triggered Hamilton-Jacobi-Isaacs (HJI) equation and time-triggered HJI equation is given in Theorem 3. Since the solution of the HJI equation is time-dependent for the augmented system, the time-dependent activation functions of NNs are considered. Moreover, an extra error is incorporated to satisfy the terminal constraints of cost function. This adaptive control pattern finds, in real time, approximations of the optimal value while also ensuring the uniform ultimate boundedness of the closed-loop system. Finally, the effectiveness of the proposed near-optimal control pattern is verified by two simulation examples.
Collapse
|
7
|
Sun J, Zhang H, Yan Y, Xu S, Fan X. Optimal Regulation Strategy for Nonzero-Sum Games of the Immune System Using Adaptive Dynamic Programming. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:1475-1484. [PMID: 34464283 DOI: 10.1109/tcyb.2021.3103820] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This article investigates the optimal control strategy problem for nonzero-sum games of the immune system based on adaptive dynamic programming (ADP). First, the main objective is approximating a Nash equilibrium between the tumor cells and the immune cell population, which is governed through chemotherapy drugs and immunoagents guided by the mathematical growth model of the tumor cells. Second, a novel intelligent nonzero-sum games-based ADP is put forward to solve the optimization control problem by reducing the growth rate of tumor cells and minimizing chemotherapy drugs and immunotherapy drugs. Meanwhile, the convergence analysis and iterative ADP algorithm are specified to prove feasibility. Finally, simulation examples are listed to account for availability and effectiveness of the research methodology.
Collapse
|
8
|
Sun J, Dai J, Zhang H, Yu S, Xu S, Wang J. Neural-Network-Based Immune Optimization Regulation Using Adaptive Dynamic Programming. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:1944-1953. [PMID: 35767503 DOI: 10.1109/tcyb.2022.3179302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
This article investigates optimal regulation scheme between tumor and immune cells based on the adaptive dynamic programming (ADP) approach. The therapeutic goal is to inhibit the growth of tumor cells to allowable injury degree and maximize the number of immune cells in the meantime. The reliable controller is derived through the ADP approach to make the number of cells achieve the specific ideal states. First, the main objective is to weaken the negative effect caused by chemotherapy and immunotherapy, which means that the minimal dose of chemotherapeutic and immunotherapeutic drugs can be operational in the treatment process. Second, according to the nonlinear dynamical mathematical model of tumor cells, chemotherapy and immunotherapeutic drugs can act as powerful regulatory measures, which is a closed-loop control behavior. Finally, states of the system and critic weight errors are proved to be ultimately uniformly bounded with the appropriate optimization control strategy and the simulation results are shown to demonstrate the effectiveness of the cybernetics methodology.
Collapse
|
9
|
Yang X, Zhou Y, Gao Z. Reinforcement learning for robust stabilization of nonlinear systems with asymmetric saturating actuators. Neural Netw 2023; 158:132-141. [PMID: 36455428 DOI: 10.1016/j.neunet.2022.11.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2022] [Revised: 08/11/2022] [Accepted: 11/07/2022] [Indexed: 11/17/2022]
Abstract
We study the robust stabilization problem of a class of nonlinear systems with asymmetric saturating actuators and mismatched disturbances. Initially, we convert such a robust stabilization problem into a nonlinear-constrained optimal control problem by constructing a discounted cost function for the auxiliary system. Then, for the purpose of solving the nonlinear-constrained optimal control problem, we develop a simultaneous policy iteration (PI) in the reinforcement learning framework. The implementation of the simultaneous PI relies on an actor-critic architecture, which employs actor and critic neural networks (NNs) to separately approximate the control policy and the value function. To determine the actor and critic NNs' weights, we use the approach of weighted residuals together with the typical Monte-Carlo integration technique. Finally, we perform simulations of two nonlinear plants to validate the established theoretical claims.
Collapse
Affiliation(s)
- Xiong Yang
- School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China.
| | - Yingjiang Zhou
- College of Automation and College of Artificial Intelligence, Nanjing University of Posts and Telecommunications, Nanjing 210023, China.
| | - Zhongke Gao
- School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China.
| |
Collapse
|
10
|
Wang K, Mu C. Asynchronous learning for actor-critic neural networks and synchronous triggering for multiplayer system. ISA TRANSACTIONS 2022; 129:295-308. [PMID: 35216805 DOI: 10.1016/j.isatra.2022.02.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Revised: 01/19/2022] [Accepted: 02/05/2022] [Indexed: 06/14/2023]
Abstract
In this paper, based on actor-critic neural network structure and reinforcement learning scheme, a novel asynchronous learning algorithm with event communication is developed, so as to solve Nash equilibrium of multiplayer nonzero-sum differential game in an adaptive fashion. From the point of optimal control view, each player or local controller wants to minimize the individual infinite-time cost function by finding an optimal policy. In this novel learning framework, each player consists of one critic and one actor, and implements distributed asynchronous policy iteration to optimize decision-making process. In addition, communication burden between the system and players is effectively reduced by setting up a central event generator. Critic network executes fast updates by gradient-descent adaption while actor network gives event-induced updates using the gradient projection. The closed-loop asymptotic stability is ensured along with uniform ultimate convergence. Then, the effectiveness of the proposed algorithm is substantiated on a four-player nonlinear system, revealing that it can significantly reduce sampling numbers without impairing learning accuracy. Finally, by leveraging nonzero-sum game idea, the proposed learning scheme is also applied to solve the lateral-directional stability of a linear aircraft system, and is further extended to a nonlinear vehicle system for achieving adaptive cruise control.
Collapse
Affiliation(s)
- Ke Wang
- School of Electrical and Information Engineering, Tianjin University, Tianjin, China.
| | - Chaoxu Mu
- School of Electrical and Information Engineering, Tianjin University, Tianjin, China.
| |
Collapse
|
11
|
Wei Q, Yang Z, Su H, Wang L. Monte Carlo-based Reinforcement Learning Control for Unmanned Aerial Vehicle Systems. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.08.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
|
12
|
Prescribed-Time Convergent Adaptive ZNN for Time-Varying Matrix Inversion under Harmonic Noise. ELECTRONICS 2022. [DOI: 10.3390/electronics11101636] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Harmonic noises widely exist in industrial fields and always affect the computational accuracy of neural network models. The existing original adaptive zeroing neural network (OAZNN) model can effectively suppress harmonic noises. Nevertheless, the OAZNN model’s convergence rate only stays at the exponential convergence, that is, its convergence speed is usually greatly affected by the initial state. Consequently, to tackle the above issue, this work combines the dynamic characteristics of harmonic signals with prescribed-time convergence activation function, and proposes a prescribed-time convergent adaptive ZNN (PTCAZNN) for solving time-varying matrix inverse problem (TVMIP) under harmonic noises. Owing to the nonlinear activation function used having the ability to reject noises itself and the adaptive term also being able to compensate the influence of noises, the PTCAZNN model can realize double noise suppression. More importantly, the theoretical analysis of PTCAZNN model with prescribed-time convergence and robustness performance is provided. Finally, by varying a series of conditions such as the frequency of single harmonic noise, the frequency of multi-harmonic noise, and the initial value and the dimension of the matrix, the comparative simulation results further confirm the effectiveness and superiority of the PTCAZNN model.
Collapse
|
13
|
Wei Q, Han L, Zhang T. Spiking Adaptive Dynamic Programming Based on Poisson Process for Discrete-Time Nonlinear Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:1846-1856. [PMID: 34143743 DOI: 10.1109/tnnls.2021.3085781] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this article, a new iterative spiking adaptive dynamic programming (SADP) method based on the Poisson process is developed to solve optimal impulsive control problems. For a fixed time interval, combining the Poisson process and the maximum likelihood estimation (MLE), the three-tuple of state, spiking interval, and probability of Poisson distribution can be computed, and then, the iterative value functions and iterative control laws can be obtained. A property analysis method is developed to show that the value functions converge to optimal performance index function as the iterative index increases from zero to infinity. Finally, two simulation examples are given to verify the effectiveness of the developed algorithm.
Collapse
|
14
|
Zhong W, Wang M, Wei Q, Lu J. A New Neuro-Optimal Nonlinear Tracking Control Method via Integral Reinforcement Learning with Applications to Nuclear Systems. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.01.034] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
15
|
Ma YS, Che WW, Deng C. Dynamic event-triggered model-free adaptive control for nonlinear CPSs under aperiodic DoS attacks. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.01.009] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
16
|
Cai Y, Zhang H, Zhang J, Wang W. Fixed-time leader-following/containment consensus for a class of nonlinear multi-agent systems. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.12.064] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
17
|
Liang M, Wei Q. A partial policy iteration ADP algorithm for nonlinear neuro-optimal control with discounted total reward. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.11.014] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|