1
|
Wang Y, Wang D, Zhao M, Liu N, Qiao J. Neural Q-learning for discrete-time nonlinear zero-sum games with adjustable convergence rate. Neural Netw 2024; 175:106274. [PMID: 38583264 DOI: 10.1016/j.neunet.2024.106274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 03/15/2024] [Accepted: 03/25/2024] [Indexed: 04/09/2024]
Abstract
In this paper, an adjustable Q-learning scheme is developed to solve the discrete-time nonlinear zero-sum game problem, which can accelerate the convergence rate of the iterative Q-function sequence. First, the monotonicity and convergence of the iterative Q-function sequence are analyzed under some conditions. Moreover, by employing neural networks, the model-free tracking control problem can be overcome for zero-sum games. Second, two practical algorithms are designed to guarantee the convergence with accelerated learning. In one algorithm, an adjustable acceleration phase is added to the iteration process of Q-learning, which can be adaptively terminated with convergence guarantee. In another algorithm, a novel acceleration function is developed, which can adjust the relaxation factor to ensure the convergence. Finally, through a simulation example with the practical physical background, the fantastic performance of the developed algorithm is demonstrated with neural networks.
Collapse
Affiliation(s)
- Yuan Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Ding Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Mingming Zhao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Nan Liu
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Junfei Qiao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
2
|
Zhang J, Zhang K, An Y, Luo H, Yin S. An Integrated Multitasking Intelligent Bearing Fault Diagnosis Scheme Based on Representation Learning Under Imbalanced Sample Condition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:6231-6242. [PMID: 37018605 DOI: 10.1109/tnnls.2022.3232147] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Accurate bearing fault diagnosis is of great significance of the safety and reliability of rotary mechanical system. In practice, the sample proportion between faulty data and healthy data in rotating mechanical system is imbalanced. Furthermore, there are commonalities between the bearing fault detection, classification, and identification tasks. Based on these observations, this article proposes a novel integrated multitasking intelligent bearing fault diagnosis scheme with the aid of representation learning under imbalanced sample condition, which realizes bearing fault detection, classification, and unknown fault identification. Specifically, in the unsupervised condition, a bearing fault detection approach based on modified denoising autoencoder (DAE) with self-attention mechanism for bottleneck layer (MDAE-SAMB) is proposed in the integrated scheme, which only uses the healthy data for training. The self-attention mechanism is introduced into the neurons in the bottleneck layer, which can assign different weights to the neurons in the bottleneck layer. Moreover, the transfer learning based on representation learning is proposed for few-shot fault classification. Only a few fault samples are used for offline training, and high-accuracy online bearing fault classification is achieved. Finally, according to the known fault data, the unknown bearing faults can be effectively identified. A bearing dataset generated by rotor dynamics experiment rig (RDER) and a public bearing dataset demonstrates the applicability of the proposed integrated fault diagnosis scheme.
Collapse
|
3
|
Luo R, Peng Z, Hu J. Optimal Robust Control of Nonlinear Systems with Unknown Dynamics via NN Learning with Relaxed Excitation. ENTROPY (BASEL, SWITZERLAND) 2024; 26:72. [PMID: 38248197 PMCID: PMC11154462 DOI: 10.3390/e26010072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 01/04/2024] [Accepted: 01/10/2024] [Indexed: 01/23/2024]
Abstract
This paper presents an adaptive learning structure based on neural networks (NNs) to solve the optimal robust control problem for nonlinear continuous-time systems with unknown dynamics and disturbances. First, a system identifier is introduced to approximate the unknown system matrices and disturbances with the help of NNs and parameter estimation techniques. To obtain the optimal solution of the optimal robust control problem, a critic learning control structure is proposed to compute the approximate controller. Unlike existing identifier-critic NNs learning control methods, novel adaptive tuning laws based on Kreisselmeier's regressor extension and mixing technique are designed to estimate the unknown parameters of the two NNs under relaxed persistence of excitation conditions. Furthermore, theoretical analysis is also given to prove the significant relaxation of the proposed convergence conditions. Finally, effectiveness of the proposed learning approach is demonstrated via a simulation study.
Collapse
Affiliation(s)
- Rui Luo
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China; (R.L.); (J.H.)
| | - Zhinan Peng
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China; (R.L.); (J.H.)
- Institute of Electronic and Information Engineering, University of Electronic Science and Technology of China, Dongguan 523808, China
| | - Jiangping Hu
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China; (R.L.); (J.H.)
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou 313001, China
| |
Collapse
|
4
|
Sun J, Ming Z. Cooperative Differential Game-Based Distributed Optimal Synchronization Control of Heterogeneous Nonlinear Multiagent Systems. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:7933-7942. [PMID: 37022861 DOI: 10.1109/tcyb.2023.3240983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
This article presents an online off-policy policy iteration (PI) algorithm using reinforcement learning (RL) to optimize the distributed synchronization problem for nonlinear multiagent systems (MASs). First, considering that not every follower can directly obtain the leader's information, a novel adaptive model-free observer based on neural networks (NNs) is designed. Moreover the feasibility of the observer is strictly proved. Subsequently, combined with the observer and follower dynamics, an augmented system and a distributed cooperative performance index with discount factors are established. On this basis, the optimal distributed cooperative synchronization problem changes into solving the numerical solution of the Hamilton-Jacobian-Bellman (HJB) equation. Finally, an online off-policy algorithm is proposed, which can be used to optimize the distributed synchronization problem of the MASs in real time based on measured data. In order to prove the stability and convergence of the online off-policy algorithm more conveniently, an offline on-policy algorithm whose stability and convergence are proved is given before the online off-policy algorithm is proposed. We give a novel mathematical analysis method for establishing the stability of the algorithm. The effectiveness of the theory is verified by simulation results.
Collapse
|
5
|
Lin D, Xue S, Liu D, Liang M, Wang Y. Adaptive dynamic programming-based hierarchical decision-making of non-affine systems. Neural Netw 2023; 167:331-341. [PMID: 37673023 DOI: 10.1016/j.neunet.2023.07.044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 06/03/2023] [Accepted: 07/27/2023] [Indexed: 09/08/2023]
Abstract
In this paper, the problem of multiplayer hierarchical decision-making problem for non-affine systems is solved by adaptive dynamic programming. Firstly, the control dynamics are obtained according to the theory of dynamic feedback and combined with the original system dynamics to construct the affine augmented system. Thus, the non-affine multiplayer system is transformed into a general affine form. Then, the hierarchical decision problem is modeled as a Stackelberg game. In the Stackelberg game, the leader makes a decision based on the information of all followers, whereas the followers do not know each other's information and only obtain their optimal control strategy based on the leader's decision. Then, the augmented system is reconstructed by a neural network (NN) using input-output data. Moreover, a single critic NN is used to approximate the value function to obtain the optimal control strategy for each player. An extra term added to the weight update law makes the initial admissible control law no longer needed. According to the Lyapunov theory, the state of the system and the error of the weights of the NN are both uniformly ultimately bounded. Finally, the feasibility and validity of the algorithm are confirmed by simulation.
Collapse
Affiliation(s)
- Danyu Lin
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China.
| | - Shan Xue
- School of Information and Communication Engineering, Hainan University, Haikou 570100, China.
| | - Derong Liu
- School of System Design and Intelligent Manufacturing, Southern University of Science and Technology, Shenzhen 518055, China; Department of Electrical and Computer Engineering, University of illinois Chicago, Chicago, IL 60607, USA.
| | - Mingming Liang
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China.
| | - Yonghua Wang
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China.
| |
Collapse
|
6
|
Luo R, Peng Z, Hu J, Ghosh BK. Adaptive optimal control of affine nonlinear systems via identifier-critic neural network approximation with relaxed PE conditions. Neural Netw 2023; 167:588-600. [PMID: 37703669 DOI: 10.1016/j.neunet.2023.08.044] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 07/25/2023] [Accepted: 08/23/2023] [Indexed: 09/15/2023]
Abstract
This paper considers an optimal control of an affine nonlinear system with unknown system dynamics. A new identifier-critic framework is proposed to solve the optimal control problem. Firstly, a neural network identifier is built to estimate the unknown system dynamics, and a critic NN is constructed to solve the Hamiltonian-Jacobi-Bellman equation associated with the optimal control problem. A dynamic regressor extension and mixing technique is applied to design the weight update laws with relaxed persistence of excitation conditions for the two classes of neural networks. The parameter estimation of the update laws and the stability of the closed-loop system under the adaptive optimal control are analyzed using a Lyapunov function method. Numerical simulation results are presented to demonstrate the effectiveness of the proposed IC learning based optimal control algorithm for the affine nonlinear system.
Collapse
Affiliation(s)
- Rui Luo
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Zhinan Peng
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Jiangping Hu
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China; Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou 313001, China.
| | - Bijoy Kumar Ghosh
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China; Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX, 79409-1042, USA
| |
Collapse
|
7
|
Peng Z, Ji H, Zou C, Kuang Y, Cheng H, Shi K, Ghosh BK. Optimal H ∞ tracking control of nonlinear systems with zero-equilibrium-free via novel adaptive critic designs. Neural Netw 2023; 164:105-114. [PMID: 37148606 DOI: 10.1016/j.neunet.2023.04.021] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 02/16/2023] [Accepted: 04/12/2023] [Indexed: 05/08/2023]
Abstract
In this paper, a novel adaptive critic control method is designed to solve an optimal H∞ tracking control problem for continuous nonlinear systems with nonzero equilibrium based on adaptive dynamic programming (ADP). To guarantee the finiteness of a cost function, traditional methods generally assume that the controlled system has a zero equilibrium point, which is not true in practical systems. In order to overcome such obstacle and realize H∞ optimal tracking control, this paper proposes a novel cost function design with respect to disturbance, tracking error and the derivative of tracking error. Based on the designed cost function, the H∞ control problem is formulated as two-player zero-sum differential games, and then a policy iteration (PI) algorithm is proposed to solve the corresponding Hamilton-Jacobi-Isaacs (HJI) equation. In order to obtain the online solution to the HJI equation, a single-critic neural network structure based on PI algorithm is established to learn the optimal control policy and the worst-case disturbance law. It is worth mentioning that the proposed adaptive critic control method can simplify the controller design process when the equilibrium of the systems is not zero. Finally, simulations are conducted to evaluate the tracking performance of the proposed control methods.
Collapse
Affiliation(s)
- Zhinan Peng
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Hanqi Ji
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Chaobin Zou
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Yiqun Kuang
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China.
| | - Hong Cheng
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Kaibo Shi
- School of Information Science and Engineering, Chengdu University, Chengdu, 610106, China
| | - Bijoy Kumar Ghosh
- Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX, 79409-1042, USA
| |
Collapse
|
8
|
Dynamic event-triggered-based single-network ADP optimal tracking control for the unknown nonlinear system with constrained input. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.11.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
9
|
Huang Z, Bai W, Li T, Long Y, Chen CP, Liang H, Yang H. Adaptive Reinforcement Learning Optimal Tracking Control for Strict-Feedback Nonlinear Systems with Prescribed Performance. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.11.109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
10
|
Song S, Zhu M, Dai X, Gong D. Model-Free Optimal Tracking Control of Nonlinear Input-Affine Discrete-Time Systems via an Iterative Deterministic Q-Learning Algorithm. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:999-1012. [PMID: 35657846 DOI: 10.1109/tnnls.2022.3178746] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In this article, a novel model-free dynamic inversion-based Q-learning (DIQL) algorithm is proposed to solve the optimal tracking control (OTC) problem of unknown nonlinear input-affine discrete-time (DT) systems. Compared with the existing DIQL algorithm and the discount factor-based Q-learning (DFQL) algorithm, the proposed algorithm can eliminate the tracking error while ensuring that it is model-free and off-policy. First, a new deterministic Q-learning iterative scheme is presented, and based on this scheme, a model-based off-policy DIQL algorithm is designed. The advantage of this new scheme is that it can avoid the training of unusual data and improve data utilization, thereby saving computing resources. Simultaneously, the convergence and stability of the designed algorithm are analyzed, and the proof that adding probing noise into the behavior policy does not affect the convergence is presented. Then, by introducing neural networks (NNs), the model-free version of the designed algorithm is further proposed so that the OTC problem can be solved without any knowledge about the system dynamics. Finally, three simulation examples are given to demonstrate the effectiveness of the proposed algorithm.
Collapse
|
11
|
Ye J, Bian Y, Luo B, Hu M, Xu B, Ding R. Costate-Supplement ADP for Model-Free Optimal Control of Discrete-Time Nonlinear Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:45-59. [PMID: 35544498 DOI: 10.1109/tnnls.2022.3172126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In this article, an adaptive dynamic programming (ADP) scheme utilizing a costate function is proposed for optimal control of unknown discrete-time nonlinear systems. The state-action data are obtained by interacting with the environment under the iterative scheme without any model information. In contrast with the traditional ADP scheme, the collected data in the proposed algorithm are generated with different policies, which improves data utilization in the learning process. In order to approximate the cost function more accurately and to achieve a better policy improvement direction in the case of insufficient data, a separate costate network is introduced to approximate the costate function under the actor-critic framework, and the costate is utilized as supplement information to estimate the cost function more precisely. Furthermore, convergence properties of the proposed algorithm are analyzed to demonstrate that the costate function plays a positive role in the convergence process of the cost function based on the alternate iteration mode of the costate function and cost function under a mild assumption. The uniformly ultimately bounded (UUB) property of all the variables is proven by using the Lyapunov approach. Finally, two numerical examples are presented to demonstrate the effectiveness and computation efficiency of the proposed method.
Collapse
|
12
|
|