1
|
Ma YS, Xu Y, Sun J, Dou LH. Data-driven optimal cooperative tracking control for heterogeneous multi-agent systems. ISA TRANSACTIONS 2024:1-9. [PMID: 39266336 DOI: 10.1016/j.isatra.2024.08.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 05/20/2024] [Accepted: 08/23/2024] [Indexed: 09/14/2024]
Abstract
This paper presents a novel hierarchical control scheme for solving the data-driven optimal cooperative tracking control problem of heterogeneous multi-agent systems. Considering that followers cannot communicate with the leader, a prescribed-time fully distributed observer is devised to estimate the leader's state for each follower. Then, the data-driven decentralized controller is designed to ensure that the follower's output can track the leader's one. Compared with the existing results, the advantages of the designed distributed observer are that the prescribed convergence time is completely predetermined by the designer, and the design of the observer gain is independent of the global topology information. Besides, the advantages of the designed decentralized controller are that neither the follower's system model nor a known initial stabilizing control policy is required. Finally, simulation results exemplify the advantage of the proposed method.
Collapse
Affiliation(s)
- Yong-Sheng Ma
- National Key Lab of Autonomous Intelligent Unmanned Systems, School of Automation, Beijing Institute of Technology, Beijing 100081, China.
| | - Yong Xu
- National Key Lab of Autonomous Intelligent Unmanned Systems, School of Automation, Beijing Institute of Technology, Beijing 100081, China.
| | - Jian Sun
- National Key Lab of Autonomous Intelligent Unmanned Systems, School of Automation, Beijing Institute of Technology, Beijing 100081, China.
| | - Li-Hua Dou
- National Key Lab of Autonomous Intelligent Unmanned Systems, School of Automation, Beijing Institute of Technology, Beijing 100081, China.
| |
Collapse
|
2
|
Jiang X, Huang M, Shi H, Wang X, Zhang Y. Off-policy two-dimensional reinforcement learning for optimal tracking control of batch processes with network-induced dropout and disturbances. ISA TRANSACTIONS 2024; 144:228-244. [PMID: 38030447 DOI: 10.1016/j.isatra.2023.11.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 10/07/2023] [Accepted: 11/03/2023] [Indexed: 12/01/2023]
Abstract
In this paper, a new off-policy two-dimensional (2D) reinforcement learning approach is proposed to deal with the optimal tracking control (OTC) issue of batch processes with network-induced dropout and disturbances. A dropout 2D augmented Smith predictor is first devised to estimate the present extended state utilizing past data of time and batch orientations. The dropout 2D value function and Q-function are further defined, and their relation is analyzed to meet the optimal performance. On this basis, the dropout 2D Bellman equation is derived according to the principle of the Q-function. For the sake of addressing the dropout 2D OTC problem of batch processes, two algorithms, i.e., the off-line 2D policy iteration algorithm and the off-policy 2D Q-learning algorithm, are presented. The latter method is developed by applying only the input and the estimated state, not the underlying information of the system. Meanwhile, the analysis with regard to the unbiasedness of solutions and convergence is separately given. The effectiveness of the provided methodologies is eventually validated through the application of a simulated case during the filling process.
Collapse
Affiliation(s)
- Xueying Jiang
- College of Information Science and Engineering, Northeastern University, China; State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, China
| | - Min Huang
- College of Information Science and Engineering, Northeastern University, China; State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, China.
| | - Huiyuan Shi
- State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, China; School of Information and Control Engineering, Liaoning Petrochemical University, China
| | - Xingwei Wang
- College of Computer Science and Engineering, Northeastern University, China
| | - Yanfeng Zhang
- College of Computer Science and Engineering, Northeastern University, China
| |
Collapse
|
3
|
Xue W, Lian B, Fan J, Kolaric P, Chai T, Lewis FL. Inverse Reinforcement Q-Learning Through Expert Imitation for Discrete-Time Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:2386-2399. [PMID: 34520364 DOI: 10.1109/tnnls.2021.3106635] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
In inverse reinforcement learning (RL), there are two agents. An expert target agent has a performance cost function and exhibits control and state behaviors to a learner. The learner agent does not know the expert's performance cost function but seeks to reconstruct it by observing the expert's behaviors and tries to imitate these behaviors optimally by its own response. In this article, we formulate an imitation problem where the optimal performance intent of a discrete-time (DT) expert target agent is unknown to a DT Learner agent. Using only the observed expert's behavior trajectory, the learner seeks to determine a cost function that yields the same optimal feedback gain as the expert's, and thus, imitates the optimal response of the expert. We develop an inverse RL approach with a new scheme to solve the behavior imitation problem. The approach consists of a cost function update based on an extension of RL policy iteration and inverse optimal control, and a control policy update based on optimal control. Then, under this scheme, we develop an inverse reinforcement Q-learning algorithm, which is an extension of RL Q-learning. This algorithm does not require any knowledge of agent dynamics. Proofs of stability, convergence, and optimality are given. A key property about the nonunique solution is also shown. Finally, simulation experiments are presented to show the effectiveness of the new approach.
Collapse
|
4
|
Rizvi SAA, Pertzborn AJ, Lin Z. Reinforcement Learning Based Optimal Tracking Control Under Unmeasurable Disturbances With Application to HVAC Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:7523-7533. [PMID: 34129505 PMCID: PMC9703879 DOI: 10.1109/tnnls.2021.3085358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This paper presents the design of an optimal controller for solving tracking problems subject to unmeasurable disturbances and unknown system dynamics using reinforcement learning (RL). Many existing RL control methods take disturbance into account by directly measuring it and manipulating it for exploration during the learning process, thereby preventing any disturbance induced bias in the control estimates. However, in most practical scenarios, disturbance is neither measurable nor manipulable. The main contribution of this article is the introduction of a combination of a bias compensation mechanism and the integral action in the Q-learning framework to remove the need to measure or manipulate the disturbance, while preventing disturbance induced bias in the optimal control estimates. A bias compensated Q-learning scheme is presented that learns the disturbance induced bias terms separately from the optimal control parameters and ensures the convergence of the control parameters to the optimal solution even in the presence of unmeasurable disturbances. Both state feedback and output feedback algorithms are developed based on policy iteration (PI) and value iteration (VI) that guarantee the convergence of the tracking error to zero. The feasibility of the design is validated on a practical optimal control application of a heating, ventilating, and air conditioning (HVAC) zone controller.
Collapse
|
5
|
Dong S, Liu L, Feng G, Liu M, Wu ZG. Quantized Fuzzy Cooperative Output Regulation for Heterogeneous Nonlinear Multiagent Systems With Directed Fixed/Switching Topologies. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:12393-12402. [PMID: 34166214 DOI: 10.1109/tcyb.2021.3082164] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This article investigates the cooperative output regulation problem for heterogeneous nonlinear multiagent systems subject to disturbances and quantization. The agent dynamics are modeled by the well-known Takagi-Sugeno fuzzy systems. Distributed reference generators are first devised to estimate the state of the exosystem under directed fixed and switching communication graphs, respectively. Then, distributed fuzzy cooperative controllers are designed for individual agents. Via the Lyapunov technique, sufficient conditions are obtained to guarantee the output synchronization of the resulting closed-loop multiagent system. Finally, the viability of proposed design approaches is demonstrated by an example of multiple single-link robot arms.
Collapse
|
6
|
Xue W, Kolaric P, Fan J, Lian B, Chai T, Lewis FL. Inverse Reinforcement Learning in Tracking Control Based on Inverse Optimal Control. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:10570-10581. [PMID: 33877993 DOI: 10.1109/tcyb.2021.3062856] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article provides a novel inverse reinforcement learning (RL) algorithm that learns an unknown performance objective function for tracking control. The algorithm combines three steps: 1) an optimal control update; 2) a gradient descent correction step; and 3) an inverse optimal control (IOC) update. The new algorithm clarifies the relation between inverse RL and IOC. It is shown that the reward weight of an unknown performance objective that generates a target control policy may not be unique. We characterize the set of all weights that generate the same target control policy. We develop a model-based algorithm and, further, two model-free algorithms for systems with unknown model information. Finally, simulation experiments are presented to show the effectiveness of the proposed algorithms.
Collapse
|
7
|
Ma X, Qian F, Zhang S, Wu L, Liu L. Adaptive dual control with online outlier detection for uncertain systems. ISA TRANSACTIONS 2022; 129:157-168. [PMID: 35131093 DOI: 10.1016/j.isatra.2022.01.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Revised: 01/18/2022] [Accepted: 01/18/2022] [Indexed: 06/14/2023]
Abstract
This paper proposes an adaptive dual control with outlier detection that is robust to the occurrence of outliers in uncertain systems. Outliers occasionally exist in system process noise and observation noise, which could cause poor parameter estimation and degraded control performance of uncertain systems. For this reason, we devise an online outlier detection mechanism to filter the outliers so as to enhance the parameter estimation of uncertain systems. The devised mechanism makes decisions on outlier detection via the generated predicted regions where the newly arriving data is expected to locate, and the predicted regions are updated in real-time according to the historical data. The detection mechanism is integrated into the design of adaptive dual control, which is derived based on the bicriterial method. Compared with classical dual control merely considering uncertainty in input and output data stream, we are the first to include the uncontrollable excitations into the structure of dual control to fit practical scenarios, and this inclusion also provides an extensive cover on outliers to be detected. The improved performance of the proposed approach is verified using a mathematical model through one-time simulation and Monte Carlo simulations under different conditions, and we also evaluate our method in the control of fermentation sterilization process for more convincing results.
Collapse
Affiliation(s)
- Xuehui Ma
- School of Automation and Information Engineering, Xi'an University of Technology, China.
| | - Fucai Qian
- School of Automation and Information Engineering, Xi'an University of Technology, China.
| | - Shiliang Zhang
- School of Computer Science and Engineering, Chalmers University of Technology, Sweden.
| | - Li Wu
- School of Automation and Information Engineering, Xi'an University of Technology, China.
| | - Lei Liu
- School of Automation and Information Engineering, Xi'an University of Technology, China.
| |
Collapse
|
8
|
Jiang Y, Zhang K, Wu J, Zhang C, Xue W, Chai T, Lewis FL. H ∞-Based Minimal Energy Adaptive Control With Preset Convergence Rate. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:10078-10088. [PMID: 33750726 DOI: 10.1109/tcyb.2021.3061894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This work studies the H∞ -based minimal energy control with a preset convergence rate (PCR) problem for a class of disturbed linear time-invariant continuous-time systems with matched external disturbance. This problem aims to design an optimal controller so that the energy of the control input satisfies a predetermined requirement. Moreover, the closed-loop system asymptotic stability with PCR is ensured simultaneously. To deal with this problem, a modified game algebraic Riccati equation (MGARE) is proposed, which is different from the game algebraic Riccati equation in the traditional H∞ control problem due to the state cost being lost. Therefore, a unique positive-definite solution of the MGARE is theoretically analyzed with its existing conditions. In addition, based on this formulation, a novel approach is proposed to solve the actuator magnitude saturation problem with the system dynamics being exactly known. To relax the requirement of the knowledge of system dynamics, a model-free policy iteration approach is proposed to compute the solution of this problem. Finally, the effectiveness of the proposed approaches is verified through two simulation examples.
Collapse
|
9
|
Ma X, Qian F, Zhang S, Wu L. Adaptive quantile control for stochastic system. ISA TRANSACTIONS 2022; 123:110-121. [PMID: 34090667 DOI: 10.1016/j.isatra.2021.05.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 05/20/2021] [Accepted: 05/21/2021] [Indexed: 06/12/2023]
Abstract
Adaptive control has been successfully developed in deriving control law for stochastic systems with unknown parameters. The generation of reasonable control law depends on accurate parameter estimation. Recursive least square is widely used to estimate unknown parameters for stochastic systems; however, this approach only fits systems with Gaussian noises. In this paper, the adaptive quantile control is first proposed to cover the case where stochastic system noise follows sharp and thick tail distribution rather than Gaussian distribution. In the proposed approach, the system noise is modeled by the Asymmetric Laplace Distribution, and the unknown parameter is online estimated by our developed Bayesian quantile sum estimator, which combines recursive quantile estimations weighted by Bayesian posterior probabilities. With the real-time estimated parameter, the adaptive quantile control law is constructed based on the certainty equivalence principle. Our proposed estimator and controller are not computationally consuming and can be easily conducted in the Micro Controller Unit to fit practical applications. The comparison with some dominant controllers for the unknown stochastic system is conducted to verify the effectiveness of the adaptive quantile control.
Collapse
Affiliation(s)
- Xuehui Ma
- School of Automation and Information Engineering, Xi'an University of Technology, China.
| | - Fucai Qian
- School of Automation and Information Engineering, Xi'an University of Technology, China.
| | - Shiliang Zhang
- Department of Computer Science and Engineering, Chalmers University of Technology, Sweden.
| | - Li Wu
- School of Automation and Information Engineering, Xi'an University of Technology, China.
| |
Collapse
|