1
|
Peng G, Chen CLP, Yang C. Robust Admittance Control of Optimized Robot-Environment Interaction Using Reference Adaptation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:5804-5815. [PMID: 34982696 DOI: 10.1109/tnnls.2021.3131261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this article, a robust control scheme is proposed for robots to achieve an optimal performance in the process of interacting with external forces from environments. The environmental dynamics are defined as a linear model, and the interaction performance is evaluated by a defined cost function, which is composed of trajectory errors and force regulation. Based on admittance control, the reference adaptation method is used to minimize the cost function and achieve the optimal interaction performance. To make the trajectory tracking controller robust to the unknown disturbance of internal system dynamics, an auxiliary system is defined and the approximation optimal controller is designed. Experiments on the Baxter robot are conducted to verify the effectiveness of the proposed method.
Collapse
|
2
|
Wei Y, Yu X, Feng Y, Chen Q, Ou L, Zhou L. Event-triggered adaptive optimal tracking control for nonlinear stochastic systems with dynamic state constraints. ISA TRANSACTIONS 2023; 139:60-70. [PMID: 37076372 DOI: 10.1016/j.isatra.2023.04.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Revised: 02/15/2023] [Accepted: 04/07/2023] [Indexed: 05/03/2023]
Abstract
This paper investigates the issue of event-triggered adaptive optimal tracking control for uncertain nonlinear systems with stochastic disturbances and dynamic state constraints. To handle the dynamic state constraints, a novel unified tangent-type nonlinear mapping function is proposed. A neural networks (NNs)-based identifier is designed to cope with the stochastic disturbances. By utilizing adaptive dynamic programming (ADP) of identifier-actor-critic architecture and event triggering mechanism, the adaptive optimized event-triggered control (ETC) approach for the nonlinear stochastic system is first proposed. It is proven that the designed optimized ETC approach guarantees the robustness of the stochastic systems and the semi-globally uniformly ultimately bounded in the mean square of the NNs adaptive estimation error, and the Zeno behavior can be avoided. Simulations are offered to illustrate the effectiveness of the proposed control approach.
Collapse
Affiliation(s)
- Yan Wei
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 30032, China
| | - Xinyi Yu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 30032, China
| | - Yu Feng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 30032, China
| | - Qiang Chen
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 30032, China
| | - Linlin Ou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 30032, China.
| | - Libo Zhou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 30032, China
| |
Collapse
|
3
|
Li G, Ma X, Li Z, Guo P, Li Y. Kinematic coupling‐based trajectory planning for rotary crane system with double‐pendulum effects and output constraints. J FIELD ROBOT 2022. [DOI: 10.1002/rob.22130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Gang Li
- School of Control Science and Engineering Shandong University Jinan China
| | - Xin Ma
- School of Control Science and Engineering Shandong University Jinan China
| | - Zhi Li
- School of Control Science and Engineering Shandong University Jinan China
| | - Peijun Guo
- Research and Development Department Shandong Offshore Research Institute Co., Ltd. Qingdao China
| | - Yibin Li
- School of Control Science and Engineering Shandong University Jinan China
| |
Collapse
|
4
|
Safe Reinforcement Learning for Affine Nonlinear Systems with State Constraints and Input Saturation Using Control Barrier Functions. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
5
|
Peng G, Chen CLP, Yang C. Neural Networks Enhanced Optimal Admittance Control of Robot-Environment Interaction Using Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:4551-4561. [PMID: 33651696 DOI: 10.1109/tnnls.2021.3057958] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
In this paper, an adaptive admittance control scheme is developed for robots to interact with time-varying environments. Admittance control is adopted to achieve a compliant physical robot-environment interaction, and the uncertain environment with time-varying dynamics is defined as a linear system. A critic learning method is used to obtain the desired admittance parameters based on the cost function composed of interaction force and trajectory tracking without the knowledge of the environmental dynamics. To deal with dynamic uncertainties in the control system, a neural-network (NN)-based adaptive controller with a dynamic learning framework is developed to guarantee the trajectory tracking performance. Experiments are conducted and the results have verified the effectiveness of the proposed method.
Collapse
|
6
|
Yao S, Liu X, Zhang Y, Cui Z. An approach to solving optimal control problems of nonlinear systems by introducing detail-reward mechanism in deep reinforcement learning. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:9258-9290. [PMID: 35942758 DOI: 10.3934/mbe.2022430] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In recent years, dynamic programming and reinforcement learning theory have been widely used to solve the nonlinear control system (NCS). Among them, many achievements have been made in the construction of network model and system stability analysis, but there is little research on establishing control strategy based on the detailed requirements of control process. Spurred by this trend, this paper proposes a detail-reward mechanism (DRM) by constructing the reward function composed of the individual detail evaluation functions in order to replace the utility function in the Hamilton-Jacobi-Bellman (HJB) equation. And this method is introduced into a wider range of deep reinforcement learning algorithms to solve optimization problems in NCS. After the mathematical description of the relevant characteristics of NCS, the stability of iterative control law is proved by Lyapunov function. With the inverted pendulum system as the experiment object, the dynamic environment is designed and the reward function is established by using the DRM. Finally, three deep reinforcement learning algorithm models are designed in the dynamic environment, which are based on Deep Q-Networks, policy gradient and actor-critic. The effects of different reward functions on the experimental accuracy are compared. The experimental results show that in NCS, using the DRM to replace the utility function in the HJB equation is more in line with the detailed requirements of the designer for the whole control process. By observing the characteristics of the system, designing the reward function and selecting the appropriate deep reinforcement learning algorithm model, the optimization problem of NCS can be solved.
Collapse
Affiliation(s)
- Shixuan Yao
- School of Software Engineering, Dalian University of Foreign Languages, Dalian 116044, China
| | - Xiaochen Liu
- School of Mechanical Engineering, Dalian Jiaotong University, Dalian 116028, China
| | - Yinghui Zhang
- School of Mechanical Engineering, Dalian Jiaotong University, Dalian 116028, China
| | - Ze Cui
- School of Control Science and Engineering, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
7
|
Toward reliable designs of data-driven reinforcement learning tracking control for Euler–Lagrange systems. Neural Netw 2022; 153:564-575. [DOI: 10.1016/j.neunet.2022.05.017] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2021] [Revised: 04/21/2022] [Accepted: 05/17/2022] [Indexed: 11/23/2022]
|
8
|
Duan J, Liu Z, Li SE, Sun Q, Jia Z, Cheng B. Adaptive dynamic programming for nonaffine nonlinear optimal control problem with state constraints. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.04.134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
9
|
Chen Q, Jin Y, Song Y. Fault-tolerant adaptive tracking control of Euler-Lagrange systems – An echo state network approach driven by reinforcement learning. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.10.083] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
10
|
Wei Q, Han L, Zhang T. Spiking Adaptive Dynamic Programming Based on Poisson Process for Discrete-Time Nonlinear Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:1846-1856. [PMID: 34143743 DOI: 10.1109/tnnls.2021.3085781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this article, a new iterative spiking adaptive dynamic programming (SADP) method based on the Poisson process is developed to solve optimal impulsive control problems. For a fixed time interval, combining the Poisson process and the maximum likelihood estimation (MLE), the three-tuple of state, spiking interval, and probability of Poisson distribution can be computed, and then, the iterative value functions and iterative control laws can be obtained. A property analysis method is developed to show that the value functions converge to optimal performance index function as the iterative index increases from zero to infinity. Finally, two simulation examples are given to verify the effectiveness of the developed algorithm.
Collapse
|
11
|
Ouyang Y, Dong L, Sun C. Critic Learning-Based Control for Robotic Manipulators With Prescribed Constraints. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:2274-2283. [PMID: 32649288 DOI: 10.1109/tcyb.2020.3003550] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this article, the optimal control problem for robotic manipulators (RMs) with prescribed constraints is addressed. Considering the environmental conditions and requirements of practical applications, prescribed constraints are imposed on the system states to guarantee the control performance and normal operation of the robotic system. Accordingly, an error transformation function is adopted to cope with the prescribed constraints and generate an equivalent unconstrained error for the convenience of the intelligent control design. In order to improve the learning ability and optimize the control performance, critic learning (CL) is introduced to the control design of the constrained RM based on the transformed equivalent unconstrained system. In addition, the stability analysis is given to illustrate the feasibility of the proposed CL-based control. Finally, simulations are conducted on a two-degree-of-freedom (DOF)-constrained RM to further validate the effectiveness of the proposed controller.
Collapse
|
12
|
Yang Y, Fan X, Xu C, Wu J, Sun B. State consensus cooperative control for a class of nonlinear multi-agent systems with output constraints via ADP approach. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.05.046] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
13
|
Yang Y, Vamvoudakis KG, Modares H, Yin Y, Wunsch DC. Safe Intermittent Reinforcement Learning With Static and Dynamic Event Generators. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:5441-5455. [PMID: 32054590 DOI: 10.1109/tnnls.2020.2967871] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, we present an intermittent framework for safe reinforcement learning (RL) algorithms. First, we develop a barrier function-based system transformation to impose state constraints while converting the original problem to an unconstrained optimization problem. Second, based on optimal derived policies, two types of intermittent feedback RL algorithms are presented, namely, a static and a dynamic one. We finally leverage an actor/critic structure to solve the problem online while guaranteeing optimality, stability, and safety. Simulation results show the efficacy of the proposed approach.
Collapse
|
14
|
Wei Q, Wang L, Liu Y, Polycarpou MM. Optimal Elevator Group Control via Deep Asynchronous Actor-Critic Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:5245-5256. [PMID: 32071000 DOI: 10.1109/tnnls.2020.2965208] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, a new deep reinforcement learning (RL) method, called asynchronous advantage actor-critic (A3C) method, is developed to solve the optimal control problem of elevator group control systems (EGCSs). The main contribution of this article is that the optimal control law of EGCSs is designed via a new deep RL method, such that the elevator system sends passengers to the desired destination floors as soon as possible. Deep convolutional and recurrent neural networks, which can update themselves during applications, are designed to dispatch elevators. Then, the structure of the A3C method is developed, and the training phase for the learning optimal law is discussed. Finally, simulation results illustrate that the developed method effectively reduces the average waiting time in a complex building environment. Comparisons with traditional algorithms further verify the effectiveness of the developed method.
Collapse
|
15
|
Xu W, Liu X, Wang H, Zhou Y. Event-based optimal output-feedback control of nonlinear discrete-time systems. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.05.098] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
16
|
Lan X, Liu Y, Zhao Z. Cooperative control for swarming systems based on reinforcement learning in unknown dynamic environment. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.06.038] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
17
|
On Stability of Perturbed Nonlinear Switched Systems with Adaptive Reinforcement Learning. ENERGIES 2020. [DOI: 10.3390/en13195069] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In this paper, a tracking control approach is developed based on an adaptive reinforcement learning algorithm with a bounded cost function for perturbed nonlinear switched systems, which represent a useful framework for modelling these converters, such as DC–DC converter, multi-level converter, etc. An optimal control method is derived for nominal systems to solve the tracking control problem, which results in solving a Hamilton–Jacobi–Bellman (HJB) equation. It is shown that the optimal controller obtained by solving the HJB equation can stabilize the perturbed nonlinear switched systems. To develop a solution to the translated HJB equation, the proposed neural networks consider the training technique obtaining the minimization of square of Bellman residual error in critic term due to the description of Hamilton function. Theoretical analysis shows that all the closed-loop system signals are uniformly ultimately bounded (UUB) and the proposed controller converges to optimal control law. The simulation results of two situations demonstrate the effectiveness of the proposed controller.
Collapse
|
18
|
Zheng Z, Ruan L, Zhu M, Guo X. Reinforcement learning control for underactuated surface vessel with output error constraints and uncertainties. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.03.021] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
19
|
Li J, Yang Q, Fan B, Sun Y. Robust State/Output-Feedback Control of Coaxial-Rotor MAVs Based on Adaptive NN Approach. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:3547-3557. [PMID: 31095501 DOI: 10.1109/tnnls.2019.2911649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The coaxial-rotor micro-aerial vehicles (CRMAVs) have been proven to be a powerful tool in forming small and agile manned-unmanned hybrid applications. However, the operation of them is usually subject to unpredictable time-varying aerodynamic disturbances and model uncertainties. In this paper, an adaptive robust controller based on a neural network (NN) approach is proposed to reject such perturbations and track both the desired position and orientation trajectories. A complete dynamic model of a CRMAV is first constructed. When all system states are assumed to be available, an NN-based state-feedback controller is proposed through feedback linearization and Lyapunov analysis. Furthermore, to overcome the practical challenge that certain states are not measurable, a high-gain observer is introduced to estimate the unavailable states, and then, an output-feedback controller is developed. Rigorous theoretical analysis verifies the stability of the entire closed-loop system. In addition, extensive simulation studies are conducted to validate the feasibility of the proposed scheme.
Collapse
|
20
|
Luo B, Yang Y, Liu D. Adaptive -Learning for Data-Based Optimal Output Regulation With Experience Replay. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:3337-3348. [PMID: 29994038 DOI: 10.1109/tcyb.2018.2821369] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, the data-based optimal output regulation problem of discrete-time systems is investigated. An off-policy adaptive -learning (QL) method is developed by using real system data without requiring the knowledge of system dynamics and the mathematical model of utility function. By introducing the -function, an off-policy adaptive QL algorithm is developed to learn the optimal -function. An adaptive parameter in the policy evaluation is used to achieve tradeoff between the current and future -functions. The convergence of adaptive QL algorithm is proved and the influence of the adaptive parameter is analyzed. To realize the adaptive QL algorithm with real system data, the actor-critic neural network (NN) structure is developed. The least-squares scheme and the batch gradient descent method are developed to update the critic and actor NN weights, respectively. The experience replay technique is employed in the learning process, which leads to simple and convenient implementation of the adaptive QL method. Finally, the effectiveness of the developed adaptive QL method is verified through numerical simulations.
Collapse
|