1
|
Wallace BA, Si J. Continuous-Time Reinforcement Learning Control: A Review of Theoretical Results, Insights on Performance, and Needs for New Designs. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:10199-10219. [PMID: 37027747 DOI: 10.1109/tnnls.2023.3245980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
This exposition discusses continuous-time reinforcement learning (CT-RL) for the control of affine nonlinear systems. We review four seminal methods that are the centerpieces of the most recent results on CT-RL control. We survey the theoretical results of the four methods, highlighting their fundamental importance and successes by including discussions on problem formulation, key assumptions, algorithm procedures, and theoretical guarantees. Subsequently, we evaluate the performance of the control designs to provide analyses and insights on the feasibility of these design methods for applications from a control designer's point of view. Through systematic evaluations, we point out when theory diverges from practical controller synthesis. We, furthermore, introduce a new quantitative analytical framework to diagnose the observed discrepancies. Based on the analyses and the insights gained through quantitative evaluations, we point out potential future research directions to unleash the potential of CT-RL control algorithms in addressing the identified challenges.
Collapse
|
2
|
Gao X, Si J, Wen Y, Li M, Huang H. Reinforcement Learning Control of Robotic Knee With Human-in-the-Loop by Flexible Policy Iteration. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:5873-5887. [PMID: 33956634 DOI: 10.1109/tnnls.2021.3071727] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
We are motivated by the real challenges presented in a human-robot system to develop new designs that are efficient at data level and with performance guarantees, such as stability and optimality at system level. Existing approximate/adaptive dynamic programming (ADP) results that consider system performance theoretically are not readily providing practically useful learning control algorithms for this problem, and reinforcement learning (RL) algorithms that address the issue of data efficiency usually do not have performance guarantees for the controlled system. This study fills these important voids by introducing innovative features to the policy iteration algorithm. We introduce flexible policy iteration (FPI), which can flexibly and organically integrate experience replay and supplemental values from prior experience into the RL controller. We show system-level performances, including convergence of the approximate value function, (sub)optimality of the solution, and stability of the system. We demonstrate the effectiveness of the FPI via realistic simulations of the human-robot system. It is noted that the problem we face in this study may be difficult to address by design methods based on classical control theory as it is nearly impossible to obtain a customized mathematical model of a human-robot system either online or offline. The results we have obtained also indicate the great potential of RL control to solving realistic and challenging problems with high-dimensional control inputs.
Collapse
|
3
|
Overvoltage Prevention and Curtailment Reduction Using Adaptive Droop-Based Supplementary Control in Smart Inverters. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11177900] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Recent developments in the renewable energy sector have seen an unprecedented growth in residential photovoltaic (PV) installations. However, high PV penetration levels often lead to overvoltage problems in low-voltage (LV) distribution feeders. Smart inverter control such as active power curtailment (APC)-based overvoltage control can be implemented to overcome these challenges. The APC technique utilizes a constant droop-based approach which curtails power rigidly, which can lead to significant energy curtailment in the LV distribution feeders. In this paper, different variations of the APC technique with linear, quadratic, and exponential droops have been analyzed from the point-of-view of energy curtailment for a LV distribution network in North America. Further, a combinatorial approach using various droop-based APC methods in conjunction with adaptive dynamic programming (ADP) as a supplementary control scheme has also been proposed. The proposed approach minimizes energy curtailment in the LV distribution network by adjusting the droop gains. Simulation results depict that ADP in conjunction with exponential droop reduces the energy curtailment to approximately 50% compared to using the standard linear droop.
Collapse
|
4
|
Khooban MH, Gheisarnejad M. A Novel Deep Reinforcement Learning Controller Based Type-II Fuzzy System: Frequency Regulation in Microgrids. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2021. [DOI: 10.1109/tetci.2020.2964886] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
5
|
Wen Y, Si J, Brandt A, Gao X, Huang HH. Online Reinforcement Learning Control for the Personalization of a Robotic Knee Prosthesis. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:2346-2356. [PMID: 30668514 DOI: 10.1109/tcyb.2019.2890974] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Robotic prostheses deliver greater function than passive prostheses, but we face the challenge of tuning a large number of control parameters in order to personalize the device for individual amputee users. This problem is not easily solved by traditional control designs or the latest robotic technology. Reinforcement learning (RL) is naturally appealing. The recent, unprecedented success of AlphaZero demonstrated RL as a feasible, large-scale problem solver. However, the prosthesis-tuning problem is associated with several unaddressed issues such as that it does not have a known and stable model, the continuous states and controls of the problem may result in a curse of dimensionality, and the human-prosthesis system is constantly subject to measurement noise, environmental change and human-body-caused variations. In this paper, we demonstrated the feasibility of direct heuristic dynamic programming, an approximate dynamic programming (ADP) approach, to automatically tune the 12 robotic knee prosthesis parameters to meet individual human users' needs. We tested the ADP-tuner on two subjects (one able-bodied subject and one amputee subject) walking at a fixed speed on a treadmill. The ADP-tuner learned to reach target gait kinematics in an average of 300 gait cycles or 10 min of walking. We observed improved ADP tuning performance when we transferred a previously learned ADP controller to a new learning session with the same subject. To the best of our knowledge, our approach to personalize robotic prostheses is the first implementation of online ADP learning control to a clinical problem involving human subjects.
Collapse
|
6
|
Liu XK, Jiang H, Wang YW, He H. A Distributed Iterative Learning Framework for DC Microgrids: Current Sharing and Voltage Regulation. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2020. [DOI: 10.1109/tetci.2018.2863747] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
7
|
Strenge L, Schultz P, Kurths J, Raisch J, Hellmann F. A multiplex, multi-timescale model approach for economic and frequency control in power grids. CHAOS (WOODBURY, N.Y.) 2020; 30:033138. [PMID: 32237782 DOI: 10.1063/1.5132335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Accepted: 03/12/2020] [Indexed: 06/11/2023]
Abstract
Power systems are subject to fundamental changes due to the increasing infeed of decentralized renewable energy sources and storage. The decentralized nature of the new actors in the system requires new concepts for structuring the power grid and achieving a wide range of control tasks ranging from seconds to days. Here, we introduce a multiplex dynamical network model covering all control timescales. Crucially, we combine a decentralized, self-organized low-level control and a smart grid layer of devices that can aggregate information from remote sources. The safety-critical task of frequency control is performed by the former and the economic objective of demand matching dispatch by the latter. Having both aspects present in the same model allows us to study the interaction between the layers. Remarkably, we find that adding communication in the form of aggregation does not improve the performance in the cases considered. Instead, the self-organized state of the system already contains the information required to learn the demand structure in the entire grid. The model introduced here is highly flexible and can accommodate a wide range of scenarios relevant to future power grids. We expect that it is especially useful in the context of low-energy microgrids with distributed generation.
Collapse
Affiliation(s)
- Lia Strenge
- Control Systems Group at Technische Universität Berlin, Einsteinufer 17, 10587 Berlin, Germany
| | - Paul Schultz
- Research Department 4 Complexity Science, Potsdam Institute for Climate Impact Research, Telegraphenberg A 31, 14473 Potsdam, Brandenburg, Germany
| | - Jürgen Kurths
- Research Department 4 Complexity Science, Potsdam Institute for Climate Impact Research, Telegraphenberg A 31, 14473 Potsdam, Brandenburg, Germany
| | - Jörg Raisch
- Control Systems Group at Technische Universität Berlin, Einsteinufer 17, 10587 Berlin, Germany
| | - Frank Hellmann
- Research Department 4 Complexity Science, Potsdam Institute for Climate Impact Research, Telegraphenberg A 31, 14473 Potsdam, Brandenburg, Germany
| |
Collapse
|
8
|
Wei C, Luo J, Dai H, Duan G. Learning-Based Adaptive Attitude Control of Spacecraft Formation With Guaranteed Prescribed Performance. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:4004-4016. [PMID: 30072354 DOI: 10.1109/tcyb.2018.2857400] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper investigates a novel leader-following attitude control approach for spacecraft formation under the preassigned two-layer performance with consideration of unknown inertial parameters, external disturbance torque, and unmodeled uncertainty. First, two-layer prescribed performance is preselected for both the attitude angular and angular velocity tracking errors. Subsequently, a distributed two-layer performance controller is devised, which can guarantee that all the involved closed-loop signals are uniformly ultimately bounded. In order to tackle the defect of statically two-layer performance controller, learning-based control strategy is introduced to serve as an adaptive supplementary controller based on adaptive dynamic programming technique. This enhances the adaptiveness of the statically two-layer performance controller with respect to unexpected uncertainty dramatically, without any prior knowledge of the inertial information. Furthermore, by employing the robustly positively invariant theory, the input-to-state stability is rigorously proven under the designed learning-based distributed controller. Finally, two groups of simulation examples are organized to validate the feasibility and effectiveness of the proposed distributed control approach.
Collapse
|
9
|
Online event-triggered adaptive critic design for non-zero-sum games of partially unknown networked systems. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.07.029] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
10
|
Adaptive deep dynamic programming for integrated frequency control of multi-area multi-microgrid systems. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.06.092] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
11
|
|
12
|
Liu YJ, Li S, Tong S, Chen CLP. Adaptive Reinforcement Learning Control Based on Neural Approximation for Nonlinear Discrete-Time Systems With Unknown Nonaffine Dead-Zone Input. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:295-305. [PMID: 29994726 DOI: 10.1109/tnnls.2018.2844165] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, an optimal control algorithm is designed for uncertain nonlinear systems in discrete-time, which are in nonaffine form and with unknown dead-zone. The main contributions of this paper are that an optimal control algorithm is for the first time framed in this paper for nonlinear systems with nonaffine dead-zone, and the adaptive parameter law for dead-zone is calculated by using the gradient rules. The mean value theory is employed to deal with the nonaffine dead-zone input and the implicit function theory based on reinforcement learning is appropriately introduced to find an unknown ideal controller which is approximated by using the action network. Other neural networks are taken as the critic networks to approximate the strategic utility functions. Based on the Lyapunov stability analysis theory, we can prove the stability of systems, i.e., the optimal control laws can guarantee that all the signals in the closed-loop system are bounded and the tracking errors are converged to a small compact set. Finally, two simulation examples demonstrate the effectiveness of the design algorithm.
Collapse
|
13
|
A data-driven online ADP control method for nonlinear system based on policy iteration and nonlinear MIMO decoupling ADRC. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.04.024] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
14
|
Guo W, Si J, Liu F, Mei S. Policy Approximation in Policy Iteration Approximate Dynamic Programming for Discrete-Time Nonlinear Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2794-2807. [PMID: 28600262 DOI: 10.1109/tnnls.2017.2702566] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Policy iteration approximate dynamic programming (DP) is an important algorithm for solving optimal decision and control problems. In this paper, we focus on the problem associated with policy approximation in policy iteration approximate DP for discrete-time nonlinear systems using infinite-horizon undiscounted value functions. Taking policy approximation error into account, we demonstrate asymptotic stability of the control policy under our problem setting, show boundedness of the value function during each policy iteration step, and introduce a new sufficient condition for the value function to converge to a bounded neighborhood of the optimal value function. Aiming for practical implementation of an approximate policy, we consider using Volterra series, which has been extensively covered in controls literature for its good theoretical properties and for its success in practical applications. We illustrate the effectiveness of the main ideas developed in this paper using several examples including a practical problem of excitation control of a hydrogenerator.
Collapse
|
15
|
General value iteration based reinforcement learning for solving optimal tracking control problem of continuous–time affine nonlinear systems. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.03.038] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|