1
|
Sun T, Yang J, Pan Y, Yu H. Repetitive Impedance Learning-Based Physically Human-Robot Interactive Control. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:10629-10638. [PMID: 37027552 DOI: 10.1109/tnnls.2023.3243091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Model-based impedance learning control can provide variable impedance regulation for robots through online impedance learning without interaction force sensing. However, the existing related results only guarantee the closed-loop control systems to be uniformly ultimately bounded (UUB) and require the human impedance profiles being periodic, iteration-dependent, or slowly varying. In this article, a repetitive impedance learning control approach is proposed for physical human-robot interaction (PHRI) in repetitive tasks. The proposed control is composed of a proportional-differential (PD) control term, an adaptive control term, and a repetitive impedance learning term. Differential adaptation with projection modification is designed for estimating robotic parameters uncertainties in the time domain, while fully saturated repetitive learning is proposed for estimating time-varying human impedance uncertainties in the iterative domain. Uniform convergence of tracking errors is guaranteed by the PD control and the use of projection and full saturation in the uncertainties estimation and is theoretically proved based on a Lyapunov-like analysis. In impedance profiles, the stiffness and damping are composed of an iteration-independent term and an iteration- dependent disturbance, which are estimated by repetitive learning and compressed by the PD control, respectively. Therefore, the developed approach can be applied to the PHRI where iteration-dependent disturbances exist in the stiffness and damping. The control effectiveness and advantages are validated by simulations on a parallel robot in a repetitive following task.
Collapse
|
2
|
Diaz MA, Vos M, Dillen A, Tassignon B, Flynn L, Geeroms J, Meeusen R, Verstraten T, Babic J, Beckerle P, De Pauw K. Human-in-the-Loop Optimization of Wearable Robotic Devices to Improve Human-Robot Interaction: A Systematic Review. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:7483-7496. [PMID: 37015459 DOI: 10.1109/tcyb.2022.3224895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
This article presents a systematic review on wearable robotic devices that use human-in-the-loop optimization (HILO) strategies to improve human-robot interaction. A total of 46 HILO studies were identified and divided into upper and lower limb robotic devices. The main aspects from HILO were identified, reviewed, and classified in four areas: 1) human-machine systems; 2) optimization methods; 3) control strategies; and 4) experimental protocols. A variety of objective functions (physiological, biomechanical, and subjective), optimization strategies, and optimized control parameters configurations used in different control strategies are presented and analyzed. An overview of experimental protocols is provided, including metrics, tasks, and conditions tested. Moreover, the relevance given to training or adaptation periods was explored. We outline an HILO framework that includes current wearable robots, optimization strategies, objective functions, control strategies, and experimental protocols. We conclude by highlighting current research gaps and defining future directions to improve the development of advanced HILO strategies in upper and lower limb wearable robots.
Collapse
|
3
|
Gu S, Kshirsagar A, Du Y, Chen G, Peters J, Knoll A. A human-centered safe robot reinforcement learning framework with interactive behaviors. Front Neurorobot 2023; 17:1280341. [PMID: 38023448 PMCID: PMC10665848 DOI: 10.3389/fnbot.2023.1280341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Accepted: 10/18/2023] [Indexed: 12/01/2023] Open
Abstract
Deployment of Reinforcement Learning (RL) algorithms for robotics applications in the real world requires ensuring the safety of the robot and its environment. Safe Robot RL (SRRL) is a crucial step toward achieving human-robot coexistence. In this paper, we envision a human-centered SRRL framework consisting of three stages: safe exploration, safety value alignment, and safe collaboration. We examine the research gaps in these areas and propose to leverage interactive behaviors for SRRL. Interactive behaviors enable bi-directional information transfer between humans and robots, such as conversational robot ChatGPT. We argue that interactive behaviors need further attention from the SRRL community. We discuss four open challenges related to the robustness, efficiency, transparency, and adaptability of SRRL with interactive behaviors.
Collapse
Affiliation(s)
- Shangding Gu
- Department of Computer Science, Technical University of Munich, Munich, Germany
| | - Alap Kshirsagar
- Department of Computer Science, Technical University of Darmstadt, Darmstadt, Germany
| | - Yali Du
- Department of Informatics, King's College London, London, United Kingdom
| | - Guang Chen
- College of Electronic and Information Engineering, Tongji University, Shanghai, China
| | - Jan Peters
- Department of Computer Science, Technical University of Darmstadt, Darmstadt, Germany
| | - Alois Knoll
- Department of Computer Science, Technical University of Munich, Munich, Germany
| |
Collapse
|
4
|
Yang J, Sun T, Yang H. Spatial hybrid adaptive impedance learning control for robots in repetitive interactive tasks. ISA TRANSACTIONS 2023; 138:151-159. [PMID: 36828703 DOI: 10.1016/j.isatra.2023.02.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 02/14/2023] [Accepted: 02/14/2023] [Indexed: 06/16/2023]
Abstract
The existing model-based impedance learning control methods can provide variable impedance regulation for physical human-robot interaction (PHRI) in repetitive tasks without interactive force sensing, however, these methods require the completion of the repetitive tasks with constant time, which restricts their applications. For PHRI in repetitive tasks with different completion time, this paper proposes a spatial hybrid adaptive impedance learning control (SHAILC) strategy by using the spatial periodic characteristics of the tasks. In the spatial hybrid adaptation, spatial periodic adaptation is used for estimating time-varying human impedance and differential adaptation is designed for estimating robotic constant unknown parameters. The use of deadzone modifications in hybrid adaptation maintains the accuracy of the parameter estimation when the tracking error is small relative to the modeling error. The control stability is analyzed by a Lyapunov-based analysis in the spatial domain, and the control effectiveness and superiority is illustrated on a parallel robot in repetitive tasks with different task completion time.
Collapse
Affiliation(s)
- Jiantao Yang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Tairen Sun
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Hongjun Yang
- State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China.
| |
Collapse
|
5
|
Qin L, Ji H, Chen M, Wang K. A Self-Coordinating Controller with Balance-Guiding Ability for Lower-Limb Rehabilitation Exoskeleton Robot. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23115311. [PMID: 37300038 DOI: 10.3390/s23115311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 05/26/2023] [Accepted: 05/30/2023] [Indexed: 06/12/2023]
Abstract
The restricted posture and unrestricted compliance brought by the controller during human-exoskeleton interaction (HEI) can cause patients to lose balance or even fall. In this article, a self-coordinated velocity vector (SCVV) double-layer controller with balance-guiding ability was developed for a lower-limb rehabilitation exoskeleton robot (LLRER). In the outer loop, an adaptive trajectory generator that follows the gait cycle was devised to generate a harmonious hip-knee reference trajectory on the non-time-varying (NTV) phase space. In the inner loop, velocity control was adopted. By searching the minimum L2 norm between the reference phase trajectory and the current configuration, the desired velocity vectors in which encouraged and corrected effects can be self-coordinated according to the L2 norm were obtained. In addition, the controller was simulated using an electromechanical coupling model, and relevant experiments were carried out with a self-developed exoskeleton device. Both simulations and experiments validated the effectiveness of the controller.
Collapse
Affiliation(s)
- Li Qin
- School of Electrical Engineering, Yanshan University, Qinhuangdao 066012, China
| | - Houzhao Ji
- School of Electrical Engineering, Yanshan University, Qinhuangdao 066012, China
| | - Minghao Chen
- School of Electrical Engineering, Yanshan University, Qinhuangdao 066012, China
| | - Ke Wang
- School of Electrical Engineering, Yanshan University, Qinhuangdao 066012, China
| |
Collapse
|
6
|
Yang R, Zheng J, Song R. Continuous mode adaptation for cable-driven rehabilitation robot using reinforcement learning. Front Neurorobot 2022; 16:1068706. [PMID: 36620486 PMCID: PMC9813438 DOI: 10.3389/fnbot.2022.1068706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 11/28/2022] [Indexed: 12/24/2022] Open
Abstract
Continuous mode adaptation is very important and useful to satisfy the different user rehabilitation needs and improve human-robot interaction (HRI) performance for rehabilitation robots. Hence, we propose a reinforcement-learning-based optimal admittance control (RLOAC) strategy for a cable-driven rehabilitation robot (CDRR), which can realize continuous mode adaptation between passive and active working mode. To obviate the requirement of the knowledge of human and robot dynamics model, a reinforcement learning algorithm was employed to obtain the optimal admittance parameters by minimizing a cost function composed of trajectory error and human voluntary force. Secondly, the contribution weights of the cost function were modulated according to the human voluntary force, which enabled the CDRR to achieve continuous mode adaptation between passive and active working mode. Finally, simulation and experiments were conducted with 10 subjects to investigate the feasibility and effectiveness of the RLOAC strategy. The experimental results indicated that the desired performances could be obtained; further, the tracking error and energy per unit distance of the RLOAC strategy were notably lower than those of the traditional admittance control method. The RLOAC strategy is effective in improving the tracking accuracy and robot compliance. Based on its performance, we believe that the proposed RLOAC strategy has potential for use in rehabilitation robots.
Collapse
Affiliation(s)
- Renyu Yang
- Key Laboratory of Sensing Technology and Biomedical Instrument of Guangdong Province, School of Biomedical Engineering, Sun Yat-sen University, Guangzhou, China,School of Biomedical Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, China
| | - Jianlin Zheng
- Key Laboratory of Sensing Technology and Biomedical Instrument of Guangdong Province, School of Biomedical Engineering, Sun Yat-sen University, Guangzhou, China,School of Biomedical Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, China
| | - Rong Song
- Key Laboratory of Sensing Technology and Biomedical Instrument of Guangdong Province, School of Biomedical Engineering, Sun Yat-sen University, Guangzhou, China,School of Biomedical Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, China,*Correspondence: Rong Song,
| |
Collapse
|
7
|
Li J, Ma Y, Gao R, Cao Z, Lim A, Song W, Zhang J. Deep Reinforcement Learning for Solving the Heterogeneous Capacitated Vehicle Routing Problem. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:13572-13585. [PMID: 34554923 DOI: 10.1109/tcyb.2021.3111082] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Existing deep reinforcement learning (DRL)-based methods for solving the capacitated vehicle routing problem (CVRP) intrinsically cope with a homogeneous vehicle fleet, in which the fleet is assumed as repetitions of a single vehicle. Hence, their key to construct a solution solely lies in the selection of the next node (customer) to visit excluding the selection of vehicle. However, vehicles in real-world scenarios are likely to be heterogeneous with different characteristics that affect their capacity (or travel speed), rendering existing DRL methods less effective. In this article, we tackle heterogeneous CVRP (HCVRP), where vehicles are mainly characterized by different capacities. We consider both min-max and min-sum objectives for HCVRP, which aim to minimize the longest or total travel time of the vehicle(s) in the fleet. To solve those problems, we propose a DRL method based on the attention mechanism with a vehicle selection decoder accounting for the heterogeneous fleet constraint and a node selection decoder accounting for the route construction, which learns to construct a solution by automatically selecting both a vehicle and a node for this vehicle at each step. Experimental results based on randomly generated instances show that, with desirable generalization to various problem sizes, our method outperforms the state-of-the-art DRL method and most of the conventional heuristics, and also delivers competitive performance against the state-of-the-art heuristic method, that is, slack induction by string removal. In addition, the results of extended experiments demonstrate that our method is also able to solve CVRPLib instances with satisfactory performance.
Collapse
|
8
|
Incorporating rivalry in reinforcement learning for a competitive game. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07746-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
AbstractRecent advances in reinforcement learning with social agents have allowed such models to achieve human-level performance on certain interaction tasks. However, most interactive scenarios do not have performance alone as an end-goal; instead, the social impact of these agents when interacting with humans is as important and largely unexplored. In this regard, this work proposes a novel reinforcement learning mechanism based on the social impact of rivalry behavior. Our proposed model aggregates objective and social perception mechanisms to derive a rivalry score that is used to modulate the learning of artificial agents. To investigate our proposed model, we design an interactive game scenario, using the Chef’s Hat Card Game, and examine how the rivalry modulation changes the agent’s playing style, and how this impacts the experience of human players on the game. Our results show that humans can detect specific social characteristics when playing against rival agents when compared to common agents, which affects directly the performance of the human players in subsequent games. We conclude our work by discussing how the different social and objective features that compose the artificial rivalry score contribute to our results.
Collapse
|
9
|
Hu B, Guan ZH, Chen G, Chen CLP. Neuroscience and Network Dynamics Toward Brain-Inspired Intelligence. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:10214-10227. [PMID: 33909581 DOI: 10.1109/tcyb.2021.3071110] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article surveys the interdisciplinary research of neuroscience, network science, and dynamic systems, with emphasis on the emergence of brain-inspired intelligence. To replicate brain intelligence, a practical way is to reconstruct cortical networks with dynamic activities that nourish the brain functions, instead of using only artificial computing networks. The survey provides a complex network and spatiotemporal dynamics (abbr. network dynamics) perspective for understanding the brain and cortical networks and, furthermore, develops integrated approaches of neuroscience and network dynamics toward building brain-inspired intelligence with learning and resilience functions. Presented are fundamental concepts and principles of complex networks, neuroscience, and hybrid dynamic systems, as well as relevant studies about the brain and intelligence. Other promising research directions, such as brain science, data science, quantum information science, and machine behavior are also briefly discussed toward future applications.
Collapse
|
10
|
Asymmetric constrained control scheme design with discrete output feedback in unknown robot–environment interaction system. ROBOTICA 2022. [DOI: 10.1017/s0263574722001138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Abstract
In this paper, an overall structure with the asymmetric constrained controller is constructed for human–robot interaction in uncertain environments. The control structure consists of two decoupling loops. In the outer loop, a discrete output feedback adaptive dynamics programing (OPFB ADP) algorithm is proposed to deal with the problems of unknown environment dynamic and unobservable environment position. Besides, a discount factor is added to the discrete OPFB ADP algorithm to improve the convergence speed. In the inner loop, a constrained controller is developed on the basis of asymmetric barrier Lyapunov function, and a neural network method is applied to approximate the dynamic characteristics of the uncertain system model. By utilizing this controller, the robot can track the prescribed trajectory precisely within a security boundary. Simulation and experimental results demonstrate the effectiveness of the proposed controller.
Collapse
|
11
|
Cognitive Learning and Robotics: Innovative Teaching for Inclusivity. MULTIMODAL TECHNOLOGIES AND INTERACTION 2022. [DOI: 10.3390/mti6080065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
We present the interdisciplinary CoWriting Kazakh project in which a social robot acts as a peer in learning the new Kazakh Latin alphabet, to which Kazakhstan is going to shift from the current Kazakh Cyrillic by 2030. We discuss the past literature on cognitive learning and script acquisition in-depth and present a theoretical framing for this study. The results of word and letter analyses from two user studies conducted between 2019 and 2020 are presented. Learning the new alphabet through Kazakh words with two or more syllables and special native letters resulted in significant learning gains. These results suggest that reciprocal Cyrillic-to-Latin script learning results in considerable cognitive benefits due to mental conversion, word choice, and handwriting practices. Overall, this system enables school-age children to practice the new Kazakh Latin script in an engaging learning scenario. The proposed theoretical framework illuminates the understanding of teaching and learning within the multimodal robot-assisted script learning scenario and beyond its scope.
Collapse
|
12
|
Wu HN, Zhang XM, Li RG. Synthesis With Guaranteed Cost and Less Human Intervention for Human-in-the-Loop Control Systems. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:7541-7551. [PMID: 33417574 DOI: 10.1109/tcyb.2020.3041033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article studies the problem of synthesis with guaranteed cost and less human intervention for linear human-in-the-loop (HiTL) control systems. Initially, the human behaviors are modeled via a hidden controlled Markov process, which not only considers the inference's stochasticity and observation's uncertainty of the human internal state but also takes the control input to human into account. Then, to integrate both models of human and machine as well as their interaction, a hidden controlled Markov jump system (HCMJS) is constructed. With the aid of the stochastic Lyapunov functional together with the bilinear matrix inequality technique, a sufficient condition for the existence of human-assistance controllers is derived on the basis of the HCMJS model, which not only guarantees the stochastic stability of the closed-loop HiTL system but also provides a prescribed upper bound for the quadratic cost function. Moreover, to achieve less human intervention while meeting the desired cost level, an algorithm that mixes the particle swarm optimization and linear matrix inequality technique is proposed to seek a suitable feedback control law to the human and a human-assistance control law to the machine. Finally, the proposed method is applied to a driver-assistance system to verify its effectiveness.
Collapse
|
13
|
Kobayashi T. Optimistic reinforcement learning by forward Kullback–Leibler divergence optimization. Neural Netw 2022; 152:169-180. [DOI: 10.1016/j.neunet.2022.04.021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 04/06/2022] [Accepted: 04/16/2022] [Indexed: 11/25/2022]
|
14
|
Sharifi M, Zakerimanesh A, Mehr JK, Torabi A, Mushahwar VK, Tavakoli M. Impedance Variation and Learning Strategies in Human-Robot Interaction. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:6462-6475. [PMID: 33449901 DOI: 10.1109/tcyb.2020.3043798] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
In this survey, various concepts and methodologies developed over the past two decades for varying and learning the impedance or admittance of robotic systems that physically interact with humans are explored. For this purpose, the assumptions and mathematical formulations for the online adjustment of impedance models and controllers for physical human-robot interaction (HRI) are categorized and compared. In this systematic review, studies on: 1) variation and 2) learning of appropriate impedance elements are taken into account. These strategies are classified and described in terms of their objectives, points of view (approaches), and signal requirements (including position, HRI force, and electromyography activity). Different methods involving linear/nonlinear analyses (e.g., optimal control design and nonlinear Lyapunov-based stability guarantee) and the Gaussian approximation algorithms (e.g., Gaussian mixture model-based and dynamic movement primitives-based strategies) are reviewed. Current challenges and research trends in physical HRI are finally discussed.
Collapse
|
15
|
Yang J, Sun T. Finite-Time Interactive Control of Robots with Multiple Interaction Modes. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22103668. [PMID: 35632080 PMCID: PMC9147656 DOI: 10.3390/s22103668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 05/08/2022] [Accepted: 05/10/2022] [Indexed: 05/14/2023]
Abstract
This paper proposes a finite-time multi-modal robotic control strategy for physical human-robot interaction. The proposed multi-modal controller consists of a modified super-twisting-based finite-time control term that is designed in each interaction mode and a continuity-guaranteed control term. The finite-time control term guarantees finite-time achievement of the desired impedance dynamics in active interaction mode (AIM), makes the tracking error of the reference trajectory converge to zero in finite time in passive interaction mode (PIM), and also guarantees robotic motion stop in finite time in safety-stop mode (SSM). Meanwhile, the continuity-guaranteed control term guarantees control input continuity and steady interaction modes transition. The finite-time closed-loop control stability and the control effectiveness is validated by Lyapunov-based theoretical analysis and simulations on a robot manipulator.
Collapse
|
16
|
Lv P, Wang X, Cheng Y, Duan Z, Chen CLP. Integrated Double Estimator Architecture for Reinforcement Learning. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:3111-3122. [PMID: 33027028 DOI: 10.1109/tcyb.2020.3023033] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Estimation bias is an important index for evaluating the performance of reinforcement learning (RL) algorithms. The popular RL algorithms, such as Q -learning and deep Q -network (DQN), often suffer overestimation due to the maximum operation in estimating the maximum expected action values of the next states, while double Q -learning (DQ) and double DQN may fall into underestimation by using a double estimator (DE) to avoid overestimation. To keep the balance between overestimation and underestimation, we propose a novel integrated DE (IDE) architecture by combining the maximum operation and DE operation to estimate the maximum expected action value. Based on IDE, two RL algorithms: 1) integrated DQ (IDQ) and 2) its deep network version, that is, integrated double DQN (IDDQN), are proposed. The main idea of the proposed RL algorithms is that the maximum and DE operations are integrated to eliminate the estimation bias, where one estimator is stochastically used to perform action selection based on the maximum operation, and the convex combination of two estimators is used to carry out action evaluation. We theoretically analyze the reason of estimation bias caused by using nonmaximum operation to estimate the maximum expected value and investigate the possible reasons of underestimation existence in DQ. We also prove the unbiasedness of IDE and convergence of IDQ. Experiments on the grid world and Atari 2600 games indicate that IDQ and IDDQN can reduce or even eliminate estimation bias effectively, enable the learning to be more stable and balanced, and improve the performance effectively.
Collapse
|
17
|
Jin Z, Liu A, Zhang WA, Yu L. An Optimal Variable Impedance Control With Consideration of the Stability. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3141759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
18
|
Zhang L, Zhang R, Wu T, Weng R, Han M, Zhao Y. Safe Reinforcement Learning With Stability Guarantee for Motion Planning of Autonomous Vehicles. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:5435-5444. [PMID: 34242172 DOI: 10.1109/tnnls.2021.3084685] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Reinforcement learning with safety constraints is promising for autonomous vehicles, of which various failures may result in disastrous losses. In general, a safe policy is trained by constrained optimization algorithms, in which the average constraint return as a function of states and actions should be lower than a predefined bound. However, most existing safe learning-based algorithms capture states via multiple high-precision sensors, which complicates the hardware systems and is power-consuming. This article is focused on safe motion planning with the stability guarantee for autonomous vehicles with limited size and power. To this end, the risk-identification method and the Lyapunov function are integrated with the well-known soft actor-critic (SAC) algorithm. By borrowing the concept of Lyapunov functions in the control theory, the learned policy can theoretically guarantee that the state trajectory always stays in a safe area. A novel risk-sensitive learning-based algorithm with the stability guarantee is proposed to train policies for the motion planning of autonomous vehicles. The learned policy is implemented on a differential drive vehicle in a simulation environment. The experimental results show that the proposed algorithm achieves a higher success rate than the SAC.
Collapse
|
19
|
Bahrami V, Kalhor A, Masouleh MT. Dynamic model estimating and designing controller for the 2-DoF planar robot in interaction with cable-driven robot based on adaptive neural network. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-210180] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
This study intends to investigate the dynamic model estimation and the design of an adaptive neural network based controller for a passive planar robot, performing 2-DoF motion pattern which is in interaction with an actuated cable-driven robot. In fact, the main goal of applying this structure is to use a number of light cables to drive serial robot links and track the desired reference model by the robot’s end-effector. The under study system can be used as a rehabilitation setup which is helpful for those with arm disability. In this way, upon applying sliding mode error dynamics, it is necessary to determine a vector that contains the matrices related to the robot dynamics. However, finding these matrices requires the use of computational approaches such as Newton-Euler or Lagrange. In addition, since the purpose of this paper is to express comprehensive methods, so with increasing the number of links and degrees of freedom of the robot, finding the dynamics of the robot becomes more difficult. Therefore, the Adaptive Neural Network (ANN) with specific inputs has been used for estimation unknown matrices of the system and the controller design has been performed based on it. So, the main idea in using an adaptive controller is the fact there is no pre-knowledge for the dynamic modeling of the system since the human arm could have different dynamic properties. Hence, the controller is formed by an ANN and robust term. In this way, the adaptation laws of the parameters are extracted by Lyapunov approach, and as a result, as aforementioned, the asymptotic stability of the whole of the system is guaranteed. Simulation results certify the efficiency of the proposed method. Finally, using the Roots Mean Square Error (RMSE) criteria, it has been revealed that, in the presence of bounded disturbance with different amplitude, adding the robust term to the controller leads to improve the tracking error about 34% and 62%, respectively.
Collapse
Affiliation(s)
- Vahid Bahrami
- School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran, Human and Robot Interaction Laboratory
| | - Ahmad Kalhor
- School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran, Human and Robot Interaction Laboratory
| | - Mehdi Tale Masouleh
- School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran, Human and Robot Interaction Laboratory
| |
Collapse
|
20
|
Singh B, Kumar R, Singh VP. Reinforcement learning in robotic applications: a comprehensive survey. Artif Intell Rev 2021. [DOI: 10.1007/s10462-021-09997-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
21
|
Yu X, He W, Li Y, Xue C, Li J, Zou J, Yang C. Bayesian Estimation of Human Impedance and Motion Intention for Human-Robot Collaboration. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:1822-1834. [PMID: 31647450 DOI: 10.1109/tcyb.2019.2940276] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This article proposes a Bayesian method to acquire the estimation of human impedance and motion intention in a human-robot collaborative task. Combining with the prior knowledge of human stiffness, estimated stiffness obeying Gaussian distribution is obtained by Bayesian estimation, and human motion intention can be also estimated. An adaptive impedance control strategy is employed to track a target impedance model and neural networks are used to compensate for uncertainties in robotic dynamics. Comparative simulation results are carried out to verify the effectiveness of estimation method and emphasize the advantages of the proposed control strategy. The experiment, performed on Baxter robot platform, illustrates a good system performance.
Collapse
|
22
|
|
23
|
Kobayashi T, Ilboudo WEL. t-soft update of target network for deep reinforcement learning. Neural Netw 2021; 136:63-71. [PMID: 33450653 DOI: 10.1016/j.neunet.2020.12.023] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Revised: 11/18/2020] [Accepted: 12/23/2020] [Indexed: 10/22/2022]
Abstract
This paper proposes a new robust update rule of target network for deep reinforcement learning (DRL), to replace the conventional update rule, given as an exponential moving average. The target network is for smoothly generating the reference signals for a main network in DRL, thereby reducing learning variance. The problem with its conventional update rule is the fact that all the parameters are smoothly copied with the same speed from the main network, even when some of them are trying to update toward the wrong directions. This behavior increases the risk of generating the wrong reference signals. Although slowing down the overall update speed is a naive way to mitigate wrong updates, it would decrease learning speed. To robustly update the parameters while keeping learning speed, a t-soft update method, which is inspired by Student-t distribution, is derived with reference to the analogy between the exponential moving average and the normal distribution. Through the analysis of the derived t-soft update, we show that it takes over the properties of the Student-t distribution. Specifically, with a heavy-tailed property of the Student-t distribution, the t-soft update automatically excludes extreme updates that differ from past experiences. In addition, when the updates are similar to the past experiences, it can mitigate the learning delay by increasing the amount of updates. In PyBullet robotics simulations for DRL, an online actor-critic algorithm with the t-soft update outperformed the conventional methods in terms of the obtained return and/or its variance. From the training process by the t-soft update, we found that the t-soft update is globally consistent with the standard soft update, and the update rates are locally adjusted for acceleration or suppression.
Collapse
|
24
|
Xu J, Xu L, Li Y, Cheng G, Shi J, Liu J, Chen S. A Multi-Channel Reinforcement Learning Framework for Robotic Mirror Therapy. IEEE Robot Autom Lett 2020. [DOI: 10.1109/lra.2020.3007408] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
25
|
Köpf F, Westermann J, Flad M, Hohmann S. Adaptive optimal control for reference tracking independent of exo-system dynamics. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.04.140] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
26
|
Khoramshahi M, Billard A. A dynamical system approach for detection and reaction to human guidance in physical human–robot interaction. Auton Robots 2020. [DOI: 10.1007/s10514-020-09934-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
AbstractA seamless interaction requires two robotic behaviors: the leader role where the robot rejects the external perturbations and focuses on the autonomous execution of the task, and the follower role where the robot ignores the task and complies with human intentional forces. The goal of this work is to provide (1) a unified robotic architecture to produce these two roles, and (2) a human-guidance detection algorithm to switch across the two roles. In the absence of human-guidance, the robot performs its task autonomously and upon detection of such guidances the robot passively follows the human motions. We employ dynamical systems to generate task-specific motion and admittance control to generate reactive motions toward the human-guidance. This structure enables the robot to reject undesirable perturbations, track the motions precisely, react to human-guidance by providing proper compliant behavior, and re-plan the motion reactively. We provide analytical investigation of our method in terms of tracking and compliant behavior. Finally, we evaluate our method experimentally using a 6-DoF manipulator.
Collapse
|
27
|
Compliant Manipulation Method for a Nursing Robot Based on Physical Structure of Human Limb. J INTELL ROBOT SYST 2020. [DOI: 10.1007/s10846-020-01221-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
28
|
Li Y, Wen Y, Tao D, Guan K. Transforming Cooling Optimization for Green Data Center via Deep Reinforcement Learning. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:2002-2013. [PMID: 31352360 DOI: 10.1109/tcyb.2019.2927410] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Data center (DC) plays an important role to support services, such as e-commerce and cloud computing. The resulting energy consumption from this growing market has drawn significant attention, and noticeably almost half of the energy cost is used to cool the DC to a particular temperature. It is thus an critical operational challenge to curb the cooling energy cost without sacrificing the thermal safety of a DC. The existing solutions typically follow a two-step approach, in which the system is first modeled based on expert knowledge and, thus, the operational actions are determined with heuristics and/or best practices. These approaches are often hard to generalize and might result in suboptimal performances due to intrinsic model errors for large-scale systems. In this paper, we propose optimizing the DC cooling control via the emerging deep reinforcement learning (DRL) framework. Compared to the existing approaches, our solution lends itself an end-to-end cooling control algorithm (CCA) via an off-policy offline version of the deep deterministic policy gradient (DDPG) algorithm, in which an evaluation network is trained to predict the DC energy cost along with resulting cooling effects, and a policy network is trained to gauge optimized control settings. Moreover, we introduce a de-underestimation (DUE) validation mechanism for the critic network to reduce the potential underestimation of the risk caused by neural approximation. Our proposed algorithm is evaluated on an EnergyPlus simulation platform and on a real data trace collected from the National Super Computing Centre (NSCC) of Singapore. The resulting numerical results show that the proposed CCA can achieve up to 11% cooling cost reduction on the simulation platform compared with a manually configured baseline control algorithm. In the trace-based study of conservative nature, the proposed algorithm can achieve about 15% cooling energy savings on the NSCC data trace. Our pioneering approach can shed new light on the application of DRL to optimize and automate DC operations and management, potentially revolutionizing digital infrastructure management with intelligence.
Collapse
|
29
|
Wan Z, Jiang C, Fahad M, Ni Z, Guo Y, He H. Robot-Assisted Pedestrian Regulation Based on Deep Reinforcement Learning. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:1669-1682. [PMID: 30475740 DOI: 10.1109/tcyb.2018.2878977] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Pedestrian regulation can prevent crowd accidents and improve crowd safety in densely populated areas. Recent studies use mobile robots to regulate pedestrian flows for desired collective motion through the effect of passive human-robot interaction (HRI). This paper formulates a robot motion planning problem for the optimization of two merging pedestrian flows moving through a bottleneck exit. To address the challenge of feature representation of complex human motion dynamics under the effect of HRI, we propose using a deep neural network to model the mapping from the image input of pedestrian environments to the output of robot motion decisions. The robot motion planner is trained end-to-end using a deep reinforcement learning algorithm, which avoids hand-crafted feature detection and extraction, thus improving the learning capability for complex dynamic problems. Our proposed approach is validated in simulated experiments, and its performance is evaluated. The results demonstrate that the robot is able to find optimal motion decisions that maximize the pedestrian outflow in different flow conditions, and the pedestrian-accumulated outflow increases significantly compared to cases without robot regulation and with random robot motion.
Collapse
|
30
|
Gui K, Tan UX, Liu H, Zhang D. A New Impedance Controller Based on Nonlinear Model Reference Adaptive Control for Exoskeleton Systems. INT J HUM ROBOT 2019. [DOI: 10.1142/s0219843619500208] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Robotic exoskeletons are expected to show high compliance and low impedance for human–robot interactions (HRIs). Our study introduces a novel method based on nonlinear model reference adaptive control (MRAC) to reduce the inherent impedance and replace the traditional impedance controller in HRIs. The control law and adaptive law are designed according to a candidate Lyapunov function. A simple system identification and initialization method for the nonlinear MRAC is put forward, which provides a set of better initial values for the controller. From the results of simulation and experiment, our controller can reduce the mechanical impedance and achieve high compliance for HRI. The adaptive control and compliance control can be both achieved by the proposed nonlinear MRAC framework.
Collapse
Affiliation(s)
- Kai Gui
- State Key Laboratory of Mechanical System and Vibration, Shanghai Jiao Tong University, Shanghai 200240, P. R. China
| | - U-Xuan Tan
- Singapore University of Technology and Design, Singapore
| | - Honghai Liu
- State Key Laboratory of Mechanical System and Vibration, Shanghai Jiao Tong University, Shanghai 200240, P. R. China
| | - Dingguo Zhang
- State Key Laboratory of Mechanical System and Vibration, Shanghai Jiao Tong University, Shanghai 200240, P. R. China
- Department of Electronic & Electrical Engineering, University of Bath, UK
| |
Collapse
|
31
|
Abstract
The optimal tracking problem is addressed in the robotics literature by using a variety of robust and adaptive control approaches. However, these schemes are associated with implementation limitations such as applicability in uncertain dynamical environments with complete or partial model-based control structures, complexity and integrity in discrete-time environments, and scalability in complex coupled dynamical systems. An online adaptive learning mechanism is developed to tackle the above limitations and provide a generalized solution platform for a class of tracking control problems. This scheme minimizes the tracking errors and optimizes the overall dynamical behavior using simultaneous linear feedback control strategies. Reinforcement learning approaches based on value iteration processes are adopted to solve the underlying Bellman optimality equations. The resulting control strategies are updated in real time in an interactive manner without requiring any information about the dynamics of the underlying systems. Means of adaptive critics are employed to approximate the optimal solving value functions and the associated control strategies in real time. The proposed adaptive tracking mechanism is illustrated in simulation to control a flexible wing aircraft under uncertain aerodynamic learning environment.
Collapse
|
32
|
Song R, Xie Y, Zhang Z. Data-driven finite-horizon optimal tracking control scheme for completely unknown discrete-time nonlinear systems. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.05.026] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
33
|
Abstract
The cooperation between humans and robots is becoming increasingly important in our society. Consequently, there is a growing interest in the development of models that can enhance and enrich the interaction between humans and robots. A key challenge in the Human-Robot Interaction (HRI) field is to provide robots with cognitive and affective capabilities, by developing architectures that let them establish empathetic relationships with users. Over the last several years, multiple models were proposed to face this open-challenge. This work provides a survey of the most relevant attempts/works. In details, it offers an overview of the architectures present in literature focusing on three specific aspects of HRI: the development of adaptive behavioral models, the design of cognitive architectures, and the ability to establish empathy with the user. The research was conducted within two databases: Scopus and Web of Science. Accurate exclusion criteria were applied to screen the 4916 articles found. At the end, 56 articles were selected. For each work, an evaluation of the model is made. Pros and cons of each work are detailed by analyzing the aspects that can be improved to establish an enjoyable interaction between robots and users.
Collapse
|
34
|
Hentout A, Aouache M, Maoudj A, Akli I. Human–robot interaction in industrial collaborative robotics: a literature review of the decade 2008–2017. Adv Robot 2019. [DOI: 10.1080/01691864.2019.1636714] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- Abdelfetah Hentout
- Centre de Développement des Technologies Avancées (CDTA), Algiers, Algeria
- Division Productique et Robotique (DPR), Algiers, Algeria
| | - Mustapha Aouache
- Centre de Développement des Technologies Avancées (CDTA), Algiers, Algeria
- Division Telecom (DT), Algiers, Algeria
| | - Abderraouf Maoudj
- Centre de Développement des Technologies Avancées (CDTA), Algiers, Algeria
- Division Productique et Robotique (DPR), Algiers, Algeria
| | - Isma Akli
- Centre de Développement des Technologies Avancées (CDTA), Algiers, Algeria
- Division Productique et Robotique (DPR), Algiers, Algeria
| |
Collapse
|
35
|
Senoo T, Murakami K, Ishikawa M. Deformation Control of a Manipulator Based on the Zener Model. JOURNAL OF ROBOTICS AND MECHATRONICS 2019. [DOI: 10.20965/jrm.2019.p0263] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In this study, passive dynamic control of a manipulator is designed and realized. According to the control strategy, the shift in the position and orientation of an end effector attributable to an external force is regarded as deformation of the robot. The Zener model, known as a standard linear solid model, is used to generate the deformable behavior, which describes the combination of plastic and elastic deformation. Based on the relation analysis between the Zener model and two other deformable models, two types of control methods are proposed in terms of the model’s expression. Physical simulations with a robotic arm are executed to validate the proposed control laws.
Collapse
|
36
|
Yu J, Ji J, Miao Z, Zhou J. Neural network-based region reaching formation control for multi-robot systems in obstacle environment. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.12.051] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
37
|
Improved learning algorithm for two-layer neural networks for identification of nonlinear systems. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.10.008] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
38
|
Ghannadi B, Sharif Razavian R, McPhee J. Configuration-Dependent Optimal Impedance Control of an Upper Extremity Stroke Rehabilitation Manipulandum. Front Robot AI 2018; 5:124. [PMID: 33501003 PMCID: PMC7805823 DOI: 10.3389/frobt.2018.00124] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Accepted: 10/09/2018] [Indexed: 11/13/2022] Open
Abstract
Robots are becoming a popular means of rehabilitation since they can decrease the laborious work of a therapist, and associated costs, and provide well-controlled repeatable tasks. Many researchers have postulated that human motor control can be mathematically represented using optimal control theories, whereby some cost function is effectively maximized or minimized. However, such abilities are compromised in stroke patients. In this study, to promote rehabilitation of the stroke patient, a rehabilitation robot has been developed using optimal control theory. Despite numerous studies of control strategies for rehabilitation, there is a limited number of rehabilitation robots using optimal control theory. The main idea of this work is to show that impedance control gains cannot be kept constant for optimal performance of the robot using a feedback linearization approach. Hence, a general method for the real-time and optimal impedance control of an end-effector-based rehabilitation robot is proposed. The controller is developed for a 2 degree-of-freedom upper extremity stroke rehabilitation robot, and compared to a feedback linearization approach that uses the standard optimal impedance derived from covariance propagation equations. The new method will assign optimal impedance gains at each configuration of the robot while performing a rehabilitation task. The proposed controller is a linear quadratic regulator mapped from the operational space to the joint space. Parameters of the two controllers have been tuned using a unified biomechatronic model of the human and robot. The performances of the controllers were compared while operating the robot under four conditions of human movements (impaired, healthy, delayed, and time-advanced) along a reference trajectory, both in simulations and experiments. Despite the idealized and approximate nature of the human-robot model, the proposed controller worked well in experiments. Simulation and experimental results with the two controllers showed that, compared to the standard optimal controller, the rehabilitation system with the proposed optimal controller is assisting more in the active-assist therapy while resisting in active-constrained case. Furthermore, in passive therapy, the proposed optimal controller maintains the position error and interaction forces in safer regions. This is the result of updating the impedance in the operational space using a linear time-variant impedance model.
Collapse
|
39
|
Mu C, Wang D, He H. Data-Driven Finite-Horizon Approximate Optimal Control for Discrete-Time Nonlinear Systems Using Iterative HDP Approach. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:2948-2961. [PMID: 29028219 DOI: 10.1109/tcyb.2017.2752845] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This paper presents a data-based finite-horizon optimal control approach for discrete-time nonlinear affine systems. The iterative adaptive dynamic programming (ADP) is used to approximately solve Hamilton-Jacobi-Bellman equation by minimizing the cost function in finite time. The idea is implemented with the heuristic dynamic programming (HDP) involved the model network, which makes the iterative control at the first step can be obtained without the system function, meanwhile the action network is used to obtain the approximate optimal control law and the critic network is utilized for approximating the optimal cost function. The convergence of the iterative ADP algorithm and the stability of the weight estimation errors based on the HDP structure are intensively analyzed. Finally, two simulation examples are provided to demonstrate the theoretical results and show the performance of the proposed method.
Collapse
|
40
|
Decentralized robust optimal control for modular robot manipulators via critic-identifier structure-based adaptive dynamic programming. Neural Comput Appl 2018. [DOI: 10.1007/s00521-018-3714-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
41
|
|
42
|
|
43
|
Learning assistive strategies for exoskeleton robots from user-robot physical interaction. Pattern Recognit Lett 2017. [DOI: 10.1016/j.patrec.2017.04.007] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
44
|
General value iteration based reinforcement learning for solving optimal tracking control problem of continuous–time affine nonlinear systems. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.03.038] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
45
|
Variable Admittance Control Based on Fuzzy Reinforcement Learning for Minimally Invasive Surgery Manipulator. SENSORS 2017; 17:s17040844. [PMID: 28417944 PMCID: PMC5424721 DOI: 10.3390/s17040844] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2017] [Revised: 04/04/2017] [Accepted: 04/07/2017] [Indexed: 11/17/2022]
Abstract
In order to get natural and intuitive physical interaction in the pose adjustment of the minimally invasive surgery manipulator, a hybrid variable admittance model based on Fuzzy Sarsa(λ)-learning is proposed in this paper. The proposed model provides continuous variable virtual damping to the admittance controller to respond to human intentions, and it effectively enhances the comfort level during the task execution by modifying the generated virtual damping dynamically. A fuzzy partition defined over the state space is used to capture the characteristics of the operator in physical human-robot interaction. For the purpose of maximizing the performance index in the long run, according to the identification of the current state input, the virtual damping compensations are determined by a trained strategy which can be learned through the experience generated from interaction with humans, and the influence caused by humans and the changing dynamics in the robot are also considered in the learning process. To evaluate the performance of the proposed model, some comparative experiments in joint space are conducted on our experimental minimally invasive surgical manipulator.
Collapse
|
46
|
Erden MS, Billard A. Robotic Assistance by Impedance Compensation for Hand Movements While Manual Welding. IEEE TRANSACTIONS ON CYBERNETICS 2016; 46:2459-2472. [PMID: 26452294 DOI: 10.1109/tcyb.2015.2478656] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, we present a robotic assistance scheme which allows for impedance compensation with stiffness, damping, and mass parameters for hand manipulation tasks and we apply it to manual welding. The impedance compensation does not assume a preprogrammed hand trajectory. Rather, the intention of the human for the hand movement is estimated in real time using a smooth Kalman filter. The movement is restricted by compensatory virtual impedance in the directions perpendicular to the estimated direction of movement. With airbrush painting experiments, we test three sets of values for the impedance parameters as inspired from impedance measurements with manual welding. We apply the best of the tested sets for assistance in manual welding and perform welding experiments with professional and novice welders. We contrast three conditions: 1) welding with the robot's assistance; 2) with the robot when the robot is passive; and 3) welding without the robot. We demonstrate the effectiveness of the assistance through quantitative measures of both task performance and perceived user's satisfaction. The performance of both the novice and professional welders improves significantly with robotic assistance compared to welding with a passive robot. The assessment of user satisfaction shows that all novice and most professional welders appreciate the robotic assistance as it suppresses the tremors in the directions perpendicular to the movement for welding.
Collapse
|
47
|
Modares H, Lewis FL, Jiang ZP. H ∞ tracking control of completely unknown continuous-time systems via off-policy reinforcement learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:2550-2562. [PMID: 26111401 DOI: 10.1109/tnnls.2015.2441749] [Citation(s) in RCA: 111] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
This paper deals with the design of an H ∞ tracking controller for nonlinear continuous-time systems with completely unknown dynamics. A general bounded L2 -gain tracking problem with a discounted performance function is introduced for the H ∞ tracking. A tracking Hamilton-Jacobi-Isaac (HJI) equation is then developed that gives a Nash equilibrium solution to the associated min-max optimization problem. A rigorous analysis of bounded L2 -gain and stability of the control solution obtained by solving the tracking HJI equation is provided. An upper-bound is found for the discount factor to assure local asymptotic stability of the tracking error dynamics. An off-policy reinforcement learning algorithm is used to learn the solution to the tracking HJI equation online without requiring any knowledge of the system dynamics. Convergence of the proposed algorithm to the solution to the tracking HJI equation is shown. Simulation examples are provided to verify the effectiveness of the proposed method.
Collapse
|