1
|
Zhou Z, Lu Y, Tortós PE, Qin R, Kokubu S, Matsunaga F, Xie Q, Yu W. Addressing data imbalance in Sim2Real: ImbalSim2Real scheme and its application in finger joint stiffness self-sensing for soft robot-assisted rehabilitation. Front Bioeng Biotechnol 2024; 12:1334643. [PMID: 38948382 PMCID: PMC11212110 DOI: 10.3389/fbioe.2024.1334643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 05/10/2024] [Indexed: 07/02/2024] Open
Abstract
The simulation-to-reality (sim2real) problem is a common issue when deploying simulation-trained models to real-world scenarios, especially given the extremely high imbalance between simulation and real-world data (scarce real-world data). Although the cycle-consistent generative adversarial network (CycleGAN) has demonstrated promise in addressing some sim2real issues, it encounters limitations in situations of data imbalance due to the lower capacity of the discriminator and the indeterminacy of learned sim2real mapping. To overcome such problems, we proposed the imbalanced Sim2Real scheme (ImbalSim2Real). Differing from CycleGAN, the ImbalSim2Real scheme segments the dataset into paired and unpaired data for two-fold training. The unpaired data incorporated discriminator-enhanced samples to further squash the solution space of the discriminator, for enhancing the discriminator's ability. For paired data, a term targeted regression loss was integrated to ensure specific and quantitative mapping and further minimize the solution space of the generator. The ImbalSim2Real scheme was validated through numerical experiments, demonstrating its superiority over conventional sim2real methods. In addition, as an application of the proposed ImbalSim2Real scheme, we designed a finger joint stiffness self-sensing framework, where the validation loss for estimating real-world finger joint stiffness was reduced by roughly 41% compared to the supervised learning method that was trained with scarce real-world data and by 56% relative to the CycleGAN trained with the imbalanced dataset. Our proposed scheme and framework have potential applicability to bio-signal estimation when facing an imbalanced sim2real problem.
Collapse
Affiliation(s)
- Zhongchao Zhou
- Department of Medical System Engineering, Chiba University, Chiba, Japan
| | - Yuxi Lu
- Department of Medical System Engineering, Chiba University, Chiba, Japan
| | | | - Ruian Qin
- Department of Medical System Engineering, Chiba University, Chiba, Japan
| | - Shota Kokubu
- Department of Medical System Engineering, Chiba University, Chiba, Japan
| | - Fuko Matsunaga
- Department of Medical System Engineering, Chiba University, Chiba, Japan
| | - Qiaolian Xie
- Department of Medical System Engineering, Chiba University, Chiba, Japan
- Institute of Rehabilitation Engineering and Technology, University of Shanghai for Science and Technology, Shanghai, China
| | - Wenwei Yu
- Department of Medical System Engineering, Chiba University, Chiba, Japan
- Center for Frontier Medical Engineering, Chiba University, Chiba, Japan
| |
Collapse
|
2
|
Haarnoja T, Moran B, Lever G, Huang SH, Tirumala D, Humplik J, Wulfmeier M, Tunyasuvunakool S, Siegel NY, Hafner R, Bloesch M, Hartikainen K, Byravan A, Hasenclever L, Tassa Y, Sadeghi F, Batchelor N, Casarini F, Saliceti S, Game C, Sreendra N, Patel K, Gwira M, Huber A, Hurley N, Nori F, Hadsell R, Heess N. Learning agile soccer skills for a bipedal robot with deep reinforcement learning. Sci Robot 2024; 9:eadi8022. [PMID: 38598610 DOI: 10.1126/scirobotics.adi8022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 03/14/2024] [Indexed: 04/12/2024]
Abstract
We investigated whether deep reinforcement learning (deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies. We used deep RL to train a humanoid robot to play a simplified one-versus-one soccer game. The resulting agent exhibits robust and dynamic movement skills, such as rapid fall recovery, walking, turning, and kicking, and it transitions between them in a smooth and efficient manner. It also learned to anticipate ball movements and block opponent shots. The agent's tactical behavior adapts to specific game contexts in a way that would be impractical to manually design. Our agent was trained in simulation and transferred to real robots zero-shot. A combination of sufficiently high-frequency control, targeted dynamics randomization, and perturbations during training enabled good-quality transfer. In experiments, the agent walked 181% faster, turned 302% faster, took 63% less time to get up, and kicked a ball 34% faster than a scripted baseline.
Collapse
Affiliation(s)
| | | | | | | | - Dhruva Tirumala
- Google DeepMind, London, UK
- University College London, London, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Neil Sreendra
- Google DeepMind, London, UK
- Proactive Global, London, UK
| | - Kushal Patel
- Google DeepMind, London, UK
- Proactive Global, London, UK
| | - Marlon Gwira
- Google DeepMind, London, UK
- Proactive Global, London, UK
| | | | | | | | | | | |
Collapse
|
3
|
Hsu KC, Ren AZ, Nguyen DP, Majumdar A, Fisac JF. Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and Generalization Guarantees. ARTIF INTELL 2022. [DOI: 10.1016/j.artint.2022.103811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
4
|
Analysis of Mobile Robot Control by Reinforcement Learning Algorithm. ELECTRONICS 2022. [DOI: 10.3390/electronics11111754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
This work presents a Deep Reinforcement Learning algorithm to control a differentially driven mobile robot. This study seeks to explain the influence of different definitions of the environment with a mobile robot on the learning process. In our study, we focus on the Reinforcement Learning algorithm called Deep Deterministic Policy Gradient, which is applicable to continuous action problems. We investigate the effectiveness of different noises, inputs, and cost functions in the neural network learning process. To examine the feature of the presented algorithm, a number of simulations were run, and their results are presented. In the simulations, the mobile robot had to reach a target position in a way that minimizes distance error. Our goal was to optimize the learning process. By analyzing the results, we wanted to recommend a more efficient choice of input and cost functions for future research.
Collapse
|