1
|
Zhang S, Tang W, Li P, Zha F. Mapless Path Planning for Mobile Robot Based on Improved Deep Deterministic Policy Gradient Algorithm. SENSORS (BASEL, SWITZERLAND) 2024; 24:5667. [PMID: 39275578 PMCID: PMC11398278 DOI: 10.3390/s24175667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2024] [Revised: 08/20/2024] [Accepted: 08/29/2024] [Indexed: 09/16/2024]
Abstract
In the traditional Deep Deterministic Policy Gradient (DDPG) algorithm, path planning for mobile robots in mapless environments still encounters challenges regarding learning efficiency and navigation performance, particularly adaptability and robustness to static and dynamic obstacles. To address these issues, in this study, an improved algorithm frame was proposed that designs the state and action spaces, and introduces a multi-step update strategy and a dual-noise mechanism to improve the reward function. These improvements significantly enhance the algorithm's learning efficiency and navigation performance, rendering it more adaptable and robust in complex mapless environments. Compared to the traditional DDPG algorithm, the improved algorithm shows a 20% increase in the stability of the navigation success rate with static obstacles along with a 25% reduction in pathfinding steps for smoother paths. In environments with dynamic obstacles, there is a remarkable 45% improvement in success rate. Real-world mobile robot tests further validated the feasibility and effectiveness of the algorithm in true mapless environments.
Collapse
Affiliation(s)
- Shuzhen Zhang
- School of Mechanical and Electrical Engineering, Lanzhou University of Technology, Lanzhou 730000, China
| | - Wei Tang
- School of Mechanical and Electrical Engineering, Lanzhou University of Technology, Lanzhou 730000, China
| | - Panpan Li
- School of Mechanical and Electrical Engineering, Lanzhou University of Technology, Lanzhou 730000, China
| | - Fusheng Zha
- State Key Laboratory of Robotics and System (HIT), Harbin Institute of Technology, Harbin 150001, China
| |
Collapse
|
2
|
Ou Y, Cai Y, Sun Y, Qin T. Autonomous Navigation by Mobile Robot with Sensor Fusion Based on Deep Reinforcement Learning. SENSORS (BASEL, SWITZERLAND) 2024; 24:3895. [PMID: 38931679 PMCID: PMC11207251 DOI: 10.3390/s24123895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 06/05/2024] [Accepted: 06/11/2024] [Indexed: 06/28/2024]
Abstract
In the domain of mobile robot navigation, conventional path-planning algorithms typically rely on predefined rules and prior map information, which exhibit significant limitations when confronting unknown, intricate environments. With the rapid evolution of artificial intelligence technology, deep reinforcement learning (DRL) algorithms have demonstrated considerable effectiveness across various application scenarios. In this investigation, we introduce a self-exploration and navigation approach based on a deep reinforcement learning framework, aimed at resolving the navigation challenges of mobile robots in unfamiliar environments. Firstly, we fuse data from the robot's onboard lidar sensors and camera and integrate odometer readings with target coordinates to establish the instantaneous state of the decision environment. Subsequently, a deep neural network processes these composite inputs to generate motion control strategies, which are then integrated into the local planning component of the robot's navigation stack. Finally, we employ an innovative heuristic function capable of synthesizing map information and global objectives to select the optimal local navigation points, thereby guiding the robot progressively toward its global target point. In practical experiments, our methodology demonstrates superior performance compared to similar navigation methods in complex, unknown environments devoid of predefined map information.
Collapse
Affiliation(s)
- Yang Ou
- School of Computer and Electronic Information, Guangxi University, Nanning 530004, China; (Y.O.); (Y.C.); (Y.S.)
- The Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning 530004, China
| | - Yiyi Cai
- School of Computer and Electronic Information, Guangxi University, Nanning 530004, China; (Y.O.); (Y.C.); (Y.S.)
- The Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning 530004, China
- School of Electronic and Information Engineering, South China University of Technology, Guangzhou 510641, China
| | - Youming Sun
- School of Computer and Electronic Information, Guangxi University, Nanning 530004, China; (Y.O.); (Y.C.); (Y.S.)
- The Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning 530004, China
| | - Tuanfa Qin
- School of Computer and Electronic Information, Guangxi University, Nanning 530004, China; (Y.O.); (Y.C.); (Y.S.)
- The Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning 530004, China
- School of Electronic and Information Engineering, South China University of Technology, Guangzhou 510641, China
| |
Collapse
|
3
|
Li P, Chen D, Wang Y, Zhang L, Zhao S. Path planning of mobile robot based on improved TD3 algorithm in dynamic environment. Heliyon 2024; 10:e32167. [PMID: 38912483 PMCID: PMC11190599 DOI: 10.1016/j.heliyon.2024.e32167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 05/23/2024] [Accepted: 05/29/2024] [Indexed: 06/25/2024] Open
Abstract
This paper proposes an improved TD3 (Twin Delayed Deep Deterministic Policy Gradient) algorithm to address the flaws of low success rate and slow training speed, when using the original TD3 algorithm in mobile robot path planning in dynamic environment. Firstly, prioritized experience replay and transfer learning are introduced to enhance the learning efficiency, where the probability of beneficial experiences being sampled in the experience pool is increased, and the pre-trained model is applied in an obstacle-free environment as the initial model for training in a dynamic environment. Secondly, dynamic delay update strategy is devised and OU noise is added to improve the success rate of path planning, where the probability of missing high-quality value estimate is reduced through changing the delay update interval dynamically, and the correlated exploration of the mobile robot inertial navigation system in the dynamic environment is temporally improved. The algorithm is tested by simulation where the Turtlebot3 robot model as a training object, the ROS melodic operating system and Gazebo simulation software as an experimental environment. Meanwhile, the result shows that the improved TD3 algorithm has a 16.6 % increase in success rate and a 23.5 % reduction in algorithm training time. A generalization experiment was designed finally, and it indicates that superior generation performance has been acquired in mobile robot path planning with continuous action spaces through the improved TD3 algorithm.
Collapse
Affiliation(s)
- Peng Li
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, No.145 Nantong Street, Harbin, Heilongjiang Province, 15001, China
| | - Donghui Chen
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, No.145 Nantong Street, Harbin, Heilongjiang Province, 15001, China
| | - Yuchen Wang
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, No.145 Nantong Street, Harbin, Heilongjiang Province, 15001, China
| | - Lanyong Zhang
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, No.145 Nantong Street, Harbin, Heilongjiang Province, 15001, China
| | - Shiquan Zhao
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, No.145 Nantong Street, Harbin, Heilongjiang Province, 15001, China
| |
Collapse
|
4
|
Huang B, Xie J, Yan J. Inspection Robot Navigation Based on Improved TD3 Algorithm. SENSORS (BASEL, SWITZERLAND) 2024; 24:2525. [PMID: 38676143 PMCID: PMC11053717 DOI: 10.3390/s24082525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 04/07/2024] [Accepted: 04/10/2024] [Indexed: 04/28/2024]
Abstract
The swift advancements in robotics have rendered navigation an essential task for mobile robots. While map-based navigation methods depend on global environmental maps for decision-making, their efficacy in unfamiliar or dynamic settings falls short. Current deep reinforcement learning navigation strategies can navigate successfully without pre-existing map data, yet they grapple with issues like inefficient training, slow convergence, and infrequent rewards. To tackle these challenges, this study introduces an improved two-delay depth deterministic policy gradient algorithm (LP-TD3) for local planning navigation. Initially, the integration of the long-short-term memory (LSTM) module with the Prioritized Experience Re-play (PER) mechanism into the existing TD3 framework was performed to optimize training and improve the efficiency of experience data utilization. Furthermore, the incorporation of an Intrinsic Curiosity Module (ICM) merges intrinsic with extrinsic rewards to tackle sparse reward problems and enhance exploratory behavior. Experimental evaluations using ROS and Gazebo simulators demonstrate that the proposed method outperforms the original on various performance metrics.
Collapse
Affiliation(s)
- Bo Huang
- School of Mechanical Engineering, Sichuan University of Science and Engineering, Zigong 643099, China
| | - Jiacheng Xie
- School of Mechanical Engineering, Sichuan University of Science and Engineering, Zigong 643099, China
| | | |
Collapse
|
5
|
Yao Z, Zhao C, Zhang T. Agricultural machinery automatic navigation technology. iScience 2024; 27:108714. [PMID: 38292432 PMCID: PMC10827555 DOI: 10.1016/j.isci.2023.108714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2024] Open
Abstract
In this paper, we review, compare, and analyze previous studies on agricultural machinery automatic navigation and path planning technologies. First, the paper introduces the fundamental components of agricultural machinery autonomous driving, including automatic navigation, path planning, control systems, and communication modules. Generally, the methods for automatic navigation technology can be divided into three categories: Global Navigation Satellite System (GNSS), Machine Vision, and Laser Radar. The structures, advantages, and disadvantages of different methods and the technical difficulties of current research are summarized and compared. At present, the more successful way is to use GNSS combined with machine vision to provide guarantee for agricultural machinery to avoid obstacles and generate the optimal path. Then the path planning methods are described, including four path planning algorithms based on graph search, sampling, optimization, and learning. This paper proposes 22 available algorithms according to different application scenarios and summarizes the challenges and difficulties that have not been completely solved in the current research. Finally, some suggestions on the difficulties arising in these studies are proposed for further research.
Collapse
Affiliation(s)
- Zhixin Yao
- College of Computer and Information Engineering, Xinjiang Agricultural University, Urumqi 830052, China
- Engineering Research Center of Intelligent Agriculture, Ministry of Education, Urumqi 830052, China
| | - Chunjiang Zhao
- College of Computer and Information Engineering, Xinjiang Agricultural University, Urumqi 830052, China
- National Engineering Research Center for Information Technology in Agriculture, Beijing 100083, China
| | - Taihong Zhang
- College of Computer and Information Engineering, Xinjiang Agricultural University, Urumqi 830052, China
- Engineering Research Center of Intelligent Agriculture, Ministry of Education, Urumqi 830052, China
| |
Collapse
|
6
|
Tanaka T, Malki H. A Deep Learning Approach to Lunar Rover Global Path Planning Using Environmental Constraints and the Rover Internal Resource Status. SENSORS (BASEL, SWITZERLAND) 2024; 24:844. [PMID: 38339561 PMCID: PMC10857624 DOI: 10.3390/s24030844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 01/19/2024] [Accepted: 01/26/2024] [Indexed: 02/12/2024]
Abstract
This research proposes a novel approach to global path and resource planning for lunar rovers. The proposed method incorporates a range of constraints, including static, time-variant, and path-dependent factors related to environmental conditions and the rover's internal resource status. These constraints are integrated into a grid map as a penalty function, and a reinforcement learning-based framework is employed to address the resource constrained shortest path problem (RCSP). Compared to existing approaches referenced in the literature, our proposed method enables the simultaneous consideration of a broader spectrum of constraints. This enhanced flexibility leads to improved path search optimality. To evaluate the performance of our approach, this research applied the proposed learning architecture to lunar rover path search problems, generated based on real lunar digital elevation data. The simulation results demonstrate that our architecture successfully identifies a rover path while consistently adhering to user-defined environmental and rover resource safety criteria across all positions and time epochs. Furthermore, the simulation results indicate that our approach surpasses conventional methods that solely rely on environmental constraints.
Collapse
Affiliation(s)
- Toshiki Tanaka
- Department of Electrical and Computer Engineering, University of Houston, Houston, TX 77004, USA
| | - Heidar Malki
- Department of Engineering Technology, University of Houston, Houston, TX 77004, USA
| |
Collapse
|
7
|
Zhang Y, Chen P. Path Planning of a Mobile Robot for a Dynamic Indoor Environment Based on an SAC-LSTM Algorithm. SENSORS (BASEL, SWITZERLAND) 2023; 23:9802. [PMID: 38139648 PMCID: PMC10747912 DOI: 10.3390/s23249802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 12/01/2023] [Accepted: 12/09/2023] [Indexed: 12/24/2023]
Abstract
This paper proposes an improved Soft Actor-Critic Long Short-Term Memory (SAC-LSTM) algorithm for fast path planning of mobile robots in dynamic environments. To achieve continuous motion and better decision making by incorporating historical and current states, a long short-term memory network (LSTM) with memory was integrated into the SAC algorithm. To mitigate the memory depreciation issue caused by resetting the LSTM's hidden states to zero during training, a burn-in training method was adopted to boost the performance. Moreover, a prioritized experience replay mechanism was implemented to enhance sampling efficiency and speed up convergence. Based on the SAC-LSTM framework, a motion model for the Turtlebot3 mobile robot was established by designing the state space, action space, reward function, and overall planning process. Three simulation experiments were conducted in obstacle-free, static obstacle, and dynamic obstacle environments using the ROS platform and Gazebo9 software. The results were compared with the SAC algorithm. In all scenarios, the SAC-LSTM algorithm demonstrated a faster convergence rate and a higher path planning success rate, registering a significant 10.5 percentage point improvement in the success rate of reaching the target point in the dynamic obstacle environment. Additionally, the time taken for path planning was shorter, and the planned paths were more concise.
Collapse
Affiliation(s)
| | - Pengzhan Chen
- School of Intelligent Manufacturing, Taizhou University, Taizhou 318000, China;
| |
Collapse
|
8
|
Jeng SL, Chiang C. End-to-End Autonomous Navigation Based on Deep Reinforcement Learning with a Survival Penalty Function. SENSORS (BASEL, SWITZERLAND) 2023; 23:8651. [PMID: 37896743 PMCID: PMC10610759 DOI: 10.3390/s23208651] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 10/16/2023] [Accepted: 10/19/2023] [Indexed: 10/29/2023]
Abstract
An end-to-end approach to autonomous navigation that is based on deep reinforcement learning (DRL) with a survival penalty function is proposed in this paper. Two actor-critic (AC) frameworks, namely, deep deterministic policy gradient (DDPG) and twin-delayed DDPG (TD3), are employed to enable a nonholonomic wheeled mobile robot (WMR) to perform navigation in dynamic environments containing obstacles and for which no maps are available. A comprehensive reward based on the survival penalty function is introduced; this approach effectively solves the sparse reward problem and enables the WMR to move toward its target. Consecutive episodes are connected to increase the cumulative penalty for scenarios involving obstacles; this method prevents training failure and enables the WMR to plan a collision-free path. Simulations are conducted for four scenarios-movement in an obstacle-free space, in a parking lot, at an intersection without and with a central obstacle, and in a multiple obstacle space-to demonstrate the efficiency and operational safety of our method. For the same navigation environment, compared with the DDPG algorithm, the TD3 algorithm exhibits faster numerical convergence and higher stability in the training phase, as well as a higher task execution success rate in the evaluation phase.
Collapse
Affiliation(s)
- Shyr-Long Jeng
- Department of Mechanical Engineering, Lunghwa University of Science and Technology, Taoyuan City 333326, Taiwan
| | - Chienhsun Chiang
- Department of Mechanical Engineering, National Yang Ming Chiao Tung University, Hsinchu City 300093, Taiwan
| |
Collapse
|
9
|
Zhao T, Wang M, Zhao Q, Zheng X, Gao H. A Path-Planning Method Based on Improved Soft Actor-Critic Algorithm for Mobile Robots. Biomimetics (Basel) 2023; 8:481. [PMID: 37887612 PMCID: PMC10604071 DOI: 10.3390/biomimetics8060481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 09/30/2023] [Accepted: 10/06/2023] [Indexed: 10/28/2023] Open
Abstract
The path planning problem has gained more attention due to the gradual popularization of mobile robots. The utilization of reinforcement learning techniques facilitates the ability of mobile robots to successfully navigate through an environment containing obstacles and effectively plan their path. This is achieved by the robots' interaction with the environment, even in situations when the environment is unfamiliar. Consequently, we provide a refined deep reinforcement learning algorithm that builds upon the soft actor-critic (SAC) algorithm, incorporating the concept of maximum entropy for the purpose of path planning. The objective of this strategy is to mitigate the constraints inherent in conventional reinforcement learning, enhance the efficacy of the learning process, and accommodate intricate situations. In the context of reinforcement learning, two significant issues arise: inadequate incentives and inefficient sample use during the training phase. To address these challenges, the hindsight experience replay (HER) mechanism has been presented as a potential solution. The HER mechanism aims to enhance algorithm performance by effectively reusing past experiences. Through the utilization of simulation studies, it can be demonstrated that the enhanced algorithm exhibits superior performance in comparison with the pre-existing method.
Collapse
Affiliation(s)
- Tinglong Zhao
- School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China; (T.Z.); (X.Z.); (H.G.)
| | - Ming Wang
- School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China; (T.Z.); (X.Z.); (H.G.)
| | - Qianchuan Zhao
- Department of Automation, Tsinghua University, Beijing 100018, China;
| | - Xuehan Zheng
- School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China; (T.Z.); (X.Z.); (H.G.)
| | - He Gao
- School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China; (T.Z.); (X.Z.); (H.G.)
- Shandong Zhengchen Technology Co., Ltd., Jinan 250000, China
| |
Collapse
|
10
|
Han H, Wang J, Kuang L, Han X, Xue H. Improved Robot Path Planning Method Based on Deep Reinforcement Learning. SENSORS (BASEL, SWITZERLAND) 2023; 23:5622. [PMID: 37420785 DOI: 10.3390/s23125622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 06/11/2023] [Accepted: 06/14/2023] [Indexed: 07/09/2023]
Abstract
With the advancement of robotics, the field of path planning is currently experiencing a period of prosperity. Researchers strive to address this nonlinear problem and have achieved remarkable results through the implementation of the Deep Reinforcement Learning (DRL) algorithm DQN (Deep Q-Network). However, persistent challenges remain, including the curse of dimensionality, difficulties of model convergence and sparsity in rewards. To tackle these problems, this paper proposes an enhanced DDQN (Double DQN) path planning approach, in which the information after dimensionality reduction is fed into a two-branch network that incorporates expert knowledge and an optimized reward function to guide the training process. The data generated during the training phase are initially discretized into corresponding low-dimensional spaces. An "expert experience" module is introduced to facilitate the model's early-stage training acceleration in the Epsilon-Greedy algorithm. To tackle navigation and obstacle avoidance separately, a dual-branch network structure is presented. We further optimize the reward function enabling intelligent agents to receive prompt feedback from the environment after performing each action. Experiments conducted in both virtual and real-world environments have demonstrated that the enhanced algorithm can accelerate model convergence, improve training stability and generate a smooth, shorter and collision-free path.
Collapse
Affiliation(s)
- Huiyan Han
- School of Computer Science and Technology, North University of China, Taiyuan 030051, China
- Shanxi Key Laboratory of Machine Vision and Virtual Reality, Taiyuan 030051, China
- Shanxi Vision Information Processing and Intelligent Robot Engineering Research Center, Taiyuan 030051, China
| | - Jiaqi Wang
- School of Computer Science and Technology, North University of China, Taiyuan 030051, China
- Shanxi Key Laboratory of Machine Vision and Virtual Reality, Taiyuan 030051, China
- Shanxi Vision Information Processing and Intelligent Robot Engineering Research Center, Taiyuan 030051, China
| | - Liqun Kuang
- School of Computer Science and Technology, North University of China, Taiyuan 030051, China
- Shanxi Key Laboratory of Machine Vision and Virtual Reality, Taiyuan 030051, China
- Shanxi Vision Information Processing and Intelligent Robot Engineering Research Center, Taiyuan 030051, China
| | - Xie Han
- School of Computer Science and Technology, North University of China, Taiyuan 030051, China
- Shanxi Key Laboratory of Machine Vision and Virtual Reality, Taiyuan 030051, China
- Shanxi Vision Information Processing and Intelligent Robot Engineering Research Center, Taiyuan 030051, China
| | - Hongxin Xue
- School of Computer Science and Technology, North University of China, Taiyuan 030051, China
- Shanxi Key Laboratory of Machine Vision and Virtual Reality, Taiyuan 030051, China
- Shanxi Vision Information Processing and Intelligent Robot Engineering Research Center, Taiyuan 030051, China
| |
Collapse
|
11
|
Liu C, Xie S, Sui X, Huang Y, Ma X, Guo N, Yang F. PRM-D* Method for Mobile Robot Path Planning. SENSORS (BASEL, SWITZERLAND) 2023; 23:3512. [PMID: 37050570 PMCID: PMC10098883 DOI: 10.3390/s23073512] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 03/17/2023] [Accepted: 03/24/2023] [Indexed: 06/19/2023]
Abstract
Various navigation tasks involving dynamic scenarios require mobile robots to meet the requirements of a high planning success rate, fast planning, dynamic obstacle avoidance, and shortest path. PRM (probabilistic roadmap method), as one of the classical path planning methods, is characterized by simple principles, probabilistic completeness, fast planning speed, and the formation of asymptotically optimal paths, but has poor performance in dynamic obstacle avoidance. In this study, we use the idea of hierarchical planning to improve the dynamic obstacle avoidance performance of PRM by introducing D* into the network construction and planning process of PRM. To demonstrate the feasibility of the proposed method, we conducted simulation experiments using the proposed PRM-D* (probabilistic roadmap method and D*) method for maps of different complexity and compared the results with those obtained by classical methods such as SPARS2 (improving sparse roadmap spanners). The experiments demonstrate that our method is non-optimal in terms of path length but second only to graph search methods; it outperforms other methods in static planning, with an average planning time of less than 1 s, and in terms of the dynamic planning speed, our method is two orders of magnitude faster than the SPARS2 method, with a single dynamic planning time of less than 0.02 s. Finally, we deployed the proposed PRM-D* algorithm on a real vehicle for experimental validation. The experimental results show that the proposed method was able to perform the navigation task in a real-world scenario.
Collapse
Affiliation(s)
- Chunyang Liu
- School of Mechatronics Engineering, Henan University of Science and Technology, Luoyang 471003, China; (C.L.)
- Longmen Laboratory, Luoyang 471000, China
| | - Saibao Xie
- School of Mechatronics Engineering, Henan University of Science and Technology, Luoyang 471003, China; (C.L.)
| | - Xin Sui
- School of Mechatronics Engineering, Henan University of Science and Technology, Luoyang 471003, China; (C.L.)
- Key Laboratory of Mechanical Design and Transmission System of Henan Province, Henan University of Science and Technology, Luoyang 471003, China
| | - Yan Huang
- School of Mechatronics Engineering, Henan University of Science and Technology, Luoyang 471003, China; (C.L.)
| | - Xiqiang Ma
- School of Mechatronics Engineering, Henan University of Science and Technology, Luoyang 471003, China; (C.L.)
- Longmen Laboratory, Luoyang 471000, China
| | - Nan Guo
- School of Mechatronics Engineering, Henan University of Science and Technology, Luoyang 471003, China; (C.L.)
| | - Fang Yang
- School of Mechatronics Engineering, Henan University of Science and Technology, Luoyang 471003, China; (C.L.)
- Longmen Laboratory, Luoyang 471000, China
| |
Collapse
|
12
|
Sánchez M, Morales J, Martínez JL. Reinforcement and Curriculum Learning for Off-Road Navigation of an UGV with a 3D LiDAR. SENSORS (BASEL, SWITZERLAND) 2023; 23:3239. [PMID: 36991950 PMCID: PMC10057611 DOI: 10.3390/s23063239] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 03/08/2023] [Accepted: 03/15/2023] [Indexed: 06/19/2023]
Abstract
This paper presents the use of deep Reinforcement Learning (RL) for autonomous navigation of an Unmanned Ground Vehicle (UGV) with an onboard three-dimensional (3D) Light Detection and Ranging (LiDAR) sensor in off-road environments. For training, both the robotic simulator Gazebo and the Curriculum Learning paradigm are applied. Furthermore, an Actor-Critic Neural Network (NN) scheme is chosen with a suitable state and a custom reward function. To employ the 3D LiDAR data as part of the input state of the NNs, a virtual two-dimensional (2D) traversability scanner is developed. The resulting Actor NN has been successfully tested in both real and simulated experiments and favorably compared with a previous reactive navigation approach on the same UGV.
Collapse
|
13
|
Park M, Lee SY, Hong JS, Kwon NK. Deep Deterministic Policy Gradient-Based Autonomous Driving for Mobile Robots in Sparse Reward Environments. SENSORS (BASEL, SWITZERLAND) 2022; 22:9574. [PMID: 36559941 PMCID: PMC9787388 DOI: 10.3390/s22249574] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 11/17/2022] [Accepted: 11/25/2022] [Indexed: 06/17/2023]
Abstract
In this paper, we propose a deep deterministic policy gradient (DDPG)-based path-planning method for mobile robots by applying the hindsight experience replay (HER) technique to overcome the performance degradation resulting from sparse reward problems occurring in autonomous driving mobile robots. The mobile robot in our analysis was a robot operating system-based TurtleBot3, and the experimental environment was a virtual simulation based on Gazebo. A fully connected neural network was used as the DDPG network based on the actor-critic architecture. Noise was added to the actor network. The robot recognized an unknown environment by measuring distances using a laser sensor and determined the optimized policy to reach its destination. The HER technique improved the learning performance by generating three new episodes with normal experience from a failed episode. The proposed method demonstrated that the HER technique could help mitigate the sparse reward problem; this was further corroborated by the successful autonomous driving results obtained after applying the proposed method to two reward systems, as well as actual experimental results.
Collapse
Affiliation(s)
- Minjae Park
- Department of Electronic Engineering, Yeungnam University, Gyeongsan 38541, Republic of Korea
| | - Seok Young Lee
- Department of Electronic Engineering, Soonchunhyang University, Asan 31538, Republic of Korea
| | - Jin Seok Hong
- Department of Electronic Engineering, Yeungnam University, Gyeongsan 38541, Republic of Korea
| | - Nam Kyu Kwon
- Department of Electronic Engineering, Yeungnam University, Gyeongsan 38541, Republic of Korea
| |
Collapse
|
14
|
Zhou Y, Shu J, Zheng X, Hao H, Song H. Real-time route planning of unmanned aerial vehicles based on improved soft actor-critic algorithm. Front Neurorobot 2022; 16:1025817. [PMID: 36545396 PMCID: PMC9762480 DOI: 10.3389/fnbot.2022.1025817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 11/11/2022] [Indexed: 12/07/2022] Open
Abstract
With the application and development of UAV technology and navigation and positioning technology, higher requirements are put forward for UAV maneuvering obstacle avoidance ability and real-time route planning. In this paper, for the problem of real-time UAV route planning in the unknown environment, we combine the ideas of artificial potential field method to modify the state observation and reward function, which solves the problem of sparse rewards of reinforcement learning algorithm, improves the convergence speed of the algorithm, and improves the generalization of the algorithm by step-by-step training based on the ideas of curriculum learning and transfer learning according to the difficulty of the task. The simulation results show that the improved SAC algorithm has fast convergence speed, good timeliness and strong generalization, and can better complete the UAV route planning task.
Collapse
|
15
|
Cimurs R, Merchán-Cruz EA. Leveraging Expert Demonstration Features for Deep Reinforcement Learning in Floor Cleaning Robot Navigation. SENSORS (BASEL, SWITZERLAND) 2022; 22:7750. [PMID: 36298101 PMCID: PMC9611158 DOI: 10.3390/s22207750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 09/22/2022] [Accepted: 09/30/2022] [Indexed: 06/16/2023]
Abstract
In this paper, a Deep Reinforcement Learning (DRL)-based approach for learning mobile cleaning robot navigation commands that leverage experience from expert demonstrations is presented. First, expert demonstrations of robot motion trajectories in simulation in the cleaning robot domain are collected. The relevant motion features with regard to the distance to obstacles and the heading difference towards the navigation goal are extracted. Each feature weight is optimized with respect to the collected data, and the obtained values are assumed as representing the optimal motion of the expert navigation. A reward function is created based on the feature values to train a policy with semi-supervised DRL, where an immediate reward is calculated based on the closeness to the expert navigation. The presented results show the viability of this approach with regard to robot navigation as well as the reduced training time.
Collapse
Affiliation(s)
| | - Emmanuel Alejandro Merchán-Cruz
- SIA Robotic Solutions, LV-1039 Riga, Latvia
- Transport and Telecommunication Institute, Engineering Faculty, LV-1019 Riga, Latvia
| |
Collapse
|
16
|
Shahi S, Lee H. Autonomous Rear Parking via Rapidly Exploring Random-Tree-Based Reinforcement Learning. SENSORS (BASEL, SWITZERLAND) 2022; 22:6655. [PMID: 36081115 PMCID: PMC9460702 DOI: 10.3390/s22176655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 08/24/2022] [Accepted: 08/30/2022] [Indexed: 06/15/2023]
Abstract
This study addresses the problem of autonomous rear parking (ARP) for car-like nonholonomic vehicles. ARP includes path planning to generate an efficient collision-free path from the start point to the target parking slot and path following to produce control inputs to stably follow the generated path. This paper proposes an efficient ARP method that consists of the following five components: (1) OpenAI Gym environment for training the reinforcement learning agent, (2) path planning based on rapidly exploring random trees, (3) path following based on model predictive control, (4) reinforcement learning based on the Markov decision process, and (5) travel length estimation between the start and the goal points. The evaluation results in OpenAI Gym show that the proposed ARP method can successfully be used by minimizing the difference between the reference points and trajectories produced by the proposed method.
Collapse
|
17
|
Li Q, Xu Y, Bu S, Yang J. Smart Vehicle Path Planning Based on Modified PRM Algorithm. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22176581. [PMID: 36081038 PMCID: PMC9460667 DOI: 10.3390/s22176581] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 08/09/2022] [Accepted: 08/29/2022] [Indexed: 06/12/2023]
Abstract
Path planning is a very important step for mobile smart vehicles in complex environments. Sampling based planners such as the Probabilistic Roadmap Method (PRM) have been widely used for smart vehicle applications. However, there exist some shortcomings, such as low efficiency, low reuse rate of the roadmap, and a lack of guidance in the selection of sampling points. To solve the above problems, we designed a pseudo-random sampling strategy with the main spatial axis as the reference axis. We optimized the generation of sampling points, removed redundant sampling points, set the distance threshold between road points, adopted a two-way incremental method for collision detections, and optimized the number of collision detection calls to improve the construction efficiency of the roadmap. The key road points of the planned path were extracted as discrete control points of the Bessel curve, and the paths were smoothed to make the generated paths more consistent with the driving conditions of vehicles. The correctness of the modified PRM was verified and analyzed using MATLAB and ROS to build a test platform. Compared with the basic PRM algorithm, the modified PRM algorithm has advantages related to speed in constructing the roadmap, path planning, and path length.
Collapse
|
18
|
Xiang D, Lin H, Ouyang J, Huang D. Combined improved A* and greedy algorithm for path planning of multi-objective mobile robot. Sci Rep 2022; 12:13273. [PMID: 35918508 PMCID: PMC9345932 DOI: 10.1038/s41598-022-17684-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Accepted: 07/29/2022] [Indexed: 11/09/2022] Open
Abstract
With the development of artificial intelligence, path planning of Autonomous Mobile Robot (AMR) has been a research hotspot in recent years. This paper proposes the improved A* algorithm combined with the greedy algorithm for a multi-objective path planning strategy. Firstly, the evaluation function is improved to make the convergence of A* algorithm faster. Secondly, the unnecessary nodes of the A* algorithm are removed, meanwhile only the necessary inflection points are retained for path planning. Thirdly, the improved A* algorithm combined with the greedy algorithm is applied to multi-objective point planning. Finally, path planning is performed for five target nodes in a warehouse environment to compare path lengths, turn angles and other parameters. The simulation results show that the proposed algorithm is smoother and the path length is reduced by about 5%. The results show that the proposed method can reduce a certain path length.
Collapse
Affiliation(s)
- Dan Xiang
- School of Automation, Guangdong Polytechnic Normal University, Guangzhou, 510665, Guangdong, China.,School of Computer Science and Information Engineering, Guangzhou Maritime University, Guangzhou, 510725, Guangdong, China
| | - Hanxi Lin
- School of Automation, Guangdong Polytechnic Normal University, Guangzhou, 510665, Guangdong, China
| | - Jian Ouyang
- Industrial Training Center, Guangdong Polytechnic Normal University, Guangzhou, 510665, Guangdong, China.
| | - Dan Huang
- The School of Mechanical and Automotive Engineering, South China University of Technology, Guangzhou, 510641, Guangdong, China
| |
Collapse
|
19
|
Robot Path Planning Method Based on Indoor Spacetime Grid Model. REMOTE SENSING 2022. [DOI: 10.3390/rs14102357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In the context of digital twins, smart city construction and artificial intelligence technology are developing rapidly, and more and more mobile robots are performing tasks in complex and time-varying indoor environments, making, at present, the unification of modeling, dynamic expression, visualization of operation, and wide application between robots and indoor environments a pressing problem to be solved. This paper presents an in-depth study on this issue and summarizes three major types of methods: geometric modeling, topological modeling, and raster modeling, and points out the advantages and disadvantages of these three types of methods. Therefore, in view of the current pain points of robots and complex time-varying indoor environments, this paper proposes an indoor spacetime grid model based on the three-dimensional division framework of the Earth space and innovatively integrates time division on the basis of space division. On the basis of the model, a dynamic path planning algorithm for the robot in the complex time-varying indoor environment is designed, that is, the Spacetime-A* algorithm (STA* for short). Finally, the indoor spacetime grid modeling experiment is carried out with real data, which verifies the feasibility and correctness of the spacetime relationship calculation algorithm encoded by the indoor spacetime grid model. Then, experiments are carried out on the multi-group path planning algorithms of the robot under the spacetime grid, and the feasibility of the STA* algorithm under the indoor spacetime grid and the superiority of the spacetime grid are verified.
Collapse
|
20
|
Gong H, Wang P, Ni C, Cheng N. Efficient Path Planning for Mobile Robot Based on Deep Deterministic Policy Gradient. SENSORS 2022; 22:s22093579. [PMID: 35591271 PMCID: PMC9102217 DOI: 10.3390/s22093579] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 05/05/2022] [Accepted: 05/05/2022] [Indexed: 02/01/2023]
Abstract
When a traditional Deep Deterministic Policy Gradient (DDPG) algorithm is used in mobile robot path planning, due to the limited observable environment of mobile robots, the training efficiency of the path planning model is low, and the convergence speed is slow. In this paper, Long Short-Term Memory (LSTM) is introduced into the DDPG network, the former and current states of the mobile robot are combined to determine the actions of the robot, and a Batch Norm layer is added after each layer of the Actor network. At the same time, the reward function is optimized to guide the mobile robot to move faster towards the target point. In order to improve the learning efficiency, different normalization methods are used to normalize the distance and angle between the mobile robot and the target point, which are used as the input of the DDPG network model. When the model outputs the next action of the mobile robot, mixed noise composed of Gaussian noise and Ornstein–Uhlenbeck (OU) noise is added. Finally, the simulation environment built by a ROS system and a Gazebo platform is used for experiments. The results show that the proposed algorithm can accelerate the convergence speed of DDPG, improve the generalization ability of the path planning model and improve the efficiency and success rate of mobile robot path planning.
Collapse
Affiliation(s)
- Hui Gong
- Information Science and Electrical Engineering, Shandong Jiao Tong University, Jinan 250357, China; (H.G.); (C.N.); (N.C.)
| | - Peng Wang
- Information Science and Electrical Engineering, Shandong Jiao Tong University, Jinan 250357, China; (H.G.); (C.N.); (N.C.)
- Institute of Automation, Shandong Academy of Sciences, Jinan 250013, China
- Correspondence:
| | - Cui Ni
- Information Science and Electrical Engineering, Shandong Jiao Tong University, Jinan 250357, China; (H.G.); (C.N.); (N.C.)
| | - Nuo Cheng
- Information Science and Electrical Engineering, Shandong Jiao Tong University, Jinan 250357, China; (H.G.); (C.N.); (N.C.)
| |
Collapse
|
21
|
Wang Y, Yang X, Wang L, Hong Z, Zou W. Return Strategy and Machine Learning Optimization of Tennis Sports Robot for Human Motion Recognition. Front Neurorobot 2022; 16:857595. [PMID: 35574231 PMCID: PMC9097601 DOI: 10.3389/fnbot.2022.857595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Accepted: 03/18/2022] [Indexed: 11/13/2022] Open
Abstract
At present, there are many kinds of intelligent training equipment in tennis sports, but they all need human control. If a single tennis player uses the robot to return the ball, it will save some human resources. This study aims to improve the recognition rate of tennis sports robots in the return action and the return strategy. The human-oriented motion recognition of the tennis sports robot is taken as the starting point to recognize and analyze the return action of the tennis sports robot. The OpenPose traversal dataset is used to recognize and extract human motion features of tennis sports robots under different classifications. According to the return characteristics of the tennis sports robot, the method of tennis return strategy based on the support vector machine (SVM) is established, and the SVM algorithm in machine learning is optimized. Finally, the return strategy of tennis sports robots under eight return actions is analyzed and studied. The results reveal that the tennis sports robot based on the SVM-Optimization (SVM-O) algorithm has the highest return recognition rate, and the average return recognition rate is 88.61%. The error rates of the backswing, forward swing, and volatilization are high in the return strategy of tennis sports robots. The preparation action, backswing, and volatilization can achieve more objective results in the analysis of the return strategy, which is more than 90%. With the increase of iteration times, the effect of the model simulation experiment based on SVM-O is the best. It suggests that the algorithm proposed has a reliable accuracy of the return strategy of tennis sports robots, which meets the research requirements. Human motion recognition is integrated with the return motion of tennis sports robots. The application of the SVM-O algorithm to the return action recognition of tennis sports robots has good practicability in the return action recognition of tennis sports robot and solves the problem that the optimization algorithm cannot be applied to the real-time requirements. It has important research significance for the application of an optimized SVM algorithm in sports action recognition.
Collapse
Affiliation(s)
- Yuxuan Wang
- Sports Institute, Nanchang JiaoTong Institute, Nanchang, China
- Graduate School, University of Perpetual Help System Dalta, Las Piñas, Philippines
| | - Xiaoming Yang
- Faculty of Educational Studies, Universiti Putra Malaysia, Kuala Lumpur, Malaysia
- College of Physical Education, East China University of Technology, Nanchang, China
| | - Lili Wang
- College of Physical Education, East China University of Technology, Nanchang, China
| | - Zheng Hong
- School of Software, Nanchang University, Nanchang, China
| | - Wenjun Zou
- Sports Institute, Nanchang JiaoTong Institute, Nanchang, China
- *Correspondence: Wenjun Zou
| |
Collapse
|
22
|
Abstract
Path planning is a key technology for the autonomous mobility of intelligent robots. However, there are few studies on how to carry out path planning in real time under the confrontation environment. Therefore, based on the deep deterministic policy gradient (DDPG) algorithm, this paper designs the reward function and adopts the incremental training and reward compensation method to improve the training efficiency and obtain the penetration strategy. The Monte Carlo experiment results show that the algorithm can effectively avoid static obstacles, break through the interception, and finally reach the target area. Moreover, the algorithm is also validated in the Webots simulator.
Collapse
|
23
|
Segato A, Marzo MD, Zucchelli S, Galvan S, Secoli R, De Momi E. Inverse Reinforcement Learning Intra-operative Path Planning for Steerable Needle. IEEE Trans Biomed Eng 2021; 69:1995-2005. [PMID: 34882540 DOI: 10.1109/tbme.2021.3133075] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
OBJECTIVE This paper presents a safe and effective keyhole neurosurgery intra-operative planning framework for flexible neurosurgical robots. The framework is intended to support neurosurgeons during the intraoperative procedure to react to a dynamic environment. METHODS The proposed system integrates inverse reinforcement learning path planning algorithm combined with 1) a pre-operative path planning framework for fast and intuitive user interaction, 2) a realistic, time-bounded simulator based on Position-based Dynamics (PBD) simulation that mocks brain deformations due to catheter insertion and 3) a simulated robotic system. RESULTS Simulation results performed on a human brain dataset show that the inverse reinforcement learning intra-operative planning method can guide a steerable needle with bounded curvature to a predefined target pose with an average targeting error of 1.34 0.52 (25th=1.02, 75th=1.36) mm in position and 3.16 1.06 (25th=2, 75th=4.94) degrees in orientation under a deformable simulated environment, with a re-planning time of 0.02 sec and a success rate of 100%. CONCLUSION With this work, we demonstrate that the presented intra-operative steerable needle path planner is able to avoid anatomical obstacles while optimising surgical criteria. SIGNIFICANCE The results demonstrate that the proposed method is fast and can securely steer flexible needles with high accuracy and robustness.
Collapse
|
24
|
Overcoming Challenges of Applying Reinforcement Learning for Intelligent Vehicle Control. SENSORS 2021; 21:s21237829. [PMID: 34883832 PMCID: PMC8659501 DOI: 10.3390/s21237829] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 11/18/2021] [Accepted: 11/20/2021] [Indexed: 12/04/2022]
Abstract
Reinforcement learning (RL) is a booming area in artificial intelligence. The applications of RL are endless nowadays, ranging from fields such as medicine or finance to manufacturing or the gaming industry. Although multiple works argue that RL can be key to a great part of intelligent vehicle control related problems, there are many practical problems that need to be addressed, such as safety related problems that can result from non-optimal training in RL. For instance, for an RL agent to be effective it should first cover all the situations during training that it may face later. This is often difficult when applied to the real-world. In this work we investigate the impact of RL applied to the context of intelligent vehicle control. We analyse the implications of RL in path planning tasks and we discuss two possible approaches to overcome the gap between the theorical developments of RL and its practical applications. Specifically, firstly this paper discusses the role of Curriculum Learning (CL) to structure the learning process of intelligent vehicle control in a gradual way. The results show how CL can play an important role in training agents in such context. Secondly, we discuss a method of transferring RL policies from simulation to reality in order to make the agent experience situations in simulation, so it knows how to react to them in reality. For that, we use Arduino Yún controlled robots as our platforms. The results enhance the effectiveness of the presented approach and show how RL policies can be transferred from simulation to reality even when the platforms are resource limited.
Collapse
|
25
|
Reinforcement-Learning-Based Route Generation for Heavy-Traffic Autonomous Mobile Robot Systems. SENSORS 2021; 21:s21144809. [PMID: 34300548 PMCID: PMC8309928 DOI: 10.3390/s21144809] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 07/11/2021] [Accepted: 07/12/2021] [Indexed: 11/19/2022]
Abstract
Autonomous mobile robots (AMRs) are increasingly used in modern intralogistics systems as complexity and performance requirements become more stringent. One way to increase performance is to improve the operation and cooperation of multiple robots in their shared environment. The paper addresses these problems with a method for off-line route planning and on-line route execution. In the proposed approach, pre-computation of routes for frequent pick-up and drop-off locations limits the movements of AMRs to avoid conflict situations between them. The paper proposes a reinforcement learning approach where an agent builds the routes on a given layout while being rewarded according to different criteria based on the desired characteristics of the system. The results show that the proposed approach performs better in terms of throughput and reliability than the commonly used shortest-path-based approach for a large number of AMRs operating in the system. The use of the proposed approach is recommended when the need for high throughput requires the operation of a relatively large number of AMRs in relation to the size of the space in which the robots operate.
Collapse
|
26
|
Wang J, Zhang T, Ma N, Li Z, Ma H, Meng F, Meng MQ. A survey of learning‐based robot motion planning. IET CYBER-SYSTEMS AND ROBOTICS 2021. [DOI: 10.1049/csy2.12020] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Jiankun Wang
- Department of Electronic and Electrical Engineering Southern University of Science and Technology Shenzhen China
| | - Tianyi Zhang
- Department of Electronic and Electrical Engineering Southern University of Science and Technology Shenzhen China
| | - Nachuan Ma
- Department of Electronic and Electrical Engineering Southern University of Science and Technology Shenzhen China
| | - Zhaoting Li
- Department of Electronic and Electrical Engineering Southern University of Science and Technology Shenzhen China
| | - Han Ma
- Department of Electronic Engineering The Chinese University of Hong Kong Hong Kong China
| | - Fei Meng
- Department of Electronic Engineering The Chinese University of Hong Kong Hong Kong China
| | - Max Q.‐H. Meng
- Department of Electronic and Electrical Engineering Southern University of Science and Technology Shenzhen China
- Department of Electronic Engineering The Chinese University of Hong Kong Hong Kong China
- Shenzhen Research Institute of the Chinese University of Hong Kong Shenzhen China
| |
Collapse
|
27
|
Xie J, Chen G, Liu S. Intelligent Badminton Training Robot in Athlete Injury Prevention Under Machine Learning. Front Neurorobot 2021; 15:621196. [PMID: 33776677 PMCID: PMC7994274 DOI: 10.3389/fnbot.2021.621196] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2020] [Accepted: 02/08/2021] [Indexed: 11/13/2022] Open
Abstract
This study was developed to explore the role of the intelligent badminton training robot (IBTR) to prevent badminton player injuries based on the machine learning algorithm. An IBTR is designed from the perspectives of hardware and software systems, and the movements of the athletes are recognized and analyzed with the hidden Markov model (HMM) under the machine learning. After the design was completed, it was simulated with the computer to analyze its performance. The results show that after the HMM is optimized, the recognition accuracy or data pre-processing algorithm, based on the sliding window segmentation at the moment of hitting reaches 96.03%, and the recognition rate of the improved HMM to the robot can be 94.5%, showing a good recognition effect on the training set samples. In addition, the accuracy rate is basically stable when the total size of the training data is 120 sets, after the accuracy of the robot is analyzed through different data set sizes. Therefore, it was found that the designed IBTR has a high recognition rate and stable accuracy, which can provide experimental references for injury prevention in athlete training.
Collapse
Affiliation(s)
- Jun Xie
- School of Physical Education, East China University of Technology, Nanchang, China
| | - Guohua Chen
- School of Physical Education, East China University of Technology, Nanchang, China
| | - Shuang Liu
- College of Physical Education, Jinggangshan University, Ji'an, China
| |
Collapse
|
28
|
Grigore LȘ, Gorgoteanu D, Molder C, Alexa O, Oncioiu I, Ștefan A, Constantin D, Lupoae M, Bălașa RI. A Dynamic Motion Analysis of a Six-Wheel Ground Vehicle for Emergency Intervention Actions. SENSORS 2021; 21:s21051618. [PMID: 33669001 PMCID: PMC7956183 DOI: 10.3390/s21051618] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/03/2021] [Revised: 02/03/2021] [Accepted: 02/21/2021] [Indexed: 11/16/2022]
Abstract
To protect the personnel of the intervention units operating in high-risk areas, it is necessary to introduce (autonomous/semi-autonomous) robotic intervention systems. Previous studies have shown that robotic intervention systems should be as versatile as possible. Here, we focused on the idea of a robotic system composed of two vectors: a carrier vector and an operational vector. The proposed system particularly relates to the carrier vector. A simple analytical model was developed to enable the entire robotic assembly to be autonomous. To validate the analytical-numerical model regarding the kinematics and dynamics of the carrier vector, two of the following applications are presented: intervention for extinguishing a fire and performing measurements for monitoring gamma radiation in a public enclosure. The results show that the chosen carrier vector solution, i.e., the ground vehicle with six-wheel drive, satisfies the requirements related to the mobility of the robotic intervention system. In addition, the conclusions present the elements of the kinematics and dynamics of the robot.
Collapse
Affiliation(s)
- Lucian Ștefăniță Grigore
- Military Technical Academy, “FERDINAND I”, 39–49 George Coșbuc Av., 050141 Bucharest, Romania; (L.Ș.G.); (D.G.); (C.M.); (O.A.); (A.Ș.); (D.C.); (M.L.); (R.-I.B.)
| | - Damian Gorgoteanu
- Military Technical Academy, “FERDINAND I”, 39–49 George Coșbuc Av., 050141 Bucharest, Romania; (L.Ș.G.); (D.G.); (C.M.); (O.A.); (A.Ș.); (D.C.); (M.L.); (R.-I.B.)
| | - Cristian Molder
- Military Technical Academy, “FERDINAND I”, 39–49 George Coșbuc Av., 050141 Bucharest, Romania; (L.Ș.G.); (D.G.); (C.M.); (O.A.); (A.Ș.); (D.C.); (M.L.); (R.-I.B.)
| | - Octavian Alexa
- Military Technical Academy, “FERDINAND I”, 39–49 George Coșbuc Av., 050141 Bucharest, Romania; (L.Ș.G.); (D.G.); (C.M.); (O.A.); (A.Ș.); (D.C.); (M.L.); (R.-I.B.)
| | - Ionica Oncioiu
- Faculty of Finance-Banking, Accountancy and Business Administration, Titu Maiorescu University, 040051 Bucharest, Romania
- Correspondence: ; Tel.: +40-372-710-962
| | - Amado Ștefan
- Military Technical Academy, “FERDINAND I”, 39–49 George Coșbuc Av., 050141 Bucharest, Romania; (L.Ș.G.); (D.G.); (C.M.); (O.A.); (A.Ș.); (D.C.); (M.L.); (R.-I.B.)
| | - Daniel Constantin
- Military Technical Academy, “FERDINAND I”, 39–49 George Coșbuc Av., 050141 Bucharest, Romania; (L.Ș.G.); (D.G.); (C.M.); (O.A.); (A.Ș.); (D.C.); (M.L.); (R.-I.B.)
| | - Marin Lupoae
- Military Technical Academy, “FERDINAND I”, 39–49 George Coșbuc Av., 050141 Bucharest, Romania; (L.Ș.G.); (D.G.); (C.M.); (O.A.); (A.Ș.); (D.C.); (M.L.); (R.-I.B.)
| | - Răzvan-Ionuț Bălașa
- Military Technical Academy, “FERDINAND I”, 39–49 George Coșbuc Av., 050141 Bucharest, Romania; (L.Ș.G.); (D.G.); (C.M.); (O.A.); (A.Ș.); (D.C.); (M.L.); (R.-I.B.)
| |
Collapse
|
29
|
Yu X, Wang P, Zhang Z. Learning-Based End-to-End Path Planning for Lunar Rovers with Safety Constraints. SENSORS 2021; 21:s21030796. [PMID: 33504073 PMCID: PMC7866010 DOI: 10.3390/s21030796] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Revised: 01/19/2021] [Accepted: 01/22/2021] [Indexed: 11/16/2022]
Abstract
Path planning is an essential technology for lunar rover to achieve safe and efficient autonomous exploration mission, this paper proposes a learning-based end-to-end path planning algorithm for lunar rovers with safety constraints. Firstly, a training environment integrating real lunar surface terrain data was built using the Gazebo simulation environment and a lunar rover simulator was created in it to simulate the real lunar surface environment and the lunar rover system. Then an end-to-end path planning algorithm based on deep reinforcement learning method is designed, including state space, action space, network structure, reward function considering slip behavior, and training method based on proximal policy optimization. In addition, to improve the generalization ability to different lunar surface topography and different scale environments, a variety of training scenarios were set up to train the network model using the idea of curriculum learning. The simulation results show that the proposed planning algorithm can successfully achieve the end-to-end path planning of the lunar rover, and the path generated by the proposed algorithm has a higher safety guarantee compared with the classical path planning algorithm.
Collapse
Affiliation(s)
- Xiaoqiang Yu
- School of Astronautics, Harbin Institute of Technology, Harbin 150002, China;
| | - Ping Wang
- China Academy of Space Technology, Beijing 100094, China;
| | - Zexu Zhang
- School of Astronautics, Harbin Institute of Technology, Harbin 150002, China;
- Correspondence:
| |
Collapse
|