1
|
Chai R, Niu H, Carrasco J, Arvin F, Yin H, Lennox B. Design and Experimental Validation of Deep Reinforcement Learning-Based Fast Trajectory Planning and Control for Mobile Robot in Unknown Environment. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:5778-5792. [PMID: 36215389 DOI: 10.1109/tnnls.2022.3209154] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
This article is concerned with the problem of planning optimal maneuver trajectories and guiding the mobile robot toward target positions in uncertain environments for exploration purposes. A hierarchical deep learning-based control framework is proposed which consists of an upper level motion planning layer and a lower level waypoint tracking layer. In the motion planning phase, a recurrent deep neural network (RDNN)-based algorithm is adopted to predict the optimal maneuver profiles for the mobile robot. This approach is built upon a recently proposed idea of using deep neural networks (DNNs) to approximate the optimal motion trajectories, which has been validated that a fast approximation performance can be achieved. To further enhance the network prediction performance, a recurrent network model capable of fully exploiting the inherent relationship between preoptimized system state and control pairs is advocated. In the lower level, a deep reinforcement learning (DRL)-based collision-free control algorithm is established to achieve the waypoint tracking task in an uncertain environment (e.g., the existence of unexpected obstacles). Since this approach allows the control policy to directly learn from human demonstration data, the time required by the training process can be significantly reduced. Moreover, a noisy prioritized experience replay (PER) algorithm is proposed to improve the exploring rate of control policy. The effectiveness of applying the proposed deep learning-based control is validated by executing a number of simulation and experimental case studies. The simulation result shows that the proposed DRL method outperforms the vanilla PER algorithm in terms of training speed. Experimental videos are also uploaded, and the corresponding results confirm that the proposed strategy is able to fulfill the autonomous exploration mission with improved motion planning performance, enhanced collision avoidance ability, and less training time.
Collapse
|
2
|
Chen L, Wang Q, Deng C, Xie B, Tuo X, Jiang G. Improved Double Deep Q-Network Algorithm Applied to Multi-Dimensional Environment Path Planning of Hexapod Robots. SENSORS (BASEL, SWITZERLAND) 2024; 24:2061. [PMID: 38610271 PMCID: PMC11013983 DOI: 10.3390/s24072061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 03/19/2024] [Accepted: 03/21/2024] [Indexed: 04/14/2024]
Abstract
Detecting transportation pipeline leakage points within chemical plants is difficult due to complex pathways, multi-dimensional survey points, and highly dynamic scenarios. However, hexapod robots' maneuverability and adaptability make it an ideal candidate for conducting surveys across different planes. The path-planning problem of hexapod robots in multi-dimensional environments is a significant challenge, especially when identifying suitable transition points and planning shorter paths to reach survey points while traversing multi-level environments. This study proposes a Particle Swarm Optimization (PSO)-guided Double Deep Q-Network (DDQN) approach, namely, the PSO-guided DDQN (PG-DDQN) algorithm, for solving this problem. The proposed algorithm incorporates the PSO algorithm to supplant the traditional random selection strategy, and the data obtained from this guided approach are subsequently employed to train the DDQN neural network. The multi-dimensional random environment is abstracted into localized maps comprising current and next level planes. Comparative experiments were performed with PG-DDQN, standard DQN, and standard DDQN to evaluate the algorithm's performance by using multiple randomly generated localized maps. After testing each iteration, each algorithm obtained the total reward values and completion times. The results demonstrate that PG-DDQN exhibited faster convergence under an equivalent iteration count. Compared with standard DQN and standard DDQN, reductions in path-planning time of at least 33.94% and 42.60%, respectively, were observed, significantly improving the robot's mobility. Finally, the PG-DDQN algorithm was integrated with sensors onto a hexapod robot, and validation was performed through Gazebo simulations and Experiment. The results show that controlling hexapod robots by applying PG-DDQN provides valuable insights for path planning to reach transportation pipeline leakage points within chemical plants.
Collapse
Affiliation(s)
- Liuhongxu Chen
- School of Computer Science and Engineering, Sichuan University of Science and Engineering, Zigong 643000, China
| | - Qibiao Wang
- School of Computer Science and Engineering, Sichuan University of Science and Engineering, Zigong 643000, China
- School of Physics and Electronic Engineering, Sichuan University of Science and Engineering, Zigong 643000, China
| | - Chao Deng
- School of Physics and Electronic Engineering, Sichuan University of Science and Engineering, Zigong 643000, China
| | - Bo Xie
- School of Physics and Electronic Engineering, Sichuan University of Science and Engineering, Zigong 643000, China
| | - Xianguo Tuo
- School of Physics and Electronic Engineering, Sichuan University of Science and Engineering, Zigong 643000, China
| | - Gang Jiang
- School of Mechanical and Electrical Engineering, Chengdu University of Technology, Chengdu 610059, China
| |
Collapse
|
3
|
Wen S, Jiang Y, Cui B, Gao K, Wang F. A Hierarchical Path Planning Approach with Multi-SARSA Based on Topological Map. SENSORS 2022; 22:s22062367. [PMID: 35336535 PMCID: PMC8954451 DOI: 10.3390/s22062367] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Revised: 03/08/2022] [Accepted: 03/15/2022] [Indexed: 11/24/2022]
Abstract
In this paper, a novel path planning algorithm with Reinforcement Learning is proposed based on the topological map. The proposed algorithm has a two-level structure. At the first level, the proposed method generates the topological area using the region dynamic growth algorithm based on the grid map. In the next level, the Multi-SARSA method divided into two layers is applied to find a near-optimal global planning path, in which the artificial potential field method, first of all, is used to initialize the first Q table for faster learning speed, and then the second Q table is initialized with the connected domain obtained by topological map, which provides the prior information. A combination of the two algorithms makes the algorithm easier to converge. Simulation experiments for path planning have been executed. The results indicate that the method proposed in this paper can find the optimal path with a shorter path length, which demonstrates the effectiveness of the presented method.
Collapse
Affiliation(s)
- Shiguang Wen
- Faculty of Robot Science and Engineering, Northeastern University, Shenyang 110169, China; (S.W.); (B.C.)
| | - Yufan Jiang
- College of Information Science and Engineering, Northeastern University, Shenyang 110819, China;
- Correspondence: (Y.J.); (F.W.); Tel.: +86-135-5576-7827 (Y.J.); +86-248-368-8325 (F.W.)
| | - Ben Cui
- Faculty of Robot Science and Engineering, Northeastern University, Shenyang 110169, China; (S.W.); (B.C.)
| | - Ke Gao
- College of Information Science and Engineering, Northeastern University, Shenyang 110819, China;
| | - Fei Wang
- Faculty of Robot Science and Engineering, Northeastern University, Shenyang 110169, China; (S.W.); (B.C.)
- Correspondence: (Y.J.); (F.W.); Tel.: +86-135-5576-7827 (Y.J.); +86-248-368-8325 (F.W.)
| |
Collapse
|
4
|
Path Planning Strategy for a Manipulator Based on a Heuristically Constructed Network. MACHINES 2022. [DOI: 10.3390/machines10020071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Collision-free path planning of manipulators is becoming indispensable for space exploration and on-orbit operation. Manipulators in these scenarios are restrained in terms of computing resources and storage, so the path planning method used in such tasks is usually limited in its operating time and the amount of data transmission. In this paper, a heuristically constructed network (HCN) construction strategy is proposed. The HCN construction contains three steps: determining the number of hub configurations and selecting and connecting hub configurations. Considering the connection time and connectivity of HCN, the number of hub configurations is determined first. The selection of hub configurations includes the division of work space and the optimization of the hub configurations. The work space can be divided by considering comprehensively the similarity among the various configurations within the same region, the dissimilarity among all regions, and the correlation among adjacent regions. The hub configurations can be selected by establishing and solving the optimization model. Finally, these hub configurations are connected to obtain the HCN. The simulation indicates that the path points number and the planning time is decreased by 45.5% and 48.4%, respectively, which verify the correctness and effectiveness of the proposed path planning strategy based on the HCN.
Collapse
|
5
|
Wu Z, Qiu K, Yuan T, Chen H. A method to keep autonomous vehicles steadily drive based on lane detection. INT J ADV ROBOT SYST 2021. [DOI: 10.1177/17298814211002974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open
Abstract
Existing studies on autonomous driving methods focus on the fusion of onboard sensor data. However, the driving behavior might be unsteady because of the uncertainties of environments. In this article, an expectation line is proposed to quantify the driving behavior motivated by the driving continuity of human drivers. Furthermore, the smooth driving could be achieved by predicting the future trajectory of the expectation line. First, a convolutional neural network-based method is applied to detect lanes in images sampled from driving video. Second, the expectation line is defined to model driving behavior of an autonomous vehicle. Finally, the long short-term memory-based method is applied to the expectation line so that the future trajectory of the vehicle could be predicted. By incorporating convolutional neural network- and long short-term memory-based methods, the autonomous vehicles could smoothly drive because of the prior information. The proposed method is evaluated using driving video data, and the experimental results demonstrate that the proposed method outperforms methods without trajectory predictions.
Collapse
Affiliation(s)
- Zhenyu Wu
- School of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing, China
| | - Kai Qiu
- School of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing, China
| | - Tingning Yuan
- School of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing, China
| | - Hongmei Chen
- College of Electrical Engineering, Henan University of Technology, Zhengzhou, Henan, China
| |
Collapse
|
6
|
Zhang X, Huang Y, Rong Y, Li G, Wang H, Liu C. Optimal Trajectory Planning for Wheeled Mobile Robots under Localization Uncertainty and Energy Efficiency Constraints. SENSORS 2021; 21:s21020335. [PMID: 33419009 PMCID: PMC7825277 DOI: 10.3390/s21020335] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Revised: 12/27/2020] [Accepted: 01/01/2021] [Indexed: 11/16/2022]
Abstract
With the rapid development of robotics, wheeled mobile robots are widely used in smart factories to perform navigation tasks. In this paper, an optimal trajectory planning method based on an improved dolphin swarm algorithm is proposed to balance localization uncertainty and energy efficiency, such that a minimum total cost trajectory is obtained for wheeled mobile robots. Since environmental information has different effects on the robot localization process at different positions, a novel localizability measure method based on the likelihood function is presented to explicitly quantify the localization ability of the robot over a prior map. To generate the robot trajectory, we incorporate localizability and energy efficiency criteria into the parameterized trajectory as the cost function. In terms of trajectory optimization issues, an improved dolphin swarm algorithm is then proposed to generate better localization performance and more energy efficiency trajectories. It utilizes the proposed adaptive step strategy and learning strategy to minimize the cost function during the robot motions. Simulations are carried out in various autonomous navigation scenarios to validate the efficiency of the proposed trajectory planning method. Experiments are performed on the prototype “Forbot” four-wheel independently driven-steered mobile robot; the results demonstrate that the proposed method effectively improves energy efficiency while reducing localization errors along the generated trajectory.
Collapse
Affiliation(s)
- Xiaolong Zhang
- School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China; (X.Z.); (Y.H.); (Y.R.); (H.W.)
| | - Yu Huang
- School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China; (X.Z.); (Y.H.); (Y.R.); (H.W.)
| | - Youmin Rong
- School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China; (X.Z.); (Y.H.); (Y.R.); (H.W.)
| | - Gen Li
- Guangzhou Institute of Advanced Technology, Chinese Academy of Sciences, Guangzhou 511400, China;
| | - Hui Wang
- School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China; (X.Z.); (Y.H.); (Y.R.); (H.W.)
| | - Chao Liu
- School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China; (X.Z.); (Y.H.); (Y.R.); (H.W.)
- Correspondence:
| |
Collapse
|
7
|
Zhuge C, Liu J, Guo D, Cui Y. Phototropism rapidly exploring random tree: An efficient rapidly exploring random tree approach based on the phototropism of plants. INT J ADV ROBOT SYST 2020. [DOI: 10.1177/1729881420945213] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Inspired by the phototropism of plants, a novel variant of the rapidly exploring random tree algorithm as called phototropism rapidly exploring random tree is proposed. The phototropism rapidly exploring random tree algorithm expands less tree nodes during the exploration period and has shorter path length than the original rapidly exploring random tree algorithm. In the algorithm, a virtual light source is set up at the goal point, and a light beam propagation method is adopted on the map to generate a map of light intensity distribution. The phototropism rapidly exploring random tree expands its node toward the space where the light intensity is higher, while the original rapidly exploring random tree expands its node based on the uniform sampling strategy. The performance of the phototropism rapidly exploring random tree is tested in three scenarios which include the simulation environment and the real-world environment. The experimental results show that the proposed phototropism rapidly exploring random tree algorithm has a higher sampling efficiency than the original rapidly exploring random tree, and the path length is close to the optimal solution of the rapidly exploring random tree* algorithm without considering the non-holonomic constraint.
Collapse
Affiliation(s)
- Chengchen Zhuge
- Department of Computer Information and Cyber Security, Jiangsu Police Institute, Nanjing, China
| | - Jiayin Liu
- Department of Computer Information and Cyber Security, Jiangsu Police Institute, Nanjing, China
| | - Dongyan Guo
- College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, Zhejiang, China
| | - Ying Cui
- College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, Zhejiang, China
| |
Collapse
|
8
|
Li J, Yao L, Xu X, Cheng B, Ren J. Deep reinforcement learning for pedestrian collision avoidance and human-machine cooperative driving. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.03.105] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
9
|
Ali H, Gong D, Wang M, Dai X. Path Planning of Mobile Robot With Improved Ant Colony Algorithm and MDP to Produce Smooth Trajectory in Grid-Based Environment. Front Neurorobot 2020; 14:44. [PMID: 32733227 PMCID: PMC7363842 DOI: 10.3389/fnbot.2020.00044] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Accepted: 05/27/2020] [Indexed: 11/17/2022] Open
Abstract
This approach has been derived mainly to improve quality and efficiency of global path planning for a mobile robot with unknown static obstacle avoidance features in grid-based environment. The quality of the global path in terms of smoothness, path consistency and safety can affect the autonomous behavior of a robot. In this paper, the efficiency of Ant Colony Optimization (ACO) algorithm has improved with additional assistance of A* Multi-Directional algorithm. In the first part, A* Multi-directional algorithm starts to search in map and stores the best nodes area between start and destination with optimal heuristic value and that area of nodes has been chosen for path search by ACO to avoid blind search at initial iterations. The path obtained in grid-based environment consist of points in Cartesian coordinates connected through line segments with sharp bends. Therefore, Markov Decision Process (MDP) trajectory evaluation model is introduced with a novel reward policy to filter and reduce the sharpness in global path generated in grid environment. With arc-length parameterization, a curvilinear smooth route has been generated among filtered waypoints and produces consistency and smoothness in the global path. To achieve a comfort drive and safety for robot, lateral and longitudinal control has been utilized to form a set of optimal trajectories along the reference route, as well as, minimizing total cost. The total cost includes curvature, lateral and longitudinal coordinates constraints. Additionally, for collision detection, at every step the set of optimal local trajectories have been checked for any unexpected obstacle. The results have been verified through simulations in MATLAB compared with previous global path planning algorithms to differentiate the efficiency and quality of derived approach in different constraint environments.
Collapse
Affiliation(s)
- Hub Ali
- School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Dawei Gong
- School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Meng Wang
- School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Xiaolin Dai
- School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
10
|
Shi Z, Pan Q, Xu M. LSTM-Cubic A*-based auxiliary decision support system in air traffic management. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.12.062] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
11
|
D’Angelo G, Tipaldi M, Palmieri F, Glielmo L. A data-driven approximate dynamic programming approach based on association rule learning: Spacecraft autonomy as a case study. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2019.07.067] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
12
|
Hierarchical automatic curriculum learning: Converting a sparse reward navigation task into dense reward. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.06.024] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
13
|
Integrating a Path Planner and an Adaptive Motion Controller for Navigation in Dynamic Environments. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9071384] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Since an individual approach can hardly navigate robots through complex environments, we present a novel two-level hierarchical framework called JPS-IA3C (Jump Point Search improved Asynchronous Advantage Actor-Critic) in this paper for robot navigation in dynamic environments through continuous controlling signals. Its global planner JPS+ (P) is a variant of JPS (Jump Point Search), which efficiently computes an abstract path of neighboring jump points. These nodes, which are seen as subgoals, completely rid Deep Reinforcement Learning (DRL)-based controllers of notorious local minima. To satisfy the kinetic constraints and be adaptive to changing environments, we propose an improved A3C (IA3C) algorithm to learn the control policies of the robots’ local motion. Moreover, the combination of modified curriculum learning and reward shaping helps IA3C build a novel reward function framework to avoid learning inefficiency because of sparse reward. We additionally strengthen the robots’ temporal reasoning of the environments by a memory-based network. These improvements make the IA3C controller converge faster and become more adaptive to incomplete, noisy information caused by partial observability. Simulated experiments show that compared with existing methods, this JPS-IA3C hierarchy successfully outputs continuous commands to accomplish large-range navigation tasks at shorter paths and less time through reasonable subgoal selection and rational motions.
Collapse
|
14
|
Zhou Y, van Kampen EJ, Chu Q. Hybrid Hierarchical Reinforcement Learning for online guidance and navigation with partial observability. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.11.072] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
15
|
Mac TT, Copot C, Tran DT, Keyser RD. A hierarchical global path planning approach for mobile robots based on multi-objective particle swarm optimization. Appl Soft Comput 2017. [DOI: 10.1016/j.asoc.2017.05.012] [Citation(s) in RCA: 62] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|