1
|
Object Semantic Grid Mapping with 2D LiDAR and RGB-D Camera for Domestic Robot Navigation. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10175782] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Occupied grid maps are sufficient for mobile robots to complete metric navigation tasks in domestic environments. However, they lack semantic information to endow the robots with the ability of social goal selection and human-friendly operation modes. In this paper, we propose an object semantic grid mapping system with 2D Light Detection and Ranging (LiDAR) and RGB-D sensors to solve this problem. At first, we use a laser-based Simultaneous Localization and Mapping (SLAM) to generate an occupied grid map and obtain a robot trajectory. Then, we employ object detection to get an object’s semantics of color images and use joint interpolation to refine camera poses. Based on object detection, depth images, and interpolated poses, we build a point cloud with object instances. To generate object-oriented minimum bounding rectangles, we propose a method for extracting the dominant directions of the room. Furthermore, we build object goal spaces to help the robots select navigation goals conveniently and socially. We have used the Robot@Home dataset to verify the system; the verification results show that our system is effective.
Collapse
|
2
|
Arkin J, Park D, Roy S, Walter MR, Roy N, Howard TM, Paul R. Multimodal estimation and communication of latent semantic knowledge for robust execution of robot instructions. Int J Rob Res 2020. [DOI: 10.1177/0278364920917755] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The goal of this article is to enable robots to perform robust task execution following human instructions in partially observable environments. A robot’s ability to interpret and execute commands is fundamentally tied to its semantic world knowledge. Commonly, robots use exteroceptive sensors, such as cameras or LiDAR, to detect entities in the workspace and infer their visual properties and spatial relationships. However, semantic world properties are often visually imperceptible. We posit the use of non-exteroceptive modalities including physical proprioception, factual descriptions, and domain knowledge as mechanisms for inferring semantic properties of objects. We introduce a probabilistic model that fuses linguistic knowledge with visual and haptic observations into a cumulative belief over latent world attributes to infer the meaning of instructions and execute the instructed tasks in a manner robust to erroneous, noisy, or contradictory evidence. In addition, we provide a method that allows the robot to communicate knowledge dissonance back to the human as a means of correcting errors in the operator’s world model. Finally, we propose an efficient framework that anticipates possible linguistic interactions and infers the associated groundings for the current world state, thereby bootstrapping both language understanding and generation. We present experiments on manipulators for tasks that require inference over partially observed semantic properties, and evaluate our framework’s ability to exploit expressed information and knowledge bases to facilitate convergence, and generate statements to correct declared facts that were observed to be inconsistent with the robot’s estimate of object properties.
Collapse
Affiliation(s)
- Jacob Arkin
- Robotics and Artificial Intelligence Laboratory, University of Rochester, USA
| | - Daehyung Park
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, USA
| | - Subhro Roy
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, USA
| | - Matthew R Walter
- Robot Intelligence through Perception Laboratory, Toyota Technological Institute at Chicago, USA
| | - Nicholas Roy
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, USA
| | - Thomas M Howard
- Robotics and Artificial Intelligence Laboratory, University of Rochester, USA
| | - Rohan Paul
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, USA
- Department of Computer Science and Engineering, Indian Institute of Technology Delhi, India
| |
Collapse
|
3
|
Qi X, Wang W, Yuan M, Wang Y, Li M, Xue L, Sun Y. Building semantic grid maps for domestic robot navigation. INT J ADV ROBOT SYST 2020. [DOI: 10.1177/1729881419900066] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
This article proposes a semantic grid mapping method for domestic robot navigation. Occupancy grid maps are sufficient for mobile robots to complete point-to-point navigation tasks in 2-D small-scale environments. However, when used in the real domestic scene, grid maps are lack of semantic information for end users to specify navigation tasks conveniently. Semantic grid maps, enhancing the occupancy grid map with the semantics of objects and rooms, endowing the robots with the capacity of robust navigation skills and human-friendly operation modes, are thus proposed to overcome this limitation. In our method, an object semantic grid map is built with low-cost sonar and binocular stereovision sensors by correctly fusing the occupancy grid map and object point clouds. Topological spaces of each object are defined to make robots autonomously select navigation destinations. Based on the domestic common sense of the relationship between rooms and objects, topological segmentation is used to get room semantics. Our method is evaluated in a real homelike environment, and the results show that the generated map is at a satisfactory precision and feasible for a domestic mobile robot to complete navigation tasks commanded in natural language with a high success rate.
Collapse
Affiliation(s)
- Xianyu Qi
- Robotics Institute, Beihang University, Beijing, China
| | - Wei Wang
- Robotics Institute, Beihang University, Beijing, China
| | - Mei Yuan
- Beijing Evolver Robotics Technology Co., Ltd, Beijing, China
| | - Yuliang Wang
- Robotics Institute, Beihang University, Beijing, China
| | - Mingbo Li
- Beijing Evolver Robotics Technology Co., Ltd, Beijing, China
| | - Lin Xue
- Beijing Evolver Robotics Technology Co., Ltd, Beijing, China
| | - Yingpin Sun
- Beijing Evolver Robotics Technology Co., Ltd, Beijing, China
| |
Collapse
|
4
|
PDDL Planning with Natural Language-Based Scene Understanding for UAV-UGV Cooperation. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9183789] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Natural-language-based scene understanding can enable heterogeneous robots to cooperate efficiently in large and unconstructed environments. However, studies on symbolic planning rarely consider the semantic knowledge acquisition problem associated with the surrounding environments. Further, recent developments in deep learning methods show outstanding performance for semantic scene understanding using natural language. In this paper, a cooperation framework that connects deep learning techniques and a symbolic planner for heterogeneous robots is proposed. The framework is largely composed of the scene understanding engine, planning agent, and knowledge engine. We employ neural networks for natural-language-based scene understanding to share environmental information among robots. We then generate a sequence of actions for each robot using a planning domain definition language planner. JENA-TDB is used for knowledge acquisition storage. The proposed method is validated using simulation results obtained from one unmanned aerial and three ground vehicles.
Collapse
|
5
|
Tamosiunaite M, Aein MJ, Braun JM, Kulvicius T, Markievicz I, Kapociute-Dzikiene J, Valteryte R, Haidu A, Chrysostomou D, Ridge B, Krilavicius T, Vitkute-Adzgauskiene D, Beetz M, Madsen O, Ude A, Krüger N, Wörgötter F. Cut & recombine: reuse of robot action components based on simple language instructions. Int J Rob Res 2019. [DOI: 10.1177/0278364919865594] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Human beings can generalize from one action to similar ones. Robots cannot do this and progress concerning information transfer between robotic actions is slow. We have designed a system that performs action generalization for manipulation actions in different scenarios. It relies on an action representation for which we perform code-snippet replacement, combining information from different actions to form new ones. The system interprets human instructions via a parser using simplified language. It uses action and object names to index action data tables (ADTs), where execution-relevant information is stored. We have created an ADT database from three different sources (KUKA LWR, UR5, and simulation) and show how a new ADT is generated by cutting and recombining data from existing ADTs. To achieve this, a small set of action templates is used. After parsing a new instruction, index-based searching finds similar ADTs in the database. Then the action template of the new action is matched against the information in the similar ADTs. Code snippets are extracted and ranked according to matching quality. The new ADT is created by concatenating code snippets from best matches. For execution, only coordinate transforms are needed to account for the poses of the objects in the new scene. The system was evaluated, without additional error correction, using 45 unknown objects in 81 new action executions, with 80% success. We then extended the method including more detailed shape information, which further reduced errors. This demonstrates that cut & recombine is a viable approach for action generalization in service robotic applications.
Collapse
Affiliation(s)
- Minija Tamosiunaite
- Department for Computational Neuroscience, Inst. Physics-3, Georg-August-Universität Göttingen, Germany
- Faculty of Informatics, Vytautas Magnus University, Lithuania
| | - Mohamad Javad Aein
- Department for Computational Neuroscience, Inst. Physics-3, Georg-August-Universität Göttingen, Germany
| | - Jan Matthias Braun
- Department for Computational Neuroscience, Inst. Physics-3, Georg-August-Universität Göttingen, Germany
| | - Tomas Kulvicius
- Department for Computational Neuroscience, Inst. Physics-3, Georg-August-Universität Göttingen, Germany
| | | | | | - Rita Valteryte
- Faculty of Informatics, Vytautas Magnus University, Lithuania
| | - Andrei Haidu
- Institute for Artificial Intelligence, University of Bremen, Germany
| | - Dimitrios Chrysostomou
- Department of Materials & Production, Robotics and Automation Group, Aalborg University, Denmark
| | - Barry Ridge
- Department of Automatics, Biocybernetics, and Robotics, Jožef Stefan Institute, Slovenia
| | | | | | - Michael Beetz
- Institute for Artificial Intelligence, University of Bremen, Germany
| | - Ole Madsen
- Department of Materials & Production, Robotics and Automation Group, Aalborg University, Denmark
| | - Ales Ude
- Department of Automatics, Biocybernetics, and Robotics, Jožef Stefan Institute, Slovenia
| | - Norbert Krüger
- Maersk Mc-Kinney Moeller Institut, South Denmark University, Denmark
| | - Florentin Wörgötter
- Department for Computational Neuroscience, Inst. Physics-3, Georg-August-Universität Göttingen, Germany
| |
Collapse
|
6
|
|
7
|
Flores JG, Meza I, Colin É, Gardent C, Gangemi A, Pineda LA. Robot experience stories: First person generation of robotic task narratives in SitLog1. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2018. [DOI: 10.3233/jifs-169511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Jorge Garcia Flores
- LIPN-CNRS-Université Paris 13, 99 av. Jean-Baptiste Clément, Villetaneuse, France
| | - Iván Meza
- IIMAS-UNAM, Circuito Escolar 3000, Cd.Universitaria, 04510 CDMX, Mexico
| | - Émilie Colin
- CNRS-LORIA, Campus Scientifique BP 239, 54506 Vandoeuvre-lès-Nancy Cedex, France
| | - Claire Gardent
- CNRS-LORIA, Campus Scientifique BP 239, 54506 Vandoeuvre-lès-Nancy Cedex, France
| | - Aldo Gangemi
- LIPN-CNRS-Université Paris 13, 99 av. Jean-Baptiste Clément, Villetaneuse, France
| | - Luis A. Pineda
- IIMAS-UNAM, Circuito Escolar 3000, Cd.Universitaria, 04510 CDMX, Mexico
| |
Collapse
|
8
|
Landsiedel C, Rieser V, Walter M, Wollherr D. A review of spatial reasoning and interaction for real-world robotics. Adv Robot 2017. [DOI: 10.1080/01691864.2016.1277554] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- C. Landsiedel
- Chair of Automatic Control Engineering, Technical University of Munich, Munich, Germany
| | - V. Rieser
- School of Mathematical and Computer Sciences (MACS), Heriot-Watt-University, Edinburgh, UK
| | - M. Walter
- Toyota Technological Institute at Chicago, Chicago, IL, USA
| | - D. Wollherr
- Chair of Automatic Control Engineering, Technical University of Munich, Munich, Germany
| |
Collapse
|
9
|
Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera. SENSORS 2016; 16:s16122117. [PMID: 27983604 PMCID: PMC5191097 DOI: 10.3390/s16122117] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Revised: 11/24/2016] [Accepted: 12/07/2016] [Indexed: 12/04/2022]
Abstract
Controlling robots by natural language (NL) is increasingly attracting attention for its versatility, convenience and no need of extensive training for users. Grounding is a crucial challenge of this problem to enable robots to understand NL instructions from humans. This paper mainly explores the object grounding problem and concretely studies how to detect target objects by the NL instructions using an RGB-D camera in robotic manipulation applications. In particular, a simple yet robust vision algorithm is applied to segment objects of interest. With the metric information of all segmented objects, the object attributes and relations between objects are further extracted. The NL instructions that incorporate multiple cues for object specifications are parsed into domain-specific annotations. The annotations from NL and extracted information from the RGB-D camera are matched in a computational state estimation framework to search all possible object grounding states. The final grounding is accomplished by selecting the states which have the maximum probabilities. An RGB-D scene dataset associated with different groups of NL instructions based on different cognition levels of the robot are collected. Quantitative evaluations on the dataset illustrate the advantages of the proposed method. The experiments of NL controlled object manipulation and NL-based task programming using a mobile manipulator show its effectiveness and practicability in robotic applications.
Collapse
|
10
|
|