1
|
Liu X, Zhang T, Liu M. Joint estimation of pose, depth, and optical flow with a competition-cooperation transformer network. Neural Netw 2024; 171:263-275. [PMID: 38103436 DOI: 10.1016/j.neunet.2023.12.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 10/31/2023] [Accepted: 12/12/2023] [Indexed: 12/19/2023]
Abstract
Estimating depth, ego-motion, and optical flow from consecutive frames is a critical task in robot navigation and has received significant attention in recent years. In this study, we propose PDF-Former, an unsupervised joint estimation network comprising a full transformer-based framework, as well as a competition and cooperation mechanism. The transformer framework captures global feature dependencies and is customized for different task types, thereby improving the performance of sequential tasks. The competition and cooperation mechanisms enable the network to obtain additional supervisory information at different training stages. Specifically, the competition mechanism is implemented early in training to achieve iterative optimization of 6 DOF poses (rotation and translation information from the target image to the two reference images), the depth of target image, and optical flow (from the target image to the two reference images) estimation in a competitive manner. In contrast, the cooperation mechanism is implemented later in training to facilitate the transmission of results among the three networks and mutually optimize the estimation results. We conducted experiments on the KITTI dataset, and the results indicate that PDF-Former has significant potential to enhance the accuracy and robustness of sequential tasks in robot navigation.
Collapse
Affiliation(s)
- Xiaochen Liu
- School of Instrument Science & Engineering, Southeast University, Nanjing, 210096, China; Key Laboratory of Micro-Inertial Instrument and Advanced Navigation Technology, Ministry of Education, Southeast University, Nanjing, 210096, Jiangsu, China
| | - Tao Zhang
- School of Instrument Science & Engineering, Southeast University, Nanjing, 210096, China; Key Laboratory of Micro-Inertial Instrument and Advanced Navigation Technology, Ministry of Education, Southeast University, Nanjing, 210096, Jiangsu, China.
| | - Mingming Liu
- Department of Orthopedic Surgery, The Second People's Hospital of Lianyungang, Lianyungang, 222003, Jiangsu, China; Department of Orthopedic Surgery, The First People's Hospital of Xining, Xining, 810000, Qinghai, China.
| |
Collapse
|
2
|
Blackwell KT, Doya K. Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks. PLoS Comput Biol 2023; 19:e1011385. [PMID: 37594982 PMCID: PMC10479916 DOI: 10.1371/journal.pcbi.1011385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 09/05/2023] [Accepted: 07/25/2023] [Indexed: 08/20/2023] Open
Abstract
A major advance in understanding learning behavior stems from experiments showing that reward learning requires dopamine inputs to striatal neurons and arises from synaptic plasticity of cortico-striatal synapses. Numerous reinforcement learning models mimic this dopamine-dependent synaptic plasticity by using the reward prediction error, which resembles dopamine neuron firing, to learn the best action in response to a set of cues. Though these models can explain many facets of behavior, reproducing some types of goal-directed behavior, such as renewal and reversal, require additional model components. Here we present a reinforcement learning model, TD2Q, which better corresponds to the basal ganglia with two Q matrices, one representing direct pathway neurons (G) and another representing indirect pathway neurons (N). Unlike previous two-Q architectures, a novel and critical aspect of TD2Q is to update the G and N matrices utilizing the temporal difference reward prediction error. A best action is selected for N and G using a softmax with a reward-dependent adaptive exploration parameter, and then differences are resolved using a second selection step applied to the two action probabilities. The model is tested on a range of multi-step tasks including extinction, renewal, discrimination; switching reward probability learning; and sequence learning. Simulations show that TD2Q produces behaviors similar to rodents in choice and sequence learning tasks, and that use of the temporal difference reward prediction error is required to learn multi-step tasks. Blocking the update rule on the N matrix blocks discrimination learning, as observed experimentally. Performance in the sequence learning task is dramatically improved with two matrices. These results suggest that including additional aspects of basal ganglia physiology can improve the performance of reinforcement learning models, better reproduce animal behaviors, and provide insight as to the role of direct- and indirect-pathway striatal neurons.
Collapse
Affiliation(s)
- Kim T Blackwell
- Department of Bioengineering, Volgenau School of Engineering, George Mason University, Fairfax, Virginia, United States of America
| | - Kenji Doya
- Neural Computation Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| |
Collapse
|
3
|
Tsantekidis A, Passalis N, Tefas A. Modeling limit order trading with a continuous action policy for deep reinforcement learning. Neural Netw 2023; 165:506-515. [PMID: 37348431 DOI: 10.1016/j.neunet.2023.05.051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 01/20/2023] [Accepted: 05/28/2023] [Indexed: 06/24/2023]
Abstract
Limit Orders allow buyers and sellers to set a "limit price" they are willing to accept in a trade. On the other hand, market orders allow for immediate execution at any price. Thus, market orders are susceptible to slippage, which is the additional cost incurred due to the unfavorable execution of a trade order. As a result, limit orders are often preferred, since they protect traders from excessive slippage costs due to larger than expected price fluctuations. Despite the price guarantees of limit orders, they are more complex compared to market orders. Orders with overly optimistic limit prices might never be executed, which increases the risk of employing limit orders in Machine Learning (ML)-based trading systems. Indeed, the current ML literature for trading almost exclusively relies on market orders. To overcome this limitation, a Deep Reinforcement Learning (DRL) approach is proposed to model trading agents that use limit orders. The proposed method (a) uses a framework that employs a continuous probability distribution to model limit prices, while (b) provides the ability to place market orders when the risk of no execution is more significant than the cost of slippage. Extensive experiments are conducted with multiple currency pairs, using hourly price intervals, validating the effectiveness of the proposed method and paving the way for introducing limit order modeling in DRL-based trading.
Collapse
Affiliation(s)
- Avraam Tsantekidis
- School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece.
| | - Nikolaos Passalis
- School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece.
| | - Anastasios Tefas
- School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece.
| |
Collapse
|
4
|
Dulberg Z, Dubey R, Berwian IM, Cohen JD. Having multiple selves helps learning agents explore and adapt in complex changing worlds. Proc Natl Acad Sci U S A 2023; 120:e2221180120. [PMID: 37399387 PMCID: PMC10334746 DOI: 10.1073/pnas.2221180120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 05/09/2023] [Indexed: 07/05/2023] Open
Abstract
Satisfying a variety of conflicting needs in a changing environment is a fundamental challenge for any adaptive agent. Here, we show that designing an agent in a modular fashion as a collection of subagents, each dedicated to a separate need, powerfully enhanced the agent's capacity to satisfy its overall needs. We used the formalism of deep reinforcement learning to investigate a biologically relevant multiobjective task: continually maintaining homeostasis of a set of physiologic variables. We then conducted simulations in a variety of environments and compared how modular agents performed relative to standard monolithic agents (i.e., agents that aimed to satisfy all needs in an integrated manner using a single aggregate measure of success). Simulations revealed that modular agents a) exhibited a form of exploration that was intrinsic and emergent rather than extrinsically imposed; b) were robust to changes in nonstationary environments, and c) scaled gracefully in their ability to maintain homeostasis as the number of conflicting objectives increased. Supporting analysis suggested that the robustness to changing environments and increasing numbers of needs were due to intrinsic exploration and efficiency of representation afforded by the modular architecture. These results suggest that the normative principles by which agents have adapted to complex changing environments may also explain why humans have long been described as consisting of "multiple selves."
Collapse
Affiliation(s)
- Zack Dulberg
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ08544
| | - Rachit Dubey
- Department of Computer Science, Princeton University, Princeton, NJ08544
| | - Isabel M. Berwian
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ08544
| | - Jonathan D. Cohen
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ08544
| |
Collapse
|
5
|
Short WD, Olutoye OO, Padon BW, Parikh UM, Colchado D, Vangapandu H, Shams S, Chi T, Jung JP, Balaji S. Advances in non-invasive biosensing measures to monitor wound healing progression. Front Bioeng Biotechnol 2022; 10:952198. [PMID: 36213059 PMCID: PMC9539744 DOI: 10.3389/fbioe.2022.952198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 07/12/2022] [Indexed: 01/09/2023] Open
Abstract
Impaired wound healing is a significant financial and medical burden. The synthesis and deposition of extracellular matrix (ECM) in a new wound is a dynamic process that is constantly changing and adapting to the biochemical and biomechanical signaling from the extracellular microenvironments of the wound. This drives either a regenerative or fibrotic and scar-forming healing outcome. Disruptions in ECM deposition, structure, and composition lead to impaired healing in diseased states, such as in diabetes. Valid measures of the principal determinants of successful ECM deposition and wound healing include lack of bacterial contamination, good tissue perfusion, and reduced mechanical injury and strain. These measures are used by wound-care providers to intervene upon the healing wound to steer healing toward a more functional phenotype with improved structural integrity and healing outcomes and to prevent adverse wound developments. In this review, we discuss bioengineering advances in 1) non-invasive detection of biologic and physiologic factors of the healing wound, 2) visualizing and modeling the ECM, and 3) computational tools that efficiently evaluate the complex data acquired from the wounds based on basic science, preclinical, translational and clinical studies, that would allow us to prognosticate healing outcomes and intervene effectively. We focus on bioelectronics and biologic interfaces of the sensors and actuators for real time biosensing and actuation of the tissues. We also discuss high-resolution, advanced imaging techniques, which go beyond traditional confocal and fluorescence microscopy to visualize microscopic details of the composition of the wound matrix, linearity of collagen, and live tracking of components within the wound microenvironment. Computational modeling of the wound matrix, including partial differential equation datasets as well as machine learning models that can serve as powerful tools for physicians to guide their decision-making process are discussed.
Collapse
Affiliation(s)
- Walker D. Short
- Laboratory for Regenerative Tissue Repair, Division of Pediatric Surgery, Department of Surgery, Texas Children’s Hospital and Baylor College of Medicine, Houston, TX, United States
| | - Oluyinka O. Olutoye
- Laboratory for Regenerative Tissue Repair, Division of Pediatric Surgery, Department of Surgery, Texas Children’s Hospital and Baylor College of Medicine, Houston, TX, United States
| | - Benjamin W. Padon
- Laboratory for Regenerative Tissue Repair, Division of Pediatric Surgery, Department of Surgery, Texas Children’s Hospital and Baylor College of Medicine, Houston, TX, United States
| | - Umang M. Parikh
- Laboratory for Regenerative Tissue Repair, Division of Pediatric Surgery, Department of Surgery, Texas Children’s Hospital and Baylor College of Medicine, Houston, TX, United States
| | - Daniel Colchado
- Laboratory for Regenerative Tissue Repair, Division of Pediatric Surgery, Department of Surgery, Texas Children’s Hospital and Baylor College of Medicine, Houston, TX, United States
| | - Hima Vangapandu
- Laboratory for Regenerative Tissue Repair, Division of Pediatric Surgery, Department of Surgery, Texas Children’s Hospital and Baylor College of Medicine, Houston, TX, United States
| | - Shayan Shams
- Department of Applied Data Science, San Jose State University, San Jose, CA, United States
- School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, United States
| | - Taiyun Chi
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, United States
| | - Jangwook P. Jung
- Department of Biological Engineering, Louisiana State University, Baton Rouge, LA, United States
| | - Swathi Balaji
- Laboratory for Regenerative Tissue Repair, Division of Pediatric Surgery, Department of Surgery, Texas Children’s Hospital and Baylor College of Medicine, Houston, TX, United States
- *Correspondence: Swathi Balaji,
| |
Collapse
|
6
|
Mobile Robot Application with Hierarchical Start Position DQN. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:4115767. [PMID: 36105641 PMCID: PMC9467786 DOI: 10.1155/2022/4115767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 07/26/2022] [Accepted: 08/03/2022] [Indexed: 11/17/2022]
Abstract
Advances in deep learning significantly affect reinforcement learning, which results in the emergence of Deep RL (DRL). DRL does not need a data set and has the potential beyond the performance of human experts, resulting in significant developments in the field of artificial intelligence. However, because a DRL agent has to interact with the environment a lot while it is trained, it is difficult to be trained directly in the real environment due to the long training time, high cost, and possible material damage. Therefore, most or all of the training of DRL agents for real-world applications is conducted in virtual environments. This study focused on the difficulty in a mobile robot to reach its target by making a path plan in a real-world environment. The Minimalistic Gridworld virtual environment has been used for training the DRL agent, and to our knowledge, we have implemented the first real-world implementation for this environment. A DRL algorithm with higher performance than the classical Deep Q-network algorithm was created with the expanded environment. A mobile robot was designed for use in a real-world application. To match the virtual environment with the real environment, algorithms that can detect the position of the mobile robot and the target, as well as the rotation of the mobile robot, were created. As a result, a DRL-based mobile robot was developed that uses only the top view of the environment and can reach its target regardless of its initial position and rotation.
Collapse
|
7
|
Yuan Y, Hua L, Cheng Y, Li J, Sang X, Zhang L, Wei W. A novel model-based reinforcement learning algorithm for solving the problem of unbalanced reward. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-210956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Reward signal reinforcement learning algorithms can be used to solve sequential learning problems. However, in practice, they still suffer from the problem of reward imbalance, which limits their use in many contexts. To solve this unbalanced reward problem, in this paper, we propose a novel model-based reinforcement learning algorithm called the expected n-step value iteration (EnVI). Unlike traditional model-based reinforcement learning algorithms, the proposed method uses a new return function that changes the discount of future rewards while reducing the influence of the current reward. We evaluated the performance of the proposed algorithm on a Treasure-Hunting game and a Hill-Walking game. The results demonstrate that the proposed algorithm can reduce the negative impact of unbalanced rewards and greatly improve the performance of traditional reinforcement learning algorithms.
Collapse
Affiliation(s)
- Yinlong Yuan
- Department of College of Electrical Engineering, Nantong University, Nantong, China
| | - Liang Hua
- Department of College of Electrical Engineering, Nantong University, Nantong, China
| | - Yun Cheng
- Department of College of Electrical Engineering, Nantong University, Nantong, China
| | - Junhong Li
- Department of College of Electrical Engineering, Nantong University, Nantong, China
| | - Xiaohu Sang
- Department of College of Electrical Engineering, Nantong University, Nantong, China
| | - Lei Zhang
- Department of College of Electrical Engineering, Nantong University, Nantong, China
| | - Wu Wei
- Department of College of Automation Science and Engineering, South China University of Technology, Guangzhou, China
| |
Collapse
|
8
|
Application of an adapted FMEA framework for robot-inclusivity of built environments. Sci Rep 2022; 12:3408. [PMID: 35233018 PMCID: PMC8888750 DOI: 10.1038/s41598-022-06902-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 02/08/2022] [Indexed: 11/09/2022] Open
Abstract
Mobile robots are deployed in the built environment at increasing rates. However, lack of considerations for a robot-inclusive planning has led to physical spaces that would potentially pose hazards to robots, and contribute to an overall productivity decline for mobile service robots. This research proposes the use of an adapted Failure Mode and Effects Analysis (FMEA) as a structured tool to evaluate a building's level of robot-inclusivity and safety for service robot deployments. This Robot-Inclusive FMEA (RIFMEA) framework, is used to identify failures in the built environment that compromise the workflow of service robots, assess their effects and causes, and provide recommended actions to alleviate these problems. The method was supported with a case study of deploying telepresence robots in a university campus. The study concluded that common failures were related to poor furniture design, a lack of clearance and hazard indicators, and sub-optimal interior planning.
Collapse
|