1
|
Mushtaq A, Haq IU, Sarwar MA, Khan A, Khalil W, Mughal MA. Multi-Agent Reinforcement Learning for Traffic Flow Management of Autonomous Vehicles. SENSORS (BASEL, SWITZERLAND) 2023; 23:2373. [PMID: 36904577 PMCID: PMC10007156 DOI: 10.3390/s23052373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 02/12/2023] [Accepted: 02/16/2023] [Indexed: 06/18/2023]
Abstract
Intelligent traffic management systems have become one of the main applications of Intelligent Transportation Systems (ITS). There is a growing interest in Reinforcement Learning (RL) based control methods in ITS applications such as autonomous driving and traffic management solutions. Deep learning helps in approximating substantially complex nonlinear functions from complicated data sets and tackling complex control issues. In this paper, we propose an approach based on Multi-Agent Reinforcement Learning (MARL) and smart routing to improve the flow of autonomous vehicles on road networks. We evaluate Multi-Agent Advantage Actor-Critic (MA2C) and Independent Advantage Actor-Critical (IA2C), recently suggested Multi-Agent Reinforcement Learning techniques with smart routing for traffic signal optimization to determine its potential. We investigate the framework offered by non-Markov decision processes, enabling a more in-depth understanding of the algorithms. We conduct a critical analysis to observe the robustness and effectiveness of the method. The method's efficacy and reliability are demonstrated by simulations using SUMO, a software modeling tool for traffic simulations. We used a road network that contains seven intersections. Our findings show that MA2C, when trained on pseudo-random vehicle flows, is a viable methodology that outperforms competing techniques.
Collapse
Affiliation(s)
- Anum Mushtaq
- Pakistan Institute of Engineering and Applied Sciences, Islamabad 44000, Pakistan
| | - Irfan Ul Haq
- Pakistan Institute of Engineering and Applied Sciences, Islamabad 44000, Pakistan
| | | | - Asifullah Khan
- Pakistan Institute of Engineering and Applied Sciences, Islamabad 44000, Pakistan
- PIEAS Artificial Intelligence Center (PAIC), Islamabad 44000, Pakistan
| | - Wajeeha Khalil
- Department of CS and IT, University of Engineering and Technology, Peshawar 25000, Pakistan
| | - Muhammad Abid Mughal
- Pakistan Institute of Engineering and Applied Sciences, Islamabad 44000, Pakistan
| |
Collapse
|
2
|
Reinforcement Learning. Mach Learn 2021. [DOI: 10.1007/978-981-15-1967-3_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
3
|
Abstract
Trial-to-trial variability in the execution of movements and motor skills is ubiquitous and widely considered to be the unwanted consequence of a noisy nervous system. However, recent studies have suggested that motor variability may also be a feature of how sensorimotor systems operate and learn. This view, rooted in reinforcement learning theory, equates motor variability with purposeful exploration of motor space that, when coupled with reinforcement, can drive motor learning. Here we review studies that explore the relationship between motor variability and motor learning in both humans and animal models. We discuss neural circuit mechanisms that underlie the generation and regulation of motor variability and consider the implications that this work has for our understanding of motor learning.
Collapse
Affiliation(s)
- Ashesh K Dhawale
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138;
- Center for Brain Science, Harvard University, Cambridge, Massachusetts 02138
| | - Maurice A Smith
- Center for Brain Science, Harvard University, Cambridge, Massachusetts 02138
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts 02138
| | - Bence P Ölveczky
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138;
- Center for Brain Science, Harvard University, Cambridge, Massachusetts 02138
| |
Collapse
|
4
|
Kraemer L, Banerjee B. Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2016.01.031] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
5
|
|
6
|
Mobbs D, Hagan CC, Dalgleish T, Silston B, Prévost C. The ecology of human fear: survival optimization and the nervous system. Front Neurosci 2015; 9:55. [PMID: 25852451 PMCID: PMC4364301 DOI: 10.3389/fnins.2015.00055] [Citation(s) in RCA: 169] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2014] [Accepted: 02/07/2015] [Indexed: 01/04/2023] Open
Abstract
We propose a Survival Optimization System (SOS) to account for the strategies that humans and other animals use to defend against recurring and novel threats. The SOS attempts to merge ecological models that define a repertoire of contextually relevant threat induced survival behaviors with contemporary approaches to human affective science. We first propose that the goal of the nervous system is to reduce surprise and optimize actions by (i) predicting the sensory landscape by simulating possible encounters with threat and selecting the appropriate pre-encounter action and (ii) prevention strategies in which the organism manufactures safe environments. When a potential threat is encountered the (iii) threat orienting system is engaged to determine whether the organism ignores the stimulus or switches into a process of (iv) threat assessment, where the organism monitors the stimulus, weighs the threat value, predicts the actions of the threat, searches for safety, and guides behavioral actions crucial to directed escape. When under imminent attack, (v) defensive systems evoke fast reflexive indirect escape behaviors (i.e., fight or flight). This cascade of responses to threat of increasing magnitude are underwritten by an interconnected neural architecture that extends from cortical and hippocampal circuits, to attention, action and threat systems including the amygdala, striatum, and hard-wired defensive systems in the midbrain. The SOS also includes a modulatory feature consisting of cognitive appraisal systems that flexibly guide perception, risk and action. Moreover, personal and vicarious threat encounters fine-tune avoidance behaviors via model-based learning, with higher organisms bridging data to reduce face-to-face encounters with predators. Our model attempts to unify the divergent field of human affective science, proposing a highly integrated nervous system that has evolved to increase the organism's chances of survival.
Collapse
Affiliation(s)
- Dean Mobbs
- Department of Psychology, Columbia University New York, NY, USA
| | - Cindy C Hagan
- Department of Psychology, Columbia University New York, NY, USA
| | - Tim Dalgleish
- Medical Research Council-Cognition and Brain Sciences Unit Cambridge, UK
| | - Brian Silston
- Department of Psychology, Columbia University New York, NY, USA
| | | |
Collapse
|
7
|
Combining Learning Algorithms: An Approach to Markov Decision Processes. ENTERP INF SYST-UK 2013. [DOI: 10.1007/978-3-642-40654-6_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
8
|
Learning domain structure through probabilistic policy reuse in reinforcement learning. PROGRESS IN ARTIFICIAL INTELLIGENCE 2012. [DOI: 10.1007/s13748-012-0026-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
9
|
Takahashi S, Takahashi Y, Maeda Y, Nakamura T. Kicking Motion Imitation of Inverted-Pendulum Mobile Robot and Development of Body Mapping from Human Demonstrator. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS 2011. [DOI: 10.20965/jaciii.2011.p1030] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
This paper proposes a new method for learning the dynamic motion of an inverted-pendulum mobile robot from the observation of a human player’s demonstration. First, an inverted-pendulum mobile robot with upper and lower body links observes the human demonstration with a camera and extracts the human region in images. Second, the robot maps the region to its own two links and estimates link posture trajectories. The robot starts learning kicking based on the trajectory parameters for imitation. Through this process, our robot can learn dynamic kicking shown by a human. The mapping parameter gives an important role for successive imitation. A reasonable and feasible procedure of learning from observation for an inverted-pendulum robot is proposed. Learning performance from observation is investigated, then, the development of body mapping is proposed and investigated.
Collapse
|
10
|
Takano T, Takase H, Kawanaka H, Tsuruoka S. Merging with Extraction Method for Transfer Learning in Actor-Critic. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS 2011. [DOI: 10.20965/jaciii.2011.p0814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
This paper aims to accelerate learning process of actor-critic method, which is one of the major reinforcement learning algorithms, by a transfer learning. Transfer learning accelerates learning processes for the target task by reusing knowledge of source policies for each source task. In general, it consists of a selection phase and a training phase. Agents select source policies that are similar to the target one without trial and error, and train the target task by referring selected policies. In this paper, we discuss the training phase, and the rest of the training algorithm is based on our previous method. We proposed the effective transfer method that consists of the extractionmethod and the mergingmethod. Agents extract action preferences that are related to reliable states, and state values that lead to preferred states. Extracted parameters are merged into the current parameters by taking weighted average. We apply the proposed algorithm to simple maze tasks, and show the effectiveness of the proposed method: reduce 16% episodes and 55% failures without transfer.
Collapse
|
11
|
|
12
|
Tamura Y, Takahashi Y, Asada M. Observed Body Clustering for Imitation Based on Value System. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS 2010. [DOI: 10.20965/jaciii.2010.p0802] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In order to develop skills, actions, and behavior in a human symbiotic environment, a robot must learn something from behavior observation of predecessors or humans. Recently, robotic imitation methods based on many approaches have been proposed. We have proposed reinforcement learning based approaches for the imitation and investigated them under an assumption that an observer recognizes the body parts of the performer and maps them to the ones of its own. However, the assumption is not always applicable because of physical differences between the performer and the observer. In order to learn various behaviors from the observation, the robot has to cluster the observed body area of the performer on the camera image and maps the clustered parts to its own body parts based on reasonable criterion for itself and feedback the data for the imitation. This paper shows that the clustering the body area on the camera image into the body parts of its own based on the estimation of the state value in a framework of reinforcement learning as well as it imitates the observed behavior based on the state value estimation. Clustering parameters are updated based on the temporal difference error analogously so the parameters of the state value function of the behavior are updated based on the temporal difference error. The validity of the proposed method is investigated by applying it to an imitation of a dynamic throwing motion of an inverted pendulum robot and human.
Collapse
|
13
|
|
14
|
Kartoun U, Stern H, Edan Y. A Human-Robot Collaborative Reinforcement Learning Algorithm. J INTELL ROBOT SYST 2010. [DOI: 10.1007/s10846-010-9422-y] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
15
|
Kormushev PS, Nomoto K, Dong F, Hirota K. Eligibility Propagation to Speed up Time Hopping for Reinforcement Learning. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS 2009. [DOI: 10.20965/jaciii.2009.p0600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
A mechanism called Eligibility Propagation is proposed to speed up the Time Hopping technique used for faster Reinforcement Learning in simulations. Eligibility Propagation provides for Time Hopping similar abilities to what eligibility traces provide for conventional Reinforcement Learning. It propagates values from one state to all of its temporal predecessors using a state transitions graph. Experiments on a simulated biped crawling robot confirm that Eligibility Propagation accelerates the learning process more than 3 times.
Collapse
|
16
|
|
17
|
Busoniu L, Babuska R, De Schutter B. A Comprehensive Survey of Multiagent Reinforcement Learning. ACTA ACUST UNITED AC 2008. [DOI: 10.1109/tsmcc.2007.913919] [Citation(s) in RCA: 987] [Impact Index Per Article: 61.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|