1
|
Dong S, Li C, Yang S, An B, Li W, Gao Y. Egoism, utilitarianism and egalitarianism in multi-agent reinforcement learning. Neural Netw 2024; 178:106544. [PMID: 39053197 DOI: 10.1016/j.neunet.2024.106544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 06/02/2024] [Accepted: 07/14/2024] [Indexed: 07/27/2024]
Abstract
In multi-agent partially observable sequential decision problems with general-sum rewards, it is necessary to account for the egoism (individual rewards), utilitarianism (social welfare), and egalitarianism (fairness) criteria simultaneously. However, achieving a balance between these criteria poses a challenge for current multi-agent reinforcement learning methods. Specifically, fully decentralized methods without global information of all agents' rewards, observations and actions fail to learn a balanced policy, while agents in centralized training (with decentralized execution) methods are reluctant to share private information due to concerns of exploitation by others. To address these issues, this paper proposes a Decentralized and Federated (D&F) paradigm, where decentralized agents train egoistic policies utilizing solely local information to attain self-interest, and the federation controller primarily considers utilitarianism and egalitarianism. Meanwhile, the parameters of decentralized and federated policies are optimized with discrepancy constraints mutually, akin to a server and client pattern, which ensures the balance between egoism, utilitarianism, and egalitarianism. Furthermore, theoretical evidence demonstrates that the federated model, as well as the discrepancy between decentralized egoistic policies and federated utilitarian policies, obtains an O(1/T) convergence rate. Extensive experiments show that our D&F approach outperforms multiple baselines, in terms of both utilitarianism and egalitarianism.
Collapse
Affiliation(s)
- Shaokang Dong
- State Key Laboratory for Novel Software Technology, Nanjing University, China.
| | - Chao Li
- State Key Laboratory for Novel Software Technology, Nanjing University, China.
| | - Shangdong Yang
- School of Computer Science, Nanjing University of Posts and Telecommunications, China.
| | - Bo An
- School of Computer Science and Engineering, Nanyang Technological University, Singapore.
| | - Wenbin Li
- State Key Laboratory for Novel Software Technology, Nanjing University, China; Shenzhen Research Institute of Nanjing University, China.
| | - Yang Gao
- State Key Laboratory for Novel Software Technology, Nanjing University, China.
| |
Collapse
|
2
|
Croll HC, Ikuma K, Ong SK, Sarkar S. Unified control of diverse actions in a wastewater treatment activated sludge system using reinforcement learning for multi-objective optimization. WATER RESEARCH 2024; 263:122179. [PMID: 39096812 DOI: 10.1016/j.watres.2024.122179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Revised: 07/10/2024] [Accepted: 07/28/2024] [Indexed: 08/05/2024]
Abstract
The operation of modern wastewater treatment facilities is a balancing act in which a multitude of variables are controlled to achieve a wide range of objectives, many of which are conflicting. This is especially true within secondary activated sludge systems, where significant research and industry effort has been devoted to advance control optimization strategies, both domain-driven and data-driven. Among data-driven control strategies, reinforcement learning (RL) stands out for its ability to achieve better than human performance in complex environments. While RL has been applied to activated sludge process optimization in existing literature, these applications are typically limited in scope, and never for the control of more than three actions. Expanding the scope of RL control has the potential to increase the optimization potential while concurrently reducing the number of control systems that must be tuned and maintained by operations staff. This study examined several facets of the implementation of multi-action, multi-objective RL agents, namely how many actions a single agent could successfully control and what extent of environment data was necessary to train such agents. This study observed improved control optimization with increasing action scope, though control of waste activated sludge remains a challenge. Furthermore, agents were able to maintain a high level of performance under decreased observation scope, up to a point. When compared to baseline control of the Benchmark Simulation Model No. 1 (BSM1), an RL agent controlling seven individual actions improved the average BSM1 performance metric by 8.3 %, equivalent to an annual cost savings of $40,200 after accounting for the cost of additional sensors.
Collapse
Affiliation(s)
- Henry C Croll
- Department of Civil, Construction, and Environmental Engineering, Iowa State University, Ames, IA, 50011, USA.
| | - Kaoru Ikuma
- Department of Civil, Construction, and Environmental Engineering, Iowa State University, Ames, IA, 50011, USA
| | - Say Kee Ong
- Department of Civil, Construction, and Environmental Engineering, Iowa State University, Ames, IA, 50011, USA
| | - Soumik Sarkar
- Department of Mechanical Engineering, Iowa State University, Ames, IA, 50011, USA
| |
Collapse
|
3
|
Cao Y, Xu B, Li B, Fu H. Advanced Design of Soft Robots with Artificial Intelligence. NANO-MICRO LETTERS 2024; 16:214. [PMID: 38869734 DOI: 10.1007/s40820-024-01423-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 04/22/2024] [Indexed: 06/14/2024]
Affiliation(s)
- Ying Cao
- Nanotechnology Center, School of Fashion and Textiles, The Hong Kong Polytechnic University, Hong Kong, 999077, People's Republic of China
| | - Bingang Xu
- Nanotechnology Center, School of Fashion and Textiles, The Hong Kong Polytechnic University, Hong Kong, 999077, People's Republic of China.
| | - Bin Li
- Bioinspired Engineering and Biomechanics Center, Xi'an Jiaotong University, Xi'an, 710049, People's Republic of China
| | - Hong Fu
- Department of Mathematics and Information Technology, The Education University of Hong Kong, Hong Kong, 999077, People's Republic of China.
| |
Collapse
|
4
|
Zhou W, Wu L, Gao Y, Chen X. A Dynamic Window Method Based on Reinforcement Learning for SSVEP Recognition. IEEE Trans Neural Syst Rehabil Eng 2024; 32:2114-2123. [PMID: 38829754 DOI: 10.1109/tnsre.2024.3408273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2024]
Abstract
Steady-state visual evoked potential (SSVEP) is one of the most used brain-computer interface (BCI) paradigms. Conventional methods analyze SSVEPs at a fixed window length. Compared with these methods, dynamic window methods can achieve a higher information transfer rate (ITR) by selecting an appropriate window length. These methods dynamically evaluate the credibility of the result by linear discriminant analysis (LDA) or Bayesian estimation and extend the window length until credible results are obtained. However, the hypotheses introduced by LDA and Bayesian estimation may not align with the collected real-world SSVEPs, which leads to an inappropriate window length. To address the issue, we propose a novel dynamic window method based on reinforcement learning (RL). The proposed method optimizes the decision of whether to extend the window length based on the impact of decisions on the ITR, without additional hypotheses. The decision model can automatically learn a strategy that maximizes the ITR through trial and error. In addition, compared with traditional methods that manually extract features, the proposed method uses neural networks to automatically extract features for the dynamic selection of window length. Therefore, the proposed method can more accurately decide whether to extend the window length and select an appropriate window length. To verify the performance, we compared the novel method with other dynamic window methods on two public SSVEP datasets. The experimental results demonstrate that the novel method achieves the highest performance by using RL.
Collapse
|
5
|
Huang J, Guo X. A Novel Method of UAV-Assisted Trajectory Localization for Forestry Environments. SENSORS (BASEL, SWITZERLAND) 2024; 24:3398. [PMID: 38894189 PMCID: PMC11174491 DOI: 10.3390/s24113398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Revised: 05/06/2024] [Accepted: 05/21/2024] [Indexed: 06/21/2024]
Abstract
Global positioning systems often fall short in dense forest environments, leading to increasing demand for innovative localization methods. Notably, existing methods suffer from the following limitations: (1) traditional localization frameworks necessitate several fixed anchors to estimate the locations of targets, which is difficult to satisfy in complex and uncertain forestry environments; (2) the uncertain environment severely decreases the quality of signal measurements and thus the localization accuracy. To cope with these limitations, this paper proposes a new method of trajectory localization for forestry environments with the assistance of UAVs. Based on the multi-agent DRL technique, the topology of UAVs is optimized in real-time to cater for high-accuracy target localization. Then, with the aid of RSS measurements from UAVs to the target, the least squares algorithm is used to estimate the location, which is more flexible and reliable than existing localization systems. Furthermore, a shared replay memory is incorporated into the proposed multi-agent DRL system, which can effectively enhance learning performance and efficiency. Simulation results show that the proposed method can obtain a flexible and high-accuracy localization system with the aid of UAVs, which exhibits better robustness against high-dimensional heterogeneous data and is suitable for forestry environments.
Collapse
Affiliation(s)
| | - Xiansheng Guo
- Department of Electronic Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China;
| |
Collapse
|
6
|
Amin S, Uddin MI, Alarood AA, Mashwani WK, Alzahrani AO, Alzahrani HA. An adaptable and personalized framework for top-N course recommendations in online learning. Sci Rep 2024; 14:10382. [PMID: 38710728 DOI: 10.1038/s41598-024-56497-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 03/07/2024] [Indexed: 05/08/2024] Open
Abstract
In recent years, the proliferation of Massive Open Online Courses (MOOC) platforms on a global scale has been remarkable. Learners can now meet their learning demands with the help of MOOC. However, learners might not understand the course material well if they have access to a lot of information due to their inadequate expertise and cognitive ability. Personalized Recommender Systems (RSs), a cutting-edge technology, can assist in addressing this issue. It greatly increases resource acquisition through personalized availability for various people of all ages. Intelligent learning methods, such as machine learning and Reinforcement Learning (RL) can be used in RS challenges. However, machine learning needs supervised data and classical RL is not suitable for multi-task recommendations in online learning platforms. To address these challenges, the proposed framework integrates a Deep Reinforcement Learning (DRL) and multi-agent approach. This adaptive system personalizes the learning experience by considering key factors such as learner sentiments, learning style, preferences, competency, and adaptive difficulty levels. We formulate the interactive RS problem using a DRL-based Actor-Critic model named DRR, treating recommendations as a sequential decision-making process. The DRR enables the system to provide top-N course recommendations and personalized learning paths, enriching the student's experience. Extensive experiments on a MOOC dataset such as the 100 K Coursera course review validate the proposed DRR model, demonstrating its superiority over baseline models in major evaluation metrics for long-term recommendations. The outcomes of this research contribute to the field of e-learning technology, guiding the design and implementation of course RSs, to facilitate personalized and relevant recommendations for online learning students.
Collapse
Affiliation(s)
- Samina Amin
- Institute of Computing, Kohat University of Science and Technology (KUST), Kohat, 26000, Pakistan
| | - M Irfan Uddin
- Institute of Computing, Kohat University of Science and Technology (KUST), Kohat, 26000, Pakistan.
| | | | - Wali Khan Mashwani
- Institute of Numerical Sciences, Kohat University of Science and Technology (KUST), Kohat, 26000, Pakistan
| | - Ahmed Omar Alzahrani
- Faculty of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | | |
Collapse
|
7
|
Liu P, Guo Y, Liu P, Ding H, Cao J, Zhou J, Feng Z. What can we learn from the AV crashes? - An association rule analysis for identifying the contributing risky factors. ACCIDENT; ANALYSIS AND PREVENTION 2024; 199:107492. [PMID: 38428241 DOI: 10.1016/j.aap.2024.107492] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 01/23/2024] [Accepted: 01/29/2024] [Indexed: 03/03/2024]
Abstract
The objective of this study is to explore the contributing risky factors to Autonomous Vehicle (AV) crashes and their interdependencies. AV crash data between 2015 and 2023 were collected from the autonomous vehicle collision report published by California Department of Motor Vehicles (DMV). AV crashes were categorized into four types based on vehicle damage. AV crashes features including crash location and time, driving mode, vehicle movements, crash type and vehicle damage, traffic conditions, and among others were used as potential risk factors. Association Rule Mining methods (ARM) were utilized to identify sets of contributing risky factors that often occur together in AV crashes. Several association rules suggest that AV crashes result from complex interactions between road factors, vehicle factors, and environmental conditions. No damage and minor crashes are more likely affected by the road features and traffic conditions. In contrast, the movements of vehicles are more sensitive to severe AV crashes. Improper vehicle operations could increase the probability of severe AV crashes. In addition, results suggest that adverse weather conditions could increase the damage of AV crashes. AV interactions with roadside infrastructure or vulnerable road users on wet road surfaces during the night could potentially lead to significant loss of life and property. Furthermore, the safety effects of vehicle mode on the different AV crash damage are revealed. In some contexts, the autonomous driving mode can mitigate the risk of crash damages compared with conventional driving mode. The findings of this study should be indicative of policy measures and engineering countermeasures that improve the safety and efficiency of AV on the road, ultimately improving road transportation's overall safety and reliability.
Collapse
Affiliation(s)
- Pei Liu
- School of Transportation, Southeast University, Nanjing 211189, China.
| | - Yanyong Guo
- School of Transportation, Southeast University, Nanjing 211189, China.
| | - Pan Liu
- School of Transportation, Southeast University, Nanjing 211189, China.
| | - Hongliang Ding
- Institute of Smart City and Intelligent Transportation, Institute of Urban Rail Transportation, Southwest Jiaotong University, Chengdu 611730, China.
| | - Jiandong Cao
- China Academy of Transportation Sciences, #1, Building 10, Hepingli East Street, Chaoyang District, Beijing 100029, China
| | - Jibiao Zhou
- Ningbo High-level Highway Construction Management Center, No.396, Songjiangzhong Road, Ningbo, Zhejiang 315211, China.
| | - Zhongxiang Feng
- School of Automobile and Traffic Engineering, Hefei University of Technology, Hefei 230009, Anhui, China.
| |
Collapse
|
8
|
Ding S, Du W, Ding L, Zhang J, Guo L, An B. Robust Multi-Agent Communication With Graph Information Bottleneck Optimization. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:3096-3107. [PMID: 38019627 DOI: 10.1109/tpami.2023.3337534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/01/2023]
Abstract
Recent research on multi-agent reinforcement learning (MARL) has shown that action coordination of multi-agents can be significantly enhanced by introducing communication learning mechanisms. Meanwhile, graph neural network (GNN) provides a promising paradigm for communication learning of MARL. Under this paradigm, agents and communication channels can be regarded as nodes and edges in the graph, and agents can aggregate information from neighboring agents through GNN. However, this GNN-based communication paradigm is susceptible to adversarial attacks and noise perturbations, and how to achieve robust communication learning under perturbations has been largely neglected. To this end, this paper explores this problem and introduces a robust communication learning mechanism with graph information bottleneck optimization, which can optimally realize the robustness and effectiveness of communication learning. We introduce two information-theoretic regularizers to learn the minimal sufficient message representation for multi-agent communication. The regularizers aim at maximizing the mutual information (MI) between the message representation and action selection while minimizing the MI between the agent feature and message representation. Besides, we present a MARL framework that can integrate the proposed communication mechanism with existing value decomposition methods. Experimental results demonstrate that the proposed method is more robust and efficient than state-of-the-art GNN-based MARL methods.
Collapse
|
9
|
Gabler V, Wollherr D. Decentralized multi-agent reinforcement learning based on best-response policies. Front Robot AI 2024; 11:1229026. [PMID: 38690119 PMCID: PMC11059992 DOI: 10.3389/frobt.2024.1229026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 02/07/2024] [Indexed: 05/02/2024] Open
Abstract
Introduction: Multi-agent systems are an interdisciplinary research field that describes the concept of multiple decisive individuals interacting with a usually partially observable environment. Given the recent advances in single-agent reinforcement learning, multi-agent reinforcement learning (RL) has gained tremendous interest in recent years. Most research studies apply a fully centralized learning scheme to ease the transfer from the single-agent domain to multi-agent systems. Methods: In contrast, we claim that a decentralized learning scheme is preferable for applications in real-world scenarios as this allows deploying a learning algorithm on an individual robot rather than deploying the algorithm to a complete fleet of robots. Therefore, this article outlines a novel actor-critic (AC) approach tailored to cooperative MARL problems in sparsely rewarded domains. Our approach decouples the MARL problem into a set of distributed agents that model the other agents as responsive entities. In particular, we propose using two separate critics per agent to distinguish between the joint task reward and agent-based costs as commonly applied within multi-robot planning. On one hand, the agent-based critic intends to decrease agent-specific costs. On the other hand, each agent intends to optimize the joint team reward based on the joint task critic. As this critic still depends on the joint action of all agents, we outline two suitable behavior models based on Stackelberg games: a game against nature and a dyadic game against each agent. Following these behavior models, our algorithm allows fully decentralized execution and training. Results and Discussion: We evaluate our presented method using the proposed behavior models within a sparsely rewarded simulated multi-agent environment. Although our approach already outperforms the state-of-the-art learners, we conclude this article by outlining possible extensions of our algorithm that future research may build upon.
Collapse
Affiliation(s)
- Volker Gabler
- Chair of Automatic Control Engineering, TUM School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
| | | |
Collapse
|
10
|
Du C, Lu Y, Meng H, Park J. Evolution of cooperation on reinforcement-learning driven-adaptive networks. CHAOS (WOODBURY, N.Y.) 2024; 34:041101. [PMID: 38558043 DOI: 10.1063/5.0201968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 03/12/2024] [Indexed: 04/04/2024]
Abstract
Complex networks are widespread in real-world environments across diverse domains. Real-world networks tend to form spontaneously through interactions between individual agents. Inspired by this, we design an evolutionary game model in which agents participate in a prisoner's dilemma game (PDG) with their neighboring agents. Agents can autonomously modify their connections with neighbors using reinforcement learning to avoid unfavorable environments. Interestingly, our findings reveal some remarkable results. Exploiting reinforcement learning-based adaptive networks improves cooperation when juxtaposed with existing PDGs performed on homogeneous networks. At the same time, the network's topology evolves from homogeneous to heterogeneous states. This change occurs as players gain experience from past games and become more astute in deciding whether to join PDGs with their current neighbors or disconnect from the least profitable neighbors. Instead, they seek out more favorable environments by establishing connections with second-order neighbors with higher rewards. By calculating the degree distribution and modularity of the adaptive network in a steady state, we confirm that the adaptive network follows a power law and has a clear community structure, indicating that the adaptive network is similar to networks in the real world. Our study reports a new phenomenon in evolutionary game theory on networks. It proposes a new perspective to generate scale-free networks, which is generating scale-free networks by the evolution of homogeneous networks rather than typical ways of network growth and preferential connection. Our results provide new aspects to understanding the network structure, the emergence of cooperation, and the behavior of actors in nature and society.
Collapse
Affiliation(s)
- Chunpeng Du
- School of Mathematics, Kunming University, Kunming 650214, China
| | - Yikang Lu
- School of Statistics and Mathematics, Yunnan University of Finance and Economics, Kunming, Yunnan 650221, China
| | - Haoran Meng
- Technical Center, Shanghai Tobacco Group Co. Ltd., Shanghai 200120, China
| | - Junpyo Park
- Department of Applied Mathematics, College of Applied Sciences, Kyung Hee University, Yongin 17104, Republic of Korea
| |
Collapse
|
11
|
Lussange J, Vrizzi S, Palminteri S, Gutkin B. Mesoscale effects of trader learning behaviors in financial markets: A multi-agent reinforcement learning study. PLoS One 2024; 19:e0301141. [PMID: 38557590 PMCID: PMC10984546 DOI: 10.1371/journal.pone.0301141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 03/08/2024] [Indexed: 04/04/2024] Open
Abstract
Recent advances in the field of machine learning have yielded novel research perspectives in behavioural economics and financial markets microstructure studies. In this paper we study the impact of individual trader leaning characteristics on markets using a stock market simulator designed with a multi-agent architecture. Each agent, representing an autonomous investor, trades stocks through reinforcement learning, using a centralized double-auction limit order book. This approach allows us to study the impact of individual trader traits on the whole stock market at the mesoscale in a bottom-up approach. We chose to test three trader trait aspects: agent learning rate increases, herding behaviour and random trading. As hypothesized, we find that larger learning rates significantly increase the number of crashes. We also find that herding behaviour undermines market stability, while random trading tends to preserve it.
Collapse
Affiliation(s)
- Johann Lussange
- Laboratoire des Neurosciences Cognitives, Département des Études Cognitives, INSERM U960, Paris, France
| | - Stefano Vrizzi
- Laboratoire des Neurosciences Cognitives, Département des Études Cognitives, INSERM U960, Paris, France
| | - Stefano Palminteri
- Laboratoire des Neurosciences Cognitives, Département des Études Cognitives, INSERM U960, Paris, France
- Center for Cognition and Decision Making, Department of Psychology, NU University Higher School of Economics, Moscow, Russia
| | - Boris Gutkin
- Laboratoire des Neurosciences Cognitives, Département des Études Cognitives, INSERM U960, Paris, France
- Center for Cognition and Decision Making, Department of Psychology, NU University Higher School of Economics, Moscow, Russia
| |
Collapse
|
12
|
Negm A, Ma X, Aggidis G. Deep reinforcement learning challenges and opportunities for urban water systems. WATER RESEARCH 2024; 253:121145. [PMID: 38330870 DOI: 10.1016/j.watres.2024.121145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 01/09/2024] [Accepted: 01/14/2024] [Indexed: 02/10/2024]
Abstract
The efficient and sustainable supply and transport of water is a key component to any functioning civilisation making the role of urban water systems (UWS) inherently crucial to the wellbeing of its customers. However, managing water is not a simple task. Whether it is ageing infrastructure, transient flows, air cavities or low pressures; water can be lost as a result of many issues that face UWSs. The complexity of those networks grows with the high urbanisation trends and climate change making water companies and regulatory bodies in need of new solutions. So, it comes as no surprise that many researchers are invested in innovating within the water industry to ensure that the future of our water is safe. Deep reinforcement learning (DRL) has the potential to tackle complexities that used to be very challenging as it relies on deep neural networks for function approximation and representation. This technology has conquered many fields due to its impressive results and can effectively revolutionise UWS. In this article, we explain the background of DRL and the milestones of this field using a novel taxonomy of the DRL algorithms. This will be followed by with a novel review of DRL applications in the UWS which focus on water distribution networks and stormwater systems. The review will be concluded with critical insights on how DRL can benefit different aspects of urban water systems.
Collapse
Affiliation(s)
- Ahmed Negm
- Lancaster University Energy Group, School of Engineering, Lancaster LA1 4YW, UK
| | - Xiandong Ma
- Lancaster University Energy Group, School of Engineering, Lancaster LA1 4YW, UK
| | - George Aggidis
- Lancaster University Energy Group, School of Engineering, Lancaster LA1 4YW, UK.
| |
Collapse
|
13
|
Pina R, Silva VD, Hook J, Kondoz A. Residual Q-Networks for Value Function Factorizing in Multiagent Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:1534-1544. [PMID: 35737605 DOI: 10.1109/tnnls.2022.3183865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Multiagent reinforcement learning (MARL) is useful in many problems that require the cooperation and coordination of multiple agents. Learning optimal policies using reinforcement learning in a multiagent setting can be very difficult as the number of agents increases. Recent solutions such as value decomposition networks (VDNs), QMIX, QTRAN, and QPLEX adhere to the centralized training and decentralized execution (CTDE) scheme and perform factorization of the joint action-value functions. However, these methods still suffer from increased environmental complexity, and at times fail to converge in a stable manner. We propose a novel concept of residual Q-networks (RQNs) for MARL, which learns to transform the individual Q -value trajectories in a way that preserves the individual-global-max (IGM) criteria, but is more robust in factorizing action-value functions. The RQN acts as an auxiliary network that accelerates convergence and will become obsolete as the agents reach the training objectives. The performance of the proposed method is compared against several state-of-the-art techniques such as QPLEX, QMIX, QTRAN, and VDN, in a range of multiagent cooperative tasks. The results illustrate that the proposed method, in general, converges faster, with increased stability, and shows robust performance in a wider family of environments. The improvements in results are more prominent in environments with severe punishments for noncooperative behaviors and especially in the absence of complete state information during training time.
Collapse
|
14
|
Jiang Q, Li J, Sun Y, Huang J, Zou R, Ma W, Guo H, Wang Z, Liu Y. Deep-reinforcement-learning-based water diversion strategy. ENVIRONMENTAL SCIENCE AND ECOTECHNOLOGY 2024; 17:100298. [PMID: 37554624 PMCID: PMC10405199 DOI: 10.1016/j.ese.2023.100298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 06/23/2023] [Accepted: 07/05/2023] [Indexed: 08/10/2023]
Abstract
Water diversion is a common strategy to enhance water quality in eutrophic lakes by increasing available water resources and accelerating nutrient circulation. Its effectiveness depends on changes in the source water and lake conditions. However, the challenge of optimizing water diversion remains because it is difficult to simultaneously improve lake water quality and minimize the amount of diverted water. Here, we propose a new approach called dynamic water diversion optimization (DWDO), which combines a comprehensive water quality model with a deep reinforcement learning algorithm. We applied DWDO to a region of Lake Dianchi, the largest eutrophic freshwater lake in China and validated it. Our results demonstrate that DWDO significantly reduced total nitrogen and total phosphorus concentrations in the lake by 7% and 6%, respectively, compared to previous operations. Additionally, annual water diversion decreased by an impressive 75%. Through interpretable machine learning, we identified the impact of meteorological indicators and the water quality of both the source water and the lake on optimal water diversion. We found that a single input variable could either increase or decrease water diversion, depending on its specific value, while multiple factors collectively influenced real-time adjustment of water diversion. Moreover, using well-designed hyperparameters, DWDO proved robust under different uncertainties in model parameters. The training time of the model is theoretically shorter than traditional simulation-optimization algorithms, highlighting its potential to support more effective decision-making in water quality management.
Collapse
Affiliation(s)
- Qingsong Jiang
- State Environmental Protection Key Laboratory of All Materials Flux in River Ecosystems, College of Environmental Sciences and Engineering, Peking University, Beijing, 100871, PR China
| | - Jincheng Li
- State Environmental Protection Key Laboratory of All Materials Flux in River Ecosystems, College of Environmental Sciences and Engineering, Peking University, Beijing, 100871, PR China
| | - Yanxin Sun
- State Environmental Protection Key Laboratory of All Materials Flux in River Ecosystems, College of Environmental Sciences and Engineering, Peking University, Beijing, 100871, PR China
| | - Jilin Huang
- State Environmental Protection Key Laboratory of All Materials Flux in River Ecosystems, College of Environmental Sciences and Engineering, Peking University, Beijing, 100871, PR China
| | - Rui Zou
- Rays Computational Intelligence Lab, Beijing Inteliway Environmental Ltd., Beijing, 100085, PR China
| | - Wenjing Ma
- Rays Computational Intelligence Lab, Beijing Inteliway Environmental Ltd., Beijing, 100085, PR China
| | - Huaicheng Guo
- State Environmental Protection Key Laboratory of All Materials Flux in River Ecosystems, College of Environmental Sciences and Engineering, Peking University, Beijing, 100871, PR China
| | - Zhiyun Wang
- Yunnan Key Laboratory of Pollution Process and Management of Plateau Lake-Watershed, Yunnan Research Academy of Eco-environmental Sciences, Kunming, 650034, PR China
| | - Yong Liu
- State Environmental Protection Key Laboratory of All Materials Flux in River Ecosystems, College of Environmental Sciences and Engineering, Peking University, Beijing, 100871, PR China
| |
Collapse
|
15
|
Cai M, Wang Q, Qi Z, Jin D, Wu X, Xu T, Zhang L. Deep Reinforcement Learning Framework-Based Flow Rate Rejection Control of Soft Magnetic Miniature Robots. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:7699-7711. [PMID: 36070281 DOI: 10.1109/tcyb.2022.3199213] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Soft magnetic miniature robots (SMMRs) have potential biomedical applications due to their flexible size and mobility to access confined environments. However, navigating the robot to a goal site with precise control performance and high repeatability in unstructured environments, especially in flow rate conditions, still remains a challenge. In this study, drawing inspiration from the control requirements of drug delivery and release to the goal lesion site in the presence of dynamic biofluids, we propose a flow rate rejection control strategy based on a deep reinforcement learning (DRL) framework to actuate an SMMR to achieve goal-reaching and hovering in fluidic tubes. To this end, an SMMR is first fabricated, which can be operated by an external magnetic field to realize its desired functionalities. Subsequently, a simulator is constructed based on neural networks to map the relationship between the applied magnetic field and robot locomotion states. With minimal prior knowledge about the environment and dynamics, a gated recurrent unit (GRU)-based DRL algorithm is formulated by considering the designed history state-action and estimated flow rates. In addition, the randomization technique is applied during training to distill the general control policy for the physical SMMR. The results of numerical simulations and experiments are illustrated to demonstrate the robustness and efficacy of the presented control framework. Finally, in-depth analyses and discussions indicate the potentiality of DRL for soft magnetic robots in biomedical applications.
Collapse
|
16
|
Diaz MA, Vos M, Dillen A, Tassignon B, Flynn L, Geeroms J, Meeusen R, Verstraten T, Babic J, Beckerle P, De Pauw K. Human-in-the-Loop Optimization of Wearable Robotic Devices to Improve Human-Robot Interaction: A Systematic Review. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:7483-7496. [PMID: 37015459 DOI: 10.1109/tcyb.2022.3224895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
This article presents a systematic review on wearable robotic devices that use human-in-the-loop optimization (HILO) strategies to improve human-robot interaction. A total of 46 HILO studies were identified and divided into upper and lower limb robotic devices. The main aspects from HILO were identified, reviewed, and classified in four areas: 1) human-machine systems; 2) optimization methods; 3) control strategies; and 4) experimental protocols. A variety of objective functions (physiological, biomechanical, and subjective), optimization strategies, and optimized control parameters configurations used in different control strategies are presented and analyzed. An overview of experimental protocols is provided, including metrics, tasks, and conditions tested. Moreover, the relevance given to training or adaptation periods was explored. We outline an HILO framework that includes current wearable robots, optimization strategies, objective functions, control strategies, and experimental protocols. We conclude by highlighting current research gaps and defining future directions to improve the development of advanced HILO strategies in upper and lower limb wearable robots.
Collapse
|
17
|
Croll HC, Ikuma K, Ong SK, Sarkar S. Systematic Performance Evaluation of Reinforcement Learning Algorithms Applied to Wastewater Treatment Control Optimization. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:18382-18390. [PMID: 37405782 DOI: 10.1021/acs.est.3c00353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/06/2023]
Abstract
Treatment of wastewater using activated sludge relies on several complex, nonlinear processes. While activated sludge systems can provide high levels of treatment, including nutrient removal, operating these systems is often challenging and energy intensive. Significant research investment has been made in recent years into improving control optimization of such systems, through both domain knowledge and, more recently, machine learning. This study leverages a novel interface between a common process modeling software and a Python reinforcement learning environment to evaluate four common reinforcement learning algorithms for their ability to minimize treatment energy use while maintaining effluent compliance within the Benchmark Simulation Model No. 1 (BSM1) simulation. Three of the algorithms tested, deep Q-learning, proximal policy optimization, and synchronous advantage actor critic, generally performed poorly over the scenarios tested in this study. In contrast, the twin delayed deep deterministic policy gradient (TD3) algorithm consistently produced a high level of control optimization while maintaining the treatment requirements. Under the best selection of state observation features, TD3 control optimization reduced aeration and pumping energy requirements by 14.3% compared to the BSM1 benchmark control, outperforming the advanced domain-based control strategy of ammonia-based aeration control, although future work is necessary to improve robustness of RL implementation.
Collapse
Affiliation(s)
- Henry C Croll
- Department of Civil, Construction, and Environmental Engineering, Iowa State University, Ames, Iowa 50011, United States
| | - Kaoru Ikuma
- Department of Civil, Construction, and Environmental Engineering, Iowa State University, Ames, Iowa 50011, United States
| | - Say Kee Ong
- Department of Civil, Construction, and Environmental Engineering, Iowa State University, Ames, Iowa 50011, United States
| | - Soumik Sarkar
- Department of Mechanical Engineering, Iowa State University, Ames, Iowa 50011, United States
| |
Collapse
|
18
|
Guo W, Lv C, Guo M, Zhao Q, Yin X, Zhang L. Innovative applications of artificial intelligence in zoonotic disease management. SCIENCE IN ONE HEALTH 2023; 2:100045. [PMID: 39077042 PMCID: PMC11262289 DOI: 10.1016/j.soh.2023.100045] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 10/22/2023] [Indexed: 07/31/2024]
Abstract
Zoonotic diseases, transmitted between humans and animals, pose a substantial threat to global public health. In recent years, artificial intelligence (AI) has emerged as a transformative tool in the fight against diseases. This comprehensive review discusses the innovative applications of AI in the management of zoonotic diseases, including disease prediction, early diagnosis, drug development, and future prospects. AI-driven predictive models leverage extensive datasets to predict disease outbreaks and transmission patterns, thereby facilitating proactive public health responses. Early diagnosis benefits from AI-powered diagnostic tools that expedite pathogen identification and containment. Furthermore, AI technologies have accelerated drug discovery by identifying potential drug targets and optimizing candidate drugs. This review addresses these advancements, while also examining the promising future of AI in zoonotic disease control. We emphasize the pivotal role of AI in revolutionizing our approach to managing zoonotic diseases and highlight its potential to safeguard the health of both humans and animals on a global scale.
Collapse
Affiliation(s)
- Wenqiang Guo
- Department of Animal Nutrition and Feed Science, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Chenrui Lv
- Department of Animal Nutrition and Feed Science, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Meng Guo
- College of Veterinary Medicine, Henan Agricultural University, Zhengzhou 450046, China
| | - Qiwei Zhao
- Department of Animal Nutrition and Feed Science, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Xinyi Yin
- Department of Animal Nutrition and Feed Science, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Li Zhang
- Department of Animal Nutrition and Feed Science, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
19
|
Liu K, Zhang H, Zhang Y, Sun C. False Data-Injection Attack Detection in Cyber-Physical Systems With Unknown Parameters: A Deep Reinforcement Learning Approach. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:7115-7125. [PMID: 37015355 DOI: 10.1109/tcyb.2022.3225236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
This article studies the detection of discontinuous false data-injection (FDI) attacks on cyber-physical systems (CPSs). Considering the unknown stochastic properties of the process noise and measurement noise, deep reinforcement learning is applied to designing an FDI attack detector. First, the discontinuous attack detection problem is modeled as a partially observable Markov decision process (POMDP) and a neural network is used to explore the POMDP. In the network, sliding observation windows which are composed of the offline fragment historical data are used as the input. An approach to designing the reward in POMDP is provided to ensure the precision of the detection when there are even some state recognition errors. Second, sufficient conditions on attack frequency and duration to guarantee the applicability of the detector and the expected estimation performance are further given. Finally, simulation examples illustrate the effectiveness of the attack detector.
Collapse
|
20
|
Wang X, Yang Z, Bai X, Ji M, Li H, Ran D. A Consistent Round-Up Strategy Based on PPO Path Optimization for the Leader-Follower Tracking Problem. SENSORS (BASEL, SWITZERLAND) 2023; 23:8814. [PMID: 37960514 PMCID: PMC10650083 DOI: 10.3390/s23218814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 10/16/2023] [Accepted: 10/20/2023] [Indexed: 11/15/2023]
Abstract
Single UAVs have limited capabilities for complex missions, so suitable solutions are needed to improve the mission success rate, as well as the UAVs' survivability. A cooperative multi-UAV formation offers great advantages in this regard; however, for large and complex systems, the traditional control methods will be invalid when faced with unstable and changing environments. To deal with the poor self-adaptability and high requirements for the environmental state information of traditional control methods for a multi-UAV cluster, this paper proposes a consistent round-up strategy based on PPO path optimization to track targets. In this strategy, the leader is trained using PPO for obstacle avoidance and target tracking, while the followers are expected to establish a communication network with the leader to obtain environmental information. In this way, the tracking control law can be designed, based on the consistency protocol and the Apollonian circle, to realize the round-up of the target and obstacle avoidance. The experimental results show that the proposed strategy can achieve the round-up of the target UAV and guide the pursuing multi-UAV group to avoid obstacles in the absence of the initial detection of the target. In multiple simulated scenarios, the success rates of the pursuit multi-UAV cluster for rounding up the target are maintained above 80%.
Collapse
Affiliation(s)
- Xiao Wang
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China; (X.W.); (Z.Y.); (X.B.)
| | - Zhaohui Yang
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China; (X.W.); (Z.Y.); (X.B.)
| | - Xueqian Bai
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China; (X.W.); (Z.Y.); (X.B.)
| | - Mingjiang Ji
- National Innovation Institute of Defense Technology, Academy of Military Sciences, Beijing 100071, China;
| | - Hao Li
- The Second Academy of CASIC, Beijing 100854, China;
| | - Dechao Ran
- National Innovation Institute of Defense Technology, Academy of Military Sciences, Beijing 100071, China;
| |
Collapse
|
21
|
Zhang R, Zong Q, Zhang X, Dou L, Tian B. Game of Drones: Multi-UAV Pursuit-Evasion Game With Online Motion Planning by Deep Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:7900-7909. [PMID: 35157597 DOI: 10.1109/tnnls.2022.3146976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
As one of the tiniest flying objects, unmanned aerial vehicles (UAVs) are often expanded as the "swarm" to execute missions. In this article, we investigate the multiquadcopter and target pursuit-evasion game in the obstacles environment. For high-quality simulation of the urban environment, we propose the pursuit-evasion scenario (PES) framework to create the environment with a physics engine, which enables quadcopter agents to take actions and interact with the environment. On this basis, we construct multiagent coronal bidirectionally coordinated with target prediction network (CBC-TP Net) with a vectorized extension of multiagent deep deterministic policy gradient (MADDPG) formulation to ensure the effectiveness of the damaged "swarm" system in pursuit-evasion mission. Unlike traditional reinforcement learning, we design a target prediction network (TP Net) innovatively in the common framework to imitate the way of human thinking: situation prediction is always before decision-making. The experiments of the pursuit-evasion game are conducted to verify the state-of-the-art performance of the proposed strategy, both in the normal and antidamaged situations.
Collapse
|
22
|
Li S, Tang Z, Yang L, Li M, Shang Z. Application of deep reinforcement learning for spike sorting under multi-class imbalance. Comput Biol Med 2023; 164:107253. [PMID: 37536094 DOI: 10.1016/j.compbiomed.2023.107253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 06/27/2023] [Accepted: 07/07/2023] [Indexed: 08/05/2023]
Abstract
Spike sorting is the basis for analyzing spike firing patterns encoded in high-dimensional information spaces. With the fact that high-density microelectrode arrays record multiple neurons simultaneously, the data collected often suffers from two problems: a few overlapping spikes and different neuronal firing rates, which both belong to the multi-class imbalance problem. Since deep reinforcement learning (DRL) assign targeted attention to categories through reward functions, we propose ImbSorter to implement spike sorting under multi-class imbalance. We describe spike sorting as a Markov sequence decision and construct a dynamic reward function (DRF) to improve the sensitivity of the agent to minor classes based on the inter-class imbalance ratios. The agent is eventually guided by the optimal strategy to classify spikes. We consider the Wave_Clus dataset, which contains overlapping spikes and diverse noise levels, and the macaque dataset, which has a multi-scale imbalance. ImbSorter is compared with classical DRL architectures, traditional machine learning algorithms, and advanced overlapping spike sorting techniques on these two above datasets. ImbSorter obtained improved results on the Macro_F1. The results show ImbSorter has a promising ability to resist overlapping and noise interference. It has high stability and promising performance in processing spikes with different degrees of skewed distribution.
Collapse
Affiliation(s)
- Suchen Li
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou, 450001, China; Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou, 450001, China
| | - Zhuo Tang
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou, 450001, China; Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou, 450001, China
| | - Lifang Yang
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou, 450001, China; Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou, 450001, China
| | - Mengmeng Li
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou, 450001, China; Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou, 450001, China.
| | - Zhigang Shang
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou, 450001, China; Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou, 450001, China.
| |
Collapse
|
23
|
Zhang J, Zhou X, Zhou J, Qiu S, Liang G, Cai S, Bao G. A High-Efficient Reinforcement Learning Approach for Dexterous Manipulation. Biomimetics (Basel) 2023; 8:264. [PMID: 37366859 DOI: 10.3390/biomimetics8020264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Revised: 06/05/2023] [Accepted: 06/06/2023] [Indexed: 06/28/2023] Open
Abstract
Robotic hands have the potential to perform complex tasks in unstructured environments owing to their bionic design, inspired by the most agile biological hand. However, the modeling, planning and control of dexterous hands remain unresolved, open challenges, resulting in the simple movements and relatively clumsy motions of current robotic end effectors. This paper proposed a dynamic model based on generative adversarial architecture to learn the state mode of the dexterous hand, reducing the model's prediction error in long spans. An adaptive trajectory planning kernel was also developed to generate High-Value Area Trajectory (HVAT) data according to the control task and dynamic model, with adaptive trajectory adjustment achieved by changing the Levenberg-Marquardt (LM) coefficient and the linear searching coefficient. Furthermore, an improved Soft Actor-Critic (SAC) algorithm is designed by combining maximum entropy value iteration and HVAT value iteration. An experimental platform and simulation program were built to verify the proposed method with two manipulating tasks. The experimental results indicate that the proposed dexterous hand reinforcement learning algorithm has better training efficiency and requires fewer training samples to achieve quite satisfactory learning and control performance.
Collapse
Affiliation(s)
- Jianhua Zhang
- College of Mechanical Engineering, Beijing University of Science and Technology, Beijing 100083, China
| | - Xuanyi Zhou
- College of Mechanical Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jinyu Zhou
- College of Mechanical Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Shiming Qiu
- College of Mechanical Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Guoyuan Liang
- Guangdong Provincial Key Laboratory of Robotics and Intelligent System, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Shibo Cai
- College of Mechanical Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Guanjun Bao
- College of Mechanical Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
24
|
Han H, Wang J, Kuang L, Han X, Xue H. Improved Robot Path Planning Method Based on Deep Reinforcement Learning. SENSORS (BASEL, SWITZERLAND) 2023; 23:5622. [PMID: 37420785 DOI: 10.3390/s23125622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 06/11/2023] [Accepted: 06/14/2023] [Indexed: 07/09/2023]
Abstract
With the advancement of robotics, the field of path planning is currently experiencing a period of prosperity. Researchers strive to address this nonlinear problem and have achieved remarkable results through the implementation of the Deep Reinforcement Learning (DRL) algorithm DQN (Deep Q-Network). However, persistent challenges remain, including the curse of dimensionality, difficulties of model convergence and sparsity in rewards. To tackle these problems, this paper proposes an enhanced DDQN (Double DQN) path planning approach, in which the information after dimensionality reduction is fed into a two-branch network that incorporates expert knowledge and an optimized reward function to guide the training process. The data generated during the training phase are initially discretized into corresponding low-dimensional spaces. An "expert experience" module is introduced to facilitate the model's early-stage training acceleration in the Epsilon-Greedy algorithm. To tackle navigation and obstacle avoidance separately, a dual-branch network structure is presented. We further optimize the reward function enabling intelligent agents to receive prompt feedback from the environment after performing each action. Experiments conducted in both virtual and real-world environments have demonstrated that the enhanced algorithm can accelerate model convergence, improve training stability and generate a smooth, shorter and collision-free path.
Collapse
Affiliation(s)
- Huiyan Han
- School of Computer Science and Technology, North University of China, Taiyuan 030051, China
- Shanxi Key Laboratory of Machine Vision and Virtual Reality, Taiyuan 030051, China
- Shanxi Vision Information Processing and Intelligent Robot Engineering Research Center, Taiyuan 030051, China
| | - Jiaqi Wang
- School of Computer Science and Technology, North University of China, Taiyuan 030051, China
- Shanxi Key Laboratory of Machine Vision and Virtual Reality, Taiyuan 030051, China
- Shanxi Vision Information Processing and Intelligent Robot Engineering Research Center, Taiyuan 030051, China
| | - Liqun Kuang
- School of Computer Science and Technology, North University of China, Taiyuan 030051, China
- Shanxi Key Laboratory of Machine Vision and Virtual Reality, Taiyuan 030051, China
- Shanxi Vision Information Processing and Intelligent Robot Engineering Research Center, Taiyuan 030051, China
| | - Xie Han
- School of Computer Science and Technology, North University of China, Taiyuan 030051, China
- Shanxi Key Laboratory of Machine Vision and Virtual Reality, Taiyuan 030051, China
- Shanxi Vision Information Processing and Intelligent Robot Engineering Research Center, Taiyuan 030051, China
| | - Hongxin Xue
- School of Computer Science and Technology, North University of China, Taiyuan 030051, China
- Shanxi Key Laboratory of Machine Vision and Virtual Reality, Taiyuan 030051, China
- Shanxi Vision Information Processing and Intelligent Robot Engineering Research Center, Taiyuan 030051, China
| |
Collapse
|
25
|
Yadav P, Mishra A, Kim S. A Comprehensive Survey on Multi-Agent Reinforcement Learning for Connected and Automated Vehicles. SENSORS (BASEL, SWITZERLAND) 2023; 23:4710. [PMID: 37430623 DOI: 10.3390/s23104710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 05/07/2023] [Accepted: 05/11/2023] [Indexed: 07/12/2023]
Abstract
Connected and automated vehicles (CAVs) require multiple tasks in their seamless maneuverings. Some essential tasks that require simultaneous management and actions are motion planning, traffic prediction, traffic intersection management, etc. A few of them are complex in nature. Multi-agent reinforcement learning (MARL) can solve complex problems involving simultaneous controls. Recently, many researchers applied MARL in such applications. However, there is a lack of extensive surveys on the ongoing research to identify the current problems, proposed methods, and future research directions in MARL for CAVs. This paper provides a comprehensive survey on MARL for CAVs. A classification-based paper analysis is performed to identify the current developments and highlight the various existing research directions. Finally, the challenges in current works are discussed, and some potential areas are given for exploration to overcome those challenges. Future readers will benefit from this survey and can apply the ideas and findings in their research to solve complex problems.
Collapse
Affiliation(s)
- Pamul Yadav
- School of Integrated Technology, Yonsei University, Incheon 21983, Republic of Korea
| | - Ashutosh Mishra
- School of Integrated Technology, Yonsei University, Incheon 21983, Republic of Korea
| | - Shiho Kim
- School of Integrated Technology, Yonsei University, Incheon 21983, Republic of Korea
| |
Collapse
|
26
|
Kwa HL, Kit JL, Horsevad N, Philippot J, Savari M, Bouffanais R. Adaptivity: a path towards general swarm intelligence? Front Robot AI 2023; 10:1163185. [PMID: 37228356 PMCID: PMC10203170 DOI: 10.3389/frobt.2023.1163185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 04/17/2023] [Indexed: 05/27/2023] Open
Abstract
The field of multi-robot systems (MRS) has recently been gaining increasing popularity among various research groups, practitioners, and a wide range of industries. Compared to single-robot systems, multi-robot systems are able to perform tasks more efficiently or accomplish objectives that are simply not feasible with a single unit. This makes such multi-robot systems ideal candidates for carrying out distributed tasks in large environments-e.g., performing object retrieval, mapping, or surveillance. However, the traditional approach to multi-robot systems using global planning and centralized operation is, in general, ill-suited for fulfilling tasks in unstructured and dynamic environments. Swarming multi-robot systems have been proposed to deal with such steep challenges, primarily owing to its adaptivity. These qualities are expressed by the system's ability to learn or change its behavior in response to new and/or evolving operating conditions. Given its importance, in this perspective, we focus on the critical importance of adaptivity for effective multi-robot system swarming and use it as the basis for defining, and potentially quantifying, swarm intelligence. In addition, we highlight the importance of establishing a suite of benchmark tests to measure a swarm's level of adaptivity. We believe that a focus on achieving increased levels of swarm intelligence through the focus on adaptivity will further be able to elevate the field of swarm robotics.
Collapse
Affiliation(s)
- Hian Lee Kwa
- Thales Research and Technology, Singapore, Singapore
| | - Jabez Leong Kit
- Engineering Product Design, Singapore University of Technology and Design, Singapore, Singapore
| | - Nikolaj Horsevad
- Mechanical Engineering, University of Ottawa, Ottawa, ON, Canada
| | - Julien Philippot
- Mechanical Engineering, University of Ottawa, Ottawa, ON, Canada
| | - Mohammad Savari
- Mechanical Engineering, University of Ottawa, Ottawa, ON, Canada
| | | |
Collapse
|
27
|
Liu S, Feng Y, Wu K, Cheng G, Huang J, Liu Z. Graph-Attention-Based Casual Discovery With Trust Region-Navigated Clipping Policy Optimization. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:2311-2324. [PMID: 34665751 DOI: 10.1109/tcyb.2021.3116762] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In many domains of empirical sciences, discovering the causal structure within variables remains an indispensable task. Recently, to tackle unoriented edges or latent assumptions violation suffered by conventional methods, researchers formulated a reinforcement learning (RL) procedure for causal discovery and equipped a REINFORCE algorithm to search for the best rewarded directed acyclic graph. The two keys to the overall performance of the procedure are the robustness of RL methods and the efficient encoding of variables. However, on the one hand, REINFORCE is prone to local convergence and unstable performance during training. Neither trust region policy optimization, being computationally expensive, nor proximal policy optimization (PPO), suffering from aggregate constraint deviation, is a decent alternative for combinatory optimization problems with considerable individual subactions. We propose a trust region-navigated clipping policy optimization method for causal discovery that guarantees both better search efficiency and steadiness in policy optimization, in comparison with REINFORCE, PPO, and our prioritized sampling-guided REINFORCE implementation. On the other hand, to boost the efficient encoding of variables, we propose a refined graph attention encoder called SDGAT that can grasp more feature information without priori neighborhood information. With these improvements, the proposed method outperforms the former RL method in both synthetic and benchmark datasets in terms of output results and optimization robustness.
Collapse
|
28
|
Orr J, Dutta A. Multi-Agent Deep Reinforcement Learning for Multi-Robot Applications: A Survey. SENSORS (BASEL, SWITZERLAND) 2023; 23:3625. [PMID: 37050685 PMCID: PMC10098527 DOI: 10.3390/s23073625] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 03/22/2023] [Accepted: 03/28/2023] [Indexed: 06/19/2023]
Abstract
Deep reinforcement learning has produced many success stories in recent years. Some example fields in which these successes have taken place include mathematics, games, health care, and robotics. In this paper, we are especially interested in multi-agent deep reinforcement learning, where multiple agents present in the environment not only learn from their own experiences but also from each other and its applications in multi-robot systems. In many real-world scenarios, one robot might not be enough to complete the given task on its own, and, therefore, we might need to deploy multiple robots who work together towards a common global objective of finishing the task. Although multi-agent deep reinforcement learning and its applications in multi-robot systems are of tremendous significance from theoretical and applied standpoints, the latest survey in this domain dates to 2004 albeit for traditional learning applications as deep reinforcement learning was not invented. We classify the reviewed papers in our survey primarily based on their multi-robot applications. Our survey also discusses a few challenges that the current research in this domain faces and provides a potential list of future applications involving multi-robot systems that can benefit from advances in multi-agent deep reinforcement learning.
Collapse
|
29
|
Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay. COMPLEX INTELL SYST 2023. [DOI: 10.1007/s40747-023-00985-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023]
Abstract
AbstractMulti-agent multi-target search strategies can be utilized in complex scenarios such as post-disaster search and rescue by unmanned aerial vehicles. To solve the problem of fixed target and trajectory, the current multi-agent multi-target search strategies are mainly based on deep reinforcement learning (DRL). However, the training of agents by the DRL tend to be brittle due to their sensitivity to the training environment, which makes the strategies learned by the agents fall into local optima frequently, resulting in poor system robustness. Additionally, sparse rewards in DRL will lead to the problems such as difficulty in system convergence and low utilization efficiency of the sampled data. To address the problem that the robustness of the agents is weakened and the sparse rewards exist in the multi-objective search environment, we propose a MiniMax Multi-agent Deep Deterministic Policy Gradient based on the Parallel Hindsight Experience Replay (PHER-M3DDPG) algorithm, which adopts the framework of centralized training and decentralized execution in continuous action space. To enhance the system robustness, the PHER-M3DDPG algorithm employs a minimax learning architecture, which adaptively adjusts the learning strategy of agents by involving adversarial disturbances. In addition, to solve the sparse rewards problem, the PHER-M3DDPG algorithm adopts a parallel hindsight experience replay mechanism to increase the efficiency of data utilization by involving virtual learning targets and batch processing of the sampled data. Simulation results show that the PHER-M3DDPG algorithm outperforms the existing algorithms in terms of convergence speed and the task completion time in a multi-target search environment.
Collapse
|
30
|
Guan Y, Ren Y, Sun Q, Li SE, Ma H, Duan J, Dai Y, Cheng B. Integrated Decision and Control: Toward Interpretable and Computationally Efficient Driving Intelligence. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:859-873. [PMID: 35439160 DOI: 10.1109/tcyb.2022.3163816] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Decision and control are core functionalities of high-level automated vehicles. Current mainstream methods, such as functional decomposition and end-to-end reinforcement learning (RL), suffer high time complexity or poor interpretability and adaptability on real-world autonomous driving tasks. In this article, we present an interpretable and computationally efficient framework called integrated decision and control (IDC) for automated vehicles, which decomposes the driving task into static path planning and dynamic optimal tracking that are structured hierarchically. First, the static path planning generates several candidate paths only considering static traffic elements. Then, the dynamic optimal tracking is designed to track the optimal path while considering the dynamic obstacles. To that end, we formulate a constrained optimal control problem (OCP) for each candidate path, optimize them separately, and follow the one with the best tracking performance. To unload the heavy online computation, we propose a model-based RL algorithm that can be served as an approximate-constrained OCP solver. Specifically, the OCPs for all paths are considered together to construct a single complete RL problem and then solved offline in the form of value and policy networks for real-time online path selecting and tracking, respectively. We verify our framework in both simulations and the real world. Results show that compared with baseline methods, IDC has an order of magnitude higher online computing efficiency, as well as better driving performance, including traffic efficiency and safety. In addition, it yields great interpretability and adaptability among different driving scenarios and tasks.
Collapse
|
31
|
Bai C, Wang L, Wang Y, Wang Z, Zhao R, Bai C, Liu P. Addressing Hindsight Bias in Multigoal Reinforcement Learning. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:392-405. [PMID: 34495860 DOI: 10.1109/tcyb.2021.3107202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Multigoal reinforcement learning (RL) extends the typical RL with goal-conditional value functions and policies. One efficient multigoal RL algorithm is the hindsight experience replay (HER). By treating a hindsight goal from failed experiences as the original goal, HER enables the agent to receive rewards frequently. However, a key assumption of HER is that the hindsight goals do not change the likelihood of the sampled transitions and trajectories used in training, which is not the fact according to our analysis. More specifically, we show that using hindsight goals changes such a likelihood and results in a biased learning objective for multigoal RL. We analyze the hindsight bias due to this use of hindsight goals and propose the bias-corrected HER (BHER), an efficient algorithm that corrects the hindsight bias in training. We further show that BHER outperforms several state-of-the-art multigoal RL approaches in challenging robotics tasks.
Collapse
|
32
|
Explaining deep reinforcement learning decisions in complex multiagent settings: towards enabling automation in air traffic flow management. APPL INTELL 2023; 53:4063-4098. [PMID: 35694685 PMCID: PMC9169601 DOI: 10.1007/s10489-022-03605-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/03/2022] [Indexed: 02/04/2023]
Abstract
With the objective to enhance human performance and maximize engagement during the performance of tasks, we aim to advance automation for decision making in complex and large-scale multi-agent settings. Towards these goals, this paper presents a deep multi agent reinforcement learning method for resolving demand - capacity imbalances in real-world Air Traffic Management settings with thousands of agents. Agents comprising the system are able to jointly decide on the measures to be applied to resolve imbalances, while they provide explanations on their decisions: This information is rendered and explored via appropriate visual analytics tools. The paper presents how major challenges of scalability and complexity are addressed, and provides results from evaluation tests that show the abilities of models to provide high-quality solutions and high-fidelity explanations.
Collapse
|
33
|
A fractional filter based on reinforcement learning for effective tracking under impulsive noise. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2022.10.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
34
|
Huang H, Hu Z, Lu Z, Wen X. Network-Scale Traffic Signal Control via Multiagent Reinforcement Learning With Deep Spatiotemporal Attentive Network. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:262-274. [PMID: 34343099 DOI: 10.1109/tcyb.2021.3087228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The continuous development of intelligent traffic control systems has a profound influence on urban traffic planning and traffic management. Indeed, as big data and artificial intelligence continue to evolve, the traffic control strategy based on deep reinforcement learning (RL) has been proven to be a promising method to improve the efficiency of intersections and save people's travel time. However, the existing algorithms ignore the temporal and spatial characteristics of intersections. In this article, we propose a multiagent RL based on the deep spatiotemporal attentive neural network (MARL-DSTAN) to determine the traffic signal timing in a large-scale road network. In this model, the state information captures the spatial dependency of the entire road network by leveraging the graph convolutional network (GCN) and integrates the information based on the importance of intersections via the attention mechanism. Meanwhile, to accumulate more valuable samples and enhance the learning efficiency, the recurrent neural network (RNN) is introduced in the exploration stage to constrain the action search space instead of fully random exploration. MARL-DSTAN decomposes the large-scale area into multiple base environments, and the agents in each base environment use the idea of "centralized training and decentralized execution" to learn to accelerate the algorithm convergence. The simulation results show that our algorithm significantly outperforms the fixed timing scheme and several other state-of-the-art baseline RL algorithms.
Collapse
|
35
|
Learning multi-agent coordination through connectivity-driven communication. Mach Learn 2022. [DOI: 10.1007/s10994-022-06286-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
AbstractIn artificial multi-agent systems, the ability to learn collaborative policies is predicated upon the agents’ communication skills: they must be able to encode the information received from the environment and learn how to share it with other agents as required by the task at hand. We present a deep reinforcement learning approach, Connectivity Driven Communication (CDC), that facilitates the emergence of multi-agent collaborative behaviour only through experience. The agents are modelled as nodes of a weighted graph whose state-dependent edges encode pair-wise messages that can be exchanged. We introduce a graph-dependent attention mechanisms that controls how the agents’ incoming messages are weighted. This mechanism takes into full account the current state of the system as represented by the graph, and builds upon a diffusion process that captures how the information flows on the graph. The graph topology is not assumed to be known a priori, but depends dynamically on the agents’ observations, and is learnt concurrently with the attention mechanism and policy in an end-to-end fashion. Our empirical results show that CDC is able to learn effective collaborative policies and can over-perform competing learning algorithms on cooperative navigation tasks.
Collapse
|
36
|
Twin attentive deep reinforcement learning for multi-agent defensive convoy. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01759-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
37
|
Li J, Ma Y, Gao R, Cao Z, Lim A, Song W, Zhang J. Deep Reinforcement Learning for Solving the Heterogeneous Capacitated Vehicle Routing Problem. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:13572-13585. [PMID: 34554923 DOI: 10.1109/tcyb.2021.3111082] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Existing deep reinforcement learning (DRL)-based methods for solving the capacitated vehicle routing problem (CVRP) intrinsically cope with a homogeneous vehicle fleet, in which the fleet is assumed as repetitions of a single vehicle. Hence, their key to construct a solution solely lies in the selection of the next node (customer) to visit excluding the selection of vehicle. However, vehicles in real-world scenarios are likely to be heterogeneous with different characteristics that affect their capacity (or travel speed), rendering existing DRL methods less effective. In this article, we tackle heterogeneous CVRP (HCVRP), where vehicles are mainly characterized by different capacities. We consider both min-max and min-sum objectives for HCVRP, which aim to minimize the longest or total travel time of the vehicle(s) in the fleet. To solve those problems, we propose a DRL method based on the attention mechanism with a vehicle selection decoder accounting for the heterogeneous fleet constraint and a node selection decoder accounting for the route construction, which learns to construct a solution by automatically selecting both a vehicle and a node for this vehicle at each step. Experimental results based on randomly generated instances show that, with desirable generalization to various problem sizes, our method outperforms the state-of-the-art DRL method and most of the conventional heuristics, and also delivers competitive performance against the state-of-the-art heuristic method, that is, slack induction by string removal. In addition, the results of extended experiments demonstrate that our method is also able to solve CVRPLib instances with satisfactory performance.
Collapse
|
38
|
Ji Z, Chen C, He J, Zhu S, Guan X. Edge Sensing and Control Co-Design for Industrial Cyber-Physical Systems: Observability Guaranteed Method. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:13350-13362. [PMID: 34343098 DOI: 10.1109/tcyb.2021.3079149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The new generation of the industrial cyber-physical system (ICPS) supported by the edge computing technology facilitates the deep integration of sensing and control. System observability is the key factor to characterize the internal relationship of them. In most existing works, the observability is regarded as the assumption for subsequent sensing and control. But, in fact, with the gradually expanded network scale, this assumption is more difficult to directly satisfy sensing design. For this problem, we propose the observability guaranteed method (OGM) for edge sensing and control co-design. Specifically, the nonconvex observability condition is transformed into the convex range of key parameters of the sensing strategy based on the graph signal processing (GSP) technology. Then, we establish the relationship between these parameters and control performance. In OGM, except the previous design from sensing to control, we reversely adjust the sensing design for control demands to satisfy observability. Finally, our algorithm is applied into the hot rolling laminar cooling process based on the semiphysical evaluation. The effectiveness is verified by the results.
Collapse
|
39
|
Bahrpeyma F, Reichelt D. A review of the applications of multi-agent reinforcement learning in smart factories. Front Robot AI 2022; 9:1027340. [DOI: 10.3389/frobt.2022.1027340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 11/08/2022] [Indexed: 12/04/2022] Open
Abstract
The smart factory is at the heart of Industry 4.0 and is the new paradigm for establishing advanced manufacturing systems and realizing modern manufacturing objectives such as mass customization, automation, efficiency, and self-organization all at once. Such manufacturing systems, however, are characterized by dynamic and complex environments where a large number of decisions should be made for smart components such as production machines and the material handling system in a real-time and optimal manner. AI offers key intelligent control approaches in order to realize efficiency, agility, and automation all at once. One of the most challenging problems faced in this regard is uncertainty, meaning that due to the dynamic nature of the smart manufacturing environments, sudden seen or unseen events occur that should be handled in real-time. Due to the complexity and high-dimensionality of smart factories, it is not possible to predict all the possible events or prepare appropriate scenarios to respond. Reinforcement learning is an AI technique that provides the intelligent control processes needed to deal with such uncertainties. Due to the distributed nature of smart factories and the presence of multiple decision-making components, multi-agent reinforcement learning (MARL) should be incorporated instead of single-agent reinforcement learning (SARL), which, due to the complexities involved in the development process, has attracted less attention. In this research, we will review the literature on the applications of MARL to tasks within a smart factory and then demonstrate a mapping connecting smart factory attributes to the equivalent MARL features, based on which we suggest MARL to be one of the most effective approaches for implementing the control mechanism for smart factories.
Collapse
|
40
|
Zhu Y, Pang JH, Gao T, Tian FB. Learning to school in dense configurations with multi-agent deep reinforcement learning. BIOINSPIRATION & BIOMIMETICS 2022; 18:015003. [PMID: 36322983 DOI: 10.1088/1748-3190/ac9fb5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Accepted: 11/01/2022] [Indexed: 06/16/2023]
Abstract
Fish are observed to school in different configurations. However, how and why fish maintain a stable schooling formation still remains unclear. This work presents a numerical study of the dense schooling of two free swimmers by a hybrid method of the multi-agent deep reinforcement learning and the immersed boundary-lattice Boltzmann method. Active control policies are developed by synchronously training the leader to swim at a given speed and orientation and the follower to hold close proximity to the leader. After training, the swimmers could resist the strong hydrodynamic force to remain in stable formations and meantime swim in desired path, only by their tail-beat flapping. The tail movement of the swimmers in the stable formations are irregular and asymmetrical, indicating the swimmers are carefully adjusting their body-kinematics to balance the hydrodynamic force. In addition, a significant decrease in the mean amplitude and the cost of transport is found for the followers, indicating these swimmers could maintain the swimming speed with less efforts. The results also show that the side-by-side formation is hydrodynamically more stable but energetically less efficient than other configurations, while the full-body staggered formation is energetically more efficient as a whole.
Collapse
Affiliation(s)
- Yi Zhu
- Ocean Intelligence Technology Center, Shenzhen Institute of Guangdong Ocean University, Shenzhen, Guangdong 518055, People's Republic of China
| | - Jian-Hua Pang
- Ocean Intelligence Technology Center, Shenzhen Institute of Guangdong Ocean University, Shenzhen, Guangdong 518055, People's Republic of China
- College of Ocean Engineering, Guangdong Ocean University, Zhanjiang, Guangdong 524088, People's Republic of China
| | - Tong Gao
- Department of Mechanical Engineering, Michigan State University, East Lansing, MI 48864, United States of America
| | - Fang-Bao Tian
- School of Engineering and Information Technology, University of New South Wales, Canberra, ACT 2600, Australia
| |
Collapse
|
41
|
Sun Q, Yao Y, Yi P, Hu Y, Yang Z, Yang G, Zhou X. Learning controlled and targeted communication with the centralized critic for the multi-agent system. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04225-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
42
|
Wong A, Bäck T, Kononova AV, Plaat A. Deep multiagent reinforcement learning: challenges and directions. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10299-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
AbstractThis paper surveys the field of deep multiagent reinforcement learning (RL). The combination of deep neural networks with RL has gained increased traction in recent years and is slowly shifting the focus from single-agent to multiagent environments. Dealing with multiple agents is inherently more complex as (a) the future rewards depend on multiple players’ joint actions and (b) the computational complexity increases. We present the most common multiagent problem representations and their main challenges, and identify five research areas that address one or more of these challenges: centralised training and decentralised execution, opponent modelling, communication, efficient coordination, and reward shaping. We find that many computational studies rely on unrealistic assumptions or are not generalisable to other settings; they struggle to overcome the curse of dimensionality or nonstationarity. Approaches from psychology and sociology capture promising relevant behaviours, such as communication and coordination, to help agents achieve better performance in multiagent settings. We suggest that, for multiagent RL to be successful, future research should address these challenges with an interdisciplinary approach to open up new possibilities in multiagent RL.
Collapse
|
43
|
A review of cooperative multi-agent deep reinforcement learning. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04105-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
44
|
Bahamid A, Mohd Ibrahim A. A review on crowd analysis of evacuation and abnormality detection based on machine learning systems. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07758-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
45
|
Cheng Y, Huang L, Wang X. Authentic Boundary Proximal Policy Optimization. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:9428-9438. [PMID: 33705327 DOI: 10.1109/tcyb.2021.3051456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In recent years, the proximal policy optimization (PPO) algorithm has received considerable attention because of its excellent performance in many challenging tasks. However, there is still a large space for theoretical explanation of the mechanism of PPO's horizontal clipping operation, which is a key means to improve the performance of PPO. In addition, while PPO is inspired by the learning theory of trust region policy optimization (TRPO), the theoretical connection between PPO's clipping operation and TRPO's trust region constraint has not been well studied. In this article, we first analyze the effect of PPO's clipping operation on the objective function of conservative policy iteration, and strictly give the theoretical relationship between PPO and TRPO. Then, a novel first-order policy gradient algorithm called authentic boundary PPO (ABPPO) is proposed, which is based on the authentic boundary setting rule. To ensure the difference between the new and old policies is better kept within the clipping range, by borrowing the idea of ABPPO, we proposed two novel improved PPO algorithms called rollback mechanism-based ABPPO (RMABPPO) and penalized point policy difference-based ABPPO (P3DABPPO), which are based on the ideas of rollback clipping and penalized point policy difference, respectively. Experiments on the continuous robotic control tasks implemented in MuJoCo show that our proposed improved PPO algorithms can effectively improve the learning stability and accelerate the learning speed compared with the original PPO.
Collapse
|
46
|
Xie S, Zhang H, Yu H, Li Y, Zhang Z, Luo X. ET-HF: A novel information sharing model to improve multi-agent cooperation. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
47
|
Controlling Fleets of Autonomous Mobile Robots with Reinforcement Learning: A Brief Survey. ROBOTICS 2022. [DOI: 10.3390/robotics11050085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Controlling a fleet of autonomous mobile robots (AMR) is a complex problem of optimization. Many approached have been conducted for solving this problem. They range from heuristics, which usually do not find an optimum, to mathematical models, which are limited due to their high computational effort. Machine Learning (ML) methods offer another potential trajectory for solving such complex problems. The focus of this brief survey is on Reinforcement Learning (RL) as a particular type of ML. Due to the reward-based optimization, RL offers a good basis for the control of fleets of AMR. In the context of this survey, different control approaches are investigated and the aspects of fleet control of AMR with respect to RL are evaluated. As a result, six fundamental key problems should be put on the current research agenda to enable a broader application in industry: (1) overcoming the “sim-to-real gap”, (2) increasing the robustness of algorithms, (3) improving data efficiency, (4) integrating different fields of application, (5) enabling heterogeneous fleets with different types of AMR and (6) handling of deadlocks.
Collapse
|
48
|
Artificial Intelligence in Adaptive and Intelligent Educational System: A Review. FUTURE INTERNET 2022. [DOI: 10.3390/fi14090245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
There has been much discussion among academics on how pupils may be taught online while yet maintaining a high degree of learning efficiency, in part because of the worldwide COVID-19 pandemic in the previous two years. Students may have trouble focusing due to a lack of teacher–student interaction, yet online learning has some advantages that are unavailable in traditional classrooms. The architecture of online courses for students is integrated into a system called the Adaptive and Intelligent Education System (AIES). In AIESs, reinforcement learning is often used in conjunction with the development of teaching strategies, and this reinforcement-learning-based system is known as RLATES. As a prerequisite to conducting research in this field, this paper undertakes the consolidation and analysis of existing research, design approaches, and model categories for adaptive and intelligent educational systems, with the hope of serving as a reference for scholars in the same field to help them gain access to the relevant information quickly and easily.
Collapse
|
49
|
Shi Y, Mu C, Hao Y, Ma S, Xu N, Chong Z. Day‐ahead optimal dispatching of hybrid power system based on deep reinforcement learning. COGNITIVE COMPUTATION AND SYSTEMS 2022. [DOI: 10.1049/ccs2.12068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
- Yakun Shi
- School of Electrical and Information Engineering Tianjin University Tianjin China
| | - Chaoxu Mu
- School of Electrical and Information Engineering Tianjin University Tianjin China
| | - Yi Hao
- Electric Power Research Institute State Grid Tianjin Electric Power Company Tianjin China
| | - Shiqian Ma
- Electric Power Research Institute State Grid Tianjin Electric Power Company Tianjin China
| | - Na Xu
- School of Electrical and Information Engineering Tianjin University Tianjin China
| | - Zhiqiang Chong
- Electric Power Research Institute State Grid Tianjin Electric Power Company Tianjin China
| |
Collapse
|
50
|
Jing F, Zhang H, Gao M, Xue B, Cao K. RIS-Assisted Multi-Antenna AmBC Signal Detection Using Deep Reinforcement Learning. SENSORS (BASEL, SWITZERLAND) 2022; 22:6137. [PMID: 36015896 PMCID: PMC9414307 DOI: 10.3390/s22166137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 08/09/2022] [Accepted: 08/10/2022] [Indexed: 06/15/2023]
Abstract
Signal detection is one of the most critical and challenging issues in ambient backscatter communication (AmBC) systems. In this paper, a multi-antenna AmBC signal detection method is proposed based on reconfigurable intelligent surface (RIS) and deep reinforcement learning. Firstly, an efficient multi-antenna AmBC system is developed based on RIS, which can achieve information transmission and energy collection simultaneously. Secondly, a smart twin delayed deep deterministic (TD3) AmBC signal detection method is presented, based on deep reinforcement learning. Extensive quantitative and qualitative experiments are performed, which show that the proposed method is more compelling than the outstanding comparison methods.
Collapse
Affiliation(s)
- Feng Jing
- School of Telecommunication Engineering, Xidian University, Xi’an 710126, China
- School of Information and Communication, National University of Defense Technology, Xi’an 430035, China
- Shaanxi Key Laboratory of Intelligence Coordination Networks, Xi’an 710048, China
| | - Hailin Zhang
- School of Telecommunication Engineering, Xidian University, Xi’an 710126, China
- Shaanxi Key Laboratory of Intelligence Coordination Networks, Xi’an 710048, China
| | - Mei Gao
- School of Telecommunication Engineering, Xidian University, Xi’an 710126, China
- School of Information and Communication, National University of Defense Technology, Xi’an 430035, China
- Shaanxi Key Laboratory of Intelligence Coordination Networks, Xi’an 710048, China
| | - Bin Xue
- School of Information and Communication, National University of Defense Technology, Xi’an 430035, China
- Shaanxi Key Laboratory of Intelligence Coordination Networks, Xi’an 710048, China
- Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
| | - Kunrui Cao
- School of Information and Communication, National University of Defense Technology, Xi’an 430035, China
| |
Collapse
|