1
|
Wang Z, Chen C, Dong D. Instance Weighted Incremental Evolution Strategies for Reinforcement Learning in Dynamic Environments. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:9742-9756. [PMID: 35349452 DOI: 10.1109/tnnls.2022.3160173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Evolution strategies (ESs), as a family of black-box optimization algorithms, recently emerge as a scalable alternative to reinforcement learning (RL) approaches such as Q-learning or policy gradient and are much faster when many central processing units (CPUs) are available due to better parallelization. In this article, we propose a systematic incremental learning method for ES in dynamic environments. The goal is to adjust previously learned policy to a new one incrementally whenever the environment changes. We incorporate an instance weighting mechanism with ES to facilitate its learning adaptation while retaining scalability of ES. During parameter updating, higher weights are assigned to instances that contain more new knowledge, thus encouraging the search distribution to move toward new promising areas of parameter space. We propose two easy-to-implement metrics to calculate the weights: instance novelty and instance quality. Instance novelty measures an instance's difference from the previous optimum in the original environment, while instance quality corresponds to how well an instance performs in the new environment. The resulting algorithm, instance weighted incremental evolution strategies (IW-IESs), is verified to achieve significantly improved performance on challenging RL tasks ranging from robot navigation to locomotion. This article thus introduces a family of scalable ES algorithms for RL domains that enables rapid learning adaptation to dynamic environments.
Collapse
|
2
|
Zaniolo M, Giuliani M, Castelletti A. Neuro-Evolutionary Direct Policy Search for Multiobjective Optimal Control. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:5926-5938. [PMID: 33882008 DOI: 10.1109/tnnls.2021.3071960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Direct policy search (DPS) is emerging as one of the most effective and widely applied reinforcement learning (RL) methods to design optimal control policies for multiobjective Markov decision processes (MOMDPs). Traditionally, DPS defines the control policy within a preselected functional class and searches its optimal parameterization with respect to a given set of objectives. The functional class should be tailored to the problem at hand and its selection is crucial, as it determines the search space within which solutions can be found. In MOMDPs problems, a different objective tradeoff determines a different fitness landscape, requiring a tradeoff-dynamic functional class selection. Yet, in state-of-the-art applications, the policy class is generally selected a priori and kept constant across the multidimensional objective space. In this work, we present a novel policy search routine called neuro-evolutionary multiobjective DPS (NEMODPS), which extends the DPS problem formulation to conjunctively search the policy functional class and its parameterization in a hyperspace containing policy architectures and coefficients. NEMODPS begins with a population of minimally structured approximating networks and progressively builds more sophisticated architectures by topological and parametrical mutation and crossover, and selection of the fittest individuals concerning multiple objectives. We tested NEMODPS for the problem of designing the control policy of a multipurpose water system. Numerical results show that the tradeoff-dynamic structural and parametrical policy search of NEMODPS is consistent across multiple runs, and outperforms the solutions designed via traditional DPS with predefined policy topologies.
Collapse
|
3
|
Kapoor A, Nukala E, Chandra R. Bayesian neuroevolution using distributed swarm optimization and tempered MCMC. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
4
|
Adaptive evolution strategy with ensemble of mutations for Reinforcement Learning. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
5
|
Salih A, Moshaiov A. Evolving topology and weights of specialized and non-specialized neuro-controllers for robot motion in various environments. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07357-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
6
|
|
7
|
Radaideh MI, Forget B, Shirvan K. Large-scale design optimisation of boiling water reactor bundles with neuroevolution. ANN NUCL ENERGY 2021. [DOI: 10.1016/j.anucene.2021.108355] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
8
|
Li S, Li M, Su J, Chen S, Yuan Z, Ye Q. PP-PG: Combining Parameter Perturbation with Policy Gradient Methods for Effective and Efficient Explorations in Deep Reinforcement Learning. ACM T INTEL SYST TEC 2021. [DOI: 10.1145/3452008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Efficient and stable exploration remains a key challenge for deep reinforcement learning (DRL) operating in high-dimensional action and state spaces. Recently, a more promising approach by combining the exploration in the action space with the exploration in the parameters space has been proposed to get the best of both methods. In this article, we propose a new iterative and close-loop framework by combining the evolutionary algorithm (EA), which does explorations in a gradient-free manner directly in the parameters space with an actor-critic, and the deep deterministic policy gradient (DDPG) reinforcement learning algorithm, which does explorations in a gradient-based manner in the action space to make these two methods cooperate in a more balanced and efficient way. In our framework, the policies represented by the EA population (the parametric perturbation part) can evolve in a guided manner by utilizing the gradient information provided by the DDPG and the policy gradient part (DDPG) is used only as a fine-tuning tool for the best individual in the EA population to improve the sample efficiency. In particular, we propose a criterion to determine the training steps required for the DDPG to ensure that useful gradient information can be generated from the EA generated samples and the DDPG and EA part can work together in a more balanced way during each generation. Furthermore, within the DDPG part, our algorithm can flexibly switch between fine-tuning the same previous RL-Actor and fine-tuning a new one generated by the EA according to different situations to further improve the efficiency. Experiments on a range of challenging continuous control benchmarks demonstrate that our algorithm outperforms related works and offers a satisfactory trade-off between stability and sample efficiency.
Collapse
Affiliation(s)
- Shilei Li
- Department of Information Security, Naval University of Engineering, Wuhan, China
| | - Meng Li
- Army Academy of Artillery and Air Defense, Hefei, China
| | - Jiongming Su
- College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China
| | - Shaofei Chen
- College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China
| | - Zhimin Yuan
- Department of Information Security, Naval University of Engineering, Wuhan, China
| | - Qing Ye
- Department of Information Security, Naval University of Engineering, Wuhan, China
| |
Collapse
|
9
|
Cuccu G, Togelius J, Cudré-Mauroux P. Playing Atari with few neurons: Improving the efficacy of reinforcement learning by decoupling feature extraction and decision making. AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS 2021; 35:17. [PMID: 34720684 PMCID: PMC8550197 DOI: 10.1007/s10458-021-09497-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 03/09/2021] [Indexed: 06/13/2023]
Abstract
We propose a new method for learning compact state representations and policies separately but simultaneously for policy approximation in vision-based applications such as Atari games. Approaches based on deep reinforcement learning typically map pixels directly to actions to enable end-to-end training. Internally, however, the deep neural network bears the responsibility of both extracting useful information and making decisions based on it, two objectives which can be addressed independently. Separating the image processing from the action selection allows for a better understanding of either task individually, as well as potentially finding smaller policy representations which is inherently interesting. Our approach learns state representations using a compact encoder based on two novel algorithms: (i) Increasing Dictionary Vector Quantization builds a dictionary of state representations which grows in size over time, allowing our method to address new observations as they appear in an open-ended online-learning context; and (ii) Direct Residuals Sparse Coding encodes observations in function of the dictionary, aiming for highest information inclusion by disregarding reconstruction error and maximizing code sparsity. As the dictionary size increases, however, the encoder produces increasingly larger inputs for the neural network; this issue is addressed with a new variant of the Exponential Natural Evolution Strategies algorithm which adapts the dimensionality of its probability distribution along the run. We test our system on a selection of Atari games using tiny neural networks of only 6 to 18 neurons (depending on each game's controls). These are still capable of achieving results that are not much worse, and occasionally superior, to the state-of-the-art in direct policy search which uses two orders of magnitude more neurons.
Collapse
Affiliation(s)
- Giuseppe Cuccu
- eXascale Infolab, Department of Computer Science, University of Fribourg, Fribourg, Switzerland
| | - Julian Togelius
- Game Innovation Lab, Tandon School of Engineering, New York University, New York, NY USA
| | - Philippe Cudré-Mauroux
- eXascale Infolab, Department of Computer Science, University of Fribourg, Fribourg, Switzerland
| |
Collapse
|
10
|
Radaideh MI, Shirvan K. Rule-based reinforcement learning methodology to inform evolutionary algorithms for constrained optimization of engineering applications. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.106836] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
11
|
Zhang W, Zhou Q. Software test data generation technology based on polymorphic particle swarm evolutionary algorithm. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-189811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Combinatorial testing is a statute-based software testing method that aims to select a small number of valid test cases from a large combinatorial space of software under test to generate a set of test cases with high coverage and strong error debunking ability. However, combinatorial test case generation is an NP-hard problem that requires solving the combinatorial problem in polynomial time, so a meta-heuristic search algorithm is needed to solve the problem. Compared with other meta-heuristic search algorithms, the particle swarm algorithm is more competitive in terms of coverage table generation scale and execution time. In this paper, we systematically review and summarize the existing research results on generating combinatorial test case sets using particle swarm algorithm, and propose a combinatorial test case generation method that can handle arbitrary coverage strengths by combining the improved one-test-at-a-time strategy and the adaptive particle swarm algorithm for the variable strength combinatorial test problem and the parameter selection problem of the particle swarm algorithm. To address the parameter configuration problem of the particle swarm algorithm, the four parameters of inertia weight, learning factor, population size and iteration number are reasonably set, which makes the particle swarm algorithm more suitable for the generation of coverage tables. For the inertia weights.
Collapse
Affiliation(s)
- Wenning Zhang
- State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou Henan, China
- Software College, Zhongyuan University of Technology, Zhengzhou, Henan, China
| | - Qinglei Zhou
- School of Information Engineering, Zhengzhou University, Zhengzhou, Henan, China
| |
Collapse
|
12
|
Santos I, Castro L, Rodriguez-Fernandez N, Torrente-Patiño Á, Carballal A. Artificial Neural Networks and Deep Learning in the Visual Arts: a review. Neural Comput Appl 2021. [DOI: 10.1007/s00521-020-05565-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
13
|
Abstract
In general, games pose interesting and complex problems for the implementation of intelligent agents and are a popular domain in the study of artificial intelligence. In fact, games have been at the center of some of the most well-known achievements in artificial intelligence. From classical board games such as chess, checkers, backgammon and Go, to video games such as Dota 2 and StarCraft II, artificial intelligence research has devised computer programs that can play at the level of a human master and even at a human world champion level. Planning and learning, two well-known and successful paradigms of artificial intelligence, have greatly contributed to these achievements. Although representing distinct approaches, planning and learning try to solve similar problems and share some similarities. They can even complement each other. This has led to research on methodologies to combine the strengths of both approaches to derive better solutions. This paper presents a survey of the multiple methodologies that have been proposed to integrate planning and learning in the context of games. In order to provide a richer contextualization, the paper also presents learning and planning techniques commonly used in games, both in terms of their theoretical foundations and applications.
Collapse
|
14
|
Chen D, Wang Y, Gao W. Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning. APPL INTELL 2020. [DOI: 10.1007/s10489-020-01702-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
15
|
|
16
|
From Chess and Atari to StarCraft and Beyond: How Game AI is Driving the World of AI. KUNSTLICHE INTELLIGENZ 2020. [DOI: 10.1007/s13218-020-00647-w] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
17
|
Reinforcement Learning and Neuroevolution in Flappy Bird Game. PATTERN RECOGNITION AND IMAGE ANALYSIS 2019. [DOI: 10.1007/978-3-030-31332-6_20] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
18
|
Yang X, Deng S, Ji M, Zhao J, Zheng W. Neural Network Evolving Algorithm Based on the Triplet Codon Encoding Method. Genes (Basel) 2018; 9:genes9120626. [PMID: 30551648 PMCID: PMC6315701 DOI: 10.3390/genes9120626] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Revised: 12/03/2018] [Accepted: 12/10/2018] [Indexed: 11/29/2022] Open
Abstract
Artificial intelligence research received more and more attention nowadays. Neural Evolution (NE) is one very important branch of AI, which waves the power of evolutionary algorithms to generate Artificial Neural Networks (ANNs). How to use the evolutionary advantages of network topology and weights to solve the application of Artificial Neural Networks is the main problem in the field of NE. In this paper, a novel DNA encoding method based on the triple codon is proposed. Additionally, a NE algorithm Triplet Codon Encoding Neural Network Evolving Algorithm (TCENNE) based on this encoding method is presented to verify the rationality and validity of the coding design. The results show that TCENNE is very effective and more robust than NE algorithms, due to the coding design. Also, it is shown that it can realize the co-evolution of network topology and weights and outperform other neural evolution systems in challenging reinforcement learning tasks.
Collapse
Affiliation(s)
- Xu Yang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.
| | - Songgaojun Deng
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.
| | - Mengyao Ji
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.
| | - Jinfeng Zhao
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.
| | - Wenhao Zheng
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.
| |
Collapse
|