1
|
Li C, Wei X, Wang J, Wang S, Zhang S. A review of reinforcement learning based hyper-heuristics. PeerJ Comput Sci 2024; 10:e2141. [PMID: 38983203 PMCID: PMC11232579 DOI: 10.7717/peerj-cs.2141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 05/29/2024] [Indexed: 07/11/2024]
Abstract
The reinforcement learning based hyper-heuristics (RL-HH) is a popular trend in the field of optimization. RL-HH combines the global search ability of hyper-heuristics (HH) with the learning ability of reinforcement learning (RL). This synergy allows the agent to dynamically adjust its own strategy, leading to a gradual optimization of the solution. Existing researches have shown the effectiveness of RL-HH in solving complex real-world problems. However, a comprehensive introduction and summary of the RL-HH field is still blank. This research reviews currently existing RL-HHs and presents a general framework for RL-HHs. This article categorizes the type of algorithms into two categories: value-based reinforcement learning hyper-heuristics and policy-based reinforcement learning hyper-heuristics. Typical algorithms in each category are summarized and described in detail. Finally, the shortcomings in existing researches on RL-HH and future research directions are discussed.
Collapse
Affiliation(s)
- Cuixia Li
- School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou, Henan, China
| | - Xiang Wei
- School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou, Henan, China
| | - Jing Wang
- School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou, Henan, China
| | - Shuozhe Wang
- School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou, Henan, China
| | - Shuyan Zhang
- School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou, Henan, China
| |
Collapse
|
2
|
Becerra-Rozas M, Crawford B, Soto R, Talbi EG, Gómez-Pulido JM. Challenging the Limits of Binarization: A New Scheme Selection Policy Using Reinforcement Learning Techniques for Binary Combinatorial Problem Solving. Biomimetics (Basel) 2024; 9:89. [PMID: 38392135 PMCID: PMC10887011 DOI: 10.3390/biomimetics9020089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 01/13/2024] [Accepted: 01/30/2024] [Indexed: 02/24/2024] Open
Abstract
In this study, we introduce an innovative policy in the field of reinforcement learning, specifically designed as an action selection mechanism, and applied herein as a selector for binarization schemes. These schemes enable continuous metaheuristics to be applied to binary problems, thereby paving new paths in combinatorial optimization. To evaluate its efficacy, we implemented this policy within our BSS framework, which integrates a variety of reinforcement learning and metaheuristic techniques. Upon resolving 45 instances of the Set Covering Problem, our results demonstrate that reinforcement learning can play a crucial role in enhancing the binarization techniques employed. This policy not only significantly outperformed traditional methods in terms of precision and efficiency, but also proved to be extensible and adaptable to other techniques and similar problems. The approach proposed in this article is capable of significantly surpassing traditional methods in precision and efficiency, which could have important implications for a wide range of real-world applications. This study underscores the philosophy behind our approach: utilizing reinforcement learning not as an end in itself, but as a powerful tool for solving binary combinatorial problems, emphasizing its practical applicability and potential to transform the way we address complex challenges across various fields.
Collapse
Affiliation(s)
- Marcelo Becerra-Rozas
- Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2241, Valparaíso 2362807, Chile
| | - Broderick Crawford
- Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2241, Valparaíso 2362807, Chile
| | - Ricardo Soto
- Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2241, Valparaíso 2362807, Chile
| | - El-Ghazali Talbi
- CNRS UMR 9189, Centre de Recherche en Informatique Signal et Automatique de Lille (CRIStAL), University of Lille, F-59000 Lille, France
| | - Jose M Gómez-Pulido
- Health Computing and Intelligent Systems Research Group (HCIS), Department of Computer Science, University of Alcalá, 28805 Alcalá de Henares, Spain
| |
Collapse
|
3
|
Zhao F, Di S, Wang L. A Hyperheuristic With Q-Learning for the Multiobjective Energy-Efficient Distributed Blocking Flow Shop Scheduling Problem. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:3337-3350. [PMID: 35994539 DOI: 10.1109/tcyb.2022.3192112] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Carbon peaking and carbon neutrality, which are the significant national strategy for sustainable development, have attracted considerable attention from production enterprises. In this study, the energy consumption is considered in the distributed blocking flow shop scheduling problem (DBFSP). A hyperheuristic with Q -learning (HHQL) is presented to address the energy-efficient DBFSP (EEDBFSP). Q -learning is employed to select an appropriate low-level heuristic (LLH) from a predesigned LLH set according to historical information fed back by LLH. An initialization method, which considers both total tardiness (TTD) and total energy consumption (TEC), is proposed to construct the initial population. The ε -greedy strategy is introduced to utilize the learned knowledge while retaining a certain degree of exploration in the process of selecting LLH. The acceleration operation of the job on the critical path is designed to optimize TTD. The deceleration operation of the job on the noncritical path is designed to optimize TEC. The statistical and computational experimentation in an extensive benchmark testified that the HHQL outperforms the other comparison algorithm regarding efficiency and significance in solving EEDBFSP.
Collapse
|
4
|
Gümüş DB, Özcan E, Atkin J, Drake JH. An investigation of F-Race training strategies for cross domain optimisation with memetic algorithms. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.11.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
5
|
A hyper-heuristic guided by a probabilistic graphical model for single-objective real-parameter optimization. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01623-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
|
6
|
Tong Z, Chen H, Liu B, Cai J, Cai S. A novel intelligent hyper-heuristic algorithm for solving optimization problems. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-211250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In recent years, solving combinatorial optimization problems involves more complications, high dimensions, and multi-objective considerations. Combining the advantages of other evolutionary algorithms to enhance the performance of a unique evolutionary algorithm and form a new hybrid heuristic algorithm has become a way to strengthen the performance of the algorithm effectively. However, the intelligent hybrid heuristic algorithm destroys the integrity, universality, and robustness of the original algorithm to a certain extent and increases its time complexity. This paper implements a new idea “ML to choose heuristics” (a heuristic algorithm combined with machine learning technology) which uses the Q-learning method to learn different strategies in genetic algorithm. Moreover, a selection-based hyper-heuristic algorithm is obtained that can guide the algorithm to make decisions at different time nodes to select appropriate strategies. The algorithm is the hybrid strategy using Q-learning on StudGA (HSQ-StudGA). The experimental results show that among the 14 standard test functions, the evolutionary algorithm guided by Q-learning can effectively improve the quality of arithmetic solution. Under the premise of not changing the evolutionary structure of the algorithm, the hyper-heuristic algorithm represents a new method to solve combinatorial optimization problems.
Collapse
Affiliation(s)
- Zhao Tong
- College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Hongjian Chen
- College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Bilan Liu
- College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Jinhui Cai
- College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Shuo Cai
- The School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha, China
| |
Collapse
|
7
|
A Novel Learning-Based Binarization Scheme Selector for Swarm Algorithms Solving Combinatorial Problems. MATHEMATICS 2021. [DOI: 10.3390/math9222887] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Currently, industry is undergoing an exponential increase in binary-based combinatorial problems. In this regard, metaheuristics have been a common trend in the field in order to design approaches to successfully solve them. Thus, a well-known strategy includes the employment of continuous swarm-based algorithms transformed to perform in binary environments. In this work, we propose a hybrid approach that contains discrete smartly adapted population-based strategies to efficiently tackle binary-based problems. The proposed approach employs a reinforcement learning technique, known as SARSA (State–Action–Reward–State–Action), in order to utilize knowledge based on the run time. In order to test the viability and competitiveness of our proposal, we compare discrete state-of-the-art algorithms smartly assisted by SARSA. Finally, we illustrate interesting results where the proposed hybrid outperforms other approaches, thus, providing a novel option to tackle these types of problems in industry.
Collapse
|
8
|
A Hyper Heuristic Algorithm Based Genetic Programming for Steel Production Scheduling of Cyber-Physical System-ORIENTED. MATHEMATICS 2021. [DOI: 10.3390/math9182256] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Intelligent manufacturing is the trend of the steel industry. A cyber-physical system oriented steel production scheduling system framework is proposed. To make up for the difficulty of dynamic scheduling of steel production in a complex environment and provide an idea for developing steel production to intelligent manufacturing. The dynamic steel production scheduling model characteristics are studied, and an ontology-based steel cyber-physical system production scheduling knowledge model and its ontology attribute knowledge representation method are proposed. For the dynamic scheduling, the heuristic scheduling rules were established. With the method, a hyper-heuristic algorithm based on genetic programming is presented. The learning-based high-level selection strategy method was adopted to manage the low-level heuristic. An automatic scheduling rule generation framework based on genetic programming is designed to manage and generate excellent heuristic rules and solve scheduling problems based on different production disturbances. Finally, the performance of the algorithm is verified by a simulation case.
Collapse
|
9
|
Abstract
One of the central issues that must be resolved for a metaheuristic optimization process to work well is the dilemma of the balance between exploration and exploitation. The metaheuristics (MH) that achieved this balance can be called balanced MH, where a Q-Learning (QL) integration framework was proposed for the selection of metaheuristic operators conducive to this balance, particularly the selection of binarization schemes when a continuous metaheuristic solves binary combinatorial problems. In this work the use of this framework is extended to other recent metaheuristics, demonstrating that the integration of QL in the selection of operators improves the exploration-exploitation balance. Specifically, the Whale Optimization Algorithm and the Sine-Cosine Algorithm are tested by solving the Set Covering Problem, showing statistical improvements in this balance and in the quality of the solutions.
Collapse
|
10
|
|
11
|
Figueroa Mora KM, Anzurez Marín J, Cerda J, Carrasco-Ochoa JA, Martínez-Trinidad JF, Olvera-López JA. A Preliminary Study on Score-Based Hyper-heuristics for Solving the Bin Packing Problem. LECTURE NOTES IN COMPUTER SCIENCE 2020. [PMCID: PMC7297569 DOI: 10.1007/978-3-030-49076-8_30] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The bin packing problem is a widespread combinatorial problem. It aims at packing a set of items by using as few bins as possible. Among the many available solving methods, approximation ones such as heuristics have become popular due to their reduced cost and generally acceptable solutions. A further step in this regard is given by hyper-heuristics, which literature usually defines as “high-level heuristics to choose heuristics”. Hyper-heuristics choose one suitable heuristic from a set of available ones, to solve a particular portion of an instance. As the search progresses, heuristics can be exchanged, adapting the solution process to the current problem state under exploration. In this work, we describe how to generate and use hyper-heuristics that keep a record of the scores achieved by individual heuristics on previously solved bin packing problem instances in the form of rules. Then, hyper-heuristics manage those scores to estimate the performance of such heuristics on unseen instances. In this way, the previous actions of the hyper-heuristics determine which heuristic to use on future unseen cases. The experiments conducted under different scenarios yield promising results where some of the hyper-heuristics produced outperform isolated heuristics.
Collapse
Affiliation(s)
| | - Juan Anzurez Marín
- Facultad de Ingeniería Eléctrica, Universidad Michoacana de San Nicolás de Hidalgo, Morelia, Mexico
| | - Jaime Cerda
- Facultad de Ingeniería Eléctrica, Universidad Michoacana de San Nicolás de Hidalgo, Morelia, Mexico
| | - Jesús Ariel Carrasco-Ochoa
- Computer Science, Instituto Nacional de Astrofísica, Óptica y Electrónica, Sta. Maria Tonantzintla, Mexico
| | | | | |
Collapse
|
12
|
Trindade ÁR, Campelo F. Tuning metaheuristics by sequential optimisation of regression models. Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2019.105829] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
13
|
Chaurasia SN, Kim JH. An evolutionary algorithm based hyper-heuristic framework for the set packing problem. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2019.07.073] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
14
|
A review on the self and dual interactions between machine learning and optimisation. PROGRESS IN ARTIFICIAL INTELLIGENCE 2019. [DOI: 10.1007/s13748-019-00185-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|