1
|
Liu X, Zheng L, Zhang W, Zhou J, Cao S, Yu S. An Evolutive Frequent Pattern Tree-based Incremental Knowledge Discovery Algorithm. ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS 2022. [DOI: 10.1145/3495213] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
To understand current situation in specific scenarios, valuable knowledge should be mined from both historical data and emerging new data. However, most existing algorithms take the historical data and the emerging data as a whole and periodically repeat to analyze all of them, which results in heavy computation overhead. It is also challenging to accurately discover new knowledge in time, because the emerging data are usually small compared to the historical data. To address these challenges, we propose a novel knowledge discovery algorithm based on double evolving frequent pattern trees that can trace the dynamically evolving data by an incremental sliding window. One tree is used to record frequent patterns from the historical data, and the other one records incremental frequent items. The structures of the double frequent pattern trees and their relationships are updated periodically according to the emerging data and a sliding window. New frequent patterns are mined from the incremental data and new knowledge can be obtained from pattern changes. Evaluations show that this algorithm can discover new knowledge from evolving data with good performance and high accuracy.
Collapse
Affiliation(s)
- Xin Liu
- College of Computer Science and Technology, China University of Petroleum(East China), Qingdao, China
| | - Liang Zheng
- College of Computer Science and Technology, China University of Petroleum(East China), Qingdao, China
| | - Weishan Zhang
- College of Computer Science and Technology, China University of Petroleum(East China), Qingdao, China
| | - Jiehan Zhou
- Information Technology and Electrical Engineering, University of Oulu, Finland
| | - Shuai Cao
- Sangfor Technologies Inc. Shenzhen, China
| | - Shaowen Yu
- College of Computer Science and Technology, China University of Petroleum(East China), Qingdao, China
| |
Collapse
|
2
|
Lin Q, Gan W, Wu Y, Chen J, Chen CM. Smart System: Joint Utility and Frequency for Pattern Classification. ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS 2022. [DOI: 10.1145/3531480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
Nowadays, the environments of smart systems for Industry 4.0 and Internet of Things (IoT) are experiencing fast industrial upgrading. Big data technologies such as design making, event detection, and classification are developed to help manufacturing organizations to achieve smart systems. By applying data analysis, the potential values of rich data can be maximized and thus help manufacturing organizations to finish another round of upgrading. In this paper, we propose two new algorithms with respect to big data analysis, namely UFC
gen
and UFC
fast
. Both algorithms are designed to collect three types of patterns to help people determine the market positions for different product combinations. We compare these algorithms on various types of datasets, both real and synthetic. The experimental results show that both algorithms can successfully achieve pattern classification by utilizing three different types of interesting patterns from all candidate patterns based on user-specified thresholds of utility and frequency. Furthermore, the list-based UFC
fast
algorithm outperforms the level-wise-based UFC
gen
algorithm in terms of both execution time and memory consumption.
Collapse
Affiliation(s)
- Qi Lin
- Jinan University of Birmingham Joint Institute, China
| | | | | | | | | |
Collapse
|
3
|
Nawaz MS, Fournier-Viger P, Yun U, Wu Y, Song W. Mining High Utility Itemsets with Hill Climbing and Simulated Annealing. ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS 2022. [DOI: 10.1145/3462636] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
High utility itemset mining (HUIM) is the task of finding all items set, purchased together, that generate a high profit in a transaction database. In the past, several algorithms have been developed to mine high utility itemsets (HUIs). However, most of them cannot properly handle the exponential search space while finding HUIs when the size of the database and total number of items increases. Recently, evolutionary and heuristic algorithms were designed to mine HUIs, which provided considerable performance improvement. However, they can still have a long runtime and some may miss many HUIs. To address this problem, this article proposes two algorithms for HUIM based on Hill Climbing (HUIM-HC) and Simulated Annealing (HUIM-SA). Both algorithms transform the input database into a bitmap for efficient utility computation and for search space pruning. To improve population diversity, HUIs discovered by evolution are used as target values for the next population instead of keeping the current optimal values in the next population. Through experiments on real-life datasets, it was found that the proposed algorithms are faster than state-of-the-art heuristic and evolutionary HUIM algorithms, that HUIM-SA discovers similar HUIs, and that HUIM-SA evolves linearly with the number of iterations.
Collapse
Affiliation(s)
- M. Saqib Nawaz
- Harbin Institute of Technology (Shenzhen), Shenzhen, China
| | | | | | - Youxi Wu
- Hebei University of Technology, Tianjin, China
| | - Wei Song
- North China University of Technology, Beijing, China
| |
Collapse
|