1
|
Huang H, Chen S, Chen J. IPHM: Incremental periodic high-utility mining algorithm in dynamic and evolving data environments. Heliyon 2024; 10:e37761. [PMID: 39328518 PMCID: PMC11425093 DOI: 10.1016/j.heliyon.2024.e37761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 09/08/2024] [Accepted: 09/09/2024] [Indexed: 09/28/2024] Open
Abstract
Periodic high-utility itemset (PHUI) mining can extend beyond the conventional approach of high-utility itemset mining by uncovering recurring customer purchase behaviors common in real-life scenarios (e.g., buying apples and oranges every three days or weekly). Such behaviors, particularly in market basket databases, signify stable patterns that ensure long-term profitability. Existing PHUI mining algorithms assume a static database and incur significant costs when handling incremental databases, as each batch of new transactions necessitates reprocessing the entire dataset. To overcome this challenge, we introduce the Incremental Periodic High-Utility Itemset Miner (IPHM), a method for efficiently extracting periodic high-utility itemsets in incremental database environments. We propose an innovative incremental utility-list structure tailored for incremental database scenarios. Effective pruning strategies are employed to expedite the construction and update of incremental utility-lists and to discard unpromising candidates. As demonstrated by the experimental results, the algorithm is efficacious and efficient, highlighting its practical applicability in dynamic data environments. The algorithm shows a remarkable ability to quickly adapt to database changes, making it highly suitable for applications in market basket analysis where frequent updates are common.
Collapse
Affiliation(s)
- Huiwu Huang
- Guangdong University of Technology, School of Computer Science and Technology, Guangzhou, 510006, China
| | - Shixi Chen
- Guangdong University of Technology, School of Computer Science and Technology, Guangzhou, 510006, China
| | - Jiahui Chen
- Guangdong University of Technology, School of Computer Science and Technology, Guangzhou, 510006, China
| |
Collapse
|
2
|
Luna JM, Kiran RU, Fournier-Viger P, Ventura S. Efficient mining of top-k high utility itemsets through genetic algorithms. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2022.12.092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
3
|
HDSHUI-miner: a novel algorithm for discovering spatial high-utility itemsets in high-dimensional spatiotemporal databases. APPL INTELL 2023. [DOI: 10.1007/s10489-022-04436-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2023]
|
4
|
Li M, Han M, Chen Z, Wu H, Zhang X. FCHM-stream: fast closed high utility itemsets mining over data streams. Knowl Inf Syst 2023. [DOI: 10.1007/s10115-023-01831-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
5
|
Liu X, Chen G, Wen S, Huang J. Effective algorithms for mining frequent-utility itemsets. J EXP THEOR ARTIF IN 2022. [DOI: 10.1080/0952813x.2022.2153281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Xuan Liu
- School of Computer and Data Engineering, NingboTech University, Ningbo, China
- School of Computer Science, Zhejiang University Ningbo Research Institute, Ningbo, China
| | - Genlang Chen
- School of Computer and Data Engineering, NingboTech University, Ningbo, China
- School of Computer Science, Zhejiang University Ningbo Research Institute, Ningbo, China
| | - Shiting Wen
- School of Computer and Data Engineering, NingboTech University, Ningbo, China
- School of Computer Science, Zhejiang University Ningbo Research Institute, Ningbo, China
| | - Jingfang Huang
- School of Computer and Data Engineering, NingboTech University, Ningbo, China
| |
Collapse
|
6
|
Ignoring Internal Utilities in High-Utility Itemset Mining. Symmetry (Basel) 2022. [DOI: 10.3390/sym14112339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
High-utility itemset mining discovers a set of items that are sold together and have utility values higher than a given minimum utility threshold. The utilities of these itemsets are calculated by considering their internal and external utility values, which correspond, respectively, to the quantity sold of each item in each transaction and profit units. Therefore, internal and external utilities have symmetric effects on deciding whether an itemset is high-utility. The symmetric contributions of both utilities cause two major related challenges. First, itemsets with low external utility values can easily exceed the minimum utility threshold if they are sold extensively. In this case, such itemsets can be found more efficiently using frequent itemset mining. Second, a large number of high-utility itemsets are generated, which can result in interesting or important high-utility itemsets that are overlooked. This study presents an asymmetric approach in which the internal utility values are ignored when finding high-utility itemsets with high external utility values. The experimental results of two real datasets reveal that the external utility values have fundamental effects on the high-utility itemsets. The results of this study also show that this effect tends to increase for high values of the minimum utility threshold. Moreover, the proposed approach reduces the execution time.
Collapse
|
7
|
Han M, Cheng H, Zhang N, Li X, Wang L. An efficient algorithm for mining closed high utility itemsets over data streams with one dataset scan. Knowl Inf Syst 2022. [DOI: 10.1007/s10115-022-01763-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
8
|
Han M, Gao Z, Li A, Liu S, Mu D. An overview of high utility itemsets mining methods based on intelligent optimization algorithms. Knowl Inf Syst 2022. [DOI: 10.1007/s10115-022-01741-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
9
|
Iqbal M, Setiawan MN, Irawan MI, Khalif KMNK, Muhammad N, Aziz BM. Cardiovascular disease detection from high utility rare rule mining. Artif Intell Med 2022; 131:102347. [DOI: 10.1016/j.artmed.2022.102347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 06/21/2022] [Accepted: 06/30/2022] [Indexed: 11/16/2022]
|
10
|
Mining fuzzy high average-utility itemsets using fuzzy utility lists and efficient pruning approach. Soft comput 2022. [DOI: 10.1007/s00500-022-07123-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
11
|
Brahmavar AB, Sheeranalli Venkatarama H, Maiya G. PUC: parallel mining of high-utility itemsets with load balancing on spark. JOURNAL OF INTELLIGENT SYSTEMS 2022. [DOI: 10.1515/jisys-2022-0044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Abstract
Distributed programming paradigms such as MapReduce and Spark have alleviated sequential bottleneck while mining of massive transaction databases. Of significant importance is mining High Utility Itemset (HUI) that incorporates the revenue of the items purchased in a transaction. Although a few algorithms to mine HUIs in the distributed environment exist, workload skew and data transfer overhead due to shuffling operations remain major issues. In the current study, Parallel Utility Computation (PUC) algorithm has been proposed with novel grouping and load balancing strategies for an efficient mining of HUIs in a distributed environment. To group the items, Transaction Weighted Utility (TWU) values as a degree of transaction similarity is employed. Subsequently, these groups are assigned to the nodes across the cluster by taking into account the mining load due to the items in the group. Experimental evaluation on real and synthetic datasets demonstrate that PUC with TWU grouping in conjunction with load balancing converges mining faster. Due to reduced data transfer, and load balancing-based assignment strategy, PUC outperforms different grouping strategies and random assignment of groups across the cluster. Also, PUC is shown to be faster than PHUI-Growth algorithm with a promising speedup.
Collapse
Affiliation(s)
- Anup Bhat Brahmavar
- Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education , Manipal , Karnataka , India
| | - Harish Sheeranalli Venkatarama
- Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education , Manipal , Karnataka , India
| | - Geetha Maiya
- Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education , Manipal , Karnataka , India
| |
Collapse
|
12
|
Performance comparison of inertia weight and acceleration coefficients of BPSO in the context of high-utility itemset mining. EVOLUTIONARY INTELLIGENCE 2022. [DOI: 10.1007/s12065-022-00707-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
13
|
|
14
|
Tung N, Nguyen LT, Nguyen TD, Fourier-Viger P, Nguyen NT, Vo B. Efficient mining of cross-level high-utility itemsets in taxonomy quantitative databases. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2021.12.017] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
15
|
Lin JCW, Djenouri Y, Srivastava G, Fourier-Viger P. Efficient evolutionary computation model of closed high-utility itemset mining. APPL INTELL 2022. [DOI: 10.1007/s10489-021-03134-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
16
|
Hui2Vec: Learning Transaction Embedding Through High Utility Itemsets. BIG DATA ANALYTICS 2022. [DOI: 10.1007/978-3-031-24094-2_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
|
17
|
Hidouri A, Jabbour S, Raddaoui B, Ben Yaghlane B. Mining Closed High Utility Itemsets based on Propositional Satisfiability. DATA KNOWL ENG 2021. [DOI: 10.1016/j.datak.2021.101927] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
18
|
|
19
|
Tung NT, Nguyen LTT, Nguyen TDD, Vo B. An efficient method for mining multi-level high utility Itemsets. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02681-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
20
|
|
21
|
Djenouri Y, Lin JCW, Nørvåg K, Ramampiaro H, Yu PS. Exploring Decomposition for Solving Pattern Mining Problems. ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS 2021. [DOI: 10.1145/3439771] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
This article introduces a highly efficient pattern mining technique called Clustering-based Pattern Mining (CBPM). This technique discovers relevant patterns by studying the correlation between transactions in the transaction database based on clustering techniques. The set of transactions is first clustered, such that highly correlated transactions are grouped together. Next, we derive the relevant patterns by applying a pattern mining algorithm to each cluster. We present two different pattern mining algorithms, one applying an approximation-based strategy and another based on an exact strategy. The approximation-based strategy takes into account only the clusters, whereas the exact strategy takes into account both clusters and shared items between clusters. To boost the performance of the CBPM, a GPU-based implementation is investigated. To evaluate the CBPM framework, we perform extensive experiments on several pattern mining problems. The results from the experimental evaluation show that the CBPM provides a reduction in both the runtime and memory usage. Also, CBPM based on the approximate strategy provides good accuracy, demonstrating its effectiveness and feasibility. Our GPU implementation achieves significant speedup of up to 552× on a single GPU using big transaction databases.
Collapse
Affiliation(s)
- Youcef Djenouri
- Dept. of Mathematics and Cybernetics, SINTEF Digital, Oslo, Norway
| | | | | | | | - Philip S. Yu
- Dept. of Computer Science, University of Illinois, Chicago, IL, United States
| |
Collapse
|
22
|
Dynamic maintenance model for high average-utility pattern mining with deletion operation. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02539-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
AbstractThe high average-utility itemset mining (HAUIM) was established to provide a fair measure instead of genetic high-utility itemset mining (HUIM) for revealing the satisfied and interesting patterns. In practical applications, the database is dynamically changed when insertion/deletion operations are performed on databases. Several works were designed to handle the insertion process but fewer studies focused on processing the deletion process for knowledge maintenance. In this paper, we then develop a PRE-HAUI-DEL algorithm that utilizes the pre-large concept on HAUIM for handling transaction deletion in the dynamic databases. The pre-large concept is served as the buffer on HAUIM that reduces the number of database scans while the database is updated particularly in transaction deletion. Two upper-bound values are also established here to reduce the unpromising candidates early which can speed up the computational cost. From the experimental results, the designed PRE-HAUI-DEL algorithm is well performed compared to the Apriori-like model in terms of runtime, memory, and scalability in dynamic databases.
Collapse
|
23
|
|
24
|
Sun R, Han M, Zhang C, Shen M, Du S. Mining of top-k high utility itemsets with negative utility. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-201357] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
High utility itemset mining (HUIM) with negative utility is an emerging data mining task. However, the setting of the minimum utility threshold is always a challenge when mining high utility itemsets (HUIs) with negative items. Although the top-k HUIM method is very common, this method can only mine itemsets with positive items, and the problem of missing itemsets occurs when mining itemsets with negative items. To solve this problem, we first propose an effective algorithm called THN (Top-k High Utility Itemset Mining with Negative Utility). It proposes a strategy for automatically increasing the minimum utility threshold. In order to solve the problem of multiple scans of the database, it uses transaction merging and dataset projection technology. It uses a redefined sub-tree utility value and a redefined local utility value to prune the search space. Experimental results on real datasets show that THN is efficient in terms of runtime and memory usage, and has excellent scalability. Moreover, experiments show that THN performs particularly well on dense datasets.
Collapse
Affiliation(s)
- Rui Sun
- Department of Computer Science and Engineering, North Minzu University, NingXia, China
| | - Meng Han
- Department of Computer Science and Engineering, North Minzu University, NingXia, China
| | - Chunyan Zhang
- Department of Computer Science and Engineering, North Minzu University, NingXia, China
| | - Mingyao Shen
- Department of Computer Science and Engineering, North Minzu University, NingXia, China
| | - Shiyu Du
- Department of Computer Science and Engineering, North Minzu University, NingXia, China
| |
Collapse
|
25
|
Nouioua M, Fournier-Viger P, Wu CW, Lin JCW, Gan W. FHUQI-Miner: Fast high utility quantitative itemset mining. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02204-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
26
|
|
27
|
Gan W, Lin JCW, Chao HC, Fournier-Viger P, Wang X, Yu PS. Utility-Driven Mining of Trend Information for Intelligent System. ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS 2020. [DOI: 10.1145/3391251] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Useful knowledge, embedded in a database, is likely to change over time. Identifying the recent changes in temporal data can provide valuable up-to-date information to decision makers. Nevertheless, techniques for mining high-utility patterns (HUPs) seldom consider recency as a criterion to discover patterns. Thus, the traditional utility mining framework is inadequate for obtaining up-to-date insights about real-world data. In this article, we address this issue by introducing a novel framework, named utility-driven mining of recent/trend high-utility patterns (RUPs), in temporal databases for intelligent systems, based on user-specified minimum recency and minimum utility thresholds. The utility-driven RUP algorithm is based on novel global and conditional downward closure properties, and a recency-utility tree. Moreover, it adopts a vertical compact recency-utility list structure to store the information required by the mining process. The developed RUP algorithm recursively discovers recent high-utility patterns. It is also fast and consumes a small amount of memory due to its pattern discovery approach that does not generate candidates. Two improved versions of the algorithm with additional pruning strategies are also designed to speed up the discovery of patterns by reducing the search space. Results of a substantial experimental evaluation show that the proposed algorithm can efficiently identify all recent HUPs in large-scale databases, and that the improved algorithm performs best.
Collapse
Affiliation(s)
- Wensheng Gan
- Jinan University, Guangzhou, Guangdong Province, China
| | - Jerry Chun-Wei Lin
- Western Norway University of Applied Sciences, Inndalsveien, Bergen, Norway
| | | | | | - Xuan Wang
- Harbin Institute of Technology (Shenzhen), Shenzhen, China
| | - Philip S. Yu
- University of Illinois at Chicago, Chicago, IL, United States
| |
Collapse
|
28
|
Sohrabi MK. An efficient projection-based method for high utility itemset mining using a novel pruning approach on the utility matrix. Knowl Inf Syst 2020. [DOI: 10.1007/s10115-020-01485-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
29
|
Huynh Trieu V, Le Quoc H, Truong Ngoc C. An efficient algorithm for hiding sensitive-high utility itemsets. INTELL DATA ANAL 2020. [DOI: 10.3233/ida-194697] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Vy Huynh Trieu
- Information Technology Faculty, Pham Van Dong University, Quang Ngai, Vietnam
| | - Hai Le Quoc
- Information Technology Faculty, Quang Tri Teacher Training College, Quang Tri, Vietnam
| | - Chau Truong Ngoc
- Information Technology Faculty, Da Nang University, Da Nang, Vietnam
| |
Collapse
|
30
|
Lin JCW, Pirouz M, Djenouri Y, Cheng CF, Ahmed U. Incrementally updating the high average-utility patterns with pre-large concept. APPL INTELL 2020. [DOI: 10.1007/s10489-020-01743-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Abstract
High-utility itemset mining (HUIM) is considered as an emerging approach to detect the high-utility patterns from databases. Most existing algorithms of HUIM only consider the itemset utility regardless of the length. This limitation raises the utility as a result of a growing itemset size. High average-utility itemset mining (HAUIM) considers the size of the itemset, thus providing a more balanced scale to measure the average-utility for decision-making. Several algorithms were presented to efficiently mine the set of high average-utility itemsets (HAUIs) but most of them focus on handling static databases. In the past, a fast-updated (FUP)-based algorithm was developed to efficiently handle the incremental problem but it still has to re-scan the database when the itemset in the original database is small but there is a high average-utility upper-bound itemset (HAUUBI) in the newly inserted transactions. In this paper, an efficient framework called PRE-HAUIMI for transaction insertion in dynamic databases is developed, which relies on the average-utility-list (AUL) structures. Moreover, we apply the pre-large concept on HAUIM. A pre-large concept is used to speed up the mining performance, which can ensure that if the total utility in the newly inserted transaction is within the safety bound, the small itemsets in the original database could not be the large ones after the database is updated. This, in turn, reduces the recurring database scans and obtains the correct HAUIs. Experiments demonstrate that the PRE-HAUIMI outperforms the state-of-the-art batch mode HAUI-Miner, and the state-of-the-art incremental IHAUPM and FUP-based algorithms in terms of runtime, memory, number of assessed patterns and scalability.
Collapse
|
31
|
Wu JMT, Teng Q, Lin JCW, Yun U, Chen HC. Updating high average-utility itemsets with pre-large concept. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2020. [DOI: 10.3233/jifs-179670] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Jimmy Ming-Tai Wu
- College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, China
| | - Qian Teng
- College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, China
| | - Jerry Chun-Wei Lin
- Department of Computer Science, Electrical Engineering and Mathematical Sciences, Western Norway University of Applied Sciences, Bergen, Norway
| | - Unil Yun
- Department of Computer Engineering, Sejong University, Seoul, Korea
| | - Hsing-Chung Chen
- Department of Computer Science and Information Engineering, Asia University, Taichung, Taiwan
| |
Collapse
|
32
|
Abstract
AbstractThis paper explores five pattern mining problems and proposes a new distributed framework called DT-DPM: Decomposition Transaction for Distributed Pattern Mining. DT-DPM addresses the limitations of the existing pattern mining problems by reducing the enumeration search space. Thus, it derives the relevant patterns by studying the different correlation among the transactions. It first decomposes the set of transactions into several clusters of different sizes, and then explores heterogeneous architectures, including MapReduce, single CPU, and multi CPU, based on the densities of each subset of transactions. To evaluate the DT-DPM framework, extensive experiments were carried out by solving five pattern mining problems (FIM: Frequent Itemset Mining, WIM: Weighted Itemset Mining, UIM: Uncertain Itemset Mining, HUIM: High Utility Itemset Mining, and SPM: Sequential Pattern Mining). Experimental results reveal that by using DT-DPM, the scalability of the pattern mining algorithms was improved on large databases. Results also reveal that DT-DPM outperforms the baseline parallel pattern mining algorithms on big databases.
Collapse
|
33
|
Gan W, Lin JCW, Fournier-Viger P, Chao HC, Yu PS. HUOPM: High-Utility Occupancy Pattern Mining. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:1195-1208. [PMID: 30794524 DOI: 10.1109/tcyb.2019.2896267] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Mining useful patterns from varied types of databases is an important research topic, which has many real-life applications. Most studies have considered the frequency as sole interestingness measure to identify high-quality patterns. However, each object is different in nature. The relative importance of objects is not equal, in terms of criteria, such as the utility, risk, or interest. Besides, another limitation of frequent patterns is that they generally have a low occupancy, that is, they often represent small sets of items in transactions containing many items and, thus, may not be truly representative of these transactions. To extract high-quality patterns in real-life applications, this paper extends the occupancy measure to also assess the utility of patterns in transaction databases. We propose an efficient algorithm named high-utility occupancy pattern mining (HUOPM). It considers user preferences in terms of frequency, utility, and occupancy. A novel frequency-utility tree and two compact data structures, called the utility-occupancy list and frequency-utility table, are designed to provide global and partial downward closure properties for pruning the search space. The proposed method can efficiently discover the complete set of high-quality patterns without candidate generation. Extensive experiments have been conducted on several datasets to evaluate the effectiveness and efficiency of the proposed algorithm. Results show that the derived patterns are intelligible, reasonable, and acceptable, and that HUOPM with its pruning strategies outperforms the state-of-the-art algorithm, in terms of runtime and search space, respectively.
Collapse
|
34
|
|
35
|
Efficient Algorithm for Mining Non-Redundant High-Utility Association Rules. SENSORS 2020; 20:s20041078. [PMID: 32079200 PMCID: PMC7070778 DOI: 10.3390/s20041078] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 02/06/2020] [Accepted: 02/11/2020] [Indexed: 11/17/2022]
Abstract
In business, managers may use the association information among products to define promotion and competitive strategies. The mining of high-utility association rules (HARs) from high-utility itemsets enables users to select their own weights for rules, based either on the utility or confidence values. This approach also provides more information, which can help managers to make better decisions. Some efficient methods for mining HARs have been developed in recent years. However, in some decision-support systems, users only need to mine a smallest set of HARs for efficient use. Therefore, this paper proposes a method for the efficient mining of non-redundant high-utility association rules (NR-HARs). We first build a semi-lattice of mined high-utility itemsets, and then identify closed and generator itemsets within this. Following this, an efficient algorithm is developed for generating rules from the built lattice. This new approach was verified on different types of datasets to demonstrate that it has a faster runtime and does not require more memory than existing methods. The proposed algorithm can be integrated with a variety of applications and would combine well with external systems, such as the Internet of Things (IoT) and distributed computer systems. Many companies have been applying IoT and such computing systems into their business activities, monitoring data or decision-making. The data can be sent into the system continuously through the IoT or any other information system. Selecting an appropriate and fast approach helps management to visualize customer needs as well as make more timely decisions on business strategy.
Collapse
|
36
|
|
37
|
|
38
|
Nguyen LT, Vu VV, Lam MT, Duong TT, Manh LT, Nguyen TT, Vo B, Fujita H. An efficient method for mining high utility closed itemsets. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2019.05.006] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
39
|
Nguyen LT, Nguyen P, Nguyen TD, Vo B, Fournier-Viger P, Tseng VS. Mining high-utility itemsets in dynamic profit databases. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2019.03.022] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
40
|
Singh K, Kumar A, Singh SS, Shakya HK, Biswas B. EHNL: An efficient algorithm for mining high utility itemsets with negative utility value and length constraints. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2019.01.056] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
41
|
|
42
|
Transforming Sensing Data into Smart Data for Smart Sustainable Cities. BIG DATA ANALYTICS 2019. [DOI: 10.1007/978-3-030-37188-3_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
|
43
|
Singh K, Singh SS, Kumar A, Biswas B. High utility itemsets mining with negative utility value: A survey. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2018. [DOI: 10.3233/jifs-18965] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Kuldeep Singh
- Department of Computer Science and Engineering, Indian Institute of Technology (BHU), Varanasi, India
| | - Shashank Sheshar Singh
- Department of Computer Science and Engineering, Indian Institute of Technology (BHU), Varanasi, India
| | - Ajay Kumar
- Department of Computer Science and Engineering, Indian Institute of Technology (BHU), Varanasi, India
| | - Bhaskar Biswas
- Department of Computer Science and Engineering, Indian Institute of Technology (BHU), Varanasi, India
| |
Collapse
|
44
|
Singh K, Singh SS, Kumar A, Biswas B. TKEH: an efficient algorithm for mining top-k high utility itemsets. APPL INTELL 2018. [DOI: 10.1007/s10489-018-1316-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
45
|
Abstract
In large organizations, it is often required to collect data from the different geographic branches spread over different locations. Extensive amounts of data may be gathered at the centralized location in order to generate interesting patterns via mono-mining the amassed database. However, it is feasible to mine the useful patterns at the data source itself and forward only these patterns to the centralized company, rather than the entire original database. These patterns also exist in huge numbers, and different sources calculate different utility values for each pattern. This paper proposes a weighted model for aggregating the high-utility patterns from different data sources. The procedure of pattern selection was also proposed to efficiently extract high-utility patterns in our weighted model by discarding low-utility patterns. Meanwhile, the synthesizing model yielded high-utility patterns, unlike association rule mining, in which frequent itemsets are generated by considering each item with equal utility, which is not true in real life applications such as sales transactions. Extensive experiments performed on the datasets with varied characteristics show that the proposed algorithm will be effective for mining very sparse and sparse databases with a huge number of transactions. Our proposed model also outperforms various state-of-the-art distributed models of mining in terms of running time.
Collapse
|
46
|
Liu J, Zhang X, Fung BC, Li J, Iqbal F. Opportunistic mining of top-n high utility patterns. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2018.02.035] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
47
|
|
48
|
Lee J, Yun U, Lee G. Analyzing of incremental high utility pattern mining based on tree structures. HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES 2017. [DOI: 10.1186/s13673-017-0112-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
AbstractSince the concept of high utility pattern mining was proposed to solve the drawbacks of traditional frequent pattern mining approach that cannot handle various features of real-world applications, many different techniques and algorithms for high utility pattern mining have been developed. Moreover, several advanced methods for incremental data processing have been proposed in recent years as the sizes of recent databases obtained in the real world become larger. In this paper, we introduce the basic concept of incremental high utility pattern mining and analyze various relevant methods. In addition, we also conduct performance evaluation for the methods with famous benchmark datasets in order to determine their detailed characteristics. The evaluation shows that the less candidate patterns make algorithms faster.
Collapse
|