Back to Search
Start Over
A Spark-based high utility itemset mining with multiple external utilities
- Source :
- Cluster Computing. 25:889-909
- Publication Year :
- 2021
- Publisher :
- Springer Science and Business Media LLC, 2021.
-
Abstract
- High utility itemset (HUI) mining is a powerful data mining technique to discover profitable patterns. The utility of an item is computed by using two measures named quantity and per-unit profit. All existing HUI mining algorithms consider a single value of external utility (per unit profit) for the entire database. However, the per-unit profit of items might fluctuate over time in many applications. This research introduces three novel strategies to comprise the external utilities of items as input for the HUI mining algorithm. Traditional HUI mining algorithms have been developed for the standalone system and do not fit for big data processing due to the limited computing resources (CPU, memory). Big data are efficiently processed on distributed frameworks like Apache Hadoop, Spark, etc. This paper introduces a distributed HUI mining algorithm named Spark-based Top-k high utility itemset (k-SHUI) miner. We also propose a fair load distribution strategy to divide the search space equally among the cluster nodes. The k-SHUI produces top-k HUIs without the requirement of the minimum utility threshold. We conducted extensive experiments on six real-life datasets to compare the proposed algorithm's performance with the existing algorithm. The experimental results demonstrate that the proposed algorithm outperforms the existing algorithms.
- Subjects :
- Big data processing
Profit (accounting)
Computer Networks and Communications
Computer science
business.industry
Big data
Load distribution
Space (commercial competition)
computer.software_genre
Data mining algorithm
Spark (mathematics)
Data mining
business
computer
Computer communication networks
Software
Subjects
Details
- ISSN :
- 15737543 and 13867857
- Volume :
- 25
- Database :
- OpenAIRE
- Journal :
- Cluster Computing
- Accession number :
- edsair.doi...........ff6f80e7cb54da945940a40630244e5a