Back to Search Start Over

A Spark-based high utility itemset mining with multiple external utilities

Authors :
Dharavath Ramesh
Krishan Kumar Sethi
Munesh Chandra Trivedi
Source :
Cluster Computing. 25:889-909
Publication Year :
2021
Publisher :
Springer Science and Business Media LLC, 2021.

Abstract

High utility itemset (HUI) mining is a powerful data mining technique to discover profitable patterns. The utility of an item is computed by using two measures named quantity and per-unit profit. All existing HUI mining algorithms consider a single value of external utility (per unit profit) for the entire database. However, the per-unit profit of items might fluctuate over time in many applications. This research introduces three novel strategies to comprise the external utilities of items as input for the HUI mining algorithm. Traditional HUI mining algorithms have been developed for the standalone system and do not fit for big data processing due to the limited computing resources (CPU, memory). Big data are efficiently processed on distributed frameworks like Apache Hadoop, Spark, etc. This paper introduces a distributed HUI mining algorithm named Spark-based Top-k high utility itemset (k-SHUI) miner. We also propose a fair load distribution strategy to divide the search space equally among the cluster nodes. The k-SHUI produces top-k HUIs without the requirement of the minimum utility threshold. We conducted extensive experiments on six real-life datasets to compare the proposed algorithm's performance with the existing algorithm. The experimental results demonstrate that the proposed algorithm outperforms the existing algorithms.

Details

ISSN :
15737543 and 13867857
Volume :
25
Database :
OpenAIRE
Journal :
Cluster Computing
Accession number :
edsair.doi...........ff6f80e7cb54da945940a40630244e5a