Start Over

A pure array structure and parallel strategy for high-utility sequential pattern mining.

Authors :: Le, Bac
Huynh, Ut
Dinh, Duy-Tai
Source :: Expert Systems with Applications. Aug2018, Vol. 104, p107-120. 14p.
Publication Year :: 2018
Abstract: High-utility sequential pattern mining (HUSPM) is the task of discovering all sequential patterns in a sequence database whose utility values are equal to or greater than a given minimum utility threshold. HUSPM has become increasingly important in many real-world data mining applications, such as market basket data analysis, weblog mining, and bio-medical gene data analysis, which considers co-occurrence values and quantity, utility (e.g., profit or cost) and time. Current approaches in the literature for HUSPM use the utility matrix to store a sequence database in the memory. Unfortunately, the utility matrix consumes a large amount of main memory. To address this issue, we introduce a pure array structure that reduces the memory consumption when compared to the utility matrix. In addition, HUSPM is also challenged with the downward closure property (DCP) to prune the search space. Recently, HUSPM algorithms have used the upper bound of utility values as the DCP. However, it is usually higher than the actual utility of patterns. Thus, these algorithms may generate many candidate patterns. The large search space leads to poor performance due to excessive runtime and memory usage. One of the reasons is the number of candidate patterns is proportional to the number of requisite projected database scans for calculating their actual utilities. In this paper, we present a novel pruning strategy that efficiently prunes non-HUSPs and significantly reduces the search space compared to the state-of-the-art HUS-Span algorithm. Moreover, we propose a parallel strategy to speed up the mining process. Then, we propose two algorithms which are the pure Array structure for High-utility Sequential (AHUS) pattern mining and AHUS parallel mining (AHUS-P). The AHUS-P algorithm uses shared memory to parallelize the mining process. It concurrently identifies HUSPs based on the advantages of the multi-core processor architecture. The experimental results show that AHUS and AHUS-P can efficiently and effectively discover all HUSPs. Both the proposed algorithms outperform the state-of-the-art HUS-Span algorithm in terms of runtime, memory usage, and scalability for all experimental datasets. [ABSTRACT FROM AUTHOR]

Subjects :: *DATA mining
*MATHEMATICAL sequences
*DATA analysis
*MATHEMATICAL bounds
*COMPUTER algorithms
*PATTERN recognition systems

Details

Language :: English
ISSN :: 09574174
Volume :: 104
Database :: Academic Search Index
Journal :: Expert Systems with Applications
Publication Type :: Academic Journal
Accession number :: 128984262
Full Text :: https://doi.org/10.1016/j.eswa.2018.03.019

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

A pure array structure and parallel strategy for high-utility sequential pattern mining.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

A pure array structure and parallel strategy for high-utility sequential pattern mining.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources