1. Towards efficiently mining closed high utility itemsets from incremental databases
- Author
-
Kjetil Nørvåg, Quang-Huy Duong, Thu-Lan Dam, and Heri Ramampiaro
- Subjects
Structure (mathematical logic) ,Information Systems and Management ,Closed set ,Database ,Computer science ,Hash function ,02 engineering and technology ,Construct (python library) ,computer.software_genre ,Management Information Systems ,Set (abstract data type) ,Artificial Intelligence ,020204 information systems ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,Batch processing ,020201 artificial intelligence & image processing ,Pruning (decision trees) ,computer ,Software - Abstract
The set of closed high-utility itemsets (CHUIs) concisely represents the exact utility of all itemsets. Yet, it can be several orders of magnitude smaller than the set of all high-utility itemsets. Existing CHUI mining algorithms assume that databases are static, making them very expensive in the case of incremental data, since the whole dataset has to be processed for each batch of new transactions. To address this challenge, this paper presents the first approach, called IncCHUI, that mines CHUIs efficiently from incremental databases. In order to achieve this, we propose an incremental utility-list structure, which is built and updated with only one database scan. Further, we apply effective pruning strategies to fast construct incremental utility-lists and eliminate candidates that are not updated. Finally, we suggest an efficient hash-based approach to update or insert new closed sets that are found. Our extensive experimental evaluation on both real-life and synthetic databases shows the efficiency, as well as the feasibility of our approach. It significantly outperforms previously proposed methods that are mainly run in batch mode in terms of speed, and it is scalable with respect to the number of transactions.
- Published
- 2019
- Full Text
- View/download PDF