1. 面向大数据的数据库划分 FP-Growth 改进算法.
- Author
-
张乐, 魏昕怡, 徐苏, and 林两位
- Subjects
- *
DATABASES , *BIG data , *PROBLEM solving , *ALGORITHMS , *TREES - Abstract
It presents an improved FP-Growth frequent itemset mining algorithm for big data based on Hadoop framework and MapReduce programming model.Firstly, the transaction database is extracted according to each frequent 1-items to generate corresponding projection databases which are distributed to several node machines.Then each node machine divides the projection database into several smaller sub-databases, and uses the improved algorithm to mine partial frequent itemsets in parallel.Finally, all the frequent itemsets are obtained by merging all the partial frequent itemsets.This algorithm does not need to generate a huge FP tree for transaction database as the traditional FP-Growth algorithm and some other improved algorithm, so effectively solves the problem of FP-Growth algorithm and other improved algorithm failure because of insufficient memory for storage of huge FP tree in a single machine.At the same time, because the size of these sub-databases are close to each other, the load distributed to each node is more balanced, which makes the algorithm more efficient. [ABSTRACT FROM AUTHOR]
- Published
- 2022