1. 基于并行处理机制的数据复用策略研究.
- Author
-
魏玲 and 郭新朋
- Abstract
Aiming at frequently appear data redundancy and data reusable inefficiency problems, this paper combined the column-storage mechanism with parallel processing to optimize data reuse strategy. It built a parallel processing model based on MapReduce of data reuse, and used the improved pattern matching algorithm CSM combine the data screening algorithm to propose parallel data reuse algorithm. This algorithm used the pattern matching algorithm to determine the correspondence between the attribute columns, and through data detected method verifies the feasibility of reusing data attribute columns, thereby filtered the data columns and realized the parallel data reuse strategy. Under the big data, it used the data tables of large scale data sets SSB and TPCH in data warehouse to experiment. The results of storage and treatment time are decreased by 17% and 35% , and verified parallel data reuse strategy has more optimized than the general strategy in data storage and data processing time. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF