HPPQ: A Parallel Package Queries Processing Approach for Large-Scale Data

Authors :: Meihui Shi
Derong Shen
Tiezheng Nie
Yue Kou
Ge Yu
Source :: Big Data Mining and Analytics, Vol 1, Iss 2, Pp 146-159 (2018)
Publication Year :: 2018
Publisher :: Tsinghua University Press, 2018.
Abstract: A lot of scholars have focused on developing effective techniques for package queries, and a lot of excellent approaches have been proposed. Unfortunately, most of the existing methods focus on a small volume of data. The rapid increase in data volume means that traditional methods of package queries find it difficult to meet the increasing requirements. To solve this problem, a novel optimization method of package queries (HPPQ) is proposed in this paper. First, the data is preprocessed into regions. Data preprocessing segments the dataset into multiple subsets and the centroid of the subsets is used for package queries, this effectively reduces the volume of candidate results. Furthermore, an efficient heuristic algorithm is proposed (namely IPOL-HS) based on the preprocessing results. This improves the quality of the candidate results in the iterative stage and improves the convergence rate of the heuristic algorithm. Finally, a strategy called HPR is proposed, which relies on a greedy algorithm and parallel processing to accelerate the rate of query. The experimental results show that our method can significantly reduce time consumption compared with existing methods.