1. The Hilbert PDC-tree
- Author
-
Andrew Rau-Chaplin, Neil Burke, David Robillard, and Frank Dehne
- Subjects
020203 distributed computing ,Computer science ,Search engine indexing ,Structure (category theory) ,0102 computer and information sciences ,02 engineering and technology ,Data structure ,computer.software_genre ,01 natural sciences ,Tree (data structure) ,010201 computation theory & mathematics ,Component (UML) ,0202 electrical engineering, electronic engineering, information engineering ,Overhead (computing) ,Data ingestion ,Data mining ,Minimum bounding rectangle ,Algorithm ,computer - Abstract
Fast aggregation of data with many dimensions is a key component of many applications. The R-tree is the traditional data structure for indexing multi-dimensional data, but even the best R-tree variants suffer from performance degradation as the number of dimensions increases. The DC-tree addressed this issue by replacing Minimum Bounding Rectangle (MBR) keys with Minimum Describing Subsets (MDSs), which are less susceptible to overlap. This technique dramatically improves query performance with many dimensions, but at the cost of reduced insertion performance. Like most R-tree variants, this insertion overhead comes from expensive geometric comparisons while selecting the best child for insertion, or splitting over-full nodes. DC-trees, including the parallel PDC-tree, suffer even more from this overhead since MDSs are typically much more expensive to compare and manipulate than MBRs. This paper introduces the Hilbert PDC-tree, a parallel index structure for many-dimensional data that supports high-velocity data ingestion. This is achieved by avoiding geometric comparisons during insertion by instead inserting records based on the Hilbert index of their keys. This approach is similar to that of the Hilbert R-tree, but with special considerations for efficiently supporting many hierarchical dimensions. Additionally, a new node splitting algorithm significantly reduces overlap and improves query performance. Experiments show that the Hilbert PDC-tree scales well to a high number of dimensions, while supporting a much higher rate of ingestion and better query performance than the PDC-tree.
- Published
- 2016
- Full Text
- View/download PDF