Back to Search Start Over

Approximation and streaming algorithms for histogram construction problems

Authors :
Guha, Sudipto
Koudas, Nick
Shim, Kyuseok
Source :
ACM Transactions on Database Systems. March, 2006, Vol. 31 Issue 1, p396, 43 p.
Publication Year :
2006

Abstract

Histograms and related synopsis structures are popular techniques for approximating data distributions. These have been successful in query optimization and a variety of applications, including approximate querying, similarity searching, and data mining, to name a few. Histograms were a few of the earliest synopsis structures proposed and continue to be used widely. The histogram construction problem is to construct the best histogram restricted to a space bound that reflects the data distribution most accurately under a given error measure. The histograms are used as quick and easy estimates. Thus, a slight loss of accuracy, compared to the optimal histogram under the given error measure, can be offset by fast histogram construction algorithms. A natural question arises in this context: Can we find a fast near optimal approximation algorithm for the histogram construction problem? In this article, we give the first linear time (1 + [member of])-factor approximation algorithms (for any [member of] > 0) for a large number of histogram construction problems including the use of piecewise small degree polynomials to approximate data, workloads, etc. Several of our algorithms extend to data streams. Using synthetic and real-life data sets, we demonstrate that in many scenarios the approximate histograms are almost identical to optimal histograms in quality and are significantly faster to construct. Categories and Subject Descriptors: F.2 [Theory of Computation]: Analysis of Algorithms; G.1.2 [Numerical Analysis I: Approximation; H.2 [Information Systems I: Database Management General Terms: Algorithms, Performance, Theory Additional Key Words and Phrases: Data Streams, histograms, approximation algorithm

Details

Language :
English
ISSN :
03625915
Volume :
31
Issue :
1
Database :
Gale General OneFile
Journal :
ACM Transactions on Database Systems
Publication Type :
Academic Journal
Accession number :
edsgcl.146838498