Back to Search Start Over

Optimized Stratified Sampling for Approximate Query Processing.

Authors :
Chaudhuri, Surajit
Das, Gautam
Narasayya, Vivek
Source :
ACM Transactions on Database Systems. Jun2007, Vol. 32 Issue 2, p1-50. 50p. 1 Illustration, 3 Diagrams, 3 Charts, 20 Graphs.
Publication Year :
2007

Abstract

The ability to approximately answer aggregation queries accurately and efficiently is of great benefit for decision support and data mining tools. In contrast to previous sampling-based studies, we treat the problem as an optimization problem where, given a workload of queries, we select a stratified random sample of the original data such that the error in answering the workload queries using the sample is minimized. A key novelty of our approach is that we can tailor the choice of samples to be robust, even for workloads that are "similar" but not necessarily identical to the given workload. Finally, our techniques recognize the importance of taking into account the variance in the data distribution in a principled manner. We show how our solution can be implemented on a database system, and present results of extensive experiments on Microsoft SQL Server that demonstrate the superior quality of our method compared to previous work. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
03625915
Volume :
32
Issue :
2
Database :
Academic Search Index
Journal :
ACM Transactions on Database Systems
Publication Type :
Academic Journal
Accession number :
25690148
Full Text :
https://doi.org/10.1145/1242524.1242526