Back to Search Start Over

Numerically stable, single-pass, parallel statistics algorithms

Authors :
Janine C. Bennett
Philippe Pierre Pebay
David Thompson
Diana C. Roe
Ray Grout
Source :
CLUSTER
Publication Year :
2009
Publisher :
IEEE, 2009.

Abstract

Statistical analysis is widely used for countless scientific applications in order to analyze and infer meaning from data. A key challenge of any statistical analysis package aimed at large-scale, distributed data is to address the orthogonal issues of parallel scalability and numerical stability. In this paper we derive a series of formulas that allow for single-pass, yet numerically robust, pairwise parallel and incremental updates of both arbitrary-order centered statistical moments and co-moments. Using these formulas, we have built an open source parallel statistics framework that performs principal component analysis (PCA) in addition to computing descriptive, correlative, and multi-correlative statistics. The results of a scalability study demonstrate numerically stable, near-optimal scalability on up to 128 processes and results are presented in which the statistical framework is used to process large-scale turbulent combustion simulation data with 1500 processes.

Details

Database :
OpenAIRE
Journal :
2009 IEEE International Conference on Cluster Computing and Workshops
Accession number :
edsair.doi...........089c2b1f3dc0fccafb3cfb3033ad8671
Full Text :
https://doi.org/10.1109/clustr.2009.5289161