Start Over

Algorithms for Efficient Reproducible Floating Point Summation.

Authors :: AHRENS, WILLOW
DEMMEL, JAMES
HONG DIEP NGUYEN
Source :: ACM Transactions on Mathematical Software. Jul2020, Vol. 46 Issue 3, p1-49. 49p.
Publication Year :: 2020
Abstract: We define “reproducibility” as getting bitwise identical results from multiple runs of the same program, perhaps with different hardware resources or other changes that should not affect the answer. Many users depend on reproducibility for debugging or correctness. However, dynamic scheduling of parallel computing resources, combined with nonassociative floating point addition, makes reproducibility challenging even for summation, or operations like the BLAS. We describe a “reproducible accumulator” data structure (the “binned number”) and associated algorithms to reproducibly sum binary floating point numbers, independent of summation order. We use a subset of the IEEE Floating Point Standard 754-2008 and bitwise operations on the standard representations in memory. Our approach requires only one read-only pass over the data, and one reduction in parallel, using a 6-word reproducible accumulator (more words can be used for higher accuracy), enabling standard tiling optimization techniques. Summing n words with a 6-word reproducible accumulator requires approximately 9 n floating point operations (arithmetic, comparison, and absolute value) and approximately 3 n bitwise operations. The final error bound with a 6-word reproducible accumulator and our default settings can be up to 229 times smaller than the error bound for conventional (recursive) summation on ill-conditioned double-precision inputs. [ABSTRACT FROM AUTHOR]