Back to Search Start Over

A distributed multiple sample testing for massive data.

Authors :
Xiaoyue, Xie
Shi, Jian
Song, Kai
Source :
Journal of Applied Statistics; Mar2023, Vol. 50 Issue 3, p555-573, 19p, 9 Charts, 3 Graphs
Publication Year :
2023

Abstract

When the data are stored in a distributed manner, direct application of traditional hypothesis testing procedures is often prohibitive due to communication costs and privacy concerns. This paper mainly develops and investigates a distributed two-node Kolmogorov–Smirnov hypothesis testing scheme, implemented by the divide-and-conquer strategy. In addition, this paper also provides a distributed fraud detection and a distribution-based classification for multi-node machines based on the proposed hypothesis testing scheme. The distributed fraud detection is to detect which node stores fraud data in multi-node machines and the distribution-based classification is to determine whether the multi-node distributions differ and classify different distributions. These methods can improve the accuracy of statistical inference in a distributed storage architecture. Furthermore, this paper verifies the feasibility of the proposed methods by simulation and real example studies. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
02664763
Volume :
50
Issue :
3
Database :
Complementary Index
Journal :
Journal of Applied Statistics
Publication Type :
Academic Journal
Accession number :
161832219
Full Text :
https://doi.org/10.1080/02664763.2021.1911967