Back to Search
Start Over
A distributed multiple sample testing for massive data.
- Source :
- Journal of Applied Statistics; Mar2023, Vol. 50 Issue 3, p555-573, 19p, 9 Charts, 3 Graphs
- Publication Year :
- 2023
-
Abstract
- When the data are stored in a distributed manner, direct application of traditional hypothesis testing procedures is often prohibitive due to communication costs and privacy concerns. This paper mainly develops and investigates a distributed two-node Kolmogorov–Smirnov hypothesis testing scheme, implemented by the divide-and-conquer strategy. In addition, this paper also provides a distributed fraud detection and a distribution-based classification for multi-node machines based on the proposed hypothesis testing scheme. The distributed fraud detection is to detect which node stores fraud data in multi-node machines and the distribution-based classification is to determine whether the multi-node distributions differ and classify different distributions. These methods can improve the accuracy of statistical inference in a distributed storage architecture. Furthermore, this paper verifies the feasibility of the proposed methods by simulation and real example studies. [ABSTRACT FROM AUTHOR]
- Subjects :
- FRAUD investigation
STATISTICAL accuracy
FRAUD
Subjects
Details
- Language :
- English
- ISSN :
- 02664763
- Volume :
- 50
- Issue :
- 3
- Database :
- Complementary Index
- Journal :
- Journal of Applied Statistics
- Publication Type :
- Academic Journal
- Accession number :
- 161832219
- Full Text :
- https://doi.org/10.1080/02664763.2021.1911967