Back to Search Start Over

Uncertainty Quantification of Data Shapley via Statistical Inference

Authors :
Wu, Mengmeng
Liu, Zhihong
Li, Xiang
Jia, Ruoxi
Chang, Xiangyu
Publication Year :
2024

Abstract

As data plays an increasingly pivotal role in decision-making, the emergence of data markets underscores the growing importance of data valuation. Within the machine learning landscape, Data Shapley stands out as a widely embraced method for data valuation. However, a limitation of Data Shapley is its assumption of a fixed dataset, contrasting with the dynamic nature of real-world applications where data constantly evolves and expands. This paper establishes the relationship between Data Shapley and infinite-order U-statistics and addresses this limitation by quantifying the uncertainty of Data Shapley with changes in data distribution from the perspective of U-statistics. We make statistical inferences on data valuation to obtain confidence intervals for the estimations. We construct two different algorithms to estimate this uncertainty and provide recommendations for their applicable situations. We also conduct a series of experiments on various datasets to verify asymptotic normality and propose a practical trading scenario enabled by this method.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2407.19373
Document Type :
Working Paper