Back to Search
Start Over
A Privacy Preserving Algorithm to Release Sparse High-dimensional Histograms
- Source :
- The Journal of Privacy and Confidentiality, Vol 8, Iss 1 (2018)
- Publication Year :
- 2018
- Publisher :
- Labor Dynamics Institute, 2018.
-
Abstract
- Differential privacy has emerged as a popular model to provably limit privacy risks associated with a given data release. However releasing high dimensional synthetic data under differential privacy remains a challenging problem. In this paper, we study the problem of releasing synthetic data in the form of a high dimensional histogram under the constraint of differential privacy. We develop an $(\epsilon, \delta)$-differentially private categorical data synthesizer called \emph{Stability Based Hashed Gibbs Sampler} (SBHG). SBHG works by combining a stability based sparse histogram estimation algorithm with Gibbs sampling and feature selection to approximate the empirical joint distribution of a discrete dataset. SBHG offers a competitive alternative to state-of-the art synthetic data generators while preserving the sparsity structure of the original dataset, which leads to improved statistical utility as illustrated on simulated data. Finally, to study the utility of the resulting synthetic data sets generated by SBHG, we also perform logistic regression using the synthetic datasets and compare the classification accuracy with those from using the original dataset.
Details
- Language :
- English
- ISSN :
- 25758527
- Volume :
- 8
- Issue :
- 1
- Database :
- Directory of Open Access Journals
- Journal :
- The Journal of Privacy and Confidentiality
- Publication Type :
- Academic Journal
- Accession number :
- edsdoj.8b1c31f5e1994d14a0119e8634fd7acb
- Document Type :
- article
- Full Text :
- https://doi.org/10.29012/jpc.657