Back to Search Start Over

A Privacy Preserving Algorithm to Release Sparse High-dimensional Histograms

Authors :
Bai Li
Vishesh Karwa
Aleksandra Slavković
Rebecca Carter Steorts
Source :
The Journal of Privacy and Confidentiality, Vol 8, Iss 1 (2018)
Publication Year :
2018
Publisher :
Labor Dynamics Institute, 2018.

Abstract

Differential privacy has emerged as a popular model to provably limit privacy risks associated with a given data release. However releasing high dimensional synthetic data under differential privacy remains a challenging problem. In this paper, we study the problem of releasing synthetic data in the form of a high dimensional histogram under the constraint of differential privacy. We develop an $(\epsilon, \delta)$-differentially private categorical data synthesizer called \emph{Stability Based Hashed Gibbs Sampler} (SBHG). SBHG works by combining a stability based sparse histogram estimation algorithm with Gibbs sampling and feature selection to approximate the empirical joint distribution of a discrete dataset. SBHG offers a competitive alternative to state-of-the art synthetic data generators while preserving the sparsity structure of the original dataset, which leads to improved statistical utility as illustrated on simulated data. Finally, to study the utility of the resulting synthetic data sets generated by SBHG, we also perform logistic regression using the synthetic datasets and compare the classification accuracy with those from using the original dataset.

Details

Language :
English
ISSN :
25758527
Volume :
8
Issue :
1
Database :
Directory of Open Access Journals
Journal :
The Journal of Privacy and Confidentiality
Publication Type :
Academic Journal
Accession number :
edsdoj.8b1c31f5e1994d14a0119e8634fd7acb
Document Type :
article
Full Text :
https://doi.org/10.29012/jpc.657