Back to Search Start Over

A Histogram Publishing Method under Differential Privacy That Involves Balancing Small-Bin Availability First

Authors :
Jianzhang Chen
Shuo Zhou
Jie Qiu
Yixin Xu
Bozhe Zeng
Wanchuan Fang
Xiangying Chen
Yipeng Huang
Zhengquan Xu
Youqin Chen
Source :
Algorithms, Vol 17, Iss 7, p 293 (2024)
Publication Year :
2024
Publisher :
MDPI AG, 2024.

Abstract

Differential privacy, a cornerstone of privacy-preserving techniques, plays an indispensable role in ensuring the secure handling and sharing of sensitive data analysis across domains such as in census, healthcare, and social networks. Histograms, serving as a visually compelling tool for presenting analytical outcomes, are widely employed in these sectors. Currently, numerous algorithms for publishing histograms under differential privacy have been developed, striving to balance privacy protection with the provision of useful data. Nonetheless, the pivotal challenge concerning the effective enhancement of precision for small bins (those intervals that are narrowly defined or contain a relatively small number of data points) within histograms has yet to receive adequate attention and in-depth investigation from experts. In standard DP histogram publishing, adding noise without regard for bin size can result in small data bins being disproportionately influenced by noise, potentially severely impairing the overall accuracy of the histogram. In response to this challenge, this paper introduces the SReB_GCA sanitization algorithm designed to enhance the accuracy of small bins in DP histograms. The SReB_GCA approach involves sorting the bins from smallest to largest and applying a greedy grouping strategy, with a predefined lower bound on the mean relative error required for a bin to be included in a group. Our theoretical analysis reveals that sorting bins in ascending order prior to grouping effectively prioritizes the accuracy of smaller bins. SReB_GCA ensures strict ϵ-DP compliance and strikes a careful balance between reconstruction error and noise error, thereby not only initially improving the accuracy of small bins but also approximately optimizing the mean relative error of the entire histogram. To validate the efficiency of our proposed SReB_GCA method, we conducted extensive experiments using four diverse datasets, including two real-life datasets and two synthetic ones. The experimental results, quantified by the Kullback–Leibler Divergence (KLD), show that the SReB_GCA algorithm achieves substantial performance enhancement compared to the baseline method (DP_BASE) and several other established approaches for differential privacy histogram publication.

Details

Language :
English
ISSN :
19994893
Volume :
17
Issue :
7
Database :
Directory of Open Access Journals
Journal :
Algorithms
Publication Type :
Academic Journal
Accession number :
edsdoj.590265189b6f4a2e85088c60a04d1a5d
Document Type :
article
Full Text :
https://doi.org/10.3390/a17070293