Back to Search Start Over

Mining frequent items from high-dimensional set-valued data under local differential privacy protection.

Authors :
Wu, Haonan
Ran, Ruisheng
Peng, Shunshun
Yang, Mengmeng
Guo, Taolin
Source :
Expert Systems with Applications. Dec2023, Vol. 234, pN.PAG-N.PAG. 1p.
Publication Year :
2023

Abstract

Mining frequent items from high-dimensional historical data (set-valued data) from massive users can extract the most routine data, playing a vital role in data mining. However, frequent item mining requires the collection of a large amount of user data, posing the potential risk of leaking user privacy. Local differential privacy is a mainstream privacy-preserving technique. It has been widely used for user privacy protection in various scenarios as it provides strict privacy protection. Mining frequent items from high-dimensional set-valued data under local differential privacy preservation has recently attracted much attention from researchers. Existing works usually use random sampling to reduce the communication cost of publishing perturbed data but hardly guarantee frequent item mining accuracy. This is because they sample minimal items from each user's high-dimensional set-value data (e.g., LDPMiner samples one item), making it difficult to focus on the scope of frequent item mining. The motivation of this paper to solve the above problem is that the frequent items mined from the data of a portion of users (e.g., half of the users) are similar to those mined from the data of the global users. Therefore, we randomly divide users into two groups and mine the set of candidate frequent items from the first group of users. Then, we focus on the candidate set in the second group of users and mine frequent items from it. Besides, we observe that the larger the sample size of user data, the better the frequent item mining accuracy and, subsequently, the higher the communication cost. Therefore, we randomly group the contents and randomly sample from each group, thus improving the frequent item mining accuracy by publishing more data than existing works. On this basis, we adaptively perturb the sampled group data according to the communication cost to trade off the communication cost and frequent item mining accuracy. Finally, we analyze the privacy and utility of our method theoretically. The experiments with state-of-the-art methods such as FIML, SVIM, and LDPMiner show that our proposed method improves about 15% in accuracy and 10% in utility for mining frequent items in high-dimensional set-valued data. • Protection of user privacy in frequent item mining by local differential privacy. • Grouping users to achieve different mining tasks. • Adaptively perturbs user data according to the content dimension. • Proof theoretically that the mechanism satisfies local differential privacy. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09574174
Volume :
234
Database :
Academic Search Index
Journal :
Expert Systems with Applications
Publication Type :
Academic Journal
Accession number :
172777057
Full Text :
https://doi.org/10.1016/j.eswa.2023.121105