Estimation of cost of k–anonymity in the number of dummy records.

Authors :: Ito, Satoshi
Kikuchi, Hiroaki
Source :: Journal of Ambient Intelligence & Humanized Computing; Dec2023, Vol. 14 Issue 12, p15885-15894, 10p
Publication Year :: 2023
Abstract: De-identification is a process to prevent individuals from being identified from original transaction data by processing personal identification information. k-anonymization, which processes data so that at least k users have the same records, is one of the representative methods of de-identification. One of the methods of k-anonymization is adding dummy records into the data to protect users who have unique histories. For this method, the cost for k-anonymization is the difference in the number of records between the original data and the processed data, and it can be calculated only after deciding the parameter k and processing data. However, we want to calculate the cost before processing and find the optimal value of k because processing the big data with various k is very costly. In this paper, we propose a new model of transaction data that gives us a probability distribution and an expected value of values in data under the assumption that all values occur independently with uniform probability. Applying our data model, it is possible to evaluate the cost of k-anonymized data even before processing. [ABSTRACT FROM AUTHOR]