1. Comprehensive Review of Privacy, Utility, and Fairness Offered by Synthetic Data
- Author
-
A. Kiran, P. Rubini, and S. Saravana Kumar
- Subjects
Artificial intelligence ,machine learning ,synthetic data ,statistical disclosure control ,differential privacy ,privacy enhancing technology ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Automation is the core transformation strategy that every industry wants to get on its roadmap today. Artificial Intelligence (AI) and Machine Learning (ML) are the key components of automation. It is increasingly used in both data analysis and building predictive models from the data. Growing privacy concerns, data confidentiality, and disclosure risks have posed a challenge to the accessibility of right and meaningful data. Several privacy-preserving and disclosure-limiting techniques have come up through research. One such disclosure limiting technique is Synthetic Data. Early research efforts have shown that synthetic data is an effective substitute for real data which can be effectively used to train AI and ML models. However, this needs a comprehensive evaluation before the data user can be confident enough that it is indeed a good substitute for real data. In this paper, we look at three main parameters of synthetic data which should provide a holistic assessment of the quality of synthetic data. First and foremost, how well synthetic data can preserve privacy and control disclosure, second is how good is its utility, and third, are they able to give fair results without any bias when used in machine learning. We review the existing literature to understand various disclosure control limiting methods, synthetic data generators, and then the validation methodologies and evaluation techniques. We understand how data privacy, utility and the fairness of synthetic data intervene with each other and identify the areas for future work.
- Published
- 2025
- Full Text
- View/download PDF