Back to Search Start Over

Balanced Mixed-Type Tabular Data Synthesis with Diffusion Models

Authors :
Yang, Zeyu
Guo, Peikun
Zanna, Khadija
Sano, Akane
Publication Year :
2024

Abstract

Diffusion models have emerged as a robust framework for various generative tasks, such as image and audio synthesis, and have also demonstrated a remarkable ability to generate mixed-type tabular data comprising both continuous and discrete variables. However, current approaches to training diffusion models on mixed-type tabular data tend to inherit the imbalanced distributions of features present in the training dataset, which can result in biased sampling. In this research, we introduce a fair diffusion model designed to generate balanced data on sensitive attributes. We present empirical evidence demonstrating that our method effectively mitigates the class imbalance in training data while maintaining the quality of the generated samples. Furthermore, we provide evidence that our approach outperforms existing methods for synthesizing tabular data in terms of performance and fairness.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2404.08254
Document Type :
Working Paper