1. A Flexible Semi-Synthetic Data Generator for Risky Drinking Behavior
- Author
-
Chi Hao Liow, Youngwoo Choi, Jiwon Yeom, Young Yim Doh, and Seungbum Hong
- Abstract
Machine intelligence has garnered immense attention owing to its ability to discover hidden patterns in abstract and high-dimensional datasets. However, its success is often limited by the fundamental bottleneck of data scarcity. In this work, we offer a universal data augmentation solution to resolve this impasse. We first discovered the hidden knowledge within the existing scarce dataset using the machine learning (ML) technique and then synthetically augmented the dataset according to its feature importance. In principle, scarce and augmented datasets should share a common statistical property. Using this property, we specifically study the scarce dataset representing the binge-drinking behavior of university students and show that our method is effective in augmenting a limited dataset with high fidelity. The current work challenges the status quo in data scarcity with rule-less-based ML, which removes the ostensible barrier that prevents the application of data-driven techniques to the data scarce clinical research.
- Published
- 2022