1. GANerAid: Realistic synthetic patient data for clinical trials
- Author
-
Lucas Krenmayr, Roland Frank, Christina Drobig, Michael Braungart, Jan Seidel, Daniel Schaudt, Reinhold von Schwerin, and Kathrin Stucke-Straub
- Subjects
Synthetic patients ,Machine learning ,Generative Adversarial Network ,Tabular data ,Random noise ,Long short-term memory ,Computer applications to medicine. Medical informatics ,R858-859.7 - Abstract
Human data must be considered one of the most valuable resources of our time, both in research and business contexts. However, particularly in fields that heavily rely on clinical information, such as medicine or pharmacy, not only the collection of patient data is expensive and time consuming, but, due to data protection laws and regulations, the ways of how to use them are strictly limited, deeming reuse or sharing very difficult, if not impossible. One promising solution to overcome these problems are artificially created data points with the same statistical properties as the investigated patient population. In this paper, we propose the GANerAid architecture, utilising a Generative Adversarial Network (GAN) approach to create such synthetic patients from random noise. Unlike other methods, GANerAid is based on long short-term memory (LSTM) layers and is thus able to preserve underlying data properties, such as correlations and variable distributions, leading to more satisfying results, even in small-sized samples, with acceptable training speed. GANerAid is published as an open source library and released as a ready-to-use package for Python 3.
- Published
- 2022
- Full Text
- View/download PDF