Back to Search Start Over

Mimicking clinical trials with synthetic acute myeloid leukemia patients using generative artificial intelligence

Authors :
Jan-Niklas Eckardt
Waldemar Hahn
Christoph Röllig
Sebastian Stasik
Uwe Platzbecker
Carsten Müller-Tidow
Hubert Serve
Claudia D. Baldus
Christoph Schliemann
Kerstin Schäfer-Eckart
Maher Hanoun
Martin Kaufmann
Andreas Burchert
Christian Thiede
Johannes Schetelig
Martin Sedlmayr
Martin Bornhäuser
Markus Wolfien
Jan Moritz Middeke
Source :
npj Digital Medicine, Vol 7, Iss 1, Pp 1-11 (2024)
Publication Year :
2024
Publisher :
Nature Portfolio, 2024.

Abstract

Abstract Clinical research relies on high-quality patient data, however, obtaining big data sets is costly and access to existing data is often hindered by privacy and regulatory concerns. Synthetic data generation holds the promise of effectively bypassing these boundaries allowing for simplified data accessibility and the prospect of synthetic control cohorts. We employed two different methodologies of generative artificial intelligence – CTAB-GAN+ and normalizing flows (NFlow) – to synthesize patient data derived from 1606 patients with acute myeloid leukemia, a heterogeneous hematological malignancy, that were treated within four multicenter clinical trials. Both generative models accurately captured distributions of demographic, laboratory, molecular and cytogenetic variables, as well as patient outcomes yielding high performance scores regarding fidelity and usability of both synthetic cohorts (n = 1606 each). Survival analysis demonstrated close resemblance of survival curves between original and synthetic cohorts. Inter-variable relationships were preserved in univariable outcome analysis enabling explorative analysis in our synthetic data. Additionally, training sample privacy is safeguarded mitigating possible patient re-identification, which we quantified using Hamming distances. We provide not only a proof-of-concept for synthetic data generation in multimodal clinical data for rare diseases, but also full public access to synthetic data sets to foster further research.

Details

Language :
English
ISSN :
23986352
Volume :
7
Issue :
1
Database :
Directory of Open Access Journals
Journal :
npj Digital Medicine
Publication Type :
Academic Journal
Accession number :
edsdoj.41c3e7d761b34403bf733f68f1cca5e0
Document Type :
article
Full Text :
https://doi.org/10.1038/s41746-024-01076-x