Back to Search Start Over

Abstract 1641: Generation and evaluation of medical synthetic data

Authors :
Madhumita Myneni
Priyadip Ray
André R. Gonçalves
Jennifer L. Stevens
Braden Soper
Ana Paula Sales
Linda Coyle
Source :
Cancer Research. 79:1641-1641
Publication Year :
2019
Publisher :
American Association for Cancer Research (AACR), 2019.

Abstract

While machine learning (ML) has shown some promise in medical research, its actual impact has been limited relative to other application domains. One reason for this disparity is the lack of high-quality, patient-level data available to the broader ML research community. Such datasets are often not made available due to protections around patient privacy. To overcome these obstacles, high-quality, synthetic datasets could be leveraged to accelerate methodological developments in the application of ML to biomedical research. Clinical data in the form of electronic health records present a rich data source to be used for synthetic data generation. Such data can be high dimensional and predominantly categorical, which poses multiple challenges from a modeling perspective. In this paper, we evaluate four classes of synthetic data generation techniques, as well as several metrics for evaluating the quality of the synthetic data. While the results and discussions are broadly applicable to medical data, for demonstration purposes we generate synthetic datasets from the publicly available Surveillance Epidemiology and End Results (SEER) program. Specifically, our cohort consists of breast cancer cases diagnosed in the year of 2010, which includes over 26000 individual cases. Finally, we discuss the trade-offs of the different methods and metrics, providing guidance on considerations for the generation and usage of synthetic medical data. Citation Format: Andre R. Goncalves, Priyadip Ray, Braden Soper, Madhumita Myneni, Jennifer L. Stevens, Linda M. Coyle, Ana Paula Sales. Generation and evaluation of medical synthetic data [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 1641.

Details

ISSN :
15387445 and 00085472
Volume :
79
Database :
OpenAIRE
Journal :
Cancer Research
Accession number :
edsair.doi...........57bf6ba1efff1d6be2881f32a525bd34