Back to Search
Start Over
Abstract 1641: Generation and evaluation of medical synthetic data
- Source :
- Cancer Research. 79:1641-1641
- Publication Year :
- 2019
- Publisher :
- American Association for Cancer Research (AACR), 2019.
-
Abstract
- While machine learning (ML) has shown some promise in medical research, its actual impact has been limited relative to other application domains. One reason for this disparity is the lack of high-quality, patient-level data available to the broader ML research community. Such datasets are often not made available due to protections around patient privacy. To overcome these obstacles, high-quality, synthetic datasets could be leveraged to accelerate methodological developments in the application of ML to biomedical research. Clinical data in the form of electronic health records present a rich data source to be used for synthetic data generation. Such data can be high dimensional and predominantly categorical, which poses multiple challenges from a modeling perspective. In this paper, we evaluate four classes of synthetic data generation techniques, as well as several metrics for evaluating the quality of the synthetic data. While the results and discussions are broadly applicable to medical data, for demonstration purposes we generate synthetic datasets from the publicly available Surveillance Epidemiology and End Results (SEER) program. Specifically, our cohort consists of breast cancer cases diagnosed in the year of 2010, which includes over 26000 individual cases. Finally, we discuss the trade-offs of the different methods and metrics, providing guidance on considerations for the generation and usage of synthetic medical data. Citation Format: Andre R. Goncalves, Priyadip Ray, Braden Soper, Madhumita Myneni, Jennifer L. Stevens, Linda M. Coyle, Ana Paula Sales. Generation and evaluation of medical synthetic data [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 1641.
Details
- ISSN :
- 15387445 and 00085472
- Volume :
- 79
- Database :
- OpenAIRE
- Journal :
- Cancer Research
- Accession number :
- edsair.doi...........57bf6ba1efff1d6be2881f32a525bd34