Abstract:
Real data with privacy and confidentiality concerns are not often available or are too expensive to afford in respect of both time and money. In this situation, it is a g...Show MoreMetadata
Abstract:
Real data with privacy and confidentiality concerns are not often available or are too expensive to afford in respect of both time and money. In this situation, it is a good alternative to use synthetic data. The objective of this research is to generate realistic synthetic data so that people can use it freely. We propose a synthetic data generation model based on boundary-seeking generative adversarial networks (BGANs)-designated as medical BGAN or medBGAN and compare its performances with an existing method medical GAN (medGAN). We aim to perform the investigation on several datasets in two different domains: electronic health records (EHRs) in the medical domain and a crime dataset in the City of Los Angeles Police Department. Firstly, we train the models and generate synthetic data by using these trained models. We then analyze and compare the models' performance by applying some statistical methods (dimension-wise average and Kolmogorov-Smirnov test) and two machine learning tasks (association rule mining and prediction). The comprehensive analysis of this study shows that the proposed model is more efficient in generating realistic synthetic data than those generated using medGAN.
Published in: 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)
Date of Conference: 03-05 June 2019
Date Added to IEEE Xplore: 08 August 2019
ISBN Information: