1 Introduction
Modeling complex distributions, such as the distribution of natural images, has been a longstanding difficult problem and recently been tackled by using deep networks. Such deep networks include variational auto-encoder [1], normalizing flow [2], [3], and auto-regressive model [4]. Different from these networks that focus on maximizing the likelihood of training samples, generative adversarial network (GAN) [5] provides indirect modeling of the distribution of interest by training a generator. Indeed, GAN formulates the training of generator as an adversarial game by introducing an auxiliary network known as discriminator. The discriminator is trained to identify generated samples from real samples, while the generator is trained to fool the discriminator so that it cannot correctly distinguish. In theory, the objective of GAN is to minimize the difference between the distribution P_\theta of generated samples and the distribution P_{r} of real samples, by adjusting the parameters \theta of the generator g_\theta. The difference between the two distributions P_\theta and P_{r} may be evaluated by the Jensen-Shannon divergence like in the first GAN (known as JSGAN for clarity) [5], the general f-divergence like in the f-GAN [6], and the Wasserstein-1 distance like in the Wasserstein GAN (WGAN) [7].