Introduction
Generative adversarial networks (GANs) [2], [4], [5], [9], [12] are well known for being effective at synthesizing samples for various applications such as image generation [5], [12], [26], [28], [29], industrial design [20], speech generation [11], and natural language processing [2], [9]. The objective of GANs is to train a generator model
In practice, however, GANs are well known both for being unstable to train and for the problem of mode colla- pse [1], [19]. Mode collapse is a situation where
Recently, several studies have focused on improving the stability of GANs, mostly by adopting a heuristic approach [11], [12], which is extremely sensitive to the training data and hard to apply to new domains [19]. Unlike previous work, this study tries to approach the unstable training problem of GANs by investigating an adaptive hyper-parameter learning method. Specifically, we propose to dynamically adjust the number of training steps of the
This study is an extension of our previous work [25], which makes further the following improvements.
In this extended work, the proposed
-GAN model dynamically adjusts the ratio (denoted as\text{A}k ) of the training steps ofk andD everyG iterations intead of every one iteration as in our previous work.c is a newly added hyperparameter to improve the generalizaiton of the adaptive model.c We introduced some variants in which the adjustment of
either depends on more sophisticated criterias, such as loss values, or in progressive or immediate manners to control the degree of stabilizing the GAN training. We also investigated blur strategies which are a set of filters applied to the input ofk to examine whether the blurring affects the training balance ofD andD .G We conducted extensive experiments to compare
-GAN models with some state-of-the-art models, using\text{A}k (Inception~Score ) [13] [16] andIS (relative~inverse~Inception~Score ) [24]. Evaluation results demonstrate the effectiveness of the proposed training procedure in terms of convergence rate and image quality.RIS
Background
A. Preliminary
In order to avoid the confusion, the parameters used throughout in this paper are defined as follows.
Definition 1:
The conventional GAN [4] is set to alternate between
Table 1 shows the notations and their descriptions.
B. Generative Adversarial Networks
GANs estimate generative models by means of an adversarial process in which a generator \begin{align*}&\hspace {-0.6pc}\min \limits _{G} \max \limits _{D} V(D,G) = \mathbb {E}_{x \sim p_{data}(x)}[\log {D(x)}] \\& \qquad \qquad \qquad \qquad \qquad\displaystyle {+\, \mathbb {E}_{z \sim p_{z}(z)}[\log ({1-D(G(z))})] } \tag{1}\end{align*}
In practice, we use the same loss function of \begin{align*} L_{D}=&-\frac {1}{m}\sum _{i=1}^{m}[\log {P(x^{(i)};\theta _{D})} \\&+\log (1-P(G(z^{(i)};\theta _{G});\theta _{D}))] \tag{2}\\ L_{G}=&-\tfrac {1}m \sum _{i=1}^{m} \log (P(G(z^{(i)};\theta _{G});\theta _{D}))\tag{3}\end{align*}
C. Challenges and Limitations of GANs
Recently, GANs have gained tremendous attention because of their strength at producing compelling image synthesis, particularly when trained on image collections comprising relatively narrow domains, such as MNIST handwritten digits, as shown in Fig. 1(a). However, for diverse image collections, GANs generally yield less impressive results. For example, samples from models trained on the Anime dataset produce few recognizable objects, as shown in Fig. 1(b).
Effects of the original GAN from data collections with different levels of diversity.
It is a considerable challenge to balance the convergence of the
We estimated and tracked the training curves of MNIST and the diverse dataset, which are shown in Fig. Fig. 2. Specifically, Fig. 2 plots the curves of probabilities of real and synthesized data (
Learning curves for the conventional GAN model with multi-layer perceptrons (MLPs). The learning curves are measured by the probabilities of real and synthesized data (p_real and p_generated, respectively).
D. Related Work
To overcome the problem of mode collapse, several researchers have proposed to penalize the appearance of near-duplicate images during each training batch using heuristic-based approaches [13], [21]. Salimans et al. [13] sought to address this problem by training the discriminator in a semi-supervised fashion, granting the discriminator’s internal representations knowledge of the class structure of the training data. Although this method increases synthesizing quality, it is less appealing as a tool for unsupervised learning. Energy-based GAN [21] has proposed modeling the
E-GAN [26] combines the GAN training with an evolutionary algorithm, which evolves a set of generators to adapt the discriminator and preserves the best generator for further training. PG-GAN [27] achieves high-resolution images by training GAN model in a progressive growing way to increase the resolution. SAGAN [28] imports the self-attention block into deeper GAN architecture to capture the global structure of images. Based on SAGAN, BigGAN [29] also uses different
Consistent with our work, some authors have tried to use an adaptive approach [15], [18] to improve GANs. The AdaGAN [18] addresses the problem of mode collapse by incrementally adding new generators, which are trained using reweighted data at every step. Thus, the data generated by the mixture model can progressively cover all the modes of datasets. However, the generator of the mixture model is a mixture of single generator networks, and thus the latent representation becomes more complex with training. Of particular interest to us is the work of ABC-GAN [15]. Its adaptive idea is also achieved by adjusting
An Adaptive Training Procedure for GANs
To address the problem that
A. Basic Idea
Our main goal is to achieve a convincing performance on distinct datasets by matching the well-trained learning curves. We first describe the manner in which to obtain well-trained learning curves, and then present an algorithm that constrains the difference between the current value of control variables (e.g., posterior probabilities
Because the original GAN object function is defined as a two-player game, determining the convergence of \begin{equation*} \left |{\frac {V_{c}-V_{m}}{V_{m}} }\right | < \alpha ~(or ~\beta)\tag{4}\end{equation*}
We next provide an intuitive explanation of our adaptive methods before proceeding to the implementation details, as illustrated in Fig. 4. First, The
B. Adaptive Control Using Fitted Curves
Note that the criterion learning curves shown in Fig. 5 fluctuate wildly during training. Instead of directly matching the curves, we used an approach to draw regression lines as guiding. In this approach, we employed two exponential functions, Eq. (5) is used to fit the curves of \begin{align*} y=&a*e^{-bx}+d \tag{5}\\ y=&a*-(e^{-bx})+d \tag{6}\end{align*}
Experiments
A. Datasets
We performed our experiments on two datasets: CelebA [8], and the Anime face dataset. Details of each dataset are given as follows. The samples of two datasets are shown in Fig. 6.
1) CelebA
The large-scale celeb faces attributes (CelebA) dataset contains 202,599 facial images of 10,177 celebrities. The images in this dataset are RGB color images containing diverse pose variations and cluttered backgrounds. In our experiments, each image was resized to
2) Anime Faces
We derived anime faces from an image board site for anime wallpapers and cropped the images to contain only faces. These images were sized to
B. Implementation
Our model was coded in Python and all experiments were conducted in the Tensorflow framework. We tested the model with two architectures: GANs with
1) MLP-Based GAN
The first GAN architecture we experimented with consisted of two MLP nets. The parameters of the networks were updated to minimize the loss function, with Adam [6] used as the optimizer function. A 100-dimension vector sampled from a Gaussian distribution is transformed into an image size using a series of fully connected layers with non-linearity activation functions (ReLU and tanh). The architecture of
2) DCGAN-Based GAN
We also performed our adaptive methods using the architecture of DCGAN [12]. The generator networks (Table 5) have five layers and include one linear connection layer and four fractional-strided convolutional layers with a filter size of
Results
A. Adaptive Control on Diverse Datasets Using MLP and DCGAN
In Fig. 7, we show the results of the proposed
B. Performance Measure
The \begin{equation*} IS(G)=exp\left({\frac {1}N \sum _{i=1}^{N} D_{KL}(p(y\mid x)\parallel p(y))}\right) \tag{7}\end{equation*}
Table 7 shows the
Table 8 presents the
To more accurately evaluate the performance of GAN models, we use the \begin{align*} RIS=&1-\frac {IS_{g_{avg}}}{IS_{r_{avg}}} \tag{8}\\ IS_{g_{avg}}=&\frac {1}{n} \sum _{i=1}^{10} IS_{g_{i}} \tag{9}\\ IS_{r_{avg}}=&\frac {1}{n} \sum _{i=1}^{10} IS_{r_{i}} \tag{10}\end{align*}
As shown in Fig. 9, we compare the
Range of the
Conclusion and Future Work
In this paper, a family of adaptive control-based GAN models called
For future work, we plan to investigate a more effective criteria to direct the training, which may encourage the convergence of GANs. We are also going to experiment with more objective functions, such as
ACKNOWLEDGMENT
(Xiaohan Ma and Rize Jin contributed equally to this work.)
Appendix
Appendix
Adaptive Method With Momentum
We next introduce the adaptive methods with momentum based on
1) Adaptive Control Based on P_{g}
or L_{D}
Because the criterion learning curves for
Algorithm 1 Adaptive Control Over k
Values Based on P_{g}
or L_{D}
(With Momentum)
if
if
else if
else
end if
else if
if
else if
else
end if
else
if
else if
else
end if
end if
return
2) Adaptive Control Based on P_{R}
or L_{G}
Different from the aforementioned method, Algorithm 2 is based on
Algorithm 2 Adaptive Control Over k
Values Based on P_{r}
or L_{G}
(With Momentum)
if
if
else if
else
end if
else if
if
else if
else
end if
else
if
else if
else
end if
end if
Adaptive Method Without Momentum
In practical implementations, the progressive adjustment methods improve the stability of models. However, the training is slow. Therefore, we explore an variant which removes the momentum. This means that the
When
Adaptive Control Over k
Values Based on P_{g}
or L_{D}
(Without Momentum)
if
if
else if
else
end if
else
if
else if
else
end if
end if
return
Adaptive Control Over k
Values Based on P_{r}
or L_{G}
(Without Momentum)
if
if
else if
else
end if
else
if
else if
else
end if
end if