Journals & Magazines >IEEE Journal of Selected Topi... >Volume: 14

ECGAN: An Improved Conditional Generative Adversarial Network With Edge Detection to Augment Limited Training Data for the Classification of Remote Sensing Images With High Spatial Resolution

Abstract:

The classification of remote sensing images with high spatial resolution requires considerable training samples, but the process of sample making is slow and laborious. H...Show More

Topic: Improving Quantity and Quality of Labeled Samples for Remote Sensing Data Analysis

Metadata

Abstract:

The classification of remote sensing images with high spatial resolution requires considerable training samples, but the process of sample making is slow and laborious. How to guarantee the accuracy of supervised classification under the condition of limited samples is an urgent problem to be solved in the field of supervised classification. For addressing this problem, we propose an improved conditional generative adversarial network with edge feature (ECGAN) to augment limited training data for the classification of remote sensing images with high spatial resolution in this article. On the basis of conditional generative adversarial network, feature factors of interclass boundaries and intraclass edges are added to networks, and an objective function with multiscale and multilevel features is constructed. The ISPRS potsdam and Vaihingen remote sensing datasets are regarded as examples. Results indicate that the high-resolution remote sensing images generated by using the network proposed in this article have abundant texture, accurate edges, and are highly similar to real images. The generated images are used to augment training samples, and an experiment for classifying high-resolution remote sensing images is conducted. The classification results of the proposed augmentation method perform better than that of the traditional sample augmentation method. We prove that ECGAN as a means of sample augmentation can effectively solves the problem that the classification effect is unideal when the supervised classification sample is insufficient.

Topic: Improving Quantity and Quality of Labeled Samples for Remote Sensing Data Analysis

Published in: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing ( Volume: 14)

Page(s): 1311 - 1325

Date of Publication: 26 October 2020

ISSN Information:

DOI: 10.1109/JSTARS.2020.3033529

Funding Agency:

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

The remote sensing technology for earth observation is widely used in land census, environmental protection, and disaster monitoring. The classification of remote sensing images is the basis and an important technical support in remote sensing applications. However, with the continuous improvement in the spatial resolution of remote sensing images, the phenomena of “different objects with the same spectrum” and “the same object with different spectra” are more serious and the details of object features are more complicated than before. These conditions increase the requirements for the classification of remote sensing images with high spatial resolution. Meanwhile, the effect of supervised classification is closely related to the quality and quantity of samples, but the production of samples is time consuming and laborious. This situation poses a challenge to the classification of remote sensing images with high spatial resolution under limited samples.

Generative adversarial networks (GAN) [1], as excellent frameworks, have attracted widespread attention from researchers in recent years. GAN is a type of deep learning models that can generate the expected value through a game between two neural networks. It is inspired by a two-player, zero-sum game, that is, the sum of the interests of the two players is zero or a constant; one party gains, whereas the other loses. The two parties in GAN are the generation model (G) and the discriminant model (D). Random noise Z, such as Gaussian distribution, is input in G model, and “false” data of the same size as the training image are output. In the discriminant model D, the similarity between the “false” data generated by G and the real data is calculated to feed back to the generation model G, where D is a binary classifier. The generation and discriminant models compete with each other in the training process; hence, the “false data” generated using the generation model are sufficient to deceive the discriminant model to output high probability. GAN shows strong capabilities in the field of image generation. On the basis of GAN model, scholars have proposed various structures for different image applications to improve the quality of generated images. The main variants include improving the stability of GAN training and the quality of generated results through the fusion of multiple network structures [2]; modifying the loss function to alleviate the problems of unstable GAN training and poor diversity of generated image quality [3], [4]; adding a self-encoder as classifier, and matching the loss distribution of the self-encoder on the basis of the Wasserstein distance, in which the generator and discriminator are balanced to improve the quality and diversity of generated images [5]–[7]; building a perceptual loss function containing antagonistic loss and loss of content, a realistic image with a magnification of four times is generated for superresolution image generation [8], [9]. All of the aforementioned variants generate images by unsupervised learning, they exhibit randomness for generated image category, target color, and texture. On the contrary, conditional GAN (CGAN) can generate images of designated category by adding condition variables to the generator and discriminator [10]. Many scholars have used these networks to achieve improved applications in different fields [11]–[17]. Nevertheless, most of these applications are based on nonpixel-level image generation. Isola et al. proposed an end-to-end CGAN based on label conditions named as pix2pix, which can generate multiclass pixel-level images [18]. The images generated using pix2pix, however, are inadequate in texture features and insufficiently accurate in boundary features. The information of texture, spectrum, and edge is important in the application of remote sensing images with high spatial resolution. Pix2pix has defects in generating remote sensing images with high spatial resolution and cannot meet the needs for corresponding applications. Centimeter-level spatial resolution remote sensing images have extremely complex spatial, textural, and spectral characteristics. For this type of data classification, the robustness of the network is mainly improved by learning a large number of training samples. The number of training samples is an important factor restricting the accuracy of the classification of remote sensing images with high spatial resolution. The collection and production of samples require excessive manpower and material resources.

Therefore, in view of the abovementioned two problems, we proposes a CGAN based on boundary and edge features for generating remote sensing images with high spatial resolution (ECGAN) in this article. And it is used for data enhancement and augmentation to solve the problem of insufficient training samples. The key contributions of the article are as follows.

An end-to-end ECGAN, which adds interclass boundary and intraclass edge feature factors into condition variables to improve the accuracy of the texture and edge information of generated images, is designed. The boundary feature focuses on improving the boundary accuracy of the generated images and reducing the “false” edges that do not exist. The edge feature focuses on improving the accuracy of the internal texture of the generated images.
An objective function combining cross-entropy loss and multilevel feature loss with L1 loss is designed to train the discriminator and generator. The loss function of multilevel features with L1 loss can minimize the differences between the features of real and generated images at different spatial scales, thereby improving the details features of the generated images.
A new augmentation method of remote sensing images based on ECGAN is proposed. We classify the high spatial resolution remote sensing images with semantic segmentation method under different data augmentation methods. Compared to the traditional augmentation methods, the generated images by ECGAN has more diverse spatial information and spectral information. The proposed method can improve training stability and classification accuracy of semantic segmentation network.

The rest of this article is organized into four sections. Section II introduces the proposed methods and network architecture. Results and discussion are presented in Section III, which includes the data set we used to validate the network architecture and model parameters. We draw the conclusion in Section IV.

SECTION II.

Methodology

A. Condition Variables With Boundary and Edge Features

In this article, boundary feature refers to the rough features between the categories (heterogeneous) and edge feature refers to the detail features within the categories (homogeneous). The texture and spatial information of remote sensing images with high spatial resolution, especially aerial remote sensing images, are extremely complex, causing the generation of such images to be challenging. Therefore, in this article, boundary feature factors are added to enhance the distinction among different objects, and edge feature factors are added to enhance the details and texture inside the objects.

A boundary feature map is obtained using the difference between bounded and unbounded label data, and an edge feature map is obtained using the edge detection algorithm (see Fig. 1). Edge detection operators comprise three types: Laplace [19], Sobel [20], and Canny [21] operators. Among them, the Laplace operator is more sensitive to noise and is thus rarely used to detect edges; the Sobel operator is better for image processing with gray gradient and more noise, but it is inadequately accurate for edge location; the Canny operator is insusceptible to noise and can detect real weak edges. The latter uses two different thresholds to detect strong and weak edges. Therefore, in this article, we extract edge features by using the Canny detection operator.

Fig. 1.

(a) High-resolution remote sensing image. (b) Boundary feature map, which is obtained bounded and unbounded label data. (c) Edge feature map, which is obtained by Canny.

MIT Libraries

MIT Libraries

ECGAN: An Improved Conditional Generative Adversarial Network With Edge Detection to Augment Limited Training Data for the Classification of Remote Sensing Images With High Spatial Resolution

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Introduction

Methodology

A. Condition Variables With Boundary and Edge Features

B. Loss Function Considering Multiscale and Multilevel Features

C. Network Architecture

1) Generator Network

2) Discriminator Network

3) Flowchart

Results and Discussion

A. Experimental Data

B. Evaluation Method and Indicator

1) Evaluation Method

2) Evaluation Indicator

C. Behavior and Analysis of Condition Variables Based on Edge Feature

1) Visual Analysis

2) Quality Evaluation

D. Behavior and Analysis of Different Loss Function

1) Visual Analysis

2) Quality Evaluation

E. Analysis of Sample Augmentation Experiment

Conclusion

References