Journals & Magazines >IEEE Journal of Selected Topi... >Volume: 14

Spectral–Spatial Attention Feature Extraction for Hyperspectral Image Classification Based on Generative Adversarial Network

Abstract:

Recent research shows that generative adversarial network (GAN) based deep learning derived frameworks can improve the accuracy of hyperspectral image (HSI) classificatio...Show More

Metadata

Abstract:

Recent research shows that generative adversarial network (GAN) based deep learning derived frameworks can improve the accuracy of hyperspectral image (HSI) classification on limited labeled samples. However, several studies point out that existing GAN-based methods are heavily affected by the complexity and inefficient description issues of HSIs. The discriminator in GAN always attempts to interpret high-dimensional nonlinear spectral knowledge of HSIs, thus resulting in the Hughes phenomenon. Another critical issue is sample generation. The generator is only used as a regularizer for the discriminator, which seriously restricts the performance for classification. In this article, we propose SSAT-GAN, a semisupervised spectral–spatial attention feature extraction approach based on the GAN that feeds raw data into a deep learning framework, in an end-to-end fashion. First, the unlabeled data is added into the discriminator to alleviate the problems of training samples and supplies a reconstructed real HSI data distribution through adversarial training. Second, to enhance the description of HSIs, we build spectral–spatial attention modules (SSAT) and extend them to the discriminator and the generator to extract discriminative characteristics from abundant spatial contexts and spectral signatures. The SSAT modules learn a three-dimensional filter bank with spectral–spatial attention weights to obtain meaningful feature maps to improve the discrimination of the feature representation. In terms of the mode collapse of GANs, the mean minimization loss is employed for unsupervised learning. Experimental results from three real datasets indicate that SSAT-GAN has certain advantages over the state-of-the-art methods.

Published in: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing ( Volume: 14)

Page(s): 10017 - 10032

Date of Publication: 28 September 2021

ISSN Information:

DOI: 10.1109/JSTARS.2021.3115971

Funding Agency:

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

Hyperspectral imagery (HSI) obtains hundreds of numerous narrow and contiguous spectral bands from the surface which provide abundant characteristics to enhance the identification ability of ground materials [1]. With high-resolution imaging technology rapidly developing, HSI becomes an ideal tool to effectively detect the surface, which spans a broad range of applications, including mineral substance [2], monitoring of plant diseases [3], anomaly detection [4], and land-cover mapping [5]. HSI classification plays a substantial role in these fields, intending to analyze discriminative characteristics of HSI and classify each pixel according to a corresponding land-cover category [6]. Therefore, two major characteristics of HSI should be considered. First, the high-dimensional nonlinear spectral signature, which originates from redundant bands of spectrums, enables the accurate distinction of homologous surface categories. Second, high spatial correlation provides spatial auxiliary contexts for accurate mapping of pixelwise classification, which derives from homogeneous regions [7].

Since the spectral information can natively reflect the characteristics of different materials, one set of traditional methods identifies the classification maps in a pixelwise way, which can be divided into two steps: 1) feature engineering, such as principal component analysis (PCA) [8], bands selection [9] and 2) classifier development, including support vector machine (SVM) [10], random forest [11]. This kind of approach is constrained by the high-dimensional nonlinear characteristics, which leads to an unsatisfactory result. To further improve the representation of HSIs, another set of approaches implements the positive effect on the spectral–spatial expression. Existing methods introduced the spatial contexts in the feature engineering step. For instance, Kang et al. [12] proposed the feature fusion framework combined with the edge-preserving filtering (EPF) and SVM. Jiang et al. [13] regarded the superpixel as a carrier to extract potential features. However, the models mentioned above consist of shallow structures which cannot provide an efficient description.

With the advancement of artificial intelligence, CNN-based approaches have attracted increased focus due to the fact that their objective functions directly aim at classification instead of two independent steps to obtain remarkable results [14], [15]. In 2016, Zhao et al. [16] adopted CNN to learn local spatial contexts for HSI classification. Chen et al. [17] designed a 3-D CNN to extract neighboring spectral cubes, which originate from HSIs instead of dimensionality-reduced data. Nonetheless, a deeper network may lead to the Hughes phenomenon, under the conditions of both complexity of the spectral–spatial distribution and the scarcity of training samples.

Meanwhile, with the development of deep learning, a series of deep-learning-derived methods have been applied for HSI classification and proven to be successful. Many works of classification frameworks obtains superior achievements by constructing high efficiency spectral–spatial feature extraction. For instance, Zhong et al. [18] built a spectral–spatial residual network (SSRN) to reduce the complexity of the network design and achieved advanced performance. In [19], a dense convolutional block was employed for accurate identification. A 3D-Conv-Capsule model [20] was presented for HSI classification, which attempted to consider the pixel position attributes to enhance the spatial awareness. In addition, in Sellami's work [21], a spectral–spatial graph was constructed to fully exploit the inherent spatial distribution.

Another line of approaches accomplished spectral–spatial classification by exploiting attention mechanisms, which performs classification after aggregating features from the homogeneous regions. Xu et al. [22] designed a control gate attention mechanism for the quick acquisition of key features. In [23], a spectral–spatial classification framework was proposed by performing CNN with a self-attention module to enhance the correlation of features. In [24], a multiattention fusion network (MAFN) was designed to mine significant features for classification. Yu et al. [25] presented a dense CNN framework with a feedback attention mechanism to further improve the computation efficiency. However, the attention weight embedding was placed behind the spectral–spatial representation, which introduced the influence of interference pixels and redundant spectral bands. He et al. [26] designed an HSI-BERT to capture global dependence among pixels at the receptive field. However, the transformer-based method needs multiple nonlocal areas to capture global long-term dependence.

In contrast to classical optical image classification objectives in the computer vision fields, which consist of hundreds of categories, the land cover classification of HSI takes much fewer targets for identification. Therefore, the theory that deep learning takes a high amount of data for training might not apply to HSIs which lack in labeled samples. Several works focus on the semisupervised learning via both labeled and unlabeled HSI samples for training. For instance, Fang et al. [27] presented a resampling strategy for training CNN sufficiently. In [28], the uncertainty of unlabeled samples of HSIs are considered for classification. Although these studies have acquired significant results, they may stem from the regions of high spatial correlation context, instead of deep learning methods.

Recently, generative adversarial network (GAN) have been applied for HSI classification to alleviate the issue of limited labeled samples. Specifically, GAN-based classifiers start from semisupervised HS-GAN proposed by Zhan et al. [29], which used 1-D spectral vectors as the input. To exploit the benefit of spatial information, a neighborhood majority voting strategy [30] is applied to the prediction, lately. He et al. [31] built a 3-D bilateral filtering-based GAN framework to improve the ability of spatial awareness. A 3D-GAN is proposed for HSI classification that keeps only the first three principal components of raw data as input. In [33], a semisupervised GAN with a conditional random field (GAN-CRF) was designed that regards the softmax prediction as conditional probabilities of HSI to refine classification maps. To enhance the meaningful semantic contexts, an adaptive DropBlock-enhanced GAN (AD-GAN) [34] was established to stabilize the training state of the model.

Although these GAN-based methods have achieved satisfying ability over the contemporaneous benchmarks, there are still two drawbacks over HSI classification to be solved.

The first challenge is the mode collapse of GAN. The generator G deceives the discriminator D through generating data from the limited labeled data distribution [35]. The restricted narrow redundant spectral signatures limit the representation ability of GAN and lead to terrible data generation. In Wang's work [34], an adaptive DropBlock is employed as a regularization method to alleviate the mode collapse. However, the supervised GANs generate the data distribution that is similar to that of labeled training ones and, thus, difficult to learn the complete real HSI distribution. In addition, the unlabeled data of HSI remains an unexploited gold mine for efficient data utilization. Recently, in response to this characteristic, Liang et al. [36] implemented the mean minimization loss that considers the constraint over unlabeled data of HSI and acquired superior achievement. The reason for this phenomenon is that it may minimize the values and variances of high-dimensional feature maps from D. As this point, the GAN model can hardly be subject to the impact of complex parameter calculation, which guaranteed the stability of the training state.

Another critical issue is the complexity, inefficient description of spectral–spatial characteristics. The classification performance seems to deteriorate when the extraction of spectral–spatial characteristics is affected by interference pixels. Therefore, it is hard to guarantee that the GAN always works toward the authentic HSI distribution, particularly for high-dimensional spectral signature or texture-dependent context. In Feng's work [37], the joint spatial spectral hard attention mechanism was employed in G to cooperate D discards misleading and confounding information for HSI classification. However, it only focused on a specific area of the input patches in one batch, which requires more complex technology for training. In a disparate line of work, the attention-aware block [38] was designed in ResNets to enhance the representation of HSI data. It demonstrated that the attention-aware block can learn more valuable and valid representations. However, when dealing with objects with variable spectral or irregular areas, the attentive architecture is inefficient. We argue that if the homogeneous spectrum and adaptive receptive fields are taken into account, the complexity issue of the HSI data can be alleviated.

To tackle the above-mentioned challenges of GAN-based methods, we suggest a spectral–spatial attention feature extraction approach based on GANs (SSAT-GANs) for HSI classification. The purpose of the proposal can build a significant representation for spectral–spatial characteristics and enhance the robustness and stability of GANs in the way of semisupervised learning. On the one hand, the SSAT-GAN takes the unlabeled data into account to alleviate the scarcity of labeled samples, which enables the generator G to implicitly reconstruct real HSI cubes. Meanwhile, we adopt the mean minimization loss as an unsupervised constraint item used in the discriminator D to avoid overfitting. On the other hand, the complicated spectral–spatial characteristics of local adjacent pixels herald the redundancy and inefficiency problem, which result in more insufficient classification with more complex regions. Inspired by the fact that the attention weights can enhance the effective representation of the saliency neighborhood of an object, the spectral–spatial attention modules (SSAT) are designed separately to capture the discriminative representation in this article, in which both intraspectrum and contextual relations of HSIs participate in the attention calculation through the feedback, and the weighted feature maps are considered to enhance intraclass consistency. In this way, we extend the SSAT to consecutive feature spreading and generation blocks and pass through them to build D and G, respectively. Unlike traditional semisupervised GANs, which require a deeper convolutional architecture for feature representation, our proposal is feature-efficient because both D and G share the weights of parameters with the corresponding attention modules and further improve the feature description. To this end, the well-trained D can achieve satisfactory classification accuracy.

The main contributions of this article are listed as follows.

We design a novel semisupervised GAN-based HSI classification framework using a small number of labeled and unlabeled data for training. The mean minimization loss is employed for unsupervised learning, which boost the backpropagation of the gradient and stabilize the training of GAN.
For the purpose of alleviating the inefficient description, we integrate the spectral–spatial attributes into SSAT for representation discrimination of the HSI data.
The alternately optimized architecture design makes the SSAT-GAN a framework that generalizes well in three real HSI datasets and achieves satisfactory classification accuracy over state-of-the-art methods.

The rest of this article is organized as follows. Section II reviews the basic concepts of GANs. The scheme of the proposed SSAT-GAN and its components are introduced in Section III. Experimental results and analysis are presented in Section IV. The superiority of SSAT-GAN is discussed in Section V. Finally, the conclusion is drawn in Section VI.

SECTION II.

Related Work

A. Generative Adversarial Network

GAN is an unsupervised deep learning model proposed by Goodfellow et al. [39], which provides a reasonable scheme to implicitly reckon real data distribution. GAN incorporates a generator G and a discriminator D in a unified network, where G generates samples to fool D into believing it, and D distinguishes the genuineness of the samples. Contradictory results make G and D reach Nash equilibrium in the zero-sum game, which is finally expressed as a minimax optimization problem $\begin{equation*} \begin{split} \underset{\text {G}}{{\text{min}}}\underset{\text {D}}{{\text{max}}}\text {Loss} = & \text {E}_{{\bf {z}}\sim p_{z}}\left[ \text{log}\left(1-D\left(G\left({\bf {z}} \right) \right) \right) \right] \\ & +\text {E}_{{\bf {x}}\sim p_{\text{data}}}\left[ \text{log}D\left({\bf {x}} \right) \right] \end{split} \tag{1} \end{equation*}$ View Source where ${\bf {z}}\sim p_{z}$ and ${\bf {x}}\sim p_{\text{data}}$ denote the random noise vectors and input images following real data distribution, respectively. $\text {E}(\cdot)$ is the expectation. $D({\bf {x}})$ and $G({\bf {z}})$ represent the sigmoid output obtained from D by training on real input vectors, and synthetic data from G by random noise, respectively. $D(G({\bf {z}}))$ gives the real expectations of D with the input derives from $G({\bf {z}})$ .

In the optimization process of GAN, G and D are optimized alternately. Given $G({\bf {z}})$ of G, the model will optimize D by maximizing $\text {E}_{{\bf {x}}\sim p_{\text{data}}}[ \text{log}D({\bf {x}}) ]+\text {E}_{{\bf {z}}\sim p_{z}}[ \text{log}(1-D(G({\bf {z}}))) ]$ . When D arrives at a stationary score, G is optimized by minimizing $\text {E}_{{\bf {z}}\sim p_{z}}[ \text{log}(1-D(G({\bf {z}}))) ]$ . Since D and G achieve the Nash equilibrium during adversarial training, GAN will learn the probability estimation of real data and produce promising results.

SECTION III.

SSAT-GAN Framework

The SSAT-GAN flowchart is shown in Fig. 1. Suppose the raw HSI dataset ${\bf {X}}$ contains m pixels $\lbrace {\bf {x}}_{1},{\bf {x}}_{2},{\bf {x}}_{3},\ldots, {\bf {x}}_{m} \rbrace \in \mathbb {R}^{1\times 1\times {b}}$ , where ${b}$ is the bands of spectrum. The neighboring cubes centered at the labeled pixels form the labeled datasets ${\bf {X}}^{1}=\lbrace {\bf {x}}{{_{\bf {i}}}^{1}} \rbrace \in \mathbb {R}^{{w}\times {w}\times {b}\times {m}_{l}}$ . Take unlabeled cubes ${\bf {X}}^{2}=\lbrace {\bf {x}}{{_{\bf {i}}}^{2}} \rbrace \in \mathbb {R}^{{w}\times {w}\times {b}\times {m}_{u}}$ , where w, ${m}_{l}$ , and ${m}_{u}$ are the spatial size of HSI cubes and the number of labeled and unlabeled HSI samples, respectively. We send these two datasets to the discriminator to learn the real distribution of HSI. The generator synthesizes HSI cube ${\bf {Z}}=\lbrace {\bf {z}}_{1},{\bf {z}}_{2},{\bf {z}}_{3},\ldots, {\bf {z}}_{m} \rbrace$ , with samples of size ${\bf {X}}^{2}$ . In addition, the labeled ${\bf {X}}^{1}$ has its corresponding annotation ${\bf {Y}}^{1}=\lbrace {\bf {y}}{{_{\bf {i}}}^{1}} \rbrace \in \mathbb {R}^{(1+{n}_{y}) \times {m}_{l}}$ , where ${n}_{y}$ is the number of land cover categories, and ${\bf {y}}{{_{\bf {i}}}^{1}}[ 0 ]$ is the first item of ${\bf {y}}{{_{\bf {i}}}^{1}}$ , which indicates the authenticity of the corresponding HSI cube. The classified prediction of HSI is carried out with a well-trained discriminator.

$Fig. 1. - Flowchart of SSAT-GAN framework for HSI classification. First, the unlabeled group ${\bf {X}}^{2}$ is established to initialize the parameters of a discriminator, and a generator transforms the noise vectors ${\bf {z}}$ to a set of fake HSI cubes ${\bf {Z}}$, which implicitly learns the real HSI distribution. Then, the discriminator attempts to identify the authenticity of the input HSI cubes that derive from ${\bf {X}}^{2}$ or ${\bf {Z}}$. Finally, the categorical information $\hat{{\bf {Y}}}$ is predicted by the discriminator that feeds labeled ${\bf {X}}^{1}$ during training. The corresponding annotation ${\bf {Y}}^{1}$ is adopted for the evaluation and acquire supervised partial loss of the GAN.$

Fig. 1.

Flowchart of SSAT-GAN framework for HSI classification. First, the unlabeled group ${\bf {X}}^{2}$ is established to initialize the parameters of a discriminator, and a generator transforms the noise vectors ${\bf {z}}$ to a set of fake HSI cubes ${\bf {Z}}$ , which implicitly learns the real HSI distribution. Then, the discriminator attempts to identify the authenticity of the input HSI cubes that derive from ${\bf {X}}^{2}$ or ${\bf {Z}}$ . Finally, the categorical information $\hat{{\bf {Y}}}$ is predicted by the discriminator that feeds labeled ${\bf {X}}^{1}$ during training. The corresponding annotation ${\bf {Y}}^{1}$ is adopted for the evaluation and acquire supervised partial loss of the GAN.

MIT Libraries

MIT Libraries

Spectral–Spatial Attention Feature Extraction for Hyperspectral Image Classification Based on Generative Adversarial Network

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Introduction

Related Work

A. Generative Adversarial Network

SSAT-GAN Framework

A. Spectral and Spatial Attention Modules

1) Spectral Attention Module

2) Spatial Attention Module

B. Spectral–Spatial Attention Discriminator and Generator

1) Spectral Attention Feature Spread Block

2) Spatial Attention Feature Spread Block

3) Spectral–Spatial Attention Feature Generation Blocks

C. Semisupervised SSAT-GAN

Algorithm 1: Training Process of SSAT-GAN.

Experimental Analysis

A. Experimental Datasets

1) Indian Pines

2) University of Pavia

3) Kennedy Space Center

B. Parameter Tuning

1) Evaluation of Different Depths of Spectral–Spatial Attention Block

2) Evaluation of Different Numbers of Kernels for SSAT-GAN

3) Influence of Unlabeled Real HSI Cubes

4) Evaluation of Different Spatial Data Sizes

C. Comparison With Various Algorithms

1) Experimental Results on IN Dataset

2) Experimental Results on UP Dataset

3) Experimental Results on KSC Dataset

D. Investigation of the Impact of Attention Mechanism

E. Execution Time Analysis on Different Datasets

F. Sensitivity Analysis on Different Number of Labeled Sample for Training

Discussion

Conclusion

ACKNOWLEDGMENT

References

IEEE Account

Purchase Details

Profile Information

Need Help?