Processing math: 0%
Spectral–Spatial Joint Feature Extraction for Hyperspectral Image Based on High-Reliable Neighborhood Structure | IEEE Journals & Magazine | IEEE Xplore

Spectral–Spatial Joint Feature Extraction for Hyperspectral Image Based on High-Reliable Neighborhood Structure


Abstract:

In recent years, semisupervised spectral–spatial feature extraction (FE) methods for hyperspectral image (HSI) classification have shown promising performance by combinin...Show More
Topic: Hyperspectral imaging and data exploitation

Abstract:

In recent years, semisupervised spectral–spatial feature extraction (FE) methods for hyperspectral image (HSI) classification have shown promising performance by combining spectral–spatial information and label information. A problem that has not been addressed satisfactorily is that how to effectively and collaboratively use this abundant information contained in the hyperspectral data to exhibit better HSI classification performance. In this article, a novel FE method based on joint spectral–spatial information is proposed for HSI classification, which consists of the following steps. First, an effective re-expression for the original data is constructed by incorporating texture features extracted by extended multiattribute profiles with the original HSI. Thus, every pixel can be described by diverse and complementary information in the spectral–spatial domain. Then, the improved neighborhood preserving embedding (NPE) is proposed to establish a relatively accurate reconstruction model and mine high-reliable neighborhood structure from a global perspective by a new distance metric, which incorporates spectral bands, texture features, and geographical information simultaneously. Finally, the low-dimensional and high-discriminative features for HSI classification are obtained by combining the scatter matrices of local fisher discriminant analysis based on labeled samples and the improved NPE based on the whole data. Experimental results on three real-world HSI datasets show that the proposed method can effectively utilize both the label information and the spectral–spatial information, and hence, achieve much better classification performance compared to the conventional FE methods and some state-of-the-art spectral–spatial classification methods.
Topic: Hyperspectral imaging and data exploitation
Page(s): 9609 - 9623
Date of Publication: 14 September 2021

ISSN Information:

Funding Agency:


CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.
SECTION I.

Introduction

Hyperspectral images (HSIs) are widely applied in various fields, such as crop monitoring, urban planning, mineral exploration, resource management, and so on [1], [2] due to their abundant spectral information. These applications usually require analyses and identification of each pixel in the same scene of HSI. However, HSIs can easily lead to the Hughes phenomenon as the dimension increases [3]. Feature extraction (FE) has been demonstrated to be an effective processing step to reduce the Hughes effect and the curse of dimensionality [4]. In general, FE methods can be divided into three categories: supervised, unsupervised, and semisupervised ones [5]. By use of class information from labeled samples, supervised FE methods, such as matrix discriminant analysis [6], nonparametric weighted FE [7], linear discriminant analysis (LDA) [8], and local Fisher discriminant analysis (LFDA) [9] can extract the low-dimensional representation from the data and assign every sample in the HSIs with a specified semantic category according to its content. However, it is labor-intensive and time-consuming to acquire sufficient labeled samples for satisfactory classification accuracy. In contrast, the unsupervised FE methods such as principal component analysis (PCA) [10], minimum noise fraction [11], locality preserving projection [12], and neighborhood preserving embedding (NPE) [13] can exploit the geometric property, which need not any prior knowledge of ground reference. However, the lack of class information provided by labeled samples results in poor classification performance. Therefore, semisupervised FE methods that both utilize semantic information from limited labeled samples and structural information from available unlabeled samples are more suitable for HSI FE and classification.

As a classical semisupervised FE method, semisupervised local discriminant analysis (SELD) combines supervised and unsupervised FE methods by using LDA to maximize the class discriminations and NPE to preserve the local structure information [14]. On this basis, an improved method, namely weighted semisupervised local discriminant analysis (weighted SLDA), has been proposed by replacing LDA with LFDA and introducing the weight coefficient that can balance the contributions of labeled samples and unlabeled samples [15]. Additionally, some sparse representation (SR) methods have been employed in extracting the features of the HSIs. Huang and Yang [16] proposed sparse discriminant embedding, which combines SR and the intermanifold structure of data to improve data separability. Luo et al. [17] proposed a semisupervised method, which is called semisupervised sparse manifold discriminative analysis by using manifold-based SR and graph embedding. As shown in Fig. 1, the common phenomenon that “same samples with different spectra and different samples with similar spectrum” exists among the HSIs. The FE and classification methods merely based on the spectral similarity metric may lead to under-classification or over-classification. Hence, the aforesaid methods that only rely on the spectral information are not effective to classify the HSIs [18].

Fig. 1. - Spectral curves of different pixels in the same scene of HSI.
Fig. 1.

Spectral curves of different pixels in the same scene of HSI.

Considering that it is unreliable to classify HSI merely by spectral information, over the years, spectral–spatial-based methods gain considerable attention in the field of HSI FE and classification. In addition to spectral information, the HSIs contain abundant spatial information, including geometric structures [19], shape [20] and texture features [21], and so on, and the pixels close to each other in space are likely to have the same label. For this reason, quite a few semisupervised FE methods have been developed to extract spectral–spatial features for improving the representation and classification performance of the HSIs [18], [22]. Imani and Ghassemian [23] stacked the texture features that extracted by extended multiattribute profiles (EMAP) based on the components extracted by feature spatial discriminant analysis with the original spectral bands for HSI classification. Kang et al. [24] proposed a novel spectral–spatial classification framework, which is based on edge-preserving filtering (EPF). A novel method called invariant attribute profiles (IAPs) that realizes locally extracting invariant features from HSI in both spatial and frequency domains to enhance the spatially semantic information for HSI processing and analysis is proposed in [25]. Gao et al. [26] stacked the spectral features and spatial features that extracted based on local binary patterns (LBP) and then, the high-dimensional vectors are fed into random multigraphs (RMGs) for classification. In addition, the spectral–spatial FE methods based on SR or collaborative representation (CR) have also been proposed for HSI classification tasks. Considering that the regions of different scales incorporate the complementary yet correlated information for classification, Fang et al. [27] presented a multiscale adaptive SR model that exploits spatial information at multiple scales via an adaptive sparse strategy. Liu et al. [28] employed a neighboring filtering kernel to spectral–spatial kernel SR for enhanced classification of the HSIs. Jiang et al. [29] proposed spatial-aware CR, which directly incorporates the spatial information by adding a spatial regularization term to the representation objective function for HSI classification and further developed a joint spatial-aware CR (JSaCR) modeling that takes into consideration of the contextual information of the center pixel. A novel CR-based spatial–spectral approach named probabilistic-kernel collaborative representation classification (PKCRC) that could cover different analysis scenarios by means of a fully adaptive processing chain is proposed in [30] for HSI classification. In addition, Zhou et al. [31] proposed a spatial peak-aware collaborative representation (SPaCR) method for HSI classification, which incorporates spectral–spatial information among superpixel clusters into regularization terms to construct a new CR-based closed-form solution.

Besides, many manifold-learning-based FE methods have also been proposed to fuse spectral and spatial information for boosting the HSI classification performance [32]. Wei et al. [33] proposed a spatial coherence NPE method, which considered the spatial context of pixels by adopting the difference between the surrounding patches of pixels, and then mapped the data into the low-dimensional space through an optimized locally linear embedding. Feng et al. [34] defined the discriminate spectral–spatial margins to reveal the local information of hyperspectral pixels and explore the global structure of both labeled and unlabeled data via low-rank representation. Zhou et al. [35] proposed a spatial and spectral regularized local discriminant embedding method, which described the local similarity information by integrating a spectral-domain regularized local preserving scatter matrix and a spatial-domain local pixel neighborhood preserving scatter matrix. Cao and Wang [36] combined both supervised loss and graph context, and learned underlying manifold representation and semisupervised classifier simultaneously for semisupervised HSI classification. Huang et al. [37] first employed a weighted mean filter to preprocess the image and then, a spatial–spectral combined distance was used to fuse the spatial and spectral information to select the neighbors for each pixel. Finally, manifold reconstruction was performed and the low-dimensional discriminative features are extracted for classification. Zhang et al. [38] first get the features by applying PCA on the normalized input HSI and utilized guided filtering to extract the spatial features of each band separately. Then, the extracted spatial features are superimposed, and low-dimensional embedding is completed through LFDA for spectral–spatial joint classification of HSI. Indeed, the above FE methods based on manifold learning introduce spatial information as a complementary of spectral information to facilitate the classification task. However, they usually use the spatial neighborhood relationship of HSI data in a specific area, i.e., a certain spatial window while ignoring the influence of spatial information in the construction of the adjacency graph and the spatial and spectral coherency of the whole data.

In recent years, due to the powerful feature representation ability of deep learning (DL) models, various DL models have been proposed to extract spectral–spatial features for HSI classification [39], [40]. Among various DL models, CNNs are the most popular DL model for hyperspectral FE. He et al. [41] proposed a multiscale 3-D deep convolutional neural network, which could jointly learn both 2-D multiscale spatial feature and 1-D spectral feature from HSI data in an end-to-end approach to achieve better results with large-scale dataset. In addition to 1-D CNN and 2-D CNN models, 3-D CNN model, which is able to extract high-level spectral–spatial features in a natural way, is also designed and employed for HSI spectral–spatial classification [42]. Shi and Pun [43] combined the feature extracted by the spectral–spatial CNN model with a multiscale hierarchical recurrent neural network that captures the spatial relations of local spectral–spatial features at different scales. Aptoula et al. [44] take attribute profiles (APs) as the input of the 2-D CNN model, taking advantage of the spatial information and spectral properties that APs can capture in an image at various scales. Lee and Kwon [45] obtained the initial spatial and spectral feature maps obtained from applying the variable size convolutional filters and then, fed the joint feature map through fully convolutional layers that eventually predict the corresponding label of each pixel vector.

Although there are a vast number of works paying attention to DL models for the representation and analysis of HSIs, some limitations should be noticed. In general, most of the DL models need lots of parameters to fine-tune, which means requiring adequate training samples and massive operation time to realize complicated structural adjustment and network training for satisfactory classification accuracy [46]. Moreover, most of these DL models try to extract features from a local neighborhood, whereas the conventional and state-of-the-art semisupervised spectral–spatial FE methods usually dig out the structural information from the unlabeled samples. These methods deliberately divide the labeled samples and the unlabeled samples, which ignore to consider the integrity and continuity of HSI data as a whole. Moreover, although both the common FE methods and the improved FE methods [47], [48] based on manifold learning aim at seeking effective and robust neighbors to obtain accurate feature representation, they lack comprehensive studies on evaluating the reliability of the selected neighbors when constructing the adjacency graph. Therefore, a more effective and robust FE method to jointly exploit the spectral–spatial information inherent in the HSIs is highly desired.

To deal with the aforementioned problems, in this article, we propose a novel semisupervised FE method for HSI classification based on the joint use of the spectral–spatial information. The proposed method can not only make use of the class information from labeled samples but also the spectral–spatial information of the whole data from a global perspective. First, we built a spectral–spatial representation for each sample in the original data. Considering that the texture of the samples with same label presents a good consistency in space [18], an effective re-expression for the original data is constructed by incorporating texture features extracted by EMAP with the original HSI. Thus, the hyperspectral data can be described more comprehensively. Next, the improved NPE is proposed and applied on the whole reconstructed data to explore the geometric structure from a global perspective. Compared to the conventional similarity measure that is merely based on spectral information, the improved NPE incorporates spectral information, texture features, and geographical information into a holistic distance measure on the assumption that adjacent samples in the HSIs would most probably be of the same class. By use of the novel similarity measure, the improved NPE can select the nearest neighbors (NNs) with the same label for each sample as much as possible. Hence, the reconstruction model is more accurate and the reconstruction error will be decreased accordingly. Finally, the scatter matrices of LFDA that dig out discriminative information from labeled samples and the improved NPE that aims at preserving neighboring data structure inherent in the whole data are incorporated to realize joint spectral–spatial FE of the HSIs.

In summary, the proposed method aims at fully utilizing the label information from labeled samples and meanwhile exploring spectral–spatial information from the whole hyperspectral data. The major contributions of the proposed method can be enumerated as follows.

  1. A spectral–spatial re-expression of the hyperspectral data is established by combining spectral information and the texture features that preserve the spatial pixel consistency, and each pixel in the same scene of HSI can be described by abundant and multiview information including spectral bands, spatial features and its own geographical information.

  2. In order to address the reconstruction error caused by the manifold reconstruction, a new distance metric of joint spectral–spatial information for measuring sample similarity is proposed in the improved NPE, which is an effective way to choose high-reliable nearby neighbors, by use of multiview information comprehensively. Moreover, as far as we know, the concept of neighbor reliability is first introduced to assess the reliability of the neighbors selected for the hyperspectral data in the manifold learning.

  3. Last but not least, we combined the supervised LFDA and the unsupervised improved NPE to compensate for each other's weaknesses. It should be noted that the improved NPE is applied on the whole data instead of unlabeled samples to find the true and the most relevant neighborhood information, which is helpful to mine the neighborhood structure from a global perspective. Thus, we use the designed semisupervised FE method to obtain significantly discriminative spectral–spatial features.

The rest of this article is organized as follows. Details of the proposed method are described in Section II. The experimental results and analysis are elaborated in Section III. Finally, Section IV concludes this article.

SECTION II.

Methodology

The framework of the proposed semisupervised spectral–spatial FE is shown in Fig. 2. The proposed method is composed of two main parts, i.e., the spectral–spatial re-expression of HSI data and the novel semisupervised FE. The former corresponds to re-expressing the original data and representing every sample by the combination form of spectral–spatial information. The latter plays a key role in redundant information elimination and joint spectral–spatial FE. The main principles for each part in the proposed method are elaborated in the following sections.

Fig. 2. - Framework of the proposed method for spectral–spatial joint FE based on high-reliable neighborhood structure.
Fig. 2.

Framework of the proposed method for spectral–spatial joint FE based on high-reliable neighborhood structure.

For clarity, given the original HSI data in a vector form as {\boldsymbol{X}} = [{{\boldsymbol{X}}^L}, {{\boldsymbol{X}}^U}] = [{{\boldsymbol{x}}_1}^L, {{\boldsymbol{x}}_2}^L, {\boldsymbol{x}}_i^L\ldots,{\boldsymbol{x}}_n^L, {{\boldsymbol{x}}_1}^U, {{\boldsymbol{x}}_2}^U, {\boldsymbol{x}}_j^U,\ldots,{{\boldsymbol{x}}_m}^U] \in {{\rm{R}}^{{d_{Spe}} \times z}} where dSpe denotes the dimension of spectral bands and z = n+m is the total amount of available samples. {{\boldsymbol{X}}^L} and {{\boldsymbol{X}}^U}are, respectively, the training set and testing set, n and m represent the number of labeled samples and unlabeled samples, respectively, xjU denotes the jth unlabeled sample and xiL denotes the ith labeled sample vector with the associated class label yi where yi∈{1, 2, …, C} and C denotes the number of classes.

A. Re-Expression of Original Data

As we know, the spectral-domain similarity is insufficient to reveal the inner relationships between different samples in the HSIs. Two samples with a small spectral distance may belong to different classes of landcovers. Hence, it is necessary to utilize the spatial information inherent in the HSIs. To make full use of spectral and spatial information provided by the given hyperspectral data, inspired by spectral–spatial feature fusion by feature stacking for the classification of the HSIs [23], we try to seek an effective way to characterize each sample with the spectral–spatial information. Given that the texture of the samples with same label present a good consistency in space [18] and EMAP focus on extracting the spatial characteristics of different structures in the HSIs at various scales, especially suitable for the greatly varied size of different classes of landcovers [23], EMAP is adopted to extract texture features and the corresponding features are superimposed onto the original HSIs to represent each sample in the spectral–spatial domain [49]. It is worth noting that we stack the texture features with original spectral bands instead of spectral features to reconstruct the hyperspectral data. Thus, we can reduce spectral information loss as much as possible.

On this basis, first, PCA is applied on X to get the first two principal components (PCs), as suggested in [21]–​[23], [49], in order to reduce subsequent computational complexity. APs are obtained through successive attribute thickening and attribute thinning operation on each PC with a set of given criteria. Then, a set of (extended morphological profiles) EAPs are composed of many concatenating APs by considering four different attributes on each PC: a, area of the regions; d, length of the diagonal of the box bounding the region; s, standard deviation; and i, moment of inertia [49]. Subsequently, EMAP can be computed by the concatenation of different types of EAPs and expressed as follows: \begin{equation*} {\rm{EMAP = }}\left\{ {{\rm{EA}}{{\rm{P}}_a}{\rm{, EA}}{{\rm{P}}_d}{\rm{, EA}}{{\rm{P}}_s}{\rm{, EA}}{{\rm{P}}_i}} \right\}. \tag{1} \end{equation*} View SourceRight-click on figure for MathML and additional features.

Define the representation of texture features extracted on the original dataset by EMAP as {{\boldsymbol{X}}^E} \in {{\rm{R}}^{{d_{{\rm{Spa}}}} \times z}}. Then, the texture features are superimposed on the original data X to construct the spectral–spatial re-expression dataset E, then {\boldsymbol{E = }}[ {{\boldsymbol{X}};{{\boldsymbol{X}}^E}} ] \in {R^{d \times z}} where d = {d_{{\rm{Spa}}}} + {d_{{\rm{Spe}}}} represents the dimension of spectral–spatial information. Besides, each reconstructed sample e contains another implicit information, i.e., its own geographical position in the image.

B. Spectral–Spatial FE

The performance of LFDA that can explore class information is limited to labeled samples, whereas the conventional unsupervised FE methods usually lose the discriminate information from labeled samples and focus on exploring neighborhood structure from unlabeled samples while ignoring to consider the integrity and continuity of HSI data as a whole. Hence, we propose the improved NPE, which adopts spectral information, texture features, and spatial interpixel correlations (the geospatial distance between different samples) to measure the sample similarity that applied on the whole data to discover the local structure information from a global perspective. Then, we combine the scatter matrices of LFDA and the improved NPE to construct a semisupervised spectral–spatial FE method to obtain the discriminative ability of different classes of landcovers in the HSIs. For better illustration, we describe and formulate the aforementioned FE methods one by one.

1) Local Fisher Discriminant Analysis

As a supervised FE method, LFDA can handle multimodal distribution and accurately capture the statistics in a reduced-dimensional space [49]. It aims at realizing between-class separation in the projected space while simultaneously preserving the within-class local structure. The optimal projection can be obtained by maximizing the interclass distance while minimizing the intraclass distance. According to the above notation representation, the reconstructed data can be expressed as E with z samples and d variables, which denotes the information dimension for each sample in the spectral–spatial domain. Let Sb and Sw denote the local interclass scatter matrix and intraclass scatter matrix, respectively, which can be expressed as follows: \begin{equation*} \begin{array}{rcl} {{\boldsymbol{S}}^b} = \frac{{1}}{{2}}\sum_{i = 1}^n {\sum_{j = 1}^n {W_{i,j}^b} \left( {{\boldsymbol{e}}_i^L - {\boldsymbol{e}}_j^L} \right){{\left( {{\boldsymbol{e}}_i^L - {\boldsymbol{e}}_j^L} \right)}^{\rm{T}}}} \\ {{\boldsymbol{S}}^w} = \frac{1}{2}\sum_{i = 1}^n {\sum_{j = 1}^n {W_{i,j}^w} \left( {{\boldsymbol{e}}_i^L - {\boldsymbol{e}}_j^L} \right){{\left( {{\boldsymbol{e}}_i^L - {\boldsymbol{e}}_j^L} \right)}^{\rm{T}}}} \end{array} \tag{2} \end{equation*} View SourceRight-click on figure for MathML and additional features.where W_{i,j}^w and W_{i,j}^b are weight matrices, which are defined as \begin{equation*} \begin{array}{l} W_{i,j}^w = \left\{ \begin{array}{ll} {A_{i,j}}{\rm{/}}{n_c}, & {\text{ if }}{y_i} = {y_j} = c{\rm{ }}\\ 0, & {\text{ if }}{y_i} \ne {y_j} \end{array} \right.\\ W_{i,j}^b = \left\{ \begin{array}{ll} {A_{i,j}}\left( {1/n - 1/{n_c}} \right), & {\text{ if }} {y_i} = {y_j} = c\\ 1/n, & {\text{ if }}{y_i} \ne {y_j}{\rm{ }} \end{array} \right. \end{array} \tag{3} \end{equation*} View SourceRight-click on figure for MathML and additional features.where {n_c} is the number of labeled samples in class c \big(\sum_{c = 1}^C {{n_c}} = n\big). Ai,j is the affinity value between labeled sample ei and ej, which can be denoted as \begin{equation*} {A_{i,j}} = \exp \left( { - \frac{{||{\boldsymbol{e}}_i^L - {\boldsymbol{e}}_j^L|{|^2}}}{{{\gamma _i}{\gamma _j}}}} \right) \tag{4} \end{equation*} View SourceRight-click on figure for MathML and additional features.where {\gamma _i} = \| {{\boldsymbol{e}}_i^L - {\boldsymbol{e}}_i^{L( l )}} \| denotes the local scaling of {\rm{e}}_i^L and the corresponding definition can be found in [9]. {\boldsymbol{e}}_i^{L( l )}is the lth NN of {\boldsymbol{e}}_i^L [50]. The closer two samples are, the more intensive influence {A_{i,j}} has on Sb and Sw. The generalized eigenvalue problem of LFDA can be expressed as \begin{equation*} {{\boldsymbol{S}}^b}\varphi = \lambda {{\boldsymbol{S}}^w}\varphi . \tag{5} \end{equation*} View SourceRight-click on figure for MathML and additional features.

According to Lu et al. [15], the local mixture scatter matrix S is introduced and can be described as \begin{align*} {\boldsymbol{S}} =& {{\boldsymbol{S}}^b}{\rm{ + }}{{\boldsymbol{S}}^w}\\ =& \frac{1}{2}\sum_{i = 1}^n {\sum_{j = 1}^n {{W_{i,j}}\left( {{\boldsymbol{e}}_i^L - {\boldsymbol{e}}_j^L} \right){{\left( {{\boldsymbol{e}}_i^L - {\boldsymbol{e}}_j^L} \right)}^{\rm{T}}}} } \\ =& {{\boldsymbol{E}}^L}\left( {{\boldsymbol{D - W}}} \right){\left( {{{\boldsymbol{E}}^L}} \right)^{\rm{T}}} \tag{6} \end{align*} View SourceRight-click on figure for MathML and additional features.where D is the n-dimensional diagonal matrix, which satisfies the following equation: \begin{equation*} D_{i,i}^{} = \sum_{j = 1}^n {{W_{_{i,j}}}} \hbox{ and }{W_{i,j}} = \left\{ \begin{array}{ll} 1/n,{\rm{ if }}{y_i} \ne {y_j}{\rm{ }}\\ {A_{i,j}}/n,{\rm{ if }}{y_i} = {y_j}{\rm{ }} \end{array} \right.. \tag{7} \end{equation*} View SourceRight-click on figure for MathML and additional features.

In addition, Sw can be re-expressed as \begin{align*} {{\boldsymbol{S}}^w} =& \sum_{i = 1}^n {\sum_{j = 1}^n {{\boldsymbol{W}}_{i,j}^w{\boldsymbol{e}}_i^L} } {\left( {{\boldsymbol{e}}_i^L} \right)^{\rm{T}}} - \sum_{i = 1}^n {\sum_{j = 1}^n {{\boldsymbol{W}}_{i,j}^w{\boldsymbol{e}}_i^L} } {\left( {{\boldsymbol{e}}_j^L} \right)^{\rm{T}}}\\ =& {{\boldsymbol{E}}^L}\left( {{{\boldsymbol{D}}^w}{\boldsymbol{ - }}{{\boldsymbol{W}}^w}} \right){\left( {{{\boldsymbol{E}}^L}} \right)^{\rm{T}}} \tag{8} \end{align*} View SourceRight-click on figure for MathML and additional features.where Dw is the n-dimensional diagonal matrix with D_{i,i}^w = \sum_{j = 1}^n {W_{_{i,j}}^w} . Hence, the optimal projection of LFDA can be rewritten by solving the following generalized eigenvalue problem: \begin{equation*} {{\boldsymbol{E}}^L}{{\boldsymbol{P}}^b}{\left( {{{\boldsymbol{E}}^L}} \right)^{\rm{T}}} = \lambda {{\boldsymbol{E}}^L}{{\boldsymbol{P}}^w}{\left( {{{\boldsymbol{E}}^L}} \right)^{\rm{T}}}\varphi \tag{9} \end{equation*} View SourceRight-click on figure for MathML and additional features.where {{\boldsymbol{P}}^b} = ( {{\boldsymbol{D}} - {\boldsymbol{W}}} )\! -\! ( {{{\boldsymbol{D}}^w}\! -\! {{\boldsymbol{W}}^w}} ) and {{\boldsymbol{P}}^w}\! =\! ( {{{\boldsymbol{D}}^w}\! -\! {{\boldsymbol{W}}^w}} ).

2) Improved NPE

As a classical manifold learning method, the basic NPE aims at maintaining the local neighborhood structure. The NPE algorithm contains three steps, i.e., constructing an adjacency graph, computing the weights, and calculating the projections. In the first step, the sample similarity is learned directly according to the Euclidean distance between different samples. However, two samples with small spectral distance measures may have large spatial distances or may belong to different classes and the spectral dissimilarity exists among the same class. Therefore, it indicates that the distance metric that merely based on spectral information is not sufficient for similarity measure in the HSIs. Considering that the samples that belong to the same label are likely to be spatially close together [51] and the texture of the samples with the same label presents a good consistency in the space, we try to improve the basic NPE by means of adding spatial interpixel correlations and textures features to the similarity measure between different samples.

For the manifold-learning-based FE methods, most focus on using the spectral-domain Euclidean distance to compute the similarity measure and merely explore the neighborhood structure from the unlabeled samples that ignore the integrity and consistency of the HSIs. In fact, the best neighbors for each unlabeled sample may exist in the training set, hence we pay more attention to fully exploiting correlations between each sample and its neighbors that selected from all the samples. The improved NPE that can be seen as an extended version of NPE is applied on the whole data E, which can preserve the local structure of the HSIs from a global perspective in the low-dimensional space and the best nearby neighbors, which satisfies highly spectral–spatial similarity are prior to be selected from all the samples for each sample.

For the reconstructed HSI dataset E with z samples and d dimensions of spectral–spatial information, each sample can be further re-expressed as [e, r], where e is the spectral–spatial information vector in the reconstructed data E and r = [x, y]T is the position vector of the corresponding sample. The mean Euclidean distance of the whole dataset E in the spectral–spatial domain can be computed as \begin{equation*} \mu \!=\! \frac{1}{{z(z - 1)/2}}\sum_{i = 1}^z \! {\sum_{j = i + 1}^z {\left( {||{{\rm{e}}_i} - {{\rm{e}}_{i + 1}}|{|_2} + \cdot \cdot \cdot + ||{{\rm{e}}_i} - {{\rm{e}}_z}|{|_2}} \right)} } . \tag{10} \end{equation*} View SourceRight-click on figure for MathML and additional features.

The distance between sample ei and ej can be defined as \begin{align*} &{p_{ij}} = \exp ( - {s^2}_{ij}{\rm{/}}\mu ) \cdot exp( - {d^2}_{ij}) \tag{11}\\ &{s_{ij}} = {\left\| {{{\rm{e}}_i} - {{\rm{e}}_j}} \right\|_2} \tag{12}\\ &{d_{ij}} = \sqrt {{{\left( {{{\rm{r}}_i} - {{\rm{r}}_j}} \right)}^2}} = \sqrt {{{\left( {{x_i} - {x_j}} \right)}^2} + {{\left( {{y_i} - {y_j}} \right)}^2}} \tag{13} \end{align*} View SourceRight-click on figure for MathML and additional features.where dij and sij are the geospatial distance and the spectral–spatial-domain Euclidean distance between ei and ej, respectively. It can be noticed that the improved NPE incorporates spectral information, texture features, and geographical information into a holistic distance measure.

It is understandable that if the sample and its neighbors belong to the same class, the reconstruction model will be more accurate. Given that there is a lack of comprehensive studies on evaluating the reliability of the selected neighbors for constructing the adjacency graph in the manifold learning, here we define an evaluation formula to assess the reliability of different neighbors, which selected on a combination of spectral information, texture features, and geospatial distance between different sample for the same scene of HSI. In fact, the distance between the sample and its neighbors is different. There exists the order among all the neighbors. In other words, the selected neighbors include the first near neighbor, the second near neighbor, the third near neighbor and so on. Hence, let {\boldsymbol{N}} \in {R^{b \times z}} denote the NN matrix for the dataset E where b is the number of NNs and ni,j denotes the jth NN corresponding to the sample ei, where j \in \{ {1,2,\ldots,b} \}. Let {\boldsymbol{V}} \in {R^{b \times z}} denote the corresponding real label matrix of N where {v_{i,j}} \in \{ {1,2,\ldots,C} \}. For the sample ei, the neighbor reliability of its jth neighbor can be defined as \begin{equation*} {H_{i,j}} = \left\{ \begin{array}{ll} 0, & {\text{ if }}l\left( {{e_i}} \right) \ne {v_{i,j}}{\rm{ }}\\ 1, & {\text{ if }}l\left( {{e_i}} \right) = {v_{i,j}} \end{array} \right. \tag{14} \end{equation*} View SourceRight-click on figure for MathML and additional features.where l( {{e_i}} ) \in \{ {1,2,\ldots,C} \} denotes the class label of {e_i}. If the sample and its jth neighbor of the order belong to the same class, then we assign the jth neighbor with high neighbor reliability. In fact, we try to find the NNs for each sample, which have the same class with the reconstructed sample as much as possible.

When we treat the HSI data as a whole of integrity and continuity, the neighbors for each sample can be classified into three categories: the close neighbors with high spectral–spatial similarity, the close neighbors with low spectral–spatial similarity, and the far neighbors with high spectral–spatial similarity. According to (11), the neighbors of the first category for each sample that have the largest possibility of holding the same label with the corresponding sample tend to be selected preferentially than the other categories. In a word, the neighbors of the first category are more important than the others. When the number of the NNs selected for building the reconstruction model increases, the other categories of neighbors with low reliability may be also selected, which results in increasing the reconstruction error. As illustrated in Fig. 3, the improved NPE expects to project each sample with its first category of NNs, which are highly likely to have the same class with the sample closer in the low-dimensional space while projecting neighboring samples that are highly likely to have different labels (i.e., the second category and the third category of NNs) further. Thus, each sample can be reconstructed by its NNs having the highest neighbor reliability.

Fig. 3. - Schematic of the improved NPE.
Fig. 3.

Schematic of the improved NPE.

In contrast to NPE, the improved NPE assumes that each sample can be reconstructed by the first category of NNs. That is, the improved NPE can reduce reconstruction error by selecting a small number of neighbors with high reliability to establish a more accurate reconstruction model. As a result, the improved NPE that is applied on the whole data can not only increase the reliability of selected neighbors but also preserve the consistency of the samples with the same class. According to the NN rule, a total of k NNs are finally confirmed to reconstruct each sample. The weight matrix Q can be calculated as follows: \begin{align*} &\arg \mathop {\min }_{\boldsymbol{Q}} \sum_i {||{{\rm{e}}_i} - \sum_j {{Q_{ij}}{{\rm{e}}_j}|{|^2}} } \,\,\,\,s.t\,\,\,\sum_j {Q_{ij}} \\ &\quad = 1,j = 1,2,\ldots,k . \tag{15} \end{align*} View SourceRight-click on figure for MathML and additional features.

The neighbor relationship will be preserved in the projected low-dimensional space and the optimal transformation matrix T can be found by minimizing the following equation: \begin{equation*} \sum_i {||{{\boldsymbol{y}}_i} - \sum_j {{Q_{ij}}{{\boldsymbol{y}}_j}|{|^2}} } \tag{16} \end{equation*} View SourceRight-click on figure for MathML and additional features.where {{\boldsymbol{y}}_i} = {{\boldsymbol{T}}^T}{{\boldsymbol{x}}_i}. By imposing the following constraint: \begin{equation*} \sum_i {{{\boldsymbol{y}}_i}{{\left( {{{\boldsymbol{y}}_i}} \right)}^{\rm{T}}} = {\boldsymbol{I}}} \tag{17} \end{equation*} View SourceRight-click on figure for MathML and additional features.the transformation matrix T can be computed by solving the following formulation: \begin{equation*} {\boldsymbol{T}} = \arg {\rm{ }}\mathop {\max }_{\boldsymbol{T}} \frac{{|{{\boldsymbol{T}}^{\rm{T}}}{\boldsymbol{E(E}}{)^{\rm{T}}}{\boldsymbol{T}}|}}{{|{{\boldsymbol{T}}^{\rm{T}}}{\boldsymbol{EM(E}}{)^{\rm{T}}}{\boldsymbol{T}}|}} \tag{18} \end{equation*} View SourceRight-click on figure for MathML and additional features.where {\boldsymbol{M}} = {({\boldsymbol{I}} - {\boldsymbol{Q}})^{\rm{T}}}({\boldsymbol{I}} - {\boldsymbol{Q}}). By introducing the Lagrangian multiplier, the optimal transformation matrix T can be obtained by finding the generalized eigenvectors corresponding to the eigenvalue problem \begin{equation*} {\boldsymbol{E}}{({\boldsymbol{E}})^{\rm{T}}}\varphi = {\boldsymbol{\lambda }}{\boldsymbol{EM}}{({\boldsymbol{E}})^T}{\boldsymbol{\varphi }} \tag{19} \end{equation*} View SourceRight-click on figure for MathML and additional features.where φ denotes the generalized eigenvector. Compared with the conventional NPE, the proposed method can establish a more accurate similarity measure between different samples and explore the local structure from the whole data effectively.

3) Proposed Semisupervised FE Method

As a supervised FE method, LFDA can dig out the semantic information of the labeled samples. The efficiency of LFDA depends on the available labeled samples and declines dramatically when the labeled samples are very few, whereas the improved NPE aims at maintaining the local neighborhood structure and loses the discriminative information from labeled samples. As shown in Fig. 2, we consider the combination of the LFDA and the improved NPE methods to compensate for each other's weaknesses. Algorithm 1 illustrates the key procedures of the proposed semisupervised FE method.

Here, we formulate a semisupervised FE method that is composed of LFDA and the improved NPE as follows. We add labeled samples EL into the reconstructed data E, which can be seen as a special way of sample augment. Then, the expanded dataset O combined by labeled samples EL and the whole data E can be expressed as {\boldsymbol{O}} = [ {{{\boldsymbol{E}}^L},{\boldsymbol{E}}} ]. It is notable that the abundant information including known label information, spectral and spatial information contained in the expanded dataset O are the same as those in the reconstructed dataset E. Then, by merging the above generalized eigenvalue functions (9) and (19), the proposed semisupervised FE method can be given as follows: \begin{equation*} {\boldsymbol{O}}\left[ \begin{array}{rcl} {\boldsymbol{P}}_{n \times z}^b{\rm{ }}{{\boldsymbol{0}}_{n \times z}}\\ {{\boldsymbol{0}}_{z \times n}}{\rm{ }}{{\boldsymbol{I}}_{z \times z}} \end{array} \right]{{\boldsymbol{O}}^{\rm{T}}} = \lambda {\boldsymbol{O}}\left[ \begin{array}{rcl} {\boldsymbol{P}}_{n \times z}^w{\rm{ }}{{\boldsymbol{0}}_{n \times z}}\\ {{\boldsymbol{0}}_{z \times n}}{\rm{ }}{{\boldsymbol{M}}_{z \times z}} \end{array} \right]{{\boldsymbol{O}}^{\rm{T}}}\varphi . \tag{20} \end{equation*} View SourceRight-click on figure for MathML and additional features.

Thus, the eigenvector matrix JO can be calculated via solving the eigen-problem expressed by (20). Finally, the joint spectral–spatial features can be obtained by means of simple matrix multiplication \begin{equation*} {\boldsymbol{F}} = {{\boldsymbol{J}}^T}{\boldsymbol{E}} \tag{21} \end{equation*} View SourceRight-click on figure for MathML and additional features.where the optimal discriminative projection matrix {\boldsymbol{J}} = [ {{{\boldsymbol{j}}_1},{{\boldsymbol{j}}_2}, \ldots,{{\boldsymbol{j}}_u}} ]( {u \ll {d_{{\rm{Spa}}}}} ) are the eigenvectors associated with the biggest u eigenvalues by rearranging the columns of JO according to the descend order of the associated eigenvalues.

SECTION III.

Experimental Results and Analysis

A. Dataset and Experimental Setting

In the experiments, to validate the performance of the proposed method, several representative HSI datasets from different remote sensing sensor are employed, which are briefly described in the following. The false-color image and the ground truth map of each dataset are illustrated in Figs. 4–​6, respectively.

  1. Indian Pines (IP): The hyperspectral dataset was acquired by the airborne visible infrared imaging spectrometer sensor over north-western India and covers the spectral range of 400 to 2500 nm with 220 spectral channels at a spatial resolution of 20 m. It contains 145×145 pixels and 16 land cover classes of interest. As a usual step, 20 water absorption and noisy bands are removed, and 200 bands are reserved.

  2. University of Pavia (PU): The dataset was gathered over the PU, Italy, by the reflective optics system imaging spectrometer. The scene is of 610×340 pixels with a spatial resolution of 1.3 m. It is composed of 103 spectral bands across the spectral range from 430 to 860 nm and nine classes in the ground truth image.

  3. University of Houston 2013 (HU): This image was captured over the HU and the neighboring urban area by the compact airborne spectrographic imager sensor. It is comprised of 349×1905 pixels with a spatial resolution of 2.5 m and 144 spectral bands ranging from 380 to 1050 nm. The corresponding ground truth image contains 15 classes inside.

Algorithm 1: Procedures of the Proposed Method.

Input:

Sample set {\boldsymbol{X}} = [{{\boldsymbol{X}}^L},{{\boldsymbol{X}}^U}] \in {{\rm{R}}^{{d_{Spe}} \times z}} and the labeled samples XL.

1:

Extract texture features {{\boldsymbol{X}}^E} \in {{\rm{R}}^{{d_{Spa}} \times z}} by EMAP from the original data and generate the re-expression of the original HSI data {\boldsymbol{E}}by combining {\boldsymbol{X}}and {{\boldsymbol{X}}^E}.

2:

Construct the improved NPE by redefining the similarity measure between different samples in (11).

3:

Formulate the proposed semisupervised feature extraction method which composed of LFDA (EL) and the improved NPE (E) on the expanded data set O in (20).

4:

Calculate the eigenvector matrix by solving (20) and obtain the transformation matrix J by rearranging the columns of JO to match the descend order of the eigenvalues.

5:

Compute the final spectral-spatial features by (21).

Output:

The low-dimensional spectral-spatial features F.

Fig. 4. - Visualization of the IP scene. (a) False-color image (band 50, 27, and 17 bands for RGB). (b) Ground truth. (c) Color code of IP.
Fig. 4.

Visualization of the IP scene. (a) False-color image (band 50, 27, and 17 bands for RGB). (b) Ground truth. (c) Color code of IP.

Fig. 5. - Visualization of the PU scene. (a) False-color image (band 64, 43, and 22 bands for RGB). (b) Ground truth. (c) Color code of PU.
Fig. 5.

Visualization of the PU scene. (a) False-color image (band 64, 43, and 22 bands for RGB). (b) Ground truth. (c) Color code of PU.

Fig. 6. - Visualization of the HU scene. (a) False-color image (band 64, 43, and 22 bands for RGB). (b) Ground truth. (c) Color code of HU.
Fig. 6.

Visualization of the HU scene. (a) False-color image (band 64, 43, and 22 bands for RGB). (b) Ground truth. (c) Color code of HU.

Given that the numbers of samples for different classes of IP is considerably unbalanced, so we randomly choose different proportions of labeled samples per class as training set from set {3%, 5%, 10%, 15%} for IP while choosing from set {0.5%, 1%, 3%, 5%} for PU, whereas for HU, we randomly choose different number of training samples per class from set {15, 30, 60, 90}. The remaining samples are used for testing. Moreover, the overall experiments are conducted for five runs to eliminate stochastic errors. Referring to the related paper for extracting EMAP [23], [49], a parameter setting of a = {10, 30, 50, 70, 90}, d = {10, 25, 40}, s = {0.05, 0.15, 0.25, 0.35}, and i = {0.2, 0.3, 0.4} were set for extracting texture features. The dimension of EMAP obtained on the first two PCs of each dataset is 62. In our experiments, we first project the HSI data into the low-dimensional feature space, and then use the relatively simple classifier-NN classifier to classify the testing samples in that common feature space. To quantitatively evaluate the performance of the proposed method and the other comparison methods, three common performance measurement indexes, i.e., overall accuracy (OA), average accuracy (AA), and Kappa coefficient are employed [52]. Among these metrics, OA records the percentage of correct predictions over the total number of test samples, AA indicates the mean of the percentage of the correct predictions for each class, and Kappa provides the overall class wise agreement between the actual class labels (ground truth) and estimated class labels for testing samples.

B. Experimental Results and Analysis

In the first experiment, we investigate the advantages of the proposed method by comparing it with the representative FE methods, i.e., LFDA [9], NPE [13], SELD [14], weighted SLDA [15], and the other semisupervised FE methods, which are presented in Table I detailedly. Note that case 5 represents the proposed method that uses a combination of multiview information (i.e., spectral information, texture features, and geographical information) to measure the similarity between different samples from a global perspective. For a fair comparison, the number of features for each method is set to 30 and the number of the NN for each sample is fixed as 2. In addition, in order to assess the capability of FE by different methods, we use the relatively simple classifier, i.e., the NN classifier to obtain the classification results. Table II presents the quantitative OA results of each method on all the HSIs using different number of training samples and the best results have been marked in bold typeface.

TABLE I Methods Under Different Conditions
Table I- Methods Under Different Conditions
TABLE II OA (%) of Different FE Methods on Three Datasets
Table II- OA (%) of Different FE Methods on Three Datasets

From the results, it can be observed that our proposed method has the best performance among the compared methods and produces an obvious superior classification performance than the compared spectral-based methods, such as LFDA, SELD, weighted SLDA, and Case 1. Moreover, compared to case 2, 3, and 4, which all are based on partial spectral–spatial information, the average of OA improvements achieved by the proposed method is {29.46%, 13.88%, 0.14%}, {4.18%, 11.81%, 0.1%} and {1.78%, 1.43%, 0.54%} for IP, PU, and HU, respectively. As for the proposed method, the neighbors for each sample are selected from the whole data that based on a combination of spectral information, texture features, and spatial interpixel correlations. Thus, the sample and its NNs chosen from a global perspective have the highest probability to belong to the same class. Hence, the proposed method can reduce the reconstruction error and reach good classification performance even with a small number of training samples.

In order to further illustrate the effectiveness of the proposed method, the impact of two important parameters on the performance of the proposed FE method, i.e., the number of extracted features and the number of NNs for each sample were investigated, respectively.

1) Impact of the Number of Spectral–Spatial Features

The results corresponding to the impact of the number of extracted features on the proposed method, which varied from 2 to 100 with different numbers of training samples and two NNs for each sample, are plotted in Fig. 7. As can be seen from the results of each dataset, when the initial labels are very small, the proposed method performs well. When we use the least training samples and fix the number of features as 30, the OA of IP, PU, and HU is 6.69%, 1.99%, and 10.77% lower than those under the most training samples of each dataset, respectively. It is worth noting that when the number of spectral–spatial features increases to 10, the classification accuracy curves for all the three datasets grow slower and tend to reach convergence. Furthermore, when the number of spectral–spatial features and NNs are fixed for each dataset, it can be seen that the classification accuracies improved with the increase of training samples. The reason for this is that a large number of labeled samples contain more discriminate information that can enhance the classification ability of the unlabeled samples.

Fig. 7. - Impact of the number of extracted spectral–spatial features on the proposed method with different numbers of training samples for (a) IP, (b) PU, and (c) HU.
Fig. 7.

Impact of the number of extracted spectral–spatial features on the proposed method with different numbers of training samples for (a) IP, (b) PU, and (c) HU.

2) Impact of the Number of NNs

In this experiment, we evaluate the impact of the number of NNs on the proposed method under the condition that the extracted features are set to 30 for each dataset. The OA results obtained by case 1, case 4, and the proposed method when using different number of NNs k (k is chosen from a regular set {2, 3, 5, 7}) are given in Table III and the graphic representation of the results for each dataset is shown in Fig. 8. It can be observed that, overall, the classification accuracies decrease with the number of NNs increases, indicating that the more the number of NNs, the more possible the neighbors with low reliability selected for constructing the reconstruction model. Moreover, the HU dataset that exists in a relatively large area of cloud shadow is not strictly following the variation rule of the classification with the number of NNs. The main reason is that the obvious phenomenon of “same samples with different spectra and similar spectrum with different samples” in the HU leads to the instable reliability of the NNs in the NN matrix for each sample and the inaccurate reconstruction model.

TABLE III OA of Each FE Method With Different Number of NNs on Three Datasets (in Percentage)
Table III- OA of Each FE Method With Different Number of NNs on Three Datasets (in Percentage)
Fig. 8. - Impact of the number of NNs on the proposed method with different number of training samples for (a) IP, (b) PU, and (c) HU.
Fig. 8.

Impact of the number of NNs on the proposed method with different number of training samples for (a) IP, (b) PU, and (c) HU.

As depicted in Section II, we use neighbor reliability that is defined above to assess a series of different neighbors that are chosen from the whole data to reconstruct each sample. Considering that the conventional FE methods based on manifold learning do not take the neighbor reliability of each chosen neighbor into account, however, the neighbors that have the same label with the sample to be reconstructed are more suitable for reconstructing the sample than those that have different labels. Therefore, if the sample and its neighbors belong to the same class, we tag the neighbors with high reliability. According to the NN order that sorted based on sample similarity, we calculate the reliability of the neighbors in different location of the neighbor order. As mentioned above, the reconstructed data can be expressed as {\boldsymbol{E}} \in {R^{d \times z}} with z samples and d variables, which denotes the information dimension for each sample in the spectral–spatial domain. According to the ground truth map, l( {{e_i}} ) \in \{ {1,2,\ldots,C} \} denotes the class label of {e_i} and C is the class number of the land cover.

Suppose {\boldsymbol{N}} \in {R^{b \times z}} denotes the neighbor order matrix for the dataset {\boldsymbol{E}} where b is the number of neighbors selected for each sample, and z is the number of all the samples. {\boldsymbol{V}} \in {R^{b \times z}} denotes the corresponding label matrix where {v_{i,j}} \in \{ {1,2,\ldots,C} \}. For the sample {{\boldsymbol{e}}_i}, we check if the sample and its jth neighbor in the neighbor order belong to the same class or not according to real label information. The neighbor reliability of the jth neighbor can be computed according to (14). Then, we use the ratio of the number of the jth neighbors that satisfy the aforesaid condition for the whole data to the number of all the samples (that is the number of the jth neighbors) to represent the reliability of the jth neighbors in the neighbor order, which can be defined as the following equation: \begin{equation*} {S_j} = \sum_{i = 1}^z {{H_{i,j}}} /z. \tag{22} \end{equation*} View SourceRight-click on figure for MathML and additional features.

For a more comprehensive comparison, we compare the neighbor reliability for each sample of the proposed method with that of case 1 that based on the conventional NPE that merely relies on spectral information to measure sample similarity. Furthermore, the neighbor reliability for each unlabeled sample of the proposed method is also compared with that of case 4 that applies the improved NPE on unlabeled samples to mine local neighborhood information. The reliability of the first seven neighbors in the NN matrix for all the HSIs by case 1 and the proposed method is presented in Table IV. Note that the neighbor reliability for each sample of the proposed method and that of case 1 has not been affected by the number of training samples. From the results, it is easy to find that the neighbor reliability of different neighbors in the proposed method is significantly higher than those in case 1.

TABLE IV Reliability of the First Seven Neighbors in the Neighbor Order for Each Sample
Table IV- Reliability of the First Seven Neighbors in the Neighbor Order for Each Sample

As mentioned above, when treating the HSI data as a whole of integrity and continuity, the neighbors can be classified into three categories: the close neighbors with high spectral–spatial similarity, the close neighbors with low spectral–spatial similarity, and the far neighbors with high spectral–spatial similarity. It is obvious that the neighbors of the first category have higher neighbor reliability than the other categories and tend to be chosen preferentially than the other categories according to the formula of similarity measurement between different samples of (11). As shown in Fig. 8, the classification accuracies tend to degenerate on the whole with the increase of the number of NNs, which is also consistent with the results of neighbor reliability in Table IV. When the number of NNs increases, the neighbors with low neighbor reliability could also be chosen for the manifold reconstruction, which results in the reconstruction error increasing.

The reliability of the first seven neighbors for each unlabeled sample in the neighbor order for all the HSIs by case 4 and the proposed method is presented in Table V. As a whole, it can be found that the reliability of neighbors for each unlabeled sample in the proposed method is higher than those in case 4 that applied the improved NPE on unlabeled samples. From the results, it can be also seen that when the number of training samples increases, the neighbor reliability in the same position of the neighbor order tends to decrease. This is mainly due to the fact that the neighbors of the first category with high neighbor reliability for each unlabeled sample may locate in the training set and the proposed method that applies the improved NPE on the whole data can reduce the reconstruction error by increasing the reliability of selected neighbors and preserving the consistency of the samples with the same class. All in all, both the theoretical analysis and the experimental results demonstrate that the proposed method can find the best spatial nearby neighbors for each sample from the whole data that satisfies high similarity in the spectral–spatial domain.

TABLE V Reliability of the First Seven Neighbors in the Neighbor Order for Each Unlabeled Sample in all the Datasets
Table V- Reliability of the First Seven Neighbors in the Neighbor Order for Each Unlabeled Sample in all the Datasets

C. Comparison With State-of-the-Art Classification Methods

To further illustrate the effectiveness of our method, the classification results obtained by several state-of-the-art spectral–spatial classification methods, namely EPF [24], IAPs [25], RMGs [26], JSaCR [29], PKCRC [30], SPaCR [31], Multiscale-CNN [41], 3-D CNN [42], and 3-D FCN [45] are also reported in this experiment. Besides, the NN-based classification results by the raw HSI data and EMAP described in Section II have also been taken into account, respectively. Considering the computational efficiency and classification performance, the parameters for each comparison method are set as follows. For a fair comparison, for EPF, the HSI is classified using a pixelwise classifier, e.g., NN instead of the support vector machine classifier in [24] before EPF. RMG is a graph-based ensemble learning method and the core parameters are the number of graphs, the number of spectral bands, and the patch size in LBP feature, which are set as 20, 2, and 11, respectively. For the JSaCR method, the three core regularization parameters λ, γ, and c are set as 8, 1, and 1e+03, respectively. In the PKCRC, the two important parameters σ and λ are fixed to 1e-3 and 300, respectively. The crucial parameters of the SPaCR are the average width of superpixel Sw, the tradeoff coefficient ωs between spatial and spectral distance, and the regularization parameters λ and β, which are set as 12, 0.9, 0.1, and 0.7, respectively. As for DL methods, the value of batch size and epoch is set as 7×7 and 200, respectively. The other settings are set as the same as the default parameters and configuration given in the related papers. The number of extracted spectral–spatial features and NNs for the proposed method is fixed as 30 and 2, respectively. The OA, Kappa coefficient, and CA of each class by each method under the largest number of training samples for all the HSIs are presented in Tables VI–​VIII. Furthermore, the corresponding classification maps obtained by different methods for each HSI dataset are also shown in Figs. 9–​11 for visual comparison.

TABLE VI Classification Results (in Percentage) Obtained by Different Methods for IP, Using 15% Per Class as Training Set
Table VI- Classification Results (in Percentage) Obtained by Different Methods for IP, Using 15% Per Class as Training Set
TABLE VII Classification Results (in Percentage) Obtained by Different Methods for PU, Using 5% Per Class as Training Set
Table VII- Classification Results (in Percentage) Obtained by Different Methods for PU, Using 5% Per Class as Training Set
TABLE VIII Classification Results (in Percentage) Obtained by Different Methods for HU, Using 90 Samples Per Class as Training Set
Table VIII- Classification Results (in Percentage) Obtained by Different Methods for HU, Using 90 Samples Per Class as Training Set
Fig. 9. - Classification maps obtained by different methods on IP with 15% randomly selected training samples per class. (a) Raw. (b) EMAP. (c) IAPs. (d) EPF. (e) RMGs. (f) JSaCR. (g) PKCR. (h) SPaCR. (i) 3-D CNN. (j) Multiscale CNN. (k) 3-D FCN. (l) Proposed.
Fig. 9.

Classification maps obtained by different methods on IP with 15% randomly selected training samples per class. (a) Raw. (b) EMAP. (c) IAPs. (d) EPF. (e) RMGs. (f) JSaCR. (g) PKCR. (h) SPaCR. (i) 3-D CNN. (j) Multiscale CNN. (k) 3-D FCN. (l) Proposed.

Fig. 10. - Classification maps obtained by different methods on PU with 5% randomly selected training samples per class. (a) Raw. (b) EMAP. (c) IAPs. (d) EPF. (e) RMGs. (f) JSaCR. (g) PKCR. (h) SPaCR. (i) 3-D CNN. (j) Multiscale CNN. (k) 3-D FCN. (l) Proposed.
Fig. 10.

Classification maps obtained by different methods on PU with 5% randomly selected training samples per class. (a) Raw. (b) EMAP. (c) IAPs. (d) EPF. (e) RMGs. (f) JSaCR. (g) PKCR. (h) SPaCR. (i) 3-D CNN. (j) Multiscale CNN. (k) 3-D FCN. (l) Proposed.

Fig. 11. - Classification maps obtained by different methods on HU with 90 randomly selected training samples per class. (a) Raw. (b) EMAP. (c) IAPs. (d) EPF. (e) RMGs. (f) JSaCR. (g) PKCR. (h) SPaCR. (i) 3-D CNN. (j) Multiscale CNN. (k) 3-D FCN. (l) Proposed.
Fig. 11.

Classification maps obtained by different methods on HU with 90 randomly selected training samples per class. (a) Raw. (b) EMAP. (c) IAPs. (d) EPF. (e) RMGs. (f) JSaCR. (g) PKCR. (h) SPaCR. (i) 3-D CNN. (j) Multiscale CNN. (k) 3-D FCN. (l) Proposed.

From the results, the proposed method achieved better classification results than the other spectral–spatial HSI classification methods for all the HSIs that support the effectiveness of the proposed method. Moreover, the proposed method can classify different classes even with high intraclass variability and interclass similarity in each dataset properly. The main reason is that the proposed method explores a new spectral–spatial combined geospatial distance to choose reliable and effective neighbors, which are used to construct a spatial–spectral adjacency graph for discovering the intrinsic structure of HSI data. Hence, the proposed FE method can greatly increase the sample (pixel) separability belonging to different classes and the sample (pixel) aggregation belonging to the same classes in the low-dimensional feature space. It is also worth noting that the proposed method can achieve a satisfactory performance with the limited number of features and NNs, which implies that the discriminate spectral–spatial features obtained by the proposed method could not only reduce the computational burden but also save the storage space for further study.

SECTION IV.

Conclusion

In this article, we proposed a novel spectral–spatial joint FE based on high-reliable neighborhood structure for the analysis and classification of HSIs. The main steps of the proposed method include: the re-expression of hyperspectral data in the spectral–spatial domain, the proposal of the improved NPE for exploring high-reliable neighborhood structure, and the combination of LFDA (labeled samples) and the improved NPE (the whole data) for extracting significantly discriminative spectral–spatial features for HSI classification. By use of complementary and diverse information provided by the HSIs, the proposed method is demonstrated to produce a satisfactory FE capability and show its robustness when tested on three different datasets. Furthermore, the proposed method also achieves competitive performance compared with several state-of-the-art spectral–spatial HSI classification methods.

References

References is not available for this document.