Introduction
Hyperspectral images (HSIs) contain very rich spatial–spectral information, thus providing a substantial opportunity to explore and understand earth surface characteristics. Among existing HSI processing technologies, classification is by far the most well-established field and has been widely applied to various scenarios in Earth science, such as monitoring coastal wetlands [1], precision agriculture [2], water quality analysis [3], and other fields [4], [5].
Most HSI classification methods stem from computer vision [6], which, in the early, are mainly based on traditional machine learning techniques [7]. For example, the
Deep learning (DL) methods, which automatically learn discriminative features, have recently made significant breakthroughs in various fields, such as face recognition [16], fault detection [17], and posture estimation [18]. This has stimulated interest among researchers in the community of HSI classification. Representative DL methods include convolutional neural networks (CNNs) [19], recurrent neural networks (RNNs) [20], [21], graph convolutional networks (GCNs) [22], [23], and capsule networks (CapsNets) [24], among which CNNs have been widely applied to HSI classification. Hu et al. [25] and Chen et al. [26] first employed CNNs for HSI classification in the spectral, spatial, and spatial–spectral domains. In [27], different portions of HSIs were fed into the diverse region-based 2-D-CNN to identify the multiscale contextual interactional features. A 3-D-CNN that implements a border mirroring strategy to learn joint spatial–spectral information was proposed in [28]. Li et al. [29] proposed a two-stream model to extract the spectral, local spatial, and global spatial features simultaneously. Roy et al. [30] constructed a hybrid spectral CNN (HybridSN), which combines the advantages of both 3-D-CNN and 2-D-CNN, strengthening spatial feature learning. Meanwhile, some residual structures and attention blocks were integrated into the standard CNNs to improve feature extraction and classification performance. In [31], a spatial–spectral residual network (SSRN) made use of residual connections to alleviate the decline in classification accuracy caused by deeper networks. The work in [32] constructed a pyramidal residual network by stacking the bottleneck residual units with the gradually increased feature channels. Li et al. [33] proposed a two-stream spectral feature fusion network, where two branches generate local and interlocal spectral correlation features that are adaptively integrated via dual-channel attention and decision fusion to achieve better classification results. In [34], a double-branch dual-attention (DBDA) mechanism network was presented, which adds some attention blocks after the spectral and spatial branches (separately) to adaptively optimize the feature maps. By using both residual and attention blocks, a residual spatial–spectral attention network (RSSAN) was further proposed in [35]. In addition, other DL networks containing convolution operations are also utilized for HSI classification. Hu et al. [36] developed a spatial–spectral ConvLSTM 2-D neural network (SSCL2DNN), which can model the long-range dependency of the spectral dimension for feature extraction. By introducing convolutional capsule layers and maximum correntropy criterion, a 3-D CapsNet [37] alleviated the influence of noise and outliers on HSI classification, yielding more robust performance. Despite the fact that these DL-based models have achieved satisfactory performance in HSI classification, they have a plethora of trainable parameters and subsequently require large storage costs, whose training is more prone to overfitting.
Recently, some efforts have been made to construct efficient structures for network compression to solve the aforementioned problems. A fast and compact 3-D-CNN with few parameters was developed in [38]. Some efficient convolution operations have been explored to reduce the number of network parameters. In [39], multikernel depthwise convolution and group convolution were utilized for lightweight feature fusion. A lightweight model was designed by replacing the standard convolution with a depthwise convolution and a pointwise convolution in an effort to reduce the complexity of the whole model [40]. Based on this work, LiteSCANet [41] improved the computational efficiency by using a residual double-branch structure for HSI classification. Cui et al. [42] also considered depthwise separable convolution and decreased the number of channels in the feature maps to reduce model complexity. Meanwhile, designing different operators to replace the convolution kernels is another popular way. For instance, a random patches network (RPNet) [43] directly utilized random patches from HSIs as convolution kernels to extract hierarchical features. The ESSINet [44] featured a new lightweight involution kernel on channel interactions and incorporated spatial information via a dual-pooling layer. Although the convolutional layers have been compressed by these compact structures, the fully connected (FC) layers still have a large number of parameters [45].
Differing from the above studies, tensor decomposition is an alternative for network compression. By exploiting the low rankness, classical Tucker and CANDECOMP/PARAFAC (CP) decompositions have been successfully applied for tensor completion [46], [47] and HSI denoising [48], which also implemented the convolution kernel tensor approximation for network compression [49], [50]. However, these works belong to the mapping way that is more suitable for model acceleration and has poor compression performance. Hence, Novikov et al. [51] made use of the tensorized way and reformulated the weight matrices of FC layers in a tensor train (TT) [52] format, which reduces a large number of parameters in FC layers while maintaining their expression ability. Inspired by Novikov et al. [51], Garipov et al. [53] reshaped 2-D convolution kernels into high-order tensors and factorized them by TT decomposition (TTD), and then, Wang et al. [54] extended this work to the 3-D form, achieving a better compression effect. Finally, TTD has been introduced to HSI classification for network compression. Hu et al. [55] proposed a spatial–spectral TT-ConvLSTM 2-D neural network (SSTTCL2DNN), which only lightweights the convolution kernels of ConvLSTM, without dealing with the FC layers. This model yet illustrates the compression effectiveness of tensor decomposition but sacrifices the classification accuracy due to the finite expression of TTD. At the same time, the tensor ring decomposition (TRD) [56] was introduced as a general form of TTD for compressing the convolutional and FC layers, which relaxes the condition over the rank and has an enhanced representation ability [57]. Nevertheless, these two decompositions have a limited correlation characterization of tensors and are sensitive to the permutation of tensor modes. Unlike the aforementioned tensor decompositions, FC tensor network decomposition (FCTND) [58] had an FC structure and broke through the limitations of TTD and TRD. In fact, various tensor decompositions were compared in [59] and [60], and we find that FCTND can better recover the original data from a small amount of data in tensor completion and exhibits stronger representation ability.
In this article, we make the first attempt to use FCTND for the purpose of network compression. Specifically, we newly construct the FCTND-based FC and convolutional units, with which a new Hybrid FC Tensorized Compression Network (HybridFCTCN) is proposed to fully capture the joint spatial–spectral information (with lower storage requirement) for improving the HSI classification performance. The main contributions of this work are listed as follows.
By exploiting the low-rank FCTND, three novel units, i.e., FCTN-FC, FCTNConv2D, and FCTNConv3D, are designed to compress the weight tensor of the standard FC layer and kernel tensor of convolutional layer, in which the FC structure can make better use of the intrinsic correlation between arbitrary two factors to enhance feature extraction and classification abilities.
HybridFCTCN adopts the hybrid structure of spatial–spectral feature learning with FCTNConv3D followed by spatial feature enhancement with FCTNConv2D. As such, the proposed model can not only learn the highly discriminative features but also achieve a more lightweight network, thus being not prone to overfitting under small sample sizes and obtaining state-of-the-art classification performance.
In terms of the existence of FCTND with equal rank, the rank of FCTND-based units is defined, and then, rank determination in different layers of HybridFCTCN is discussed, thus facilitating its practical application.
The remainder of this article is organized as follows. Section II introduces some background information about tensors and presents three tensor decompositions. Section III describes three novel compression units (FCTN-FC, FCTNConv2D, and FCTNConv3D), the architecture of the proposed model, and the determination of the ranks in HybridFCTCN. Section IV reports the obtained experimental results. Section V concludes this article with some remarks.
Tensor Preliminaries and Tensor Decompositions
In this section, some preliminaries including notations and tensor operations, as well as related tensor decompositions, are concisely presented for the self-contained purpose.
A. Preliminaries
1) Notations:
Throughout this article, scalars, vectors, matrices, and tensors are denoted by italic letters
2) Tensor Operations:
Two basic tensor operations, including tensor contraction and tensor convolution, are illustrated in Fig. 2, with different line shapes to distinguish different operations of tensor. Specifically, the tensor contraction operation is the process of removing the matching dimensions between two tensors, represented by the solid line in Fig. 2(a). For example, by performing the tensor contraction operation between a three-order tensor
Tensor operations. (a) Tensor contraction operation. (b) Tensor convolution operation.
B. Tensor Decompositions
1) Tensor Train Decomposition [52]:
As depicted in Fig. 3, TTD factorizes a large-sized tensor into a set of sequentially connected small-sized tensors, where the side factors are matrices, and the others are three-order tensors. The mathematical expression of TTD can be written as follows:\begin{equation*} \mathcal {X}_{l_{1},\ldots,l_{d}}\stackrel {\text {TTD}}{=} \sum _{r_{1}=1}^{R_{1}},\ldots, \sum _{r_{d-1}=1}^{R_{d-1}}\left \{{\mathcal {G}_{r_{0},l_{1},r_{1}}^{(1) },\ldots,\mathcal {G}_{r_{d-1},l_{d},r_{d}}^{(d)}}\right \} \tag{1}\end{equation*}
2) Tensor Ring Decomposition [56]:
TRD links the side factors of TTD to construct a ring-like form, whose sketch map is shown in Fig. 4. It can be regarded as a linear combination of TTD, having the properties of good representation and cyclic invariance. Given a \begin{align*} \mathcal {X}_{l_{1},\ldots,l_{d}}\stackrel {\text {TRD}}{=} \sum _{r_{0}=r_{d}=1}^{R}\sum _{r_{1}=1}^{R_{1}},\ldots, \sum _{r_{d-1}=1}^{R_{d-1}} \left \{{\mathcal {G}_{r_{0},l_{1},r_{1}}^{(1) },\ldots,\mathcal {G}_{r_{d-1},l_{d},r_{d}}^{(d)}}\right \} \tag{2}\end{align*}
3) Fully Connected Tensor Network Decomposition [58]:
As shown in Fig. 5, a large \begin{align*}&\hspace {-0.5pc}\mathcal {X}_{l_{1},\ldots,l_{d}}\stackrel {\text {FCTND}}{=}\sum _ {r_{1,2}=1}^{R_{1,2}},\ldots,\sum _{r_{1,d}=1}^{R_{1,d}},\ldots, \sum _{r_{d-1,d}=1}^{R_{d-1,d}} \\&\hspace {-3pc} \,\left \{{\mathcal {G}_{l_{1},r_{1,2},\ldots,r_{1,d}}^{(1) },\ldots,\mathcal {G}_{r_{1,d},\ldots,r_{d-1,d},l_{d}}^{(d)} }\right \} \tag{3}\end{align*}
HybridFCTCN
This Section first illustrates the compression of the FC and convolutional layers based on FCTND. Then, with three newly designed units, the whole framework of the proposed HybridFCTCN model (as shown in Fig. 6) for HSI classification is described in detail. Finally, the determination of the ranks in HybridFCTCN is discussed.
Graphical illustration of the proposed HybridFCTCN model. HybridFCTCN is composed of data preprocessing, spatial–spectral feature learning, spatial feature enhancement, and HSI classification. Data preprocessing includes PCA and normalization. Spatial–spectral feature learning contains three 3-D units in which the features with the spatial–spectral structure (red block) are propagated. Spatial feature enhancement has an FCTNConv2D unit where the features are propagated in the spatial structure (yellow block). For HSI classification, a GAP layer, two FCTN-FC units, and a softmax layer are utilized to predict the sample attribute.
A. Design of FCTN-FC
In an FC layer, the input feature map \begin{equation*} \mathbf {y}=\mathbf {W}\mathbf {x}. \tag{4}\end{equation*}
\begin{equation*} \mathcal {Y}_{o_{1},\ldots,o_{n}}=\sum _{i_{1},\ldots,i_{m}} \mathcal {W}_{i_{1},\ldots,i_{m},o_{1},\ldots,o_{n}} \mathcal {X}_{i_{1},\ldots,i_{m}} \tag{5}\end{equation*}
\begin{align*} \text {FCTND}(\mathcal {W})=&\sum _{r_{1,2}=1}^{R_{1,2}},\ldots, \sum _{r_{1,m+n}=1}^{R_{1,m+n}},\ldots, \sum _{r_{m,m+1}=1}^{R_{m,m+1}},\ldots, \sum _{r_{m+n-1,m+n}=1}^{R_{m+n-1,m+n}} \\&\hspace {-.2pc}\left \{{\mathcal {G}_{i_{1},r_{1,2},\ldots,r_{1,m},\ldots,r_{1,m+n}}^{(1) },\ldots, }\right. \\&\hspace {-2pc}\qquad \mathcal {G}_{r_{1,m},\ldots,r_{m-1,m},i_{m},r_{m,m+1},\ldots,r_{m,m+n}}^{(m)} \\&\hspace {-2pc}\qquad \,\mathcal {G}_{r_{1,m+1},\ldots,r_{m,m+1}, o_{1},r_{m+1,m+2},\ldots,r_{m+1,m+n}}^{(m+1)},\ldots, \\&\hspace {-1.5pc}\qquad \left.{\mathcal {G}_{r_{1,m+n},\ldots,r_{m+n-1,m+n}, o_{n}}^{(m+n)}}\right \}. \tag{6}\end{align*}
\begin{equation*} \mathcal {Y}_{o_{1},\ldots,o_{n}}=\sum _{i_{1},\ldots,i_{m}}\text {FCTND} (\mathcal {W})\mathcal {X}_{i_{1},\ldots,i_{m}}. \tag{7}\end{equation*}
Compression process of the FC and convolutional layers by FCTND. (a) FCTN-FC. (b) FCTNConv2D.
Compared to the original FC layers, the number of parameters in the FCTN-FC units is reduced, with the compression ratio calculated as \begin{equation*} C_{\mathrm{ FC}}=\frac {IO}{R^{m+n-1}\left ({\sum _{i=1}^{m}I_{i} +\sum _{j=1}^{n}O_{j}}\right)} \tag{8}\end{equation*}
B. Design of FCTNConv
To reduce the trainable parameters in a 2-D convolutional layer, the four-order convolution kernel \begin{equation*} \mathcal {K}_{l_{1},l_{2},i,o}=\sum _{r_{0,1}=1}^{R_{0,1}} \sum _{r_{0,2}=1}^{R_{0,2}}\sum _{r_{1,2}=1}^{R_{1,2}} \mathcal {S}_{r_{0,1},l_{1},l_{2},r_{0,2}}\mathcal {P}_{r_{0,1},i,r_{1,2}}\mathcal {Q}_{r_{0,2},o,r_{1,2}} \qquad \tag{9}\end{equation*}
After substituting (9) into the original 2-D convolution kernel, the approximate evaluation of the standard convolutional layer can be expressed by the following three consecutive steps:\begin{align*} \mathcal {U}_{h,w,r_{0,1},r_{1,2}}&\,=\, \sum _{i=1}^{I}\mathcal {X}_{h,w,i}\mathcal {P}_{r_{0,1},i,r_{1,2}} \tag{10}\\ \mathcal {V}_{h',w',r_{1,2},r_{0,2}}&\,=\, \sum _{l_{1},l_{2}=1}^{L}\sum _{r_{0,1}=1}^{R_{0,1}} \mathcal {U}_{h,w,r_{0,1},r_{1,2}}\mathcal {S}_{r_{0,1}, l_{1},l_{2},r_{0,2}} \tag{11}\\ \mathcal {Y}_{h',w',o}&\,=\,\sum _{r_{0,2}=1}^{R_{0,2}} \sum _{r_{1,2}=1}^{R_{1,2}}\mathcal {V}_{h',w',r_{1,2},r_{0,2}} \mathcal {Q}_{r_{0,2},o,r_{1,2}} \tag{12}\end{align*}
In practice, the convolution kernel \begin{align*} \text {FCTND}(\mathcal {K})=&\sum _{r_{0,1}=1}^{R_{0,1}},\ldots, \sum _{r_{m+n-1,m+n}=1}^{R_{m+n-1,m+n}} \\&\,\left \{{\mathcal {S}_{l_{1},l_{2},r_{0,1},\ldots, r_{0,m+n}}}\right.\tag{13}\\&\hspace {-.9pt}\mathcal {G}_{r_{0,1},i_{1},r_{1,2},\ldots, r_{1,m+n}}^{(1) },\ldots, \\&\hspace {-2pc}\qquad \left.{\mathcal {G}_{r_{0,m+n},\ldots,r_{m+n-1,m+n}, o_{n}}^{(m+n)}}\right \}. \tag{14}\end{align*}
We name this unit FCTNConv2D, whose sketch map is shown in Fig. 7(b) (the solid line denotes tensor contraction operation, while the dashed line denotes tensor convolution operation), and computation scheme is described in Algorithm 1. To be specific, the channel factors \begin{equation*} C_{\mathrm{ conv}}=\frac {L^{2}{IO}}{R^{m+n}\left ({L^{2}+ \sum _{i=1}^{m}I_{i}+\sum _{j=1}^{n}O_{j}}\right)}. \tag{15}\end{equation*}
Algorithm 1 FCTNConv2D
Input: Input tensor
factors
Output: Output tensor
for
end for
Increase the feature channels via (10);
Extract the features via (11);
for
end for
Reduce the feature channels via (12);
retrun
With the purpose of further extracting the spatial–spectral features for HSI classification, the FCTNConv2D unit is extended to a 3-D form called the FCTNConv3D unit. The difference between these two units lies in the data dimensionality. In other words, the uncompressed convolution kernel \begin{align*} \mathcal {Y}_{h',w',d',o_{1},\ldots,o_{n}}=\sum _{l_{1},l_{2}=1}^{L} \sum _{t=1}^{T}\sum _{i_{1},\ldots,i_{m}}\text {FCTND} (\mathcal {K}_{\mathrm{ 3D}})\mathcal {X}_{h,w,d,i_{1},\ldots,i_{m}}. \tag{16}\end{align*}
C. Framework of the Proposed HybridFCTCN
HSI contains hundreds of continuous and narrowband image data, which can be naturally represented by a three-order tensor. To completely and efficiently explore both spatial and spectral information, a lightweight tensorized neural network model, i.e., HybridFCTCN, is proposed for HSI classification. The overall network architecture of HybridFCTCN is shown in Fig. 6, which mainly includes four modules: data preprocessing, spatial–spectral feature learning, spatial feature enhancement, and classification. In particular, the underlying hybrid structure can integrate the complementary information provided by FCTNConv3D and FCTNConv2D for highly discriminative feature extraction.
In the data preprocessing, principal component analysis (PCA) is utilized for spectral redundancy reduction. For each pixel, its neighboring patch is extracted as the spatial–spectral information and used as the input of the proposed model. In order to facilitate understanding, we take the Indian Pines dataset as an example. The input sample of HybridFCTCN has a size of
The spatial–spectral feature learning module consists of a standard convolutional layer and two FCTNConv3D units. Initially, the above input sample is fed into a standard 3-D convolutional layer with a size of
Considering that spatial information is important for HSI classification, an FCTNConv2D unit is followed by the spatial–spectral feature learning module to construct a hybrid network structure for extracting more discriminative spatial features. After merging the spectral and channel dimensions from the output of the second FCTNConv3D unit, the feature is fed into the FCTNConv2D unit, where the kernel size is
Finally, the output of the former module passes sequentially to a global average pooling (GAP) layer, two FCTN-FC units, and a softmax layer for HSI classification. Particularly, the decomposed output dimensions of these two FCTN-FC units are, respectively,
In HybridFCTCN, the newly designed units take place of the original FC and convolutional layers, which greatly compresses the scale of model parameters. Note that the batch normalization (BN) and the ReLU activation function are added after each convolutional unit. The detailed parameter settings of HybridFCTCN are reported in Table I.
D. Determination of the Ranks in HybridFCTCN
Rank determination is vitally important in tensor decompositions and has a considerable effect on the classification performance and complexity of the proposed model. Before elaborating on the rank determination in HybridFCTCN, Theorem 1 needs to be introduced.
Theorem 1:
Let
Proof:
See the Appendix.
The newly designed FCTN-FC and FCTNConv units are comprised of the FCTND factors of the original weight tensors. On this basis, a novel unit rank is defined to solve the problem of rank determination.
Definition 1:
Supposing that there are a series of decomposed factors
In HybridFCTCN, there are two FCTN-FC and three FCTNConv units, whose ranks can be determined by using the defined unit rank. For the FCTN-FC units, the dimension of each mode in the first unit is
Based on the above analysis, the bounds of ranks in HybridFCTCN for the Indian Pines dataset are found, and by the same method, the ones for the University of Pavia and Houston datasets can also be obtained. The experiments to determine the specific rank in both FCTN-FC and FCTNConv units for three different datasets are conducted in Section IV-B.
Experimental Results
To verify the performance of the proposed HybridFCTCN model, SVM [61], 3D-CNN [6], SSRN [31], HybridSN [30], MANet [38], Hybrid3D-2DCNN [62], SSCL2DNN [36], and SSTTCL2DNN [55] are selected as comparative models. The evaluation indexes adopted are the overall accuracy (OA), average accuracy (AA), and Kappa coefficient (
A. Dataset Description
Three widely used HSI datasets, i.e., Indian Pines, University of Pavia, and Houston, are adopted in experiments.
1) Indian Pines:
The Indian Pines dataset was collected by the Airborne Visible Infrared Imaging Spectrometer (AVIRIS) sensor in northwest Indiana, USA. This dataset has a spatial size of
2) University of Pavia:
The University of Pavia dataset was captured in 2001 by the Reflective Optics System Imaging Spectrometer (ROSIS) sensor in northern Italy. The image of this scene has a spatial size of
3) Houston:
The Houston dataset was acquired by the Compact Airborne Spectrographic Imager (CASI) sensor over the University of Houston and its neighboring area. It is published by the IEEE Geoscience and Remote Sensing Society (GRSS) in the 2013 Data Fusion Contest (DFC). It comprises
B. Experimental Settings
In the following experiments, the HSI datasets are divided into the training and testing sets, where 10% samples in the Indian Pines and 1% in the University of Pavia datasets are randomly selected as the training set, respectively, and the rest for testing. For the Houston dataset, the given training and testing samples in the 2013 DFC are used. Tables II–IV list the detailed train-test splits for each dataset.
The parameters of the 3D-CNN, SSRN, HybridSN, MANet, Hybrid3D-2DCNN, SSCL2DNN, and SSTTCL2DNN models are consistent with the original settings to reproduce their experimental results. For a fair comparison, the input data of HybridFCTCN are reshaped into a vector as the input of SVM for HSI spatial–spectral classification.
For the proposed HybridFCTCN model, there are some hyperparameters that need to be determined, i.e., the size of the convolution kernels, the ranks in the FCTNConv and FCTN-FC units, the optimal number (
In the following experiments, the learning rate of HybridFCTCN is set to 0.001 for all 500 training epochs with the batch size set to 64. Adam is the selected optimizer, and CrossEntropy is the loss function.
C. Classification Performance
According to the above experimental settings, Tables VIII–X present a quantitative assessment of HybridFCTCN and other comparative models for the Indian Pines, University of Pavia, and Houston datasets, respectively. It can be observed that the proposed model achieves the best performance in terms of OA, AA, and
The experimental results also show that the spatial–spectral SVM is outperformed by all DL models because it is unable to make use of the spatial information. 3D-CNN can utilize the spatial–spectral information to reach good classification accuracy in the three HSI datasets, but it requires millions of parameters. In some classes containing very few training samples (e.g., Grass-pasture-mowed and Oats classes in the Indian Pines dataset), the ability of residual blocks in SSRN to learn discriminative features is not demonstrated though it obtains a good overall performance, achieving the best results for certain classes. SSTTCL2DNN introduced TTD into the convolutional layers of SSCL2DNN to compress the model, sacrificing the classification performance. Compared with MANet, HybridSN and Hybrid3D-2DCNN adopt a 2-D convolutional layer to strengthen the spatial feature learning, contributing to better performance but at the expense of a large number of parameters. The proposed model achieves excellent performance with fewer parameters for the three considered HSI scenarios. This is due to twofold reasons. On the one hand, the FC structures in the FCTN-FC and FCTNConv units boost the information flow across the feature channels, strengthening their representation ability. On the other hand, since the novel units allow more feature channels, HybridFCTCN extracts richer spatial and spectral features that contain high-level semantic information for better HSI classification. With these efficient feature extraction designs, the proposed compression model produces satisfactory results with a very small number of parameters.
Compared with 3D-CNN, which achieves good performance under different scenarios, the proposed model obtains improvements of 1.17%, 0.84%, and 2.54% in terms of OA for the Indian Pines, University of Pavia, and Houston datasets, respectively. As far as the HybridSN model is concerned, the gains in OA obtained by our model are 0.57%, 4.17%, and 11.13% for the three considered HSI datasets. The relatively wide margin between HybridFCTCN and HybridSN in the University of Pavia and Houston datasets may be caused by the presence of more heterogeneous classes in those scenes.
Some of the classification maps achieved by HybridFCTCN and other comparative models for the three considered datasets (together with the ground-truth maps of the original scenes) are displayed in Figs. 8–10. As it can be seen in these figures, HybridFCTCN provides the smoothest classification maps and is also the most similar to the ground-truth maps in the Indian Pines, University of Pavia, and Houston datasets, respectively. Other models can roughly distinguish different classes in the three datasets, but there are still some misclassifications, e.g., the Corn-notill class in the Indian Pines dataset, the Bare Soil class in the University of Pavia dataset, and the Highway class in the Houston dataset.
Classification maps for the Indian Pines dataset. (a) Ground-truth map. (b) SVM. (c) 3D-CNN. (d) SSRN. (e) HybridSN. (f) MANet. (g) Hybrid3D-2DCNN. (h) SSCL2DNN. (i) SSTTCL2DNN. (j) HybridFCTCN.
Classification maps for the University of Pavia dataset. (a) Ground-truth map. (b) SVM. (c) 3D-CNN. (d) SSRN. (e) HybridSN. (f) MANet. (g) Hybrid3D-2DCNN. (h) SSCL2DNN. (i) SSTTCL2DNN. (j) HybridFCTCN.
Classification maps for the Houston dataset. (a) Ground-truth map (b) SVM. (c) 3D-CNN. (d) SSRN. (e) HybridSN. (f) MANet. (g) Hybrid3D-2DCNN. (h) SSCL2DNN. (i) SSTTCL2DNN. (j) HybridFCTCN.
D. Comparison and Analysis Using Small Training Samples
In practice, labeling HSIs is time-consuming and challenging, which restricts the amount of labeled data available for classification. Hence, it is necessary to experimentally evaluate the robustness and generalization of all the considered models under small training samples.
Tables XI–XIII show the classification performance achieved by different models with very small training sets in the Indian Pines, University of Pavia, and Houston datasets, respectively (here, only ten samples from each labeled class are randomly selected). The experimental results illustrate that the proposed HybridFCTCN model still outperforms all other comparative models. Specifically, our model is 3.91%, 4.00%, and 1.51% more accurate than 3D-CNN in terms of OA for all three datasets. Compared to HybridSN, HybridFCTCN yields 13.47%, 16.31%, and 18.97% improvements in the Indian Pines, University of Pavia, and Houston datasets, respectively. To the best of our knowledge, this deficiency of HybridSN could be related to overfitting due to too many parameters and too few training samples. The superiority of HybridFCTCN in scenarios dominated by small training samples lies in its strong representation ability with very few parameters. On the one hand, low-rank regularization is applied to the model weights by FCTND, which reduces the noise in the data and extracts more expressive features. On the other hand, since the proposed model has few parameters, its training is less prone to overfitting, ensuring great performance.
To further validate the generalization of HybridFCTCN, 20, 30, and 40 labeled data are randomly selected from each class as training sets for the Indian Pines, University of Pavia, and Houston datasets, respectively. It should be noted that, due to the limited labeled samples in the Grass-pasture-mowed class and the Oats class in the Indian Pines dataset, the number of training samples for these two classes is uniformly set to 10 in the following experiments. Fig. 11 depicts the OA curves achieved by all models using different numbers of samples in the three datasets. It can be seen that the OA metrics of all models improve as the number of training samples increases. The proposed HybridFCTCN model always maintains the highest accuracy in all comparative models, which verifies its generalization ability in scenarios dominated by small training samples.
OA (%) obtained by the considered methods using different numbers of training samples for three HSI datasets. (a) Indian Pines. (b) University of Pavia. (c) Houston.
E. Analysis of the Number of Parameters
In addition to performance analysis, the number of parameters in the HybridFCTCN and other comparative models are further investigated. Table XIV compares the distributions of model parameters in the Indian Pines dataset, from which it is obvious that HybridFCTCN requires a significantly lower number of parameters in both convolutional and FC layers, and obtains higher classification accuracy as analyzed before.
Because of the large-sized input data, the 3D-CNN model has the largest number of parameters in the convolutional layers and HybridSN in the FC layers. Although the scale of parameters in the convolutional layers is reduced by the decreased number of channels, there are still large-sized FC layers in the MANet model. Hybrid3D-2DCNN has millions of parameters, which is mainly caused by a deeper network structure and more feature channels. Also, SSTTCL2DNN only compresses the convolution kernels of SSCL2DNN by TTD, without dealing with the FC layers. In SSRN, a GAP layer is utilized to replace the FC layer, resulting in a relatively small number of parameters.
In HybridFCTCN, both convolutional and FC layers are compressed by the tensor decomposition. The number of parameters in the convolutional layers is reduced via the novel units (i.e., FCTNConv2D and FCTNConv3D) from a few hundred thousands to several thousands, and the FCTN-FC units, combined with a GAP layer, obtain a similar effect in the FC layers. It should be noted that there is only one FC layer in SSRN, while we use two in the HybridFCTCN model to improve accuracy, which slightly increases the number of parameters in the FC layers. In general, the proposed HybridFCTCN model has a significantly small number of parameters while achieving outstanding performance in terms of HSI classification.
F. Comparison of Different Tensor Decompositions
HybridFCTCN is further compared with the models based on TTD [64] and TRD [57]. For a fair comparison, only the FCTN-FC, FCTNConv2D, and FCTNConv3D units are replaced with the corresponding TTD-based or TRD-based units. Also, the ranks of the comparative tensor decompositions are set according to the number of parameters in HybridFCTCN, i.e., 6 or 7 in TTD-based and 5 or 6 in TRD-based models.
The experimental results of the models based on different tensor decompositions are listed in Table XV. It can be seen that, with a similar number of parameters, the proposed model has the best OA values and only moderately increases the testing time. To be specific, compared to the TTD-based models, HybridFCTCN improves by 2.28% over TTD-6 and 1.69% over TTD-7 in the Indian Pines dataset. Meanwhile, as far as the TRD-based models are concerned, our model is 0.98% more accurate than that in TRD-5 and 0.69% than TRD-6. For their computational complexity, the testing times of TRD-based models are about twice that of TTD-based ones, and the time is further increased in HybridFCTCN.
The reason for the experimental results is that the FC structure in FCTND-based units is able to adequately characterize the correlation between any two factors and has transposition invariance for better accuracy but also leads to higher computational complexity. For other comparative models, the chain- and ring-like structures in the tensorized units, which only connect adjacent factors, sacrifice the classification performance. Overall, compared to other tensor decomposition models, HybridFCTCN achieves satisfactory results with acceptable testing time.
Conclusion
In this article, a new compression model called HybridFCTCN is proposed for HSI classification, achieving outstanding performance with significantly fewer parameters than other comparative models. Owing to the FC structures, three novel units with fewer parameters, i.e., FCTN-FC, FCTNConv2D, and FCTNConv3D, effectively achieve the information flow across the channels, which improves their feature extraction and classification abilities. By making use of hybrid network architecture, the proposed model combines complementary spatial–spectral and spatial information with a lower storage requirement for better HSI classification. Moreover, the rank determination in HybridFCTCN has been discussed, which facilitates the practical application of the proposed model. A series of experiments on three widely used HSI datasets demonstrate the superiority of our model, especially in scenarios dominated by small training samples.
ACKNOWLEDGMENT
The authors would like to thank the Editor-in-Chief, the Associate Editor, and anonymous reviewers for their valuable comments and suggestions. They also sincerely thank Dr. Yu-Bang Zheng from the University of Electronic Science and Technology of China for discussing rank determination on the fully connected tensor network decomposition.
Appendix
Appendix
[Proof of the Theorem 1] Without loss of generality, we assume that the \begin{align*} \mathcal {G}^{(1) }_{l_{1},r_{1,2},\ldots,r_{1,d}}=&\begin{cases} \mathcal {X}_{l_{1},\ldots,l_{d}},&\text {if} r_{1,m}\leq L_{m} (m=2,\ldots,d)\\[3pt] 0,&\text {otherwise} \end{cases}\\[3pt] \mathcal {G}^{(k)}_{r_{1,k},\ldots,l_{k},\ldots,r_{k,d}}=&\begin{cases} 1,&\text {if} r_{1,k}=l_{k}\\[3pt] & r_{m,k}=1 (m=2,\ldots,k-1)\\[3pt] & r_{k,n}=1 (n=k+1,\ldots,d)\\[3pt] 0,&\text {otherwise} \end{cases}\end{align*}
\begin{align*}&\hspace {1pc}\mathcal {X}_{l_{1},\ldots,l_{d}}=\sum _{r_{1,2}=1}^{R},\ldots, \sum _{r_{1,d}=1}^{R},\ldots,\sum _{r_{d-1,d}=1}^{R} \\&\hspace {-5pc}\qquad \quad \,\left \{{\mathcal {G}_{l_{1},r_{1,2},\ldots,r_{1,d}}^{(1) },\ldots,\mathcal {G}_{r_{1,d},\ldots,r_{d-1,d},l_{d}}^{(d)} }\right \}.\end{align*}