I. Introduction
Hyperspectral imaging technology offers wealthy information of the Earth observation by sampling hundreds of spectral bands, which makes an accurate classification of land covers possible [1]. However, the increasing spectrum resolution produces a large number of spectral bands, which not only increases the computational complexity but also degrades the classification accuracy, particularly when a few labeled samples are available [2], [3]. As a result, it is advantageous to reduce the dimensionality of hyperspectral vector in the classification without dropping significant information [4]– [6], i.e., dimensionality reduction (DR). In the past several decades, extensive work has been done on DR of high-dimensionality data, including: 1) distance-based methods [7]; 2) separability-based methods [8]; and 3) margin-based methods [9]. A typical distance-based method is principal component analysis (PCA) [7], which does not utilize labeled data and thus often has lower classification accuracy than supervised DR approaches. Linear discriminant analysis [8] is a separability-based method, but its performance degrades remarkably when there are small number of labeled samples. Margin-based methods define margins to characterize the discriminant ability of features; hence, the feature extraction process is classification oriented.