I. Introduction
The rich spectral information available in remotely sensed hyperspectral images allows for the possibility to distinguish between spectrally similar materials [1]. However, supervised classification of hyperspectral images is a very challenging task due to the generally unfavorable ratio between the (large) number of spectral bands and the (limited) number of training samples available a priori, which results in the Hughes phenomenon [2]. As shown in [3], when the number of features considered for classification is larger than a threshold, the classification accuracy starts to decrease. The application of methods originally developed for the classification of lower dimensional data sets (such as multispectral images) provides therefore poor results when applied to hyperspectral images, especially in the case of small training sets [4]. On the other hand, the collection of reliable training samples is very expensive in terms of time and finance, and the possibility to exploit large ground truth information is not common [5]. To address this issue, a dimensionality reduction step is often performed prior to the classification process, in order to bring the information in the original space (which in the case of hyperspectral data is almost empty [4]) to the right subspace which allows separating the classes by discarding information that is useless for classification purposes. Several feature extraction techniques have been proposed to reduce the dimensionality of the data prior to classification, thus mitigating the Hughes phenomenon. These methods can be unsupervised (if no a priori information is available) or supervised (if available training samples are used to project the data onto a classification-optimized subspace [6], [7]). Classic unsupervised techniques include principal component analysis (PCA) [8], the minimum noise fraction (MNF) [9], or independent component analysis (ICA) [10]. Supervised approaches comprise discriminant analysis for feature extraction (DAFE), decision boundary feature extraction (DBFE), and non-parametric weighted feature extraction (NWFE), among many others [4], [11].