I. Introduction
The purpose of feature selection is to choose relevant and informative features such that the selected features strengthen the capability of generalization while avoiding overfitting. According to the strategy of utilizing the label information, feature selection algorithms can be classified as unsupervised [1]–[4], semisupervised [5], [6], or supervised [7]–[11]. On the other hand, feature selection methods can also be categorized as filter methods [12], wrapper methods [13], or embedded methods [14]. These categories differ in how the learning algorithm is incorporated to evaluate the features. The filter method computes a score for each feature to achieve reasonable complexity. Additionally, the filter method is classifier-independent, since the features are chosen based on intrinsic properties of the input data. The wrapper method is associated with the algorithmic performance of a specific classifier. Hence, the wrapper method can achieve better classification performance under certain feature subsets. The embedded method incorporates feature selection into an optimization problem such that preponderant classification performance can be achieved along with rational computational cost.