I. Introduction
Data in real world applications usually have huge amount and high dimensionality. These data often contain irrelevant, noisy and redundant features. These features present great challenging for data storage, computation and the curse of dimensionality. To remedy these limitations, feature selection techniques have been developed in the last decayed to keep only a few relevant and informative features. Given the selected features, the subsequent learning process could be accelerated, and even the generalization ability could be improved.