I. Introduction
In MANY classification tasks, datasets such as gene microarray data [1] and text data [2], are usually described by hundreds or even thousands of features. Fortunately, as pointed out by many researchers [3], [4], a large amount of features is either irrelevant or redundant with respect to the target concept. How to find or rank all potentially relevant features has become a challenging task in machine learning or pattern recognition. Feature selection has been very effectively used to discover previously unknown and potentially useful features in data classification [5]–[7]. It has proven that feature selection can overcome the curse of dimensionality, speed up the computation time, improve the predictive performance, and gain a better understanding of the data.