I. Introduction
Classification is a widely-studied data mining task in academia and industry, which aims to classify unseen data based on the information presented by its features [1]. With the rapid development of emerging techniques, high-dimensional data become common in many real-world applications, such as job shop scheduling [2] and text classification [3]. In such data, many features (i.e., irrelevant and redundant features) are not useful for class prediction, and they not only increase the computational complexity of a learning algorithm but also degrade the classification performance due to “the curse of dimensionality” [4]. Feature selection (FS) is an effective data preprocessing technique, and is capable of solving this challenge by choosing a subset of relevant features from the original features [5].