I. Introduction
In pattern classification, a pattern consists of a set of features or attributes and the task is to classify unseen patterns into predefined categories using a classifier trained on a set of known patterns [17]. However, only some of these features are relevant to classification, whilst others are redundant, containing no information on the classification of patterns. In such circumstances, a classifier trained from the original patterns has a poor performance [17]. Another disadvantage is that a large number of features increases the training and prediction time of the classifier [17]. Feature selection removes the redundant features only from the original dataset while keeping all the important features [17]. Thus, classifiers with simpler structure and higher accuracy than the classifiers trained on the original dataset, can be obtained. 2 main feature selection approaches are the filter and the wrapper approaches [17]. The filter approach selects features based on the property of the data only as a pre-processing step of classifier training. The wrapper method selects features based on the performance of the classifier trained on the reduced datasets using the features.