1 Introduction
Learning for feature selection and classification has become a focus of research in pattern recognition. Many algorithms have been developed for learning an effective and efficient classifier. For example, Linear Discriminant Analysis (LDA) is powerful tool for data reduction and feature extraction. However, LDA assumes that the two classes are both Gaussian distributed, and maximizes the distance between two class over the variation within each class. It does not work well for non-Gaussian problems.