1 INTRODUCTION
Feature extraction can be used as a preprocessor for applications including visualization, classification, detection, and verification. Herein, feature extraction is investigated as it applies to classification. Classification consists of associating each incoming exemplar, having features, with one of class labels. It is a supervised process, which implies that a set of exemplars are available for which the true class labels are known. The designer of a classification system does not usually know a priori which features will yield acceptable classification performance and, in theory, the classification performance is a nondecreasing function of the number of features. Hence, the designer might choose to use all available features. However, using a large number of features can be wasteful of both computational and memory resources. In addition, due to practical problems associated with training a classifier with a finite amount of data, using a large number of features can actually cause degradation of classification performance [1]. Reduction of the number of input features can be done by linear or nonlinear transformations. The use of a linear transformation to reduce the number of features is known as (linear) feature extraction or subspace projection. A constrained linear transformation can also be used. The selection of a subset of the input features is a common method of constraining the linear transformation.