Journals & Magazines >IEEE Transactions on Pattern ... >Volume: 28 Issue: 9

Feature extraction using information-theoretic learning

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

A classification system typically consists of both a feature extractor (preprocessor) and a classifier. These two components can be trained either independently or simult...Show More

Metadata

Abstract:

A classification system typically consists of both a feature extractor (preprocessor) and a classifier. These two components can be trained either independently or simultaneously. The former option has an implementation advantage since the extractor need only be trained once for use with any classifier, whereas the latter has an advantage since it can be used to minimize classification error directly. Certain criteria, such as minimum classification error, are better suited for simultaneous training, whereas other criteria, such as mutual information, are amenable for training the feature extractor either independently or simultaneously. Herein, an information-theoretic criterion is introduced and is evaluated for training the extractor independently of the classifier. The proposed method uses nonparametric estimation of Renyi's entropy to train the extractor by maximizing an approximation of the mutual information between the class labels and the output of the feature extractor. The evaluations show that the proposed method, even though it uses independent training, performs at least as well as three feature extraction methods that train the extractor and classifier simultaneously

Published in: IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: 28, Issue: 9, September 2006)

Page(s): 1385 - 1392

Date of Publication: 24 July 2006

ISSN Information:

PubMed ID: 16929726

DOI: 10.1109/TPAMI.2006.186

Contents

1 INTRODUCTION

Feature extraction can be used as a preprocessor for applications including visualization, classification, detection, and verification. Herein, feature extraction is investigated as it applies to classification. Classification consists of associating each incoming exemplar, having features, with one of class labels. It is a supervised process, which implies that a set of exemplars are available for which the true class labels are known. The designer of a classification system does not usually know a priori which features will yield acceptable classification performance and, in theory, the classification performance is a nondecreasing function of the number of features. Hence, the designer might choose to use all available features. However, using a large number of features can be wasteful of both computational and memory resources. In addition, due to practical problems associated with training a classifier with a finite amount of data, using a large number of features can actually cause degradation of classification performance [1]. Reduction of the number of input features can be done by linear or nonlinear transformations. The use of a linear transformation to reduce the number of features is known as (linear) feature extraction or subspace projection. A constrained linear transformation can also be used. The selection of a subset of the input features is a common method of constraining the linear transformation.

References is not available for this document.

MIT Libraries

MIT Libraries

Feature extraction using information-theoretic learning

Abstract:

Metadata

Abstract:

ISSN Information:

1 INTRODUCTION

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Feature extraction using information-theoretic learning

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1 INTRODUCTION

References