Conferences >2007 IEEE International Fuzzy...

Core-generating Approximate Minimum Entropy Discretization for Rough Set Feature Selection: An Experimental Investigation

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Rough set feature selection (RSFS) can be used to improve classifier performance. RSFS removes redundant attributes whilst keeping important ones that preserve the classi...Show More

Metadata

Abstract:

Rough set feature selection (RSFS) can be used to improve classifier performance. RSFS removes redundant attributes whilst keeping important ones that preserve the classification power of the original dataset. The feature subsets selected by RSFS are termed reducts. The intersection of all reducts is termed the core. As RSFS works on discrete attributes only, for real-valued datasets discretization of the real attributes is performed before RSFS. The core size of the discretized datasets is determined by the discretization process. Previous work has shown that the core size of the discretized dataset critically affects the performance of RSFS. This paper proposes a type of discretization termed core-generating approximate minimum entropy discretization (C-GAME) which selects a set of minimum entropy cuts capable of generating discrete data with nonempty cores. The paper defines C-GAME and then models it as a constraint satisfaction optimization problem which is solved using the branch and bound algorithm. Experiments have been performed on 2 datasets from the UCI database to investigate the performance of C-GAME as a pre-processing step for RSFS. Results show that, for these datasets, C-GAME outperforms both the recursive minimal entropy partition discretization method (RMEP) and the original decision trees without feature selection.

Published in: 2007 IEEE International Fuzzy Systems Conference

Date of Conference: 23-26 July 2007

Date Added to IEEE Xplore: 27 August 2007

ISBN Information:

Print ISSN: 1098-7584

DOI: 10.1109/FUZZY.2007.4295437

Conference Location: London, UK

Contents

I. INTRODUCTION

In rough set theory (RST) [1], the set of all objects with the same properties is called a concept (a rough set). However, there is incomplete knowledge of the concepts from the decision systems and they are approximated using their if upper and lower approximations. In this context, the concepts are termed rough sets. Objects of different concepts can be discerned using their corresponding attribute values. The subsets of attributes which discern the same number of objects as do all attributes are termed reducts [1]. Rough set feature selection (RSFS) aims to find the reducts with the minimal size for classifier training. RSFS works on discrete attributes only, because the indiscernibility relation works on discrete values only. For datasets with real attributes, discretization must be performed before RSFS to transform real attributes values into discrete intervals. This work is motivated by the results obtained in our previous work [9] which evaluated the effect of RSFS on the performance of decision trees. RMEP was integrated with a genetic algorithm-based RSFS approach and a decision tree classifier. 9 datasets from the UCI database [7] were used to evaluate the approach. For the datasets with high dimensionality, RMEP generated discrete datasets with empty cores; for the datasets with low dimensionality, RMEP generated discrete datasets with nonempty cores. The results suggested that the discretization process affects the performance of RSFS as follows. The core size of the discretized datasets is determined by the discretization process; if the ratio of core size to data dimensionality is close to 0, RSFS tends to have random performance and to not always improve the performance of decision trees. If the ratio of core size to data dimensionality is greater than 0.1, RSFS tends to improve the performance of decision trees [9]. Current discretization methods perform discretization without considering the core size of the discretized dataset to be produced. As a consequence, this paper proposes the core-generating approximate minimum entropy discretization (C-GAME) which selects cuts of minimum entropy values capable of generating discrete datasets with nonempty cores and proposes a modelling approach for C-GAME based on constraint satisfaction [10]. The paper is organized as follows: Section II presents the basic concepts concerning rough set theory, discretization problems and constraint satisfaction optimization problems (CSOPs); C-GAME is defined and modelled as a CSOP in Section III; Section IV investigates the performance of C-GAME on two datasets by integrating it with RSFS and decision trees; conclusions and further work are given in Section V.

References is not available for this document.

MIT Libraries

MIT Libraries

Core-generating Approximate Minimum Entropy Discretization for Rough Set Feature Selection: An Experimental Investigation

Abstract:

Metadata

Abstract:

I. INTRODUCTION

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Core-generating Approximate Minimum Entropy Discretization for Rough Set Feature Selection: An Experimental Investigation

Alerts

Abstract:

Metadata

Abstract:

I. INTRODUCTION

References