Conferences >SMC'03 Conference Proceedings...

Cluster validity analysis using subsampling

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Cluster validity investigates whether generated clusters are true clusters or due to chance. This is usually done based on subsampling stability analysis. Related to this...Show More

Metadata

Abstract:

Cluster validity investigates whether generated clusters are true clusters or due to chance. This is usually done based on subsampling stability analysis. Related to this problem is estimating true number of clusters in a given dataset. There are a number of methods described in the literature to handle both purposes. In this paper, we propose three methods for estimating confidence in the validity of clustering result. The first method validates clustering result by employing supervised classifiers. The dataset is divided into training and test sets and the accuracy of the classifier is evaluated on the test set. This method computes confidence in the generalization capability of clustering. The second method is based on the fact that if a clustering is valid then each of its subsets should be valid as well. The third method is similar to second method; it takes the dual approach, i.e., each cluster is expected to be stable and compact. Confidence is estimated by repeating the process a number of times on subsamples. Experimental results illustrate effectiveness of the proposed methods.

Published in: SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483)

Date of Conference: 08-08 October 2003

Date Added to IEEE Xplore: 17 November 2003

Print ISBN:0-7803-7952-7

Print ISSN: 1062-922X

DOI: 10.1109/ICSMC.2003.1244614

Conference Location: Washington, DC, USA

Citations are not available for this document.

Contents

1 Introduction

The word “clustering” (unsupervised classification) refers to methods of grouping objects based on some similarity measure between them. Clustering algorithms can be classified into four classes, namely Partitional, Hierarchical, Density-based and Grid-based [8]. Each of these classes has subclasses and different corresponding approaches, e.g., conceptual, fuzzy, self-organizing maps etc. The clustering task can be divided into the following five steps, (the last two are optional) [9]: 1) Pattern representation; 2) Pattern proximity measure definition; 3) Clustering; 4) Data abstraction; and 5) Cluster validity analysis.

Getting results...

References is not available for this document.

MIT Libraries

MIT Libraries

Cluster validity analysis using subsampling

Abstract:

Metadata

Abstract:

1 Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Cluster validity analysis using subsampling

Alerts

Abstract:

Metadata

Abstract:

1 Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?