Loading [MathJax]/extensions/MathMenu.js
A Fast Density and Grid Based Clustering Method for Data With Arbitrary Shapes and Noise | IEEE Journals & Magazine | IEEE Xplore

A Fast Density and Grid Based Clustering Method for Data With Arbitrary Shapes and Noise


Abstract:

This paper presents a density- and grid- based (DGB) clustering method for categorizing data with arbitrary shapes and noise. As most of the conventional clustering appro...Show More

Abstract:

This paper presents a density- and grid- based (DGB) clustering method for categorizing data with arbitrary shapes and noise. As most of the conventional clustering approaches work only with round-shaped clusters, other methods are needed to be explored to proceed classification of clusters with arbitrary shapes. Clustering approach by fast search and find of density peaks and density-based spatial clustering of applications with noise, and so many other methods are reported to be capable of completing this task but are limited by their computation time of mutual distances between points or patterns. Without the calculation of mutual distances, this paper presents an alternative method to fulfill clustering of data with any shape and noise even faster and with more efficiency. It was successfully verified in clustering industrial data (e.g., DNA microarray data) and several benchmark datasets with different kinds of noise. It turned out that the proposed DGB clustering method is more efficient and faster in clustering datasets with any shape than the conventional methods.
Published in: IEEE Transactions on Industrial Informatics ( Volume: 13, Issue: 4, August 2017)
Page(s): 1620 - 1628
Date of Publication: 15 November 2016

ISSN Information:

Funding Agency:

References is not available for this document.

I. Introduction

An essential routine to preproceed a given industrial data is to seek its clustering structure. Many applications in industrial area using various clustering methods can be found in [1] and [2]. Clustering approaches come along with different definitions of clusters. The expectation–maximization (EM) algorithm [3] categorizes patterns into the cluster with maximum likelihood. The assumption of EM clustering algorithm is that the cluster is a combination of patterns that have most likely the same distribution. The EM algorithm fulfills this task by optimizing the distribution functions of clusters. Applications using EM are reported in [4] and [5]. The widely used K-means method [6] finds the clusters by iteratively computing the distances from patterns to the gravity centers of clusters until converge. It assumes that the patterns, which belong to the same cluster, are located around cluster's gravity center. Various applications based on K-means method can be seen in [7] and [8]. Another alternative approach is called the hierarchical clustering [9] method, which keeps the property that patterns with small distance are more related than with large distance.

Select All
1.
H. Gao, C. Ding, C. Song and J. Mei, "Automated inspection of E-shaped magnetic core elements using K-tSL-center clustering and active shape models", IEEE Trans. Ind. Informat., vol. 9, no. 3, pp. 1782-1789, Aug. 2013.
2.
D. Wijayasekara, O. Linda, M. Manic and C. Rieger, "Mining building energy management system data using fuzzy anomaly detection and linguistic descriptions", IEEE Trans. Ind. Informat., vol. 10, no. 3, pp. 1829-1840, Aug. 2014.
3.
A. P. Dempster, N. M. Laird and D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm", J. Roy. Statist. Soc. Ser. B, vol. 39, no. 1, pp. 1-38, 1977.
4.
J. Zhu, Z. Ge and Z. Song, "HMM-driven robust probabilistic principal component analyzer for dynamic process fault classification", IEEE Trans. Ind. Electron., vol. 62, no. 6, pp. 3814-3821, Jun. 2015.
5.
K. Zhang, R Gonzalez, B. Huang and G. Ji, "Expectation-maximization approach to fault diagnosis with missing data", IEEE Trans. Ind. Electron., vol. 62, no. 2, pp. 1231-1240, Feb. 2015.
6.
J. B. MacQueen, "Some methods for classification and analysis of multivariate observations", Proc. 5th Berkeley Symp. Math. Statist. Probab., pp. 281-297, 1967.
7.
C.-C. Lin, D.-J. Deng, J.-R. Kang, S.-C. Chang and C.-H. Chueh, "Forecasting rare faults of critical components in LED epitaxy plants using a hybrid grey forecasting and harmony search approach", IEEE Trans. Ind. Informat., Dec. 2015.
8.
W. Bi, M. Cai, M. Liu and G. Li, "A big data clustering algorithm for mitigating the risk of customer churn", IEEE Trans. Ind. Informat., vol. 12, no. 3, pp. 1270-1281, Jun. 2016.
9.
S. C. Johnson, "Hierarchical clustering schemes", Psychometrika, vol. 32, no. 3, pp. 241-254, Sep. 1967.
10.
M. Ester, H. Kriegel, J. Sander and X. Xu, "A density-based algorithm for discovering clusters in large spatial databases with noise", Proc. 2nd Int. Conf. Knowl. Discovery Data Mining, pp. 226-231, 1996.
11.
D. C Hernandez, Van-Dung Hoang, A. Filonenko and Kang-Hyun Jo, "Vision-based heading angle estimation for an autonomous mobile robots navigation", 2014 IEEE 23rd Int. Symp. Ind. Electron., pp. 1967-1972, Jun. 2014.
12.
A. Rodriguez and A. Laio, "Clustering by fast search and find of density peaks", Science, vol. 344, no. 6191, pp. 1492-1496, 2014.
13.
N. Beckmann, H. P. Kriegel, R. Schneider and B. Seeger, " The R * -tree: An efficient and robust access method for points and rectangles ", Proc. 1990 ACM SIGMOD Int. Conf. Manage. Data, pp. 322-331, May, 1990.
14.
W. Zhang and J. Li, "Extended fast search clustering algorithm: Widely density clusters no density peaks", Comput. Sci. Inf. Technol, 2015.
15.
E. Schikuta, "Grid clustering: An efficient hierarchical clustering method for very large data sets", Proc. 13th Int. Conf. Pattern Recog., no. 2, pp. 101-105, 1996.
16.
W. Wang, J. Yang and R. Muntz, "STING: A statistical information grid approach to spatial data mining", Proc. 23rd Very Large Data Bases Conf., pp. 186-195, 1997.
17.
R. Agrawal, J. Gehrke, D. Gunopulos and P. Raghavan, "Automatic subspace clustering of high dimensional data for data mining applications", Proc. ACM SIGMOD Int. Conf. Manage. Data, pp. 94-105, 1998.
18.
H. Li, C. Wu, X. Jing and L. Wu, "Fuzzy tracking control for nonlinear networked systems", IEEE Trans. Cybern., Sep. 2016.
19.
Q. Zhou, L. Wang, C. Wu, H. Li and H. Du, "Adaptive fuzzy control for nonstrict-feedback systems with input saturation and output constraint", IEEE Trans. Syst. Man Cybern. Syst., May 2016.
20.
L. Fu and E. Medico, "FLAME a novel fuzzy clustering method for the analysis of DNA microarray data", BMC Bioinformat., vol. 8, no. 3, 2007.
21.
G. Karypis, E.-H. Han and V. Kumar, "CHAMELEON: A hierarchical clustering algorithm using dynamic modeling", Computer, vol. 32, no. 8, pp. 68-75, Aug. 1999.
22.
R. Ibrahim, N. Ahmed, N. A. Yousri and M. A. Ismail, "Incremental mitosis: Discovering clusters of arbitrary shapes and densities in dynamic data", 11th Int. Conf. Mach. Learn. Appl., vol. 1, pp. 102-107, 2012.

Contact IEEE to Subscribe

References

References is not available for this document.