Loading [MathJax]/extensions/MathMenu.js
Dynamic density-based clustering algorithm over uncertain data streams | IEEE Conference Publication | IEEE Xplore

Dynamic density-based clustering algorithm over uncertain data streams


Abstract:

In recent years, the uncertain data stream which is related in many real applications attracts more and more attention of researchers. As one aspect of uncertain characte...Show More

Abstract:

In recent years, the uncertain data stream which is related in many real applications attracts more and more attention of researchers. As one aspect of uncertain character, existence-uncertainty can affect the clustering process and results significantly. The lately reported clustering algorithms are all based on K-Means algorithm with the inhere shortage. DCUStream algorithm which is density-based clustering algorithm over uncertain data stream is proposed in this paper. It can find arbitrary shaped clusters with less time cost in high dimension data stream. In the meantime, a dynamic density threshold is designed to accommodate the changing density of grids with time in data stream. The experiment results show that DCUStream algorithm can acquire more accurate clustering result and execute the clustering process more efficiently on progressing uncertain data stream.
Date of Conference: 29-31 May 2012
Date Added to IEEE Xplore: 09 July 2012
ISBN Information:
Conference Location: Chongqing, China

I. Introduction

With the development of data acquisition and data processing technology, the data uncertainty is realized widespread. More and more attention transfers from certain data to uncertain data in the domain of data processing technology[1], [2]. Uncertain data analysis and data mining technology has become a new research hotspot. For the effect of physical device and external environment and other subjective or objective factors, the perception layer of Things Internet which mainly consists with radio frequency identification (RFID) and wireless sensor network (WSN) obtains uncertain data stream with these characters. (1) Internal factors: with the impact of its volume, energy, cost and others, it is difficult to guarantee the accuracy of data collected by the sensor nodes. At the same time, RFID reader often misread, adding read and neglecting read in practice with the high error rate as 30–40%[3]; (2) External factors: the complex and changeable work environment makes the physical device cannot work stably, the acquired data precision is decreased, such as wireless sensor network data transmission can be affected by bandwidth, delay, energy, external magnetic field and other interference factors; (3) Pretreatment factors: wireless sensor network cluster nodes are designed to have limited function of pretreatment, such as data integration, heterogeneous data fusion, interpolation processing, which introduces new uncertain factors; (4) Privacy protection factors: some applications with privacy purposes process data through a series of methods. It is unable to obtain the accurate details of the original data. The uncertain character caused by these factors seriously influence on data processing and analysis process, even make the results be not acceptable.

Contact IEEE to Subscribe

References

References is not available for this document.