Loading web-font TeX/Math/Italic
Efficient Distributed Database Clustering Algorithm for Big Data Processing | IEEE Conference Publication | IEEE Xplore

Efficient Distributed Database Clustering Algorithm for Big Data Processing


Abstract:

When clustering efficient distributed database, the conventional algorithm has long time cost and low clustering accuracy. To solve the above problems, an efficient distr...Show More

Abstract:

When clustering efficient distributed database, the conventional algorithm has long time cost and low clustering accuracy. To solve the above problems, an efficient distributed database clustering algorithm for big data processing is designed. Calculating the eigenvalues of the database, and linking the efficient distributed database with similar characteristics. The cross correlation matrix is used to ensure the consistency of cluster label. To improve the performance of K-means algorithm, input the database to be clustered, output k clustering centers, and divide the clustering groups. Mapping database to clustering center, clustering low dimensional big data. Experimental results show that the proposed algorithm can reduce the running time and mean square error of data clustering, and improve the efficiency and accuracy of clustering.
Date of Conference: 29-30 May 2021
Date Added to IEEE Xplore: 12 July 2021
ISBN Information:
Conference Location: Kunming, China

I. Introduction

Preprocessing is an important part of data mining. Its main function is to sort out big data and lay the foundation for data analysis. Literature [1] measures the degree of similarity between data by similarity measurement, uses criterion function to evaluate the quality of clustering results, and uses K-means clustering algorithm to make the distance between each data to the center of its cluster as small as possible, and the distance between different clusters as large as possible. However, this algorithm has slow clustering convergence speed, long clustering time and low accuracy [1]. In reference [2], unsupervised learning method is used to measure the similarity of data without category labels, and big data is divided into various clusters to achieve the effect of data grouping. However, the data grouping distance of this method is not standardized, and the time cost of database clustering is long and the accuracy is low [2]. To solve this problem, combined with the above theory, this paper designs an efficient distributed database clustering algorithm for big data processing, reveals the differences between big data, discovers the internal relationship of big data, and provides a reliable basis for deeper data analysis.

Contact IEEE to Subscribe

References

References is not available for this document.