Conferences >2021 6th International Confer...

Efficient Distributed Database Clustering Algorithm for Big Data Processing

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

When clustering efficient distributed database, the conventional algorithm has long time cost and low clustering accuracy. To solve the above problems, an efficient distr...Show More

Metadata

Abstract:

When clustering efficient distributed database, the conventional algorithm has long time cost and low clustering accuracy. To solve the above problems, an efficient distributed database clustering algorithm for big data processing is designed. Calculating the eigenvalues of the database, and linking the efficient distributed database with similar characteristics. The cross correlation matrix is used to ensure the consistency of cluster label. To improve the performance of K-means algorithm, input the database to be clustered, output

$k$ clustering centers, and divide the clustering groups. Mapping database to clustering center, clustering low dimensional big data. Experimental results show that the proposed algorithm can reduce the running time and mean square error of data clustering, and improve the efficiency and accuracy of clustering.

Published in: 2021 6th International Conference on Smart Grid and Electrical Automation (ICSGEA)

Date of Conference: 29-30 May 2021

Date Added to IEEE Xplore: 12 July 2021

ISBN Information:

DOI: 10.1109/ICSGEA53208.2021.00118

Conference Location: Kunming, China

Contents

I. Introduction

Preprocessing is an important part of data mining. Its main function is to sort out big data and lay the foundation for data analysis. Literature [1] measures the degree of similarity between data by similarity measurement, uses criterion function to evaluate the quality of clustering results, and uses K-means clustering algorithm to make the distance between each data to the center of its cluster as small as possible, and the distance between different clusters as large as possible. However, this algorithm has slow clustering convergence speed, long clustering time and low accuracy [1]. In reference [2], unsupervised learning method is used to measure the similarity of data without category labels, and big data is divided into various clusters to achieve the effect of data grouping. However, the data grouping distance of this method is not standardized, and the time cost of database clustering is long and the accuracy is low [2]. To solve this problem, combined with the above theory, this paper designs an efficient distributed database clustering algorithm for big data processing, reveals the differences between big data, discovers the internal relationship of big data, and provides a reliable basis for deeper data analysis.

References is not available for this document.

Efficient Distributed Database Clustering Algorithm for Big Data Processing

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Efficient Distributed Database Clustering Algorithm for Big Data Processing

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

References