Loading web-font TeX/Math/Italic
Efficient Distributed Database Clustering Algorithm for Big Data Processing | IEEE Conference Publication | IEEE Xplore

Efficient Distributed Database Clustering Algorithm for Big Data Processing


Abstract:

When clustering efficient distributed database, the conventional algorithm has long time cost and low clustering accuracy. To solve the above problems, an efficient distr...Show More

Abstract:

When clustering efficient distributed database, the conventional algorithm has long time cost and low clustering accuracy. To solve the above problems, an efficient distributed database clustering algorithm for big data processing is designed. Calculating the eigenvalues of the database, and linking the efficient distributed database with similar characteristics. The cross correlation matrix is used to ensure the consistency of cluster label. To improve the performance of K-means algorithm, input the database to be clustered, output k clustering centers, and divide the clustering groups. Mapping database to clustering center, clustering low dimensional big data. Experimental results show that the proposed algorithm can reduce the running time and mean square error of data clustering, and improve the efficiency and accuracy of clustering.
Date of Conference: 29-30 May 2021
Date Added to IEEE Xplore: 12 July 2021
ISBN Information:
Conference Location: Kunming, China
References is not available for this document.

I. Introduction

Preprocessing is an important part of data mining. Its main function is to sort out big data and lay the foundation for data analysis. Literature [1] measures the degree of similarity between data by similarity measurement, uses criterion function to evaluate the quality of clustering results, and uses K-means clustering algorithm to make the distance between each data to the center of its cluster as small as possible, and the distance between different clusters as large as possible. However, this algorithm has slow clustering convergence speed, long clustering time and low accuracy [1]. In reference [2], unsupervised learning method is used to measure the similarity of data without category labels, and big data is divided into various clusters to achieve the effect of data grouping. However, the data grouping distance of this method is not standardized, and the time cost of database clustering is long and the accuracy is low [2]. To solve this problem, combined with the above theory, this paper designs an efficient distributed database clustering algorithm for big data processing, reveals the differences between big data, discovers the internal relationship of big data, and provides a reliable basis for deeper data analysis.

Select All
1.
X Cheng, Y Kawano and J M Scherpen, "A. Model Reduction of Multiagent Systems Using Dissimilarity-Based Clustering[J]", IEEE Transactions on Automatic Control, vol. 64, no. 4, pp. 1663-1670, 2019.
2.
S J Nanda, I Gulati, R Chauhan et al., "A K- Means-Galactic Swarm Optimization-Based Clustering Algorithm with Otsu's Entropy for Brain Tumor Detection[J]", Applied Artificial Intelligence, vol. 33, no. 4, pp. 152-170, 2019.
3.
A Mansouri and M S. Bouhlel, "Trust in Ad Hoc Networks: A New Model Based on Clustering Algorithm[J]", International Journal of Network Security, vol. 21, no. 3, pp. 483-493, 2019.
4.
J Ren and Y. Yang, "Multitask possibilistic and fuzzy co-clustering algorithm for clustering data with multisource features[J]", Neural Computing and Applications, vol. 32, no. 9, pp. 4785-4804, 2020.
5.
H. Zou, "Clustering Algorithm and Its Application in Data Mining[J]", Wireless Personal Communications, vol. 110, no. 1, pp. 21-30, 2020.
6.
R L Boroschek and J A Bilbao, "Interpretation of stabilization diagrams using density-based clustering algorithm[J]", Engineering Structures, vol. 178, no. 7, pp. 245-257, 2019.

Contact IEEE to Subscribe

References

References is not available for this document.