Journals & Magazines >IEEE Transactions on Knowledg... >Volume: 24 Issue: 10

Decentralized Probabilistic Text Clustering

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Text clustering is an established technique for improving quality in information retrieval, for both centralized and distributed environments. However, traditional text c...Show More

Metadata

Abstract:

Text clustering is an established technique for improving quality in information retrieval, for both centralized and distributed environments. However, traditional text clustering algorithms fail to scale on highly distributed environments, such as peer-to-peer networks. Our algorithm for peer-to-peer clustering achieves high scalability by using a probabilistic approach for assigning documents to clusters. It enables a peer to compare each of its documents only with very few selected clusters, without significant loss of clustering quality. The algorithm offers probabilistic guarantees for the correctness of each document assignment to a cluster. Extensive experimental evaluation with up to 1 million peers and 1 million documents demonstrates the scalability and effectiveness of the algorithm.

Published in: IEEE Transactions on Knowledge and Data Engineering ( Volume: 24, Issue: 10, October 2012)

Page(s): 1848 - 1861

Date of Publication: 09 June 2011

ISSN Information:

DOI: 10.1109/TKDE.2011.120

Contents

1 Introduction

Text clustering is widely employed for automatically structuring large document collections and enabling cluster-based information browsing, which alleviates the problem of information overflow. It is especially useful in highly distributed environments such as distributed digital libraries [1] and peer-to-peer (P2P) information management systems [2], since these environments operate on large-scale document collections, scattered over the network. Existing P2P systems also employ text clustering to enhance information retrieval efficiency and effectiveness [3], [4], [5]. Hence, a distributed clustering algorithm that scales to large networks and large text collections is required.

References is not available for this document.

Decentralized Probabilistic Text Clustering

Abstract:

Metadata

Abstract:

ISSN Information:

1 Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Decentralized Probabilistic Text Clustering

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1 Introduction

References