1 Introduction
Text clustering is widely employed for automatically structuring large document collections and enabling cluster-based information browsing, which alleviates the problem of information overflow. It is especially useful in highly distributed environments such as distributed digital libraries [1] and peer-to-peer (P2P) information management systems [2], since these environments operate on large-scale document collections, scattered over the network. Existing P2P systems also employ text clustering to enhance information retrieval efficiency and effectiveness [3], [4], [5]. Hence, a distributed clustering algorithm that scales to large networks and large text collections is required.