I. Introduction
Clustering is a kind of unsupervised learning. Document clustering is the operation of grouping similar documents together into subsets or clusters. The goal is to create clusters that are coherent internally, but clearly different from each other. In other words, each cluster consists of documents that are similar between themselves and dissimilar to documents of other clusters (Documents within a cluster should be as similar as possible; and documents in one cluster should be as dissimilar as possible from documents in other clusters.). The underlying cluster hypothesis is stated as follows: Documents in the same cluster behave similarly with respect to relevance to information needs [1]. The hypothesis states that if there is a document from a cluster that is relevant to a search request, and then it is likely that other documents from the same cluster are also relevant to it. The core problem in document clustering is the measures of similarity between documents. Compared with common data clustering tasks, document clustering presents unique challenges due to the large and unfixed number of features included in the document data set.