I. Introduction
Clustering is a data mining task which aims to group similar objects based on a dissimilarity measure. One of the many uses of clustering is textual data analysis using document clustering, which plays an important role in document retrieval, web search and spam filtering [9]. Many methods for clustering documents have been proposed, most of them using a term-frequency inverse-document-frequency matrix (TF-IDF matrix) to represent a corpus on which a chosen clustering method is applied [3]. However, the TF-IDF model has several drawbacks: it does not consider the semantic similarities between words, neither the word order and produces a high dimensional representation which often must be reduced using Principal Component Analysis or similar techniques [9]. The Paragraph Vector or Document to Vector (Doc2Vec) model [9] overcomes these disadvantages by representing words as n-dimensional vectors learnt using the word context.