I. Introduction
Nowadays, due to the advent of technology, a high volume of text data has been generating through web pages, social media, and other sources so that it is very time and energy consuming to categorize data by humans [1]. Text data is an unstructured data which can be analyzed using text mining methods [2]. Text mining framework, as it is mentioned in [3], has three sequential steps: text preprocessing, text representation, and knowledge discovery. According to [3], classification task is a technique of knowledge discovery and feature extraction methods are included in text representation. Text classification, more specifically, document classification is a supervised task in which the classifier is trained with some pre-categorized documents; Then, it will be expected that the classifier assigns an unseen document to one of the existing categories. There are many classification algorithms for text data including Support Vector Machines (SVM), Naïve Bayes [4], Logistic Regression, K-Nearest Neighbors (KNN) [5], and Neural Networks models [6] [7]. Furthermore, ensemble classifiers, made up of several classifiers, are another new technique for text classification. Gradient Boosting is a popular ensemble classifier for text classification [8]. Feature extraction is a crucial step which should be carried out before the classification task. It can enhance the prediction accuracy of the classification task through finding more discriminant representations or via dimensionality reduction techniques [9]. Although a lot of research has been done on English document classification, the developed methods might not necessarily perform well for Persian document classification [10]. In what follows, we will review some previous works on feature extraction methods and Persian text classification tasks. TF-IDF is a commonly used feature extraction method. This method calculates the importance of a word in the whole dataset or corpus [11]. In reference [1], Farhoodi and Yari applied TF-IDF technique to the Hamshahri dataset to realize which one of SVM or KNN is more efficient in Persian document classification task. They reported that the KNN algorithm outperforms SVM classifier. This efficiency can be improved by increasing the number of selected features and using cosine similarity measure. In [10], researchers classified Hamshahri news dataset using TF-IDF as the feature extraction method in the presence of entropy instead of stop word lists for removing stop words in the preprocessing step. They applied KNN and Nave Bayes classifiers to examine the accuracy of their method. In reference [12], TF and TF-IDF methods are used as the feature extraction for classifying Irna News Website. They operated Gaussian Naïve Bayes, Multinomial Naïve Bayes, Bernoulli Naïve Bayes, and SVM classifiers to report the most accurate algorithm for Persian text classification. They reported that Multinomial Naïve Bayes classifier is the most accurate algorithm with micro F1-score 0.838530 using TF-IDF method and removing stop words. In reference [13], researchers proposed an ensemble classifier consisting of SVM, KNN, and MLP classifiers to categorize two datasets; "Routers" and "Hamshahri". They also benefited from TF-IDF method as the feature extraction method. This ensemble classifier resulted in better accuracy and efficiency for both datasets in comparison with SVM, KNN, and MLP, individually. Jafari, Ezadi, Hossennejad, and Noohi [14] also worked on four categories of new Hamshahri dataset. They used representative vector to explore its impact on the accuracy of Persian document classification. They realized that high value of precision and recall can be achieved through removing more extra words and inserting a few words into the representative vector. In [15], topic models are utilized for Persian text classification and it is concluded that the use of topic models can lead to accuracy improvement with respect to the bag of words-based algorithms such as TF-IDF. Although the TF-IDF method is very common, it suffers from the lack of semantic relations between the words. As a result, another interesting method of feature extraction presented to solve this problem. It is the Word2Vec model introduced by Tomas Mikolov; et al. (2013) [16]. Researchers in reference [17] introduced a novel combined method benefited from Word2Vec and Latent Dirichlet Allocation (LDA) to extract the features of documents. This model considers both the relation between words and documents, and the relation between topics and documents. They tested their model with 20Newsgroups dataset [18] and SVM classifier. They concluded that their model outperforms TF-IDF, Word2Vec, and LDA methods.