1. Introduction
Information Retrieval (IR) systems are used to handle information gathered from a large amount of electronic documents. Information on a document is composed of words' semantics. Hence, an IR system actually deals with those words, which are the representatives of semantics that are truly the building blocks of intended information. Index term selection is a task of finding manually or automatically the words or collocations that are the representatives of the potential information on a particular document. Those terms are then used to represent the document in an IR system for further processing purposes. The potential information in a document is mostly represented by the noun part-of-speech (PoS) category of the word forms [3]. Thus, an index term is commonly a noun word or noun word collocation. Consequently, PoS tagging is a preliminary step for indexing collocations for linguistically motivated IR purposes [1], [9]. Vocabulary based PoS tagging effort for agglutinative languages as Turkish is problematic because of rich set of part of speech categories. The vocabulary that should be stored for PoS tagging can theoretically be an infinite set of word forms [24]. On the other hand, even for analytical languages like English, the same problem still exists in different manner resulting in the failure of closed vocabulary assumption in principal.