I. Introduction
The Web is a vast collection of completely uncontrolled heterogeneous documents. Existing search engines provide assistance to users in locating the relevant information they need from the Web. Over the last decade, the WWW has become a major source of information. Although English is still the dominant language on the Web, information in other languages is steadily gaining prominence. The use of languages other than English has been growing exponentially on the Web. Data provided in [6] indicate that in June 2007 the percentage of English language users on Internet was only 31.2% as compared to 45% in June 2001 [2]. This figure implies that there are a great number of non-English language contents on the Web, which in tum shows the need for more attention to non-English documents on the Web.