I. Introduction
In the Information Retrieval System, text classification organizes the retrieval text sets orderly according to the scheduled requirements. It organizes the texts which are similar or related on the content in order to inquire the Information Retrieval System efficiently and accurately in the future [1]. Active information discovery is for solving the problems of users' goals and making automatic classifications of the source texts which are classified into user-related category and userunrelated category. Traditional automatic text classification methods include Naive Bayesian classification, Support Vector Machines, maximum entropy algorithm, the greatest expectations, Neural Network and rule learning algorithm and so on [2]–[4]. Traditional text classifier is applicable to integral and unstructured texts. For partial and semi-structured texts as Web pages, the treatment effects of the algorithms with good performance in traditional texts are not significant. The semistructured feature of Web texts is a difficult problem to an automatic text classifier, and also a hot spot of research[4]. Compared with the information collection which is based on the theme, Web page classification can realize another kind of classification according to the specific classification information to some extent[5], [7]. The page classification algorithm studied in chapter 4 of this paper is based on the improved Naive Bayesian classifier, which is the process of downloading Web pages, then classifying according to some training algorithms and only the demanded categories are being searched. The establishing of automatic classified information resources is to provide classification information directory for the users.