I. Introduction
The World Wide Web has become the major tool and channel to distribute, exchange and access information, and it can be viewed as a gigantic distributed database in constant growth at an incredible speed, including millions of interconnected hosts some of which publish information via web servers or peer-to-peer systems. On the other hand, people can gain access to the information they are interested in thanks to search engines providing keyword matching and thematic filters. Despite its success, the web looks increasingly more like a ‘black hole’ from where the information is difficult to retrieve. The user has to browse the retrieved (relevant or not) documents one by one, looking for the desired data. This is because the tools currently available are not suited to the extraction of cognitive knowledge. With such a powerful, dynamic and large-scale database at everybody's fingertips, tools allowing automatic and intelligent scans of the web are needed. One would expect to extract knowledge, which is aggregate information, from the web rather than raw documents requiring human interpretation.