Conferences >2009 WASE International Confe...

A Novel Method of Chinese Web Information Extraction and Applications

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

One promising application of natural language processing (NLP) research is in the area of information extraction (IE). In this paper, we present work flow of our IE syste...Show More

Metadata

Abstract:

One promising application of natural language processing (NLP) research is in the area of information extraction (IE). In this paper, we present work flow of our IE system for the extraction of semantically rich information from the unstructured or semi-structured Chinese web pages. Knowledge engineering approach and automatic training approach are used to extract pattern and built knowledge repository. General IE system needs to label the unlabeled training Web pages. A novel methodology that does not need to label text is developed, including hierarchy filtration pattern matching based on syntax in best distance method and maximum forward boundary recognition using organization suffix repository and part of speech tagging method. As for applications of IE, a new application system based on IE is built. It is object-level vertical search system and object here is Chinese people, so IE is concerned with extracting people's related attributes from a collection of web pages about Chinese people. The results are displayed as hierarchy directory tree according to people's attributes. The system makes user find people quickly and easily.

Published in: 2009 WASE International Conference on Information Engineering

Date of Conference: 10-11 July 2009

Date Added to IEEE Xplore: 21 August 2009

Print ISBN:978-0-7695-3679-8

DOI: 10.1109/ICIE.2009.43

Conference Location: Taiyuan, China

Citations are not available for this document.

Contents

I. Introduction

The World Wide Web has become the major tool and channel to distribute, exchange and access information, and it can be viewed as a gigantic distributed database in constant growth at an incredible speed, including millions of interconnected hosts some of which publish information via web servers or peer-to-peer systems. On the other hand, people can gain access to the information they are interested in thanks to search engines providing keyword matching and thematic filters. Despite its success, the web looks increasingly more like a ‘black hole’ from where the information is difficult to retrieve. The user has to browse the retrieved (relevant or not) documents one by one, looking for the desired data. This is because the tools currently available are not suited to the extraction of cognitive knowledge. With such a powerful, dynamic and large-scale database at everybody's fingertips, tools allowing automatic and intelligent scans of the web are needed. One would expect to extract knowledge, which is aggregate information, from the web rather than raw documents requiring human interpretation.

Cites in Papers - |

Cites in Papers - IEEE (1)

Select All

Fan Yang, Qinru Qiu, Morgan Bishop, Qing Wu, "Tag-assisted sentence confabulation for intelligent text recognition", 2012 IEEE Symposium on Computational Intelligence for Security and Defence Applications, pp.1-7, 2012.

Show Article

Google Scholar

Cites in Papers - Other Publishers (1)

Xia Xie, Yu Fu, Hai Jin, Yaliang Zhao, Wenzhi Cao, "A novel text mining approach for scholar information extraction from web content in Chinese", Future Generation Computer Systems, vol.111, pp.859, 2020.

CrossRef Google Scholar

References is not available for this document.

A Novel Method of Chinese Web Information Extraction and Applications

Abstract:

Metadata

Abstract:

I. Introduction

Cites in Papers - |

Cites in Papers - IEEE (1)

Cites in Papers - Other Publishers (1)

References

IEEE Account

Purchase Details

Profile Information

Need Help?

A Novel Method of Chinese Web Information Extraction and Applications

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

Cites in Papers - IEEE (1) | Other Publishers (1)

Cites in Papers - IEEE (1)

Cites in Papers - Other Publishers (1)

References

Cites in Papers - |