Learning topic knowledge to improve Chinese word sense disambiguation | IEEE Conference Publication | IEEE Xplore

Learning topic knowledge to improve Chinese word sense disambiguation


Abstract:

This paper addresses an issue of incorporating topic knowledge to improve Chinese word sense disambiguation. The key is how to learn topic knowledge as features in the de...Show More

Abstract:

This paper addresses an issue of incorporating topic knowledge to improve Chinese word sense disambiguation. The key is how to learn topic knowledge as features in the design of classifiers for disambiguating word senses. This paper presents two solutions to learn topic knowledge. In the first solution, a Chinese domain knowledge dictionary named NEUKD is used to generate domain feature set. However, due to the limited coverage of the NEUKD, a constrained clustering algorithm is adopted for dictionary expansion. The second method is to build topic feature set by utilizing the Latent Dirichlet Allocation (LDA) algorithm on a large scale unlabeled corpus. Experiments on the SENSEVAL-3 Chinese dataset demonstrated that integrating topic knowledge improve the performance of Chinese word sense disambiguation.
Date of Conference: 18-19 October 2010
Date Added to IEEE Xplore: 13 December 2010
ISBN Information:
Conference Location: Beijing, China
References is not available for this document.

I. Introduction

The goal of word sense disambiguation (WSD) is to assign an appropriate sense to an ambiguous word within a given context. A variety of techniques for supervised WSD have demonstrated reasonable performance, such as exemplar-based learning [1], decision list [2], maximum entropy model [3], Naive Bayes model [4] [5]. Among these supervised approaches, the sense ambiguity of words is resolved with the help of the contexts of their occurrences. Two types of features local collocation features (LCF) and topical contextual features (TCF) are commonly used in WSD studies to represent the contexts [4] [6], such as local words or part-of-speech (POS) tags with position information, bi-gram templates, collocations, and syntactic features. LCF and TCF generally take morphological or syntactic information into account.

Select All
1.
H. T. Ng and H. B. Lee, "Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-based Approach", Proc. of ACL, pp. 40-47, 1996.
2.
D. Yarowsky, "Hierarchical Decision Lists for Word Sense Disambiguation", Computer and the Humanities, vol. 34, no. #;1, pp. 179-186, 2000.
3.
H. T. Dang, C. Y. Chia, M. Palmer and F. D. Chiou, "Simple Features for Chinese Word Sense Disambiguation", Proc. of Coling, 2002.
4.
W-Y. Li, Q. Liu and W-J Li, "Integrating Collocation Features in Chinese Word Sense Disambiguation", Proc. of SIGHAN4, 2005.
5.
Y-T Zhang, L. Gong and Y-C Wang, "Chinese Word Sense Disambiguation using Hownet", Proc. of ICNC LNCS 3610, pp. 925-932, 2005.
6.
Z. Y. Niu, D. H. Ji and C. L. Tan, "Optimizing Features Set for Chinese Word Sense Disambiguation", Proc. of SENSEVAL-3, 2004.
7.
J-B Zhu and T-S Yao, "FIFA-based Text Classification", Journal of Chinese Information Processing, vol. 16, no. 3, 2002.
8.
D. M. Blei, A. Y. Ng and M. I. Jordan, "Latent Dirichlet Allocation", Journal of Machine Learning Research, 2003.
9.
L. Berger Adam, Vincent J. Della Pietra and Stephen A. Della Pietra, "A maximum entropy approach to natural language processing", Computational Linguistics, vol. 22, no. 1, pp. 39-71, 1996.
10.
A. McCallum and K. Nigam, "A comparison of event models for naive bayes text classification", AAAI-98 Workshop on Learning for Text Categorization, 1998.
11.
Lee Yoong Keok and Hwee Tou Ng, "An empirical evaluation of knowledge sources and learning algorithm for word sense disambiguation", Proceedings of the ACL-02 conference on Empirical methods in natural language processing, pp. 41-48, 2002.
12.
S. Kullback, "Information theory and statistics" in , New York:John Wiley and Sons, 1959.
13.
J. F. Cai, W. S. Lee and Y. W. The, "Improving word sense disambiguation using topic features", Proc. of EMNLP2007, pp. 1015-1023, 2007.
14.
J-B Zhu and W-L Chen, "Some Studies on Chinese Domain Knowledge Dictionary and Its Application to Text Classification", Proc. of SIGHAN4, 2005.
15.
T. L. Griffiths and Mark Steyvers, "Finding scientific topics", Proc Natl Acad Sci USA, vol. 101, no. #;1, pp. 5228-5235, 2004.

Contact IEEE to Subscribe

References

References is not available for this document.