Loading [MathJax]/extensions/MathZoom.js
Pattern-Based Topic Models for Information Filtering | IEEE Conference Publication | IEEE Xplore

Pattern-Based Topic Models for Information Filtering


Abstract:

Topic modelling, such as Latent Dirichlet Allocation (LDA), was proposed to generate statistical models to represent multiple topics in a collection of documents, which h...Show More

Abstract:

Topic modelling, such as Latent Dirichlet Allocation (LDA), was proposed to generate statistical models to represent multiple topics in a collection of documents, which has been widely utilized in the fields of machine learning and information retrieval, etc. But its effectiveness in information filtering is rarely known. Patterns are always thought to be more representative than single terms for representing documents. In this paper, a novel information filtering model, Pattern-based Topic Model (PBTM), is proposed to represent the text documents not only using the topic distributions at general level but also using semantic pattern representations at detailed specific level, both of which contribute to the accurate document representation and document relevance ranking. Extensive experiments are conducted to evaluate the effectiveness of PBTM by using the TREC data collection Reuters Corpus Volume 1. The results show that the proposed model achieves outstanding performance.
Date of Conference: 07-10 December 2013
Date Added to IEEE Xplore: 06 March 2014
ISBN Information:

ISSN Information:

Conference Location: Dallas, TX, USA

I. Introduction

Information filtering (IF) is a system to remove redundant or unwanted information from an information or document stream based on document representations which represent users' interest. Traditional IF models were developed based on a term-based approach, whose advantage is efficient computational performance, as well as mature theories for term weighting, like Rocchio, BM25, et al [1], [2]. But term-based document representation suffers from the problems of polysemy and synonymy. To overcome the limitations of term-based approaches, pattern mining based techniques have been used for information filtering and achieved some improvements on effectiveness [3], [4], since patterns carry more semantic meaning than terms. Also, data mining has developed some techniques (i.e., maximal patterns, closed patterns and master patterns) for removing the redundant and noisy patterns [5], , , [8]. One of the promising techniques is Pattern Taxonomy Model (PTM) [9] that discovered closed sequential patterns in text classification. It shows a certain extent improvement on effectiveness, but still faces one challenging issue which is low frequency of the patterns appearing in documents. In order to solve this problem, Wu et.al [10], [11] proposed deploying pattern approach to weight terms by calculating their appearance in discovered patterns.

Contact IEEE to Subscribe

References

References is not available for this document.