Conferences >2013 IEEE 13th International ...

Pattern-Based Topic Models for Information Filtering

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Topic modelling, such as Latent Dirichlet Allocation (LDA), was proposed to generate statistical models to represent multiple topics in a collection of documents, which h...Show More

Metadata

Abstract:

Topic modelling, such as Latent Dirichlet Allocation (LDA), was proposed to generate statistical models to represent multiple topics in a collection of documents, which has been widely utilized in the fields of machine learning and information retrieval, etc. But its effectiveness in information filtering is rarely known. Patterns are always thought to be more representative than single terms for representing documents. In this paper, a novel information filtering model, Pattern-based Topic Model (PBTM), is proposed to represent the text documents not only using the topic distributions at general level but also using semantic pattern representations at detailed specific level, both of which contribute to the accurate document representation and document relevance ranking. Extensive experiments are conducted to evaluate the effectiveness of PBTM by using the TREC data collection Reuters Corpus Volume 1. The results show that the proposed model achieves outstanding performance.

Published in: 2013 IEEE 13th International Conference on Data Mining Workshops

Date of Conference: 07-10 December 2013

Date Added to IEEE Xplore: 06 March 2014

ISBN Information:

ISSN Information:

DOI: 10.1109/ICDMW.2013.30

Conference Location: Dallas, TX, USA

Contents

I. Introduction

Information filtering (IF) is a system to remove redundant or unwanted information from an information or document stream based on document representations which represent users' interest. Traditional IF models were developed based on a term-based approach, whose advantage is efficient computational performance, as well as mature theories for term weighting, like Rocchio, BM25, et al [1], [2]. But term-based document representation suffers from the problems of polysemy and synonymy. To overcome the limitations of term-based approaches, pattern mining based techniques have been used for information filtering and achieved some improvements on effectiveness [3], [4], since patterns carry more semantic meaning than terms. Also, data mining has developed some techniques (i.e., maximal patterns, closed patterns and master patterns) for removing the redundant and noisy patterns [5], , , [8]. One of the promising techniques is Pattern Taxonomy Model (PTM) [9] that discovered closed sequential patterns in text classification. It shows a certain extent improvement on effectiveness, but still faces one challenging issue which is low frequency of the patterns appearing in documents. In order to solve this problem, Wu et.al [10], [11] proposed deploying pattern approach to weight terms by calculating their appearance in discovered patterns.

References is not available for this document.

Pattern-Based Topic Models for Information Filtering

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Pattern-Based Topic Models for Information Filtering

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?