Loading [MathJax]/extensions/MathZoom.js
Incorporating local word relationships into probabilistic topic models | IEEE Conference Publication | IEEE Xplore

Incorporating local word relationships into probabilistic topic models


Abstract:

Probabilistic topic models have been very popular in automatic text analysis since introduction. As a dimensionality reduction method, they are similar to term clustering...Show More

Abstract:

Probabilistic topic models have been very popular in automatic text analysis since introduction. As a dimensionality reduction method, they are similar to term clustering methods. These models work based on word co-occurrence but are not very flexible with context in which co-occurrence is defined. Probabilistic topic models do not let us to bring local or spatial data into account and therefore their performance is poor when it comes to short documents or applications that are bound to local data. Despite their generally better performance compared to term clustering methods, probabilistic topic models do not benefit from one of the key features of term clustering methods; flexibility in defining context in which co-occurrence is calculated. In this paper we introduce a perspective to look at probabilistic topic models which can lead to more flexible models and a model which according to the perspective has the mentioned flexibility.
Date of Conference: 26-28 May 2015
Date Added to IEEE Xplore: 05 October 2015
ISBN Information:
Conference Location: Urmia, Iran

I. Introduction

Nowadays we are faced with a vast amount of online digitalized information. As the amount continues to grow, it becomes more and more difficult to find what we are looking for but, it will be way more facile if we could look for our needed information by exploring based on thematic data instead of raw data. Probabilistic topic modeling introduces methods which can extract thematic structure of documents. The basic idea of these methods is that a document is a mixture of latent topics and each topic is a distribution over words. Suppose we have documents where each document consists of words and such that there are topics and unique words . The topic assigned to each word is denoted by • Based on this view we can approach the problem of extracting topics of a corpus like this: each topic is a distribution over words where the words are exchangeable i.e. each document is a bag of words. Documents are also exchangeable. Each word in each document is extracted from the distribution of its assigned topic. For each document there is a distribution over topics which shows how the topics have been mixed to produce the document. Then there are two parameters in model; distribution of words in topics and distribution of topics in documents .

Contact IEEE to Subscribe

References

References is not available for this document.