Journals & Magazines >IEEE/ACM Transactions on Audi... >Volume: 26 Issue: 2

Bayesian Nonparametric Learning for Hierarchical and Sparse Topics

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

This paper presents the Bayesian nonparametric (BNP) learning for hierarchical and sparse topics from natural language. Traditionally, the Indian buffet process provides ...Show More

Metadata

Abstract:

This paper presents the Bayesian nonparametric (BNP) learning for hierarchical and sparse topics from natural language. Traditionally, the Indian buffet process provides the BNP prior on a binary matrix for an infinite latent feature model consisting of a flat layer of topics. The nested model paves an avenue to construct a tree model instead of a flat-layer model. This paper presents the nested Indian buffet process (nIBP) to achieve the sparsity and flexibility in topic model where the model complexity and topic hierarchy are learned from the groups of words. The mixed membership modeling is conducted by representing a document using the tree nodes or dishes that a document or a customer chooses according to the nIBP scenario. A tree stick-breaking process is implemented to select topic weights from a subtree for flexible topic modeling. Such an nIBP relaxes the constraint of adopting a single tree path in the nested Chinese restaurant process (nCRP) and, therefore, improves the variety of topic representation for heterogeneous documents. A Gibbs sampling procedure is developed to infer the nIBP topic model. Compared to the nested hierarchical Dirichlet process (nhDP), the compactness of the estimated topics in a tree using nIBP is improved. Experimental results show that the proposed nIBP reduces the error rate of nCRP and nhDP by 18% and 8% on Reuters task for document classification, respectively.

Published in: IEEE/ACM Transactions on Audio, Speech, and Language Processing ( Volume: 26, Issue: 2, February 2018)

Page(s): 422 - 435

Date of Publication: 04 December 2017

ISSN Information:

DOI: 10.1109/TASLP.2017.2779862

Funding Agency:

Contents

I. Introduction

Unsupervised learning has a broad goal of extracting salient features and discovering structural information from the collected data. It is challenging to extract reliable features and their latent structure from abundant heterogeneous documents which are prone to be redundant, noisy, ambiguous, mismatched and ill-posed. We aim to construct a flexible latent variable model to meet the heterogeneous conditions and annotate the observed documents for prediction of future documents. In the past decade, the unsupervised learning via probabilistic topic model [1], [2] has been successfully developed for document categorization [3], collaborative filtering, topic discovery [2], audio classification [4] , speech recognition [5], document summarization [6] and other natural language systems [7] . The latent features or semantic topics are learned from a bag of words which are semantically similar across different documents. Conventionally, the parametric topic model based on latent Dirichlet allocation (LDA) [3] was constructed for modeling a set of documents . Each vocabulary word is represented by an associated parameter which is driven by a topic label using the multinomial parameters . The multinomial parameters of topics in document are represented by a Dirichlet distribution with parameters . Graphical representation of LDA is depicted in Fig. 1(a) . LDA is known as a finite-dimensional mixture representation for documents which assumes that 1)

number of topics is fixed, Fig. 1.

Graphical representation for (a) LDA, (b) hDP and (c) hDP topic model where word, document and topic are indexed by , and with the plates shown in blue, yellow and green, respectively.

topics within or across documents are independent,

all topics are used to represent a target document even though some of them are irrelevant to the document.

References is not available for this document.

MIT Libraries

MIT Libraries

Bayesian Nonparametric Learning for Hierarchical and Sparse Topics

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Bayesian Nonparametric Learning for Hierarchical and Sparse Topics

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References