Conferences >ICASSP 2022 - 2022 IEEE Inter...

Advin: Automatically Discovering Novel Domains and Intents from User Text Utterances

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Recognizing the intents and domains of users’ spoken and written language is a key component of Natural Language Understanding (NLU) systems. Real applications however en...Show More

Metadata

Abstract:

Recognizing the intents and domains of users’ spoken and written language is a key component of Natural Language Understanding (NLU) systems. Real applications however encounter dynamic, rapidly evolving environments with newly emerging intents and domains, for which no labeled data or prior information is available. For such a setting, we propose a novel framework, ADVIN, to automatically discover novel domains and intents from large volumes of unlabeled text. We first employ an open classification model to discriminate all utterances potentially consisting of a novel intent. Next, we train a deep learning model with a pairwise margin loss function and knowledge transfer, to discover multiple latent intent categories in an unsupervised manner. We finally form a hierarchical intent-domain taxonomy by linking mutually related novel intents into novel domains. ADVIN significantly outperforms strong baselines on four benchmark datasets, and data from a real-world voice agent.

Published in: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 23-27 May 2022

Date Added to IEEE Xplore: 27 April 2022

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP43922.2022.9746672

Conference Location: Singapore, Singapore

Contents

1. INTRODUCTION

Comprehending the intent and/or domain (groups of mutually related intents) of users’ language utterances is a key task in everyday gadgets like mobile phones or smart speaker devices. Various techniques have been proposed in the literature for this [1]–[11]. Most methods are supervised or semi-supervised, i.e. they capitalize on sufficient labeled data, can only handle a fixed number of intents and domains seen during model training, and generalize poorly to new intents or domains unseen during model development. Zero shot techniques [6], [12] recognize new intents for which no labeled training data is available. However, they require some (often unfeasible) additional information like the number of new intent types, and some prior knowledge about the new intents to be discovered. Efforts have been made to break the closed-world assumption in the NLU literature [9], [10], [13], [14] via open world learning or open classification, that identifies instances with labels unseen during training. However, the task of discovering the actual latent categories within the instances identified as possessing unseen labels is relatively under explored [10], [15]. In this work, we attempt to bridge the gap between the two challenging yet realistic tasks of (i) discriminating utterances belonging to new intents/domains from utterances belonging to already familiar ones, and (ii) organizing the newly discovered intents/domains into a taxonomy. Though we address the problem of novel user intent and domain discovery, our technique can easily be generalized to any open classification setting. We propose a novel, three-stage framework called ADVIN (Automated Discovery of noVel domaIns and iNtents). It automatically discovers user intents and domains in massive, unlabeled text corpora, without any prior knowledge about the intents or domains that the text may comprise of. Our method first leverages the pre-trained multi-layer transformer network, BERT [16], to determine if an utterance is likely to contain a novel intent or not. ADVIN next uses unsupervised knowledge transfer to discover the latent intent categories in the earlier identified utterances. Finally, ADVIN hierarchically links semantically related groups of discovered intents to form new domains. Thus, our main contributions are that ADVIN (i) is completely unsupervised with respect to the number and names of novel intents and domains it detects; (ii) jointly detects novel domains and novel intents to form an intent-domain taxonomy; and (iii) defines a new loss function to learn pairwise distances between utterances.

References is not available for this document.

MIT Libraries

MIT Libraries

Advin: Automatically Discovering Novel Domains and Intents from User Text Utterances

Abstract:

Metadata

Abstract:

ISSN Information:

1. INTRODUCTION

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Advin: Automatically Discovering Novel Domains and Intents from User Text Utterances

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. INTRODUCTION

References