1. INTRODUCTION
Comprehending the intent and/or domain (groups of mutually related intents) of users’ language utterances is a key task in everyday gadgets like mobile phones or smart speaker devices. Various techniques have been proposed in the literature for this [1]–[11]. Most methods are supervised or semi-supervised, i.e. they capitalize on sufficient labeled data, can only handle a fixed number of intents and domains seen during model training, and generalize poorly to new intents or domains unseen during model development. Zero shot techniques [6], [12] recognize new intents for which no labeled training data is available. However, they require some (often unfeasible) additional information like the number of new intent types, and some prior knowledge about the new intents to be discovered. Efforts have been made to break the closed-world assumption in the NLU literature [9], [10], [13], [14] via open world learning or open classification, that identifies instances with labels unseen during training. However, the task of discovering the actual latent categories within the instances identified as possessing unseen labels is relatively under explored [10], [15]. In this work, we attempt to bridge the gap between the two challenging yet realistic tasks of (i) discriminating utterances belonging to new intents/domains from utterances belonging to already familiar ones, and (ii) organizing the newly discovered intents/domains into a taxonomy. Though we address the problem of novel user intent and domain discovery, our technique can easily be generalized to any open classification setting. We propose a novel, three-stage framework called ADVIN (Automated Discovery of noVel domaIns and iNtents). It automatically discovers user intents and domains in massive, unlabeled text corpora, without any prior knowledge about the intents or domains that the text may comprise of. Our method first leverages the pre-trained multi-layer transformer network, BERT [16], to determine if an utterance is likely to contain a novel intent or not. ADVIN next uses unsupervised knowledge transfer to discover the latent intent categories in the earlier identified utterances. Finally, ADVIN hierarchically links semantically related groups of discovered intents to form new domains. Thus, our main contributions are that ADVIN (i) is completely unsupervised with respect to the number and names of novel intents and domains it detects; (ii) jointly detects novel domains and novel intents to form an intent-domain taxonomy; and (iii) defines a new loss function to learn pairwise distances between utterances.