I. Introduction
Extreme Multi-label text Classification (XMC) is an essential task in natural language processing (NLP), which aims to recall multiple relevant labels and discriminate them from enormous numbers of irrelevant labels for a given text. This kind of problems is prevalent in many real-world applications, such as news annotation [25] and recommending related queries on a search engine [1].