I. Introduction
With the rapid development of deep learning and natural language processing technologies, and the enormous commercial value embedded in real-world dialogue systems, conversational agents such as Microsoft XiaoIce, Apple Siri, Baidu Xiaodu, among others, have gradually become integrated into daily lives. These systems serve as virtual assistants, offering support for users’ specific goals, such as weather inquiries, navigation, and restaurant reservations. Dialogue State Tracking (DST), which accurately monitors changing user goals and system operations during conversations, is a crucial component of modular task-oriented dialogue systems [1] – [3]. Precise dialogue state tracking is essential for generating correct dialogue actions and appropriate natural language responses, impacting subsequent database searches and dialogue strategy selection operations in task-oriented dialogue systems. Generally, multidomain dialogue states are represented as binary group in the form of (domain-slot, value), such as <restaurant-area, east>. These slot-value pairs are extracted from the context of each turn of dialogue.