I. Introduction
Machine translation (MT) quality has significantly improved with the development of neural machine translation (NMT) [1], [2], which always relies on a large amount of high-quality parallel sentences. However, in some low-resource domains, parallel sentences are limited and cannot support training a good NMT model. Moreover, exhausting all the potential domains and separately training the NMT models is expensive and impossible. To address this problem, Multi-domain (MD) MT [3], [4] is proposed to construct models with mixed-domain training corpora to switch translation between different domains. MDMT has multiple advantages: 1) When faced with inputs that are potentially from multiple domains, MDMT can be effective and cheap to deploy [5]. 2) MDMT allows domains to share information and promote the performance for related domains like the findings in multilingual translation [6]. 3) MDMT would be skilled at generalization and benefit the low-resource domains.