I. Introduction
Neural machine translation systems [1] are traditionally trained by textual instances. Those instances usually need to go through additional pre-defined heuristic data processing pipelines to be canonical for training MT, as described in the previous technical reports of the WMT shared tasks [2], [3], [4]. However, with the rapid development of 5G technology and the maturity of the mobile Internet industry, the data sources of machine translation systems have become more diverse: results generated by optical character recognition (OCR) or automatic speech recognition (ASR) could be input texts for NMT, where a wide variety of noises would be introduced. Moreover, users may type queries quickly on the mobile phones, with different kinds of misspellings. In summary, in real-world machine translation application scenarios, noisy inputs could be more common and cannot be ignored. Unfortunately, current research [5] has empirically shown that nerual machine translation systems are vulnerable to both synthetic and natural noisy inputs. Therefore, improving the robustness of translation is crucial for deployed NMT systems.