1. Introduction
In recent years, with the improvement of GPU computing power and the development of the sequence-to-sequence (seq2seq) models, e.g., transformer [1] and ConvS2S model [2], seq2seq-based grammatical error correction (GEC) has attracted more and more attention. In the field of Automatic Speech Recognition (ASR), GEC can be used to correct errors in ASR transcribed results. GEC for ASR can be modeled as a machine translation task where seq2seq models take decoder hypotheses as inputs and ground truth transcriptions as target outputs, and seq2seq models learn to translate decoder hypotheses to error-free transcriptions. Many works have shown that GEC can further reduce the word error rate (WER) of ASR output remarkably [3]–[5].