1. Introduction
After the breakthrough of deep learning technology [1]–[6], speech recognition accuracy has improved dramatically. Recently, speech recognition systems have begun to be employed not only in smart phones and Personal Computers (PCs) but also in standalone devices in far-field environments. Examples include voice assistant systems such as Amazon Alexa and Google Home [7], [8]. In far-field speech recognition, the impact of noise and reverberation is much larger than near-field cases. Traditional approaches to far-field speech recognition include noise robust feature extraction algorithms [9], [10], on-set enhancement algorithms [11], [12], and multimicrophone approaches [13]–[17].