1. Introduction
Deep learning techniques [1] are now integral to current automatic speech recognition (ASR) systems [2]. Deep learning has been used for feature representation [3], acoustic modeling [1], and language modeling [4]. Although the results from deep neural networks (DNNs) have always been encouraging, current research is focused on both improving the state-of-the-art and increasing scientific understanding of deep learning's strengths and weaknesses. Although DNNs have been observed to work highly reliably under matched conditions, they are susceptible to performance degradations under mismatched conditions [28]. Speech-signal degradations (such as reverberation, noise, and channel mismatch) can significantly reduce DNN recognition accuracy, revealing DNN's vulnerability [4], [5] to unseen conditions.