1. Introduction
One of the challenges in speech recognition community is to build an ASR system with limited in-domain data. Thus the data hungry speech recognition training algorithms have to be modified to handle such limits. This also applies to neural networks (NNs) which are part of essentially any state-of-the-art ASR system today. They serve either as features extractors or as acoustic models estimating probabilities of sub-phoneme classes. NNs have to be trained on a large amount of in-domain data in order to perform well. Today's deep neural networks are largely under-trained from the perspective of the research done on tasks with rich resources [1]. The need for more training data can be alleviated by layer-wise training [2] or unsupervised pre-training [3]. Another techniques such as dropout [4] and maxout [5] effectively reduce the number of parameters in the neural network.