1. Introduction
Deep learning based approaches, especially recurrent neural networks and their variants, have been one of the hottest topics in language modeling research for the past few years. Long Short Term Memory (LSTM) based recurrent language models (LMs) have shown significant perplexity gains on well established benchmarks such as the Penn Tree Bank [1] and the more recent One Billion corpus [2]. These results validate the potential of deep learning and recurrent models as being key to further progress in the field of language modeling. Since LMs are one of the core components of natural language processing (NLP) tasks such as automatic speech recognition (ASR) and Machine Translation (MT), improved language modeling techniques have sometimes translated to improvements in overall system performance for these tasks [3]–[5].