1. INTRODUCTION
Current speech recognizers involve several probabilistic models, such as acoustic models, lexicons, and language models. Therefore, the efficient integration of these models is important if we are to optimize the speech recognition performance. Conventionally, these probabilistic models are trained separately, and integrated simply by taking the product of probabilities. This basic scheme is consistent if we consider the maximum likelihood (ML) training of hidden Markov model (HMM)-based acoustic models, and N-gram language models. This is because these models are designed so that the joint objective function can be factorized into a likelihood term and a prior probability term computed by acoustic and language models, respectively.