1. INTRODUCTION
Recently, the major target of LVCSR (Large-Vocabulary Continuous Speech Recognition) systems has been shifting to spontaneous speech such as conversations, lectures and meetings. One of the most fundamental problems in training an acoustic model for this kind of spontaneous speech is the insufficient amount of training data to cover wide variation of the acoustic features. It is very difficult and costly to prepare a large speech corpus, because it involves manual transcription of utterances with many disfluencies, compared to the reading of prepared materials.