Loading [a11y]/accessibility-menu.js
Multi-style training of HMMS with stereo data for reverberation-robust speech recognition | IEEE Conference Publication | IEEE Xplore

Multi-style training of HMMS with stereo data for reverberation-robust speech recognition


Abstract:

A novel training algorithm using data pairs of clean and reverberant feature vectors for estimating robust Hidden Markov Models (HMMs), introduced in for matched training...Show More

Abstract:

A novel training algorithm using data pairs of clean and reverberant feature vectors for estimating robust Hidden Markov Models (HMMs), introduced in for matched training, is employed in this paper for multi-style training. The multi-style HMMs are derived from well-trained clean-speech HMMs by aligning the clean data to the clean-speech HMM and using the resulting state-frame alignment to estimate the Gaussian mixture densities from the reverberant data of several different rooms. Thus, the temporal alignment is fixed for all reverberation conditions contained in the multi-style training set so that the model mismatch between the different rooms is reduced. Therefore, this training approach is particularly suitable for multi-style training. Multi-style HMMs trained by the proposed approach and adapted to the current room condition using maximum likelihood linear regression significantly outperform the corresponding adapted multi-style HMMs trained by the conventional Baum-Welch algorithm. In strongly reverberant rooms, the proposed adapted multi-style HMMs even outper-form Baum-Welch HMMs trained on matched data.
Date of Conference: 30 May 2011 - 01 June 2011
Date Added to IEEE Xplore: 07 July 2011
ISBN Information:
Conference Location: Edinburgh, UK

1. INTRODUCTION

In distant-talking Automatic Speech Recognition (ASR) ap-plications, reverberation leads to a mismatch between the acoustic models of the ASR system, typically trained on clean speech, and the utterances to be recognized. This mismatch leads to a significant reduction of the recognition accuracy compared to clean-speech recordings as shown in [2].

References

References is not available for this document.