A study on bias-based speech signal conditioning techniques for improving the robustness of automatic speech recognition | IEEE Conference Publication | IEEE Xplore

A study on bias-based speech signal conditioning techniques for improving the robustness of automatic speech recognition


Abstract:

Automatic speech recognition (ASR) performs poorly when the training conditions greatly mismatch the testing conditions. Additive background noises and channel distortion...Show More

Abstract:

Automatic speech recognition (ASR) performs poorly when the training conditions greatly mismatch the testing conditions. Additive background noises and channel distortion are responsible mostly for these mismatches. These mismatches introduce highly non-linear terms in the acoustic model of speech in both log-spectral and the cepstral domains. Current ASR is based on simple linear approximation of these non-linear functions in the cepstral domain in order to avoid model complexities. This linear modeling approach transforms the channel distortion into an additive bias term in the cepstral domain under the assumption of high SNR, which is barely true in practical situations. Several algorithms have been developed to estimate this bias term and make compensations either in feature space or in the model domain to improve the robustness of ASR. In this paper, we explore these bias estimate techniques for both stationary and non-stationary acoustic environments to find their applicability for self-adaptable ASR.
Date of Conference: 03-06 May 2009
Date Added to IEEE Xplore: 19 June 2009
ISBN Information:
Print ISSN: 0840-7789
Conference Location: St. John's, NL, Canada

1. INTRODUCTION

The benefits of ASR disappear quickly when the training and the testing conditions mismatch greatly in unknown environments. These mismatches are due to three reasons - (I) inter-and intraspeaker variabilities, (II) additive background noises, and (III) microphone and transmission channel interferences [9]. An insufficiency of training data to train the model parameters may also contribute to the acoustic mismatches. These variabilities could severely hamper the performance of ASR to an extent that would make it unacceptable for real-world applications. In robust ASR, the goal is to reduce the effects of such extraneous conditions to bring the recognition performance closer to that experienced in matched testing environments.

References

References is not available for this document.