Journals & Magazines >IEEE Transactions on Speech a... >Volume: 13 Issue: 6

Automatic transcription of conversational telephone speech

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

This paper discusses the Cambridge University HTK (CU-HTK) system for the automatic transcription of conversational telephone speech. A detailed discussion of the most im...Show More

Metadata

Abstract:

This paper discusses the Cambridge University HTK (CU-HTK) system for the automatic transcription of conversational telephone speech. A detailed discussion of the most important techniques in front-end processing, acoustic modeling and model training, language and pronunciation modeling are presented. These include the use of conversation side based cepstral normalization, vocal tract length normalization, heteroscedastic linear discriminant analysis for feature projection, minimum phone error training and speaker adaptive training, lattice-based model adaptation, confusion network based decoding and confidence score estimation, pronunciation selection, language model interpolation, and class based language models. The transcription system developed for participation in the 2002 NIST Rich Transcription evaluations of English conversational telephone speech data is presented in detail. In this evaluation the CU-HTK system gave an overall word error rate of 23.9%, which was the best performance by a statistically significant margin. Further details on the derivation of faster systems with moderate performance degradation are discussed in the context of the 2002 CU-HTK 10 /spl times/ RT conversational speech transcription system.

Published in: IEEE Transactions on Speech and Audio Processing ( Volume: 13, Issue: 6, November 2005)

Page(s): 1173 - 1185

Date of Publication: 30 November 2005

ISSN Information:

DOI: 10.1109/TSA.2005.852999

Contents

I. Introduction

The transcription of conversational telephone speech is one of the most challenging tasks for speech recognition technology. State-of-the-art systems still yield high word error rates typically within a range of 20%–30%. Work on this task has been aided by extensive data collection, namely the Switchboard-1 corpus [10]. Originally designed as a resource to train and evaluate speaker identification systems, the corpus now serves as the primary source of data for work on automatic transcription of conversational telephone speech in English.

References is not available for this document.

MIT Libraries

MIT Libraries

Automatic transcription of conversational telephone speech

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Automatic transcription of conversational telephone speech

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?