Design of fast LVCSR systems | IEEE Conference Publication | IEEE Xplore

Design of fast LVCSR systems


Abstract:

The paper describes the development of fast (less than 10 times real-time) large vocabulary continuous speech recognition (LVCSR) systems based on technology developed fo...Show More

Abstract:

The paper describes the development of fast (less than 10 times real-time) large vocabulary continuous speech recognition (LVCSR) systems based on technology developed for unlimited runtime systems assembled for participation in recent DARPA/NIST LVCSR evaluations. A general system structure for 10 times real-time systems is proposed and two specific systems that have been built for broadcast news (BN) and conversational telephone speech (CTS) recognition are described. The systems were evaluated in the DARPA/NIST April 2003 rich transcription evaluation. Results are reported and contrasted with unlimited runtime systems and previous fast systems.
Date of Conference: 30 November 2003 - 04 December 2003
Date Added to IEEE Xplore: 02 August 2004
Print ISBN:0-7803-7980-2
Conference Location: St Thomas, VI, USA
Citations are not available for this document.

1. Introduction

For more than a decade a major focus in LVCSR research work have been the yearly U.S. Government sponsored evaluations conducted by NIST. While these evaluations helped the research community to accurately measure the progress in the state-of-the-art in LVCSR and led to impressive improvements in accuracy [17], they also encouraged research sites to pursue “accuracy at any price”. This lead to typical systems running in about 300 times slower than real-time (with some taking up to 2000xRT). As LVCSR technology matured there is now again an increased interest in building faster systems while retaining the gains achieved. This trend is also reflected in the recently initiated DARPA EARS programme which aims at fast transcription of both Broadcast News and Conversational Telephone Speech data.

Cites in Papers - |

Cites in Papers - IEEE (9)

Select All
1.
P. Lanchantin, M. J. F. Gales, P. Karanasou, X. Liu, Y. Qian, L. Wang, P.C. Woodland, C. Zhang, "The development of the cambridge university alignment systems for the multi-genre broadcast challenge", 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp.647-653, 2015.
2.
P Bell, M J F Gales, T Hain, J Kilgour, P Lanchantin, X Liu, A McParland, S Renals, O Saz, M Wester, P C Woodland, "The MGB challenge: Evaluating multi-genre broadcast media recognition", 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp.687-693, 2015.
3.
Asmaa El Hannani, Thomas Hain, "Automatic Optimization of Speech Decoder Parameters", IEEE Signal Processing Letters, vol.17, no.1, pp.95-98, 2010.
4.
Ho Yin Chan, Justin Jian Zhang, Pascale Fung, Lu Cao, "A Mandarin lecture speech transcription system for speech summarization", 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU), pp.467-471, 2007.
5.
G. Evermann, H.Y. Chan, M.J.F. Gales, B. Jia, D. Mrva, P.C. Woodland, K. Yu, "Training LVCSR systems on thousands of hours of data", Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., vol.1, pp.I/209-I/212 Vol. 1, 2005.
6.
X. Liu, M.J.F. Gales, K.C. Sim, K. Yu, "Investigation of acoustic modeling techniques for LVCSR systems", Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., vol.1, pp.I/849-I/852 Vol. 1, 2005.
7.
H.Y. Chan, P. Woodland, "Improving broadcast news transcription by lightly supervised discriminative training", 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.1, pp.I-737, 2004.
8.
S.E. Tranter, K. Yu, G. Everinann, P.C. Woodland, "Generating and evaluating segmentations for automatic speech recognition of conversational telephone speech", 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.1, pp.I-753, 2004.
9.
D.Y. Kim, G. Evermann, T. Hain, D. Mrva, S.E. Tranter, L. Wang, P.C. Woodland, "Recent advances in broadcast news transcription", 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721), pp.105-110, 2003.
Contact IEEE to Subscribe

References

References is not available for this document.