Abstract:
We describe extensions and improvements to IBM's system for automatic transcription of broadcast news. The speech recognizer uses a total of 160 hours of acoustic trainin...Show MoreMetadata
Abstract:
We describe extensions and improvements to IBM's system for automatic transcription of broadcast news. The speech recognizer uses a total of 160 hours of acoustic training data, 80 hours more than for the system described in Chen et al. (1998). In addition to improvements obtained in 1997 we made a number of changes and algorithmic enhancements. Among these were changing the acoustic vocabulary, reducing the number of phonemes, insertion of short pauses, mixture models consisting of non-Gaussian components, pronunciation networks, factor analysis (FACILT) and Bayesian information criteria (BIC) applied to choosing the number of components in a Gaussian mixture model. The models were combined in a single system using NIST's script voting machine known as rover (Fiscus 1997).
Published in: 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258)
Date of Conference: 15-19 March 1999
Date Added to IEEE Xplore: 06 August 2002
Print ISBN:0-7803-5041-3
Print ISSN: 1520-6149