Conferences >2007 IEEE International Confe...

Unsupervised Training for Mandarin Broadcast News and Conversation Transcription

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

A significant cost in obtaining acoustic training data is the generation of accurate transcriptions. For some sources close-caption data is available. This allows the use...Show More

Metadata

Abstract:

A significant cost in obtaining acoustic training data is the generation of accurate transcriptions. For some sources close-caption data is available. This allows the use of lightly-supervised training techniques. However, for some sources and languages close-caption is not available. In these cases unsupervised training techniques must be used. This paper examines the use of unsupervised techniques for discriminative training. In unsupervised training automatic transcriptions from a recognition system are used for training. As these transcriptions may be errorful data selection may be useful. Two forms of selection are described, one to remove non-target language shows, the other to remove segments with low confidence. Experiments were carried out on a Mandarin transcriptions task. Two types of test data were considered, broadcast news (BN) and broadcast conversations (BC). Results show that the gains from unsupervised discriminative training are highly dependent on the accuracy of the automatic transcriptions.

Published in: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07

Date of Conference: 15-20 April 2007

Date Added to IEEE Xplore: 04 June 2007

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP.2007.366922

Conference Location: Honolulu, HI, USA

Contents

1. INTRODUCTION

For some tasks, such as Broadcast News (BN) transcription, audio data can be easily collected from radio and television shows. Thus, it is possible to collect thousands of hours of audio data. However to build Speech-to-Text (STT) systems, in addition to the audio data, transcriptions are required. Generating accurate manual transcriptions for this data is highly expensive. For some sources, closed-captions and television transcripts may be available. These approximate transcriptions have been successfully used in lightly-supervised training techniques [1], [2]. However, for some tasks and languages, approximate transcriptions are not available. To make use of this audio data unsupervised training techniques may be used.

References is not available for this document.

Unsupervised Training for Mandarin Broadcast News and Conversation Transcription

Abstract:

Metadata

Abstract:

ISSN Information:

1. INTRODUCTION

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Unsupervised Training for Mandarin Broadcast News and Conversation Transcription

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. INTRODUCTION

References