1. INTRODUCTION
For some tasks, such as Broadcast News (BN) transcription, audio data can be easily collected from radio and television shows. Thus, it is possible to collect thousands of hours of audio data. However to build Speech-to-Text (STT) systems, in addition to the audio data, transcriptions are required. Generating accurate manual transcriptions for this data is highly expensive. For some sources, closed-captions and television transcripts may be available. These approximate transcriptions have been successfully used in lightly-supervised training techniques [1], [2]. However, for some tasks and languages, approximate transcriptions are not available. To make use of this audio data unsupervised training techniques may be used.