ASR Error Correction and Domain Adaptation Using Machine Translation | IEEE Conference Publication | IEEE Xplore

ASR Error Correction and Domain Adaptation Using Machine Translation


Abstract:

Off-the-shelf pre-trained Automatic Speech Recognition (ASR) systems are an increasingly viable service for companies of any size building speech-based products. While th...Show More

Abstract:

Off-the-shelf pre-trained Automatic Speech Recognition (ASR) systems are an increasingly viable service for companies of any size building speech-based products. While these ASR systems are trained on large amounts of data, domain mismatch is still an issue for many such parties that want to use this service as-is leading to not so optimal results for their task. We propose a simple technique to perform domain adaptation for ASR error correction via machine translation. The machine translation model is a strong candidate to learn a mapping from out-of-domain ASR errors to in-domain terms in the corresponding reference files. We use two off-the-shelf ASR systems in this work: Google ASR (commercial) and the ASPIRE model (open-source). We observe 7% absolute improvement in word error rate and 4 point absolute improvement in BLEU score in Google ASR output via our proposed method. We also evaluate ASR error correction via a downstream task of Speaker Diarization that captures speaker style, syntax, structure and semantic improvements we obtain via ASR correction.
Date of Conference: 04-08 May 2020
Date Added to IEEE Xplore: 09 April 2020
ISBN Information:

ISSN Information:

Conference Location: Barcelona, Spain

1. INTRODUCTION

Cloud-based ASR systems are easily available to companies building speech-based products. These products cover a wide-range of use cases like speech transcriptions, language understanding, spoken language translation, information extraction, and summarization. Most of these use-cases involve transcribing speech and then performing various downstream language-processing tasks. In these scenarios, there is a break of domain in two places, one for speech-to-text where pre-trained ASR is trained on different domains of data, and another while optimizing NLP downstream tasks with transcriptions from pre-existing ASR trained on another domain. This is a break that also stems from being unable to train in-house competitive ASR on in-domain data alone, which has a lesser chance of out performing pre-trained ASRs on much larger data, even if it is out-of-domain. Towards solving this problem, we propose to carry out ASR error correction via domain adaptation on two pre-existing ASRs: ASPIRE model [1] which is an open-source resource trained on conversational, broadcast, and read speech, and Google Speech API1 which is trained on large quantities of English speech.

Contact IEEE to Subscribe

References

References is not available for this document.