Conferences >ICASSP 2020 - 2020 IEEE Inter...

ASR Error Correction and Domain Adaptation Using Machine Translation

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Off-the-shelf pre-trained Automatic Speech Recognition (ASR) systems are an increasingly viable service for companies of any size building speech-based products. While th...Show More

Metadata

Abstract:

Off-the-shelf pre-trained Automatic Speech Recognition (ASR) systems are an increasingly viable service for companies of any size building speech-based products. While these ASR systems are trained on large amounts of data, domain mismatch is still an issue for many such parties that want to use this service as-is leading to not so optimal results for their task. We propose a simple technique to perform domain adaptation for ASR error correction via machine translation. The machine translation model is a strong candidate to learn a mapping from out-of-domain ASR errors to in-domain terms in the corresponding reference files. We use two off-the-shelf ASR systems in this work: Google ASR (commercial) and the ASPIRE model (open-source). We observe 7% absolute improvement in word error rate and 4 point absolute improvement in BLEU score in Google ASR output via our proposed method. We also evaluate ASR error correction via a downstream task of Speaker Diarization that captures speaker style, syntax, structure and semantic improvements we obtain via ASR correction.

Published in: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 04-08 May 2020

Date Added to IEEE Xplore: 09 April 2020

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP40776.2020.9053126

Conference Location: Barcelona, Spain

Contents

1. INTRODUCTION

Cloud-based ASR systems are easily available to companies building speech-based products. These products cover a wide-range of use cases like speech transcriptions, language understanding, spoken language translation, information extraction, and summarization. Most of these use-cases involve transcribing speech and then performing various downstream language-processing tasks. In these scenarios, there is a break of domain in two places, one for speech-to-text where pre-trained ASR is trained on different domains of data, and another while optimizing NLP downstream tasks with transcriptions from pre-existing ASR trained on another domain. This is a break that also stems from being unable to train in-house competitive ASR on in-domain data alone, which has a lesser chance of out performing pre-trained ASRs on much larger data, even if it is out-of-domain. Towards solving this problem, we propose to carry out ASR error correction via domain adaptation on two pre-existing ASRs: ASPIRE model [1] which is an open-source resource trained on conversational, broadcast, and read speech, and Google Speech API¹ which is trained on large quantities of English speech.

References is not available for this document.

MIT Libraries

MIT Libraries

ASR Error Correction and Domain Adaptation Using Machine Translation

Abstract:

Metadata

Abstract:

ISSN Information:

1. INTRODUCTION

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

ASR Error Correction and Domain Adaptation Using Machine Translation

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. INTRODUCTION

References

IEEE Account

Purchase Details

Profile Information

Need Help?