Loading [MathJax]/extensions/MathZoom.js
ASR Error Correction and Domain Adaptation Using Machine Translation | IEEE Conference Publication | IEEE Xplore

ASR Error Correction and Domain Adaptation Using Machine Translation


Abstract:

Off-the-shelf pre-trained Automatic Speech Recognition (ASR) systems are an increasingly viable service for companies of any size building speech-based products. While th...Show More

Abstract:

Off-the-shelf pre-trained Automatic Speech Recognition (ASR) systems are an increasingly viable service for companies of any size building speech-based products. While these ASR systems are trained on large amounts of data, domain mismatch is still an issue for many such parties that want to use this service as-is leading to not so optimal results for their task. We propose a simple technique to perform domain adaptation for ASR error correction via machine translation. The machine translation model is a strong candidate to learn a mapping from out-of-domain ASR errors to in-domain terms in the corresponding reference files. We use two off-the-shelf ASR systems in this work: Google ASR (commercial) and the ASPIRE model (open-source). We observe 7% absolute improvement in word error rate and 4 point absolute improvement in BLEU score in Google ASR output via our proposed method. We also evaluate ASR error correction via a downstream task of Speaker Diarization that captures speaker style, syntax, structure and semantic improvements we obtain via ASR correction.
Date of Conference: 04-08 May 2020
Date Added to IEEE Xplore: 09 April 2020
ISBN Information:

ISSN Information:

Conference Location: Barcelona, Spain
Citations are not available for this document.

1. INTRODUCTION

Cloud-based ASR systems are easily available to companies building speech-based products. These products cover a wide-range of use cases like speech transcriptions, language understanding, spoken language translation, information extraction, and summarization. Most of these use-cases involve transcribing speech and then performing various downstream language-processing tasks. In these scenarios, there is a break of domain in two places, one for speech-to-text where pre-trained ASR is trained on different domains of data, and another while optimizing NLP downstream tasks with transcriptions from pre-existing ASR trained on another domain. This is a break that also stems from being unable to train in-house competitive ASR on in-domain data alone, which has a lesser chance of out performing pre-trained ASRs on much larger data, even if it is out-of-domain. Towards solving this problem, we propose to carry out ASR error correction via domain adaptation on two pre-existing ASRs: ASPIRE model [1] which is an open-source resource trained on conversational, broadcast, and read speech, and Google Speech API1 which is trained on large quantities of English speech.

Cites in Papers - |

Cites in Papers - IEEE (21)

Select All
1.
Yuanchao Li, Pinzhen Chen, Peter Bell, Catherine Lai, "Crossmodal ASR Error Correction With Discrete Speech Units", 2024 IEEE Spoken Language Technology Workshop (SLT), pp.431-438, 2024.
2.
Rakesh Roushan, Harshit Mishra, Lucky Yadav, Sreeja Koppula, Nitya Tiwari, K S Nataraj, "Optimizing Speech Recognition for Medical Transcription: Fine-Tuning Whisper and Developing a Web Application", 2024 IEEE Conference on Engineering Informatics (ICEI), pp.1-6, 2024.
3.
Nischay Kondai, Abhishek Devapangu, Sai Kartik Hosur, Pratik Raj, Nitya Tiwari, K S Nataraj, "Enhancing Medical ASR Accuracy through the Integration of T5-Based Small Language Models and Error Correction Mechanisms", 2024 IEEE Conference on Engineering Informatics (ICEI), pp.1-6, 2024.
4.
Wei-Chen Hsu, Pei-Xu Lin, Chi-Jou Li, Hao-Yu Tien, Yi-Huang Kang, Pei-Ju Lee, "An Enhanced Model for ASR in the Medical Field", 2024 IEEE International Conference on Information Reuse and Integration for Data Science (IRI), pp.43-48, 2024.
5.
Bingshen Mu, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie, "MMGER: Multi-Modal and Multi-Granularity Generative Error Correction With LLM for Joint Accent and Speech Recognition", IEEE Signal Processing Letters, vol.31, pp.1940-1944, 2024.
6.
Hao Yang, Min Zhang, Daimeng Wei, Jiaxin Guo, "CSNet: Contrastive Siamese Network for Robust SLU", ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.12797-12801, 2024.
7.
Pin-Jui Ku, I-Fan Chen, Chao-Han Huck Yang, Anirudh Raju, Pranav Dheram, Pegah Ghahremani, Brian King, Jing Liu, Roger Ren, Phani Sankar Nidadavolu, "Hot-Fixing Wake Word Recognition for End-to-End ASR Via Neural Model Reprogramming", ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.10816-10820, 2024.
8.
John Harvill, Rinat Khaziev, Scarlett Li, Randy Cogill, Lidan Wang, Gopinath Chennupati, Hari Thadakamalla, "Significant ASR Error Detection for Conversational Voice Assistants", ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.11606-11610, 2024.
9.
Long Mai, Julie Carson-Berndsen, "Enhancing Conversation Smoothness in Language Learning Chatbots: An Evaluation of GPT4 for ASR Error Correction", ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.11001-11005, 2024.
10.
Fang Dong, Yiyang Qian, Tianlei Wang, Peng Liu, Jiuwen Cao, "A Transformer-Based End-to-End Automatic Speech Recognition Algorithm", IEEE Signal Processing Letters, vol.30, pp.1592-1596, 2023.
11.
Yutong Shao, Arun Kumar, Ndapa Nakashole, "Database-Aware ASR Error Correction for Speech-to-SQL Parsing", ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.1-5, 2023.
12.
Takashi Fukuda, Samuel Thomas, "Effective Training of RNN Transducer Models on Diverse Sources of Speech and Text Data", ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.1-5, 2023.
13.
Binghuai Lin, Liyuan Wang, "Multi-modal ASR error correction with joint ASR error detection", ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.1-5, 2023.
14.
Duo-Duo Hang, Hua Zhang, Zi-Jian Cao, Yi Yang, "Error Correction of ASR in Air Traffic Control", 2022 IEEE 8th International Conference on Computer and Communications (ICCC), pp.1603-1607, 2022.
15.
Prashant Serai, Vishal Sunder, Eric Fosler-Lussier, "Hallucination of Speech Recognition Errors With Sequence to Sequence Learning", IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.30, pp.890-900, 2022.
16.
Dhanush Bekal, Ashish Shenoy, Monica Sunkara, Sravan Bodapati, Katrin Kirchhoff, "Remember the Context! ASR Slot Error Correction Through Memorization", 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp.236-243, 2021.
17.
Linchen Zhu, Wenjie Liu, Linquan Liu, Edward Lin, "Improving ASR Error Correction Using N-Best Hypotheses", 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp.83-89, 2021.
18.
Yu Jiang, Christian Poellabauer, "A Sequence-to-sequence Based Error Correction Model for Medical Automatic Speech Recognition", 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp.3029-3035, 2021.
19.
Sadeen Alharbi, Muna Alrazgan, Alanoud Alrashed, Turkiayh Alnomasi, Raghad Almojel, Rimah Alharbi, Saja Alharbi, Sahar Alturki, Fatimah Alshehri, Maha Almojil, "Automatic Speech Recognition: Systematic Literature Review", IEEE Access, vol.9, pp.131858-131876, 2021.
20.
Tsz Kin Lam, Shigehiko Schamoni, Stefan Riezler, "Cascaded Models with Cyclic Feedback for Direct Speech Translation", ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.7508-7512, 2021.
21.
Ryo Imaizumi, Ryo Masumura, Sayaka Shiota, Hitoshi Kiya, "Dialect-Aware Modeling for End-to-End Japanese Dialect Speech Recognition", 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp.297-301, 2020.

Cites in Papers - Other Publishers (7)

1.
Gayani Nanayakkara, Nirmalie Wiratunga, David Corsar, Kyle Martin, Anjana Wijekoon, "Clinical Dialogue Transcription Error Correction with Self-supervision", Artificial Intelligence XL, vol.14381, pp.33, 2023.
2.
Genshun Wan, Tingzhi Mao, Jingxuan Zhang, Hang Chen, Jianqing Gao, Zhongfu Ye, "Grammar-Supervised End-to-End Speech Recognition with Part-of-Speech Tagging and Dependency Parsing", Applied Sciences, vol.13, no.7, pp.4243, 2023.
3.
Gayani Nanayakkara, Nirmalie Wiratunga, David Corsar, Kyle Martin, Anjana Wijekoon, "Clinical Dialogue Transcription Error Correction Using Seq2Seq Models", Multimodal AI in Healthcare, vol.1060, pp.41, 2023.
4.
Xue Yu, "The appeal of green advertisements on consumers' consumption intention based on low-resource machine translation", The Journal of Supercomputing, 2022.
5.
Javier Cebrián, Ramón Martínez, Natalia Rodríguez, Luis Fernando D'Haro, "Considerations on creating conversational agents for multiple environments and users", AI Magazine, vol.42, no.2, pp.71, 2021.
6.
Kentaro Kamiya, Takuya Kawase, Ryuichiro Higashinaka, Katashi Nagao, "Using Presentation Slides and Adjacent Utterances for Post-editing of Speech Recognition Results for Meeting Recordings", Text, Speech, and Dialogue, vol.12848, pp.331, 2021.
7.
Linhan Zhang, Tieran Zheng, Jiabin Xue, "Error Heuristic Based Text-Only Error Correction Method for Automatic Speech Recognition", Neural Information Processing, vol.12532, pp.743, 2020.
Contact IEEE to Subscribe

References

References is not available for this document.