Conferences >2021 IEEE International Confe...

A Sequence-to-sequence Based Error Correction Model for Medical Automatic Speech Recognition

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

The use of Automatic Speech Recognition (ASR) systems in medical applications is receiving rapidly growing interest due to their ability to reduce distractions and the co...Show More

Metadata

Abstract:

The use of Automatic Speech Recognition (ASR) systems in medical applications is receiving rapidly growing interest due to their ability to reduce distractions and the cognitive workload of physicians, particularly during critical medical procedures. However, state-of-the-art ASR systems still experience recognition errors, especially in noisy environments where speakers rely on medical-domain terminologies. This paper proposes a customized language model and a neural network based sequence-to-sequence (seq2seq) error correction module for medical ASR systems to provide domain adaptation and more reliable transcription results. Specifically, the error correction module learns the error patterns in noisy scenarios and is able to correct such errors during inference. Our experiments show that the proposed method can reduce the sentence error rate (SER) by up to 81% for formatted input and up to 31% SER for unformatted input in noisy environments.

Published in: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Date of Conference: 09-12 December 2021

Date Added to IEEE Xplore: 14 January 2022

ISBN Information:

DOI: 10.1109/BIBM52615.2021.9669554

Conference Location: Houston, TX, USA

Contents

I. Introduction

Automatic Speech Recognition (ASR) systems often significantly increase the efficiency and convenience of human-computer interactions for a variety of applications and domains. In recent years, the medical and healthcare field is one such domain that has received increasing attention from the ASR community. For example, in hospitals, manual interactions with computing systems (e.g., recording physician notes, retrieving patient data, and searching for medical information) can be distracting, tedious, and time-consuming. A speech-based system could even be used during surgeries or emergency interventions, where such a system could also more quickly alert physicians of recommended steps or help prevent deviations from usual treatment protocols and workflows. However, there are several challenges that need to be addressed before using an ASR in the medical field will be feasible. First, the system has to work reliably in environments with different levels and types of noise, such as multiple speakers, sounds generated by medical equipment, etc. Second, it is difficult to identify a universal dataset to be used for training and evaluating medical speech recognition tasks. And third, medical terminology can be much more complex than everyday expressions, e.g., medical terms may be longer than most other dictionary words, they are often combined in unusual ways and more difficult to pronounce, while also often sharing very similar pronunciations across different words (such as names of procedures, diseases, medications, etc.). Therefore, a more advanced approach to speech recognition is needed. Since this problem is so new, there have only been a few prior efforts to investigate the design of an ASR for medical purposes [1]. One approach to designing such a system is to collect medical speech data and and build a speech corpus that can be used to train a system from scratch. Edwards et al. [2] present a speech recognition system trained with 270 hours of medical speech data and 30 million tokens of text from clinical episodes, resulting in a word error rate (WER) that is below 16% in realistic clinical cases. Chiu et al. [3] trained two models, a Connectionist Temporal Classification (CTC) phoneme based model and a Listen Attend and Spell (LAS) grapheme based model, with 14,000 hours of medical conversations, yielding WERs of 20.1% and 18.3%, respectively. Another option is to use an existing ASR system and adapt it to the medical domain. Liu et al. [4] evaluate two well-known ASR systems, Nuance Dragon and SRI Decipher, on spoken clinical questions, and adapt the SRI system to the medical domain using a language model, achieving an WER of 26.7%. Salloum et al. [5] propose a method called “crowdsourced transcription process” to continuously refine ASR language models. Mani et al. [6] perform medical domain adaptation of Google ASR and ASPIRE via machine translation, which is achieved by learning a mapping from out-of-domain errors to in-domain medical terms, yielding a WER of 7%.

References is not available for this document.

A Sequence-to-sequence Based Error Correction Model for Medical Automatic Speech Recognition

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

A Sequence-to-sequence Based Error Correction Model for Medical Automatic Speech Recognition

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

References