Conferences >ICASSP 2022 - 2022 IEEE Inter...

Importantaug: A Data Augmentation Agent for Speech

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

We introduce ImportantAug, a technique to augment training data for speech classification and recognition models by adding noise to unimportant regions of the speech and ...Show More

Metadata

Abstract:

We introduce ImportantAug, a technique to augment training data for speech classification and recognition models by adding noise to unimportant regions of the speech and not to important regions. Importance is predicted for each utterance by a data augmentation agent that is trained to maximize the amount of noise it adds while minimizing its impact on recognition performance. The effectiveness of our method is illustrated on version two of the Google Speech Commands (GSC) dataset. On the standard GSC test set, it achieves a 23.3% relative error rate reduction compared to conventional noise augmentation which applies noise to speech without regard to where it might be most effective. It also provides a 25.4% error rate reduction compared to a baseline without data augmentation. Additionally, the proposed ImportantAug outperforms the conventional noise augmentation and the baseline on two test sets with additional noise added.

Published in: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 23-27 May 2022

Date Added to IEEE Xplore: 27 April 2022

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP43922.2022.9747003

Conference Location: Singapore, Singapore

Funding Agency:

Contents

1. INTRODUCTION

Data augmentation techniques are used to enhance models’ performance by adding additional variations to the training data. These techniques are widely applied to improve automatic speech recognition (ASR) performance [1]–[4]. In [1], the authors used speed perturbation to create new speech utterances by changing the frequency components and number of time frames of speech recordings. This additional training data helped to decrease the word error rate (WER) by 3.2% relative on Librispeech task with 960 hours Librispeech data. In [2], reverberation was added to the speech to make it more realistic. Recently, a common technique is to remove or mask information in the spectrogram domain. For instance, SpecAugment [5] removes speech information in T continuous random time frames or F frequency bins. At the time, this augmentation not only increased ASR accuracy, but also achieved the state-of-the-art WER on the LibriSpeech 960-hour dataset at 5.8%. [3] proposed data augmentation via adding additional noise to speech, reducing WER by 21.3% relative on their self-constructed 100 sentence evaluation set.

References is not available for this document.

MIT Libraries

MIT Libraries

Importantaug: A Data Augmentation Agent for Speech

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1. INTRODUCTION

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Importantaug: A Data Augmentation Agent for Speech

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1. INTRODUCTION

References