Conferences >2022 International Conference...

Adversarial Synthesis based Data Augmentation for Speech Classification

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Speech classification plays a vital role in modern audio processing, with the rise in technologies like home assistants and speech-based control devices. Deep learning-ba...Show More

Metadata

Abstract:

Speech classification plays a vital role in modern audio processing, with the rise in technologies like home assistants and speech-based control devices. Deep learning-based algorithms have played a big role in developing such technologies. Deep learning algorithms are data-hungry and need large labelled datasets for classification. However, finding such labelled datasets is rare in the real world. The datasets found in the open can be highly skewed, making the classification task hard. This study addresses the problem of data augmentation for the task of speech classification. To create examples for underrepresented classes, this method suggests conditioned data augmentation using generative adversarial networks (GAN). This work adapts and improves conditional GAN architecture to generate synthetic Mel spectrograms for the minority classes. The proposed GAN based approach is evaluated on the Speech Commands “Zero” through “Nine” (Sc09) dataset. Results demonstrate approximately 1.7% relative performance improvement in the accuracy, against a Convolutional Recurrent Neural Network (CRNN) baseline adapted to classify the speech commands.

Published in: 2022 International Conference on Signal and Information Processing (IConSIP)

Date of Conference: 26-27 August 2022

Date Added to IEEE Xplore: 12 January 2023

ISBN Information:

DOI: 10.1109/ICoNSIP49665.2022.10007491

Conference Location: Pune, India

Contents

I. Introduction

Automatic speech recognition (ASR) is at the heart of the ever-increasing variety of voice assistants, auto-captioning, and voice-search applications [1]. Speech Classification is a subset of speech recognition which refers to a group of duties or issues that program must resolve to automatically categorise a section of input audio into categories, such as voice activity detection (binary or multi-class), speech command recognition (multi-class), and audio sentiment classification, among others. Speech command recognition is the process of classifying an input speech pattern into a certain set of classifications. It's a branch of automatic speech recognition known as keyword spotting, where a model continuously examines speech patterns to find particular “command” classes. The system can conduct a specified action in response to these commands being detected.

References is not available for this document.

MIT Libraries

MIT Libraries

Adversarial Synthesis based Data Augmentation for Speech Classification

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Adversarial Synthesis based Data Augmentation for Speech Classification

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?