Loading [MathJax]/extensions/MathZoom.js
Adversarial Synthesis based Data Augmentation for Speech Classification | IEEE Conference Publication | IEEE Xplore

Adversarial Synthesis based Data Augmentation for Speech Classification


Abstract:

Speech classification plays a vital role in modern audio processing, with the rise in technologies like home assistants and speech-based control devices. Deep learning-ba...Show More

Abstract:

Speech classification plays a vital role in modern audio processing, with the rise in technologies like home assistants and speech-based control devices. Deep learning-based algorithms have played a big role in developing such technologies. Deep learning algorithms are data-hungry and need large labelled datasets for classification. However, finding such labelled datasets is rare in the real world. The datasets found in the open can be highly skewed, making the classification task hard. This study addresses the problem of data augmentation for the task of speech classification. To create examples for underrepresented classes, this method suggests conditioned data augmentation using generative adversarial networks (GAN). This work adapts and improves conditional GAN architecture to generate synthetic Mel spectrograms for the minority classes. The proposed GAN based approach is evaluated on the Speech Commands “Zero” through “Nine” (Sc09) dataset. Results demonstrate approximately 1.7% relative performance improvement in the accuracy, against a Convolutional Recurrent Neural Network (CRNN) baseline adapted to classify the speech commands.
Date of Conference: 26-27 August 2022
Date Added to IEEE Xplore: 12 January 2023
ISBN Information:
Conference Location: Pune, India

I. Introduction

Automatic speech recognition (ASR) is at the heart of the ever-increasing variety of voice assistants, auto-captioning, and voice-search applications [1]. Speech Classification is a subset of speech recognition which refers to a group of duties or issues that program must resolve to automatically categorise a section of input audio into categories, such as voice activity detection (binary or multi-class), speech command recognition (multi-class), and audio sentiment classification, among others. Speech command recognition is the process of classifying an input speech pattern into a certain set of classifications. It's a branch of automatic speech recognition known as keyword spotting, where a model continuously examines speech patterns to find particular “command” classes. The system can conduct a specified action in response to these commands being detected.

Contact IEEE to Subscribe

References

References is not available for this document.