Abstract:
Speech classification plays a vital role in modern audio processing, with the rise in technologies like home assistants and speech-based control devices. Deep learning-ba...Show MoreMetadata
Abstract:
Speech classification plays a vital role in modern audio processing, with the rise in technologies like home assistants and speech-based control devices. Deep learning-based algorithms have played a big role in developing such technologies. Deep learning algorithms are data-hungry and need large labelled datasets for classification. However, finding such labelled datasets is rare in the real world. The datasets found in the open can be highly skewed, making the classification task hard. This study addresses the problem of data augmentation for the task of speech classification. To create examples for underrepresented classes, this method suggests conditioned data augmentation using generative adversarial networks (GAN). This work adapts and improves conditional GAN architecture to generate synthetic Mel spectrograms for the minority classes. The proposed GAN based approach is evaluated on the Speech Commands “Zero” through “Nine” (Sc09) dataset. Results demonstrate approximately 1.7% relative performance improvement in the accuracy, against a Convolutional Recurrent Neural Network (CRNN) baseline adapted to classify the speech commands.
Date of Conference: 26-27 August 2022
Date Added to IEEE Xplore: 12 January 2023
ISBN Information:
No metrics found for this document.