Conferences >2022 3rd International Confer...

A Modular Deep Denoising Autoencoder for speech enhancement

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Recent researches have proven that deep denoising autoencoder is an effective method for noise reduction and speech enhancement, and can provide better performance than s...Show More

Metadata

Abstract:

Recent researches have proven that deep denoising autoencoder is an effective method for noise reduction and speech enhancement, and can provide better performance than several existing methods. However, training deep denoising autoencoder has proven to be difficult computationally. The goal of this study is to develop a modular approach for training deep denoising autoencoders as a set of classifiers based on collaborative learning. Each classifier is a multilayer deep denoising autoencoder network and specialized for a particular enhancement task and handles a subset of the complete training dataset. The approach performance was assessed using a perceptual evaluation of speech quality and the segmental signal-to-noise ratio. We have trained two individual DDAEs with three and five hidden layers respectively for comparison purposes. We have also compared our proposed model with the traditional spectral subtraction and, log MMSE methods. The results showed that DDAE with three and five hidden layers was sufficient (deeper is better), while the proposed approach provided higher intelligibility results and was more suitable for high-quality cases.

Published in: 2022 3rd International Conference for Emerging Technology (INCET)

Date of Conference: 27-29 May 2022

Date Added to IEEE Xplore: 15 July 2022

ISBN Information:

DOI: 10.1109/INCET54531.2022.9825440

Conference Location: Belgaum, India

Funding Agency:

Contents

I. Introduction

Speech enhancement (SE) techniques have been a topic of great interest to researchers in the last decades. Over years, a variety of traditional statistical model-based SE techniques has been extensively addressed. E.g., spectral subtraction methods [1], Wiener filtering [2], and minimal mean-square error (MMSE) of the spectrum amplitude [3][4][5]. In the last few years, deep learning research has already impacted a wide range of speech enhancement work in both traditional and new contexts. E.g., regression model based on deep neural network (DNN) [6], convolutional neural network (CNN)-based speech augmentation approach [7], and long short-term memory (LSTM) neural network's time series modelling ability to improve speech [8]. According to these state-of-the-art studies, given enough hidden layers, DNN can learn any complicated transform function to approximate any mapping from input to code arbitrary well. Increasing the number of hidden layers helps to increase the capacity of the DNN for function approximation. Train DNN with large variations of speech patterns in different noisy environments, create severe interference effects in real-world environments, resulting in a delayed learning process and poor generalization of new inputs in unknown signal-to-noise ratios (SNR) [9][10]. Furthermore, even though context features were employed as input to the network, residual noise appeared in the increased output due to the DNN's frame-by-frame conversion of speech. Xu et al in [11] proposed a separable denoising autoencoder (SDAE), which have two autoencoders that represent speech and noise separately with pre-training that its input and output magnitudes are based on Fourier coefficients. Deep recurrent neural networks (DRNN) by Huang et al. added the recurrent structure to SDAE, as well as a discriminative term [12]. Meanwhile, Liu et al. conducted various tests to determine the SDAE's generalization power [13]. When not exposed to certain noise types during training, the network performs worse with mixtures containing those noise types. It was also found that invisible speakers and mixing weights had poor performance. Experimentally, however, we confront a lot more variations, such as the ratio of contributions from different sources, the frequency response of microphones, the degree of reverberations, and so on [14].

References is not available for this document.

MIT Libraries

MIT Libraries

A Modular Deep Denoising Autoencoder for speech enhancement

Abstract:

Metadata

Abstract:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

A Modular Deep Denoising Autoencoder for speech enhancement

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?