Conferences >2015 IEEE Workshop on Automat...

Time-frequency convolutional networks for robust speech recognition

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Convolutional deep neural networks (CDNNs) have consistently shown more robustness to noise and background contamination than traditional deep neural networks (DNNs). For...Show More

Metadata

Abstract:

Convolutional deep neural networks (CDNNs) have consistently shown more robustness to noise and background contamination than traditional deep neural networks (DNNs). For speech recognition, CDNNs apply their convolution filters across frequency, which helps to remove cross-spectral distortions and, to some extent, speaker-level variability stemming from vocal tract length differences. Convolution across time has not been considered with much enthusiasm within the speech technology community. This work presents a modified CDNN architecture that we call the time-frequency convolutional network (TFCNN), in which two parallel layers of convolution are performed on the input feature space: convolution across time and frequency, each using a different pooling layer. The feature maps obtained from the convolution layers are then combined and fed to a fully connected DNN. Our experimental analysis on noise-, channel-, and reverberation-corrupted databases shows that TFCNNs demonstrate reduced speech recognition error rates compared to CDNNs whether using baseline mel-filterbank features or noise-robust acoustic features.

Published in: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

Date of Conference: 13-17 December 2015

Date Added to IEEE Xplore: 11 February 2016

ISBN Information:

DOI: 10.1109/ASRU.2015.7404811

Conference Location: Scottsdale, AZ, USA

References is not available for this document.

Contents

1. Introduction

Deep learning techniques [1] are now integral to current automatic speech recognition (ASR) systems [2]. Deep learning has been used for feature representation [3], acoustic modeling [1], and language modeling [4]. Although the results from deep neural networks (DNNs) have always been encouraging, current research is focused on both improving the state-of-the-art and increasing scientific understanding of deep learning's strengths and weaknesses. Although DNNs have been observed to work highly reliably under matched conditions, they are susceptible to performance degradations under mismatched conditions [28]. Speech-signal degradations (such as reverberation, noise, and channel mismatch) can significantly reduce DNN recognition accuracy, revealing DNN's vulnerability [4], [5] to unseen conditions.

Getting results...

References is not available for this document.

MIT Libraries

MIT Libraries

Time-frequency convolutional networks for robust speech recognition

Abstract:

Metadata

Abstract:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Time-frequency convolutional networks for robust speech recognition

Alerts

Abstract:

Metadata

Abstract:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?