Abstract:
Convolutional deep neural networks (CDNNs) have consistently shown more robustness to noise and background contamination than traditional deep neural networks (DNNs). For...Show MoreMetadata
Abstract:
Convolutional deep neural networks (CDNNs) have consistently shown more robustness to noise and background contamination than traditional deep neural networks (DNNs). For speech recognition, CDNNs apply their convolution filters across frequency, which helps to remove cross-spectral distortions and, to some extent, speaker-level variability stemming from vocal tract length differences. Convolution across time has not been considered with much enthusiasm within the speech technology community. This work presents a modified CDNN architecture that we call the time-frequency convolutional network (TFCNN), in which two parallel layers of convolution are performed on the input feature space: convolution across time and frequency, each using a different pooling layer. The feature maps obtained from the convolution layers are then combined and fed to a fully connected DNN. Our experimental analysis on noise-, channel-, and reverberation-corrupted databases shows that TFCNNs demonstrate reduced speech recognition error rates compared to CDNNs whether using baseline mel-filterbank features or noise-robust acoustic features.
Date of Conference: 13-17 December 2015
Date Added to IEEE Xplore: 11 February 2016
ISBN Information:
Multi-style training of HMMS with stereo data for reverberation-robust speech recognition
Armin Sehr,Christian Hofmann,Roland Maas,Walter Kellermann
Efficient training of acoustic models for reverberation-robust medium-vocabulary automatic speech recognition
Armin Sehr,Hendrik Barfuss,Christian Hofmann,Roland Maas,Walter Kellermann
Improving robustness against reverberation for automatic speech recognition
Vikramjit Mitra,Julien Van Hout,Wen Wang,Martin Graciarena,Mitchell McLaren,Horacio Franco,Dimitra Vergyri
Leveraging automatic speech recognition in cochlear implants for improved speech intelligibility under reverberation
Oldooz Hazrati,Shabnam Ghaffarzadegan,John H.L. Hansen
Small Energy Masking for Improved Neural Network Training for End-To-End Speech Recognition
Chanwoo Kim,Kwangyoun Kim,Sathish Reddy Indurthi
Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition
Felix Weninger,Shinji Watanabe,Yuuki Tachioka,Björn Schuller
State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention with Dilated 1D Convolutions
Kyu J. Han,Ramon Prieto,Tao Ma
Far-field speech recognition using CNN-DNN-HMM with convolution in time
Takuya Yoshioka,Shigeki Karita,Tomohiro Nakatani
On the application of reverberation suppression to robust speech recognition
Roland Maas,Emanuël A.P. Habets,Armin Sehr,Walter Kellermann
Deep convolutional nets and robust features for reverberation-robust speech recognition
Vikramjit Mitra,Wen Wang,Horacio Franco