Conferences >2018 IEEE International Confe...

Spectral Distortion Model for Training Phase-Sensitive Deep-Neural Networks for Far-Field Speech Recognition

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In this paper, we present an algorithm which introduces phase-perturbation to the training database when training phase-sensitive deep neural-network models. Traditional ...Show More

Notes: This article was originally incorrectly tagged as not presented at the conference. It is now included as part of the conference record.

Metadata

Abstract:

In this paper, we present an algorithm which introduces phase-perturbation to the training database when training phase-sensitive deep neural-network models. Traditional features such as log-mel or cepstral features do not have have any phase-relevant information. However features such as raw-waveform or complex spectra features contain phase-relevant information. Phase-sensitive features have the advantage of being able to detect differences in time of arrival across different microphone channels or frequency bands. However, compared to magnitude-based features, phase information is more sensitive to various kinds of distortions such as variations in microphone characteristics, reverberation, and so on. For traditional magnitude-based features, it is widely known that adding noise or reverberation, often called Multistyle-TRaining (MTR), improves robustness. In a similar spirit, we propose an algorithm which introduces spectral distortion to make the deep-learning models more robust to phase-distortion. We call this approach Spectral-Distortion TRaining (SDTR). In our experiments using a training set consisting of 22-million utterances with and without MTR, this approach reduces Word Error Rates (WERs) relatively by 3.2 % and 8.48 % respectively on test sets recorded on Google Home.

Notes: This article was originally incorrectly tagged as not presented at the conference. It is now included as part of the conference record.

Published in: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 15-20 April 2018

Date Added to IEEE Xplore: 07 October 2018

ISBN Information:

Electronic ISSN: 2379-190X

DOI: 10.1109/ICASSP.2018.8462223

Conference Location: Calgary, AB, Canada

Contents

1. Introduction

After the breakthrough of deep learning technology [1]–[6], speech recognition accuracy has improved dramatically. Recently, speech recognition systems have begun to be employed not only in smart phones and Personal Computers (PCs) but also in standalone devices in far-field environments. Examples include voice assistant systems such as Amazon Alexa and Google Home [7], [8]. In far-field speech recognition, the impact of noise and reverberation is much larger than near-field cases. Traditional approaches to far-field speech recognition include noise robust feature extraction algorithms [9], [10], on-set enhancement algorithms [11], [12], and multimicrophone approaches [13]–[17].

References is not available for this document.

Spectral Distortion Model for Training Phase-Sensitive Deep-Neural Networks for Far-Field Speech Recognition

Abstract:

Metadata

Abstract:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Spectral Distortion Model for Training Phase-Sensitive Deep-Neural Networks for Far-Field Speech Recognition

Alerts

Abstract:

Metadata

Abstract:

1. Introduction

References