Conferences >2020 IEEE International Confe...

Snr-Based Teachers-Student Technique For Speech Enhancement

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

It is very challenging for speech enhancement methods to achieves robust performance under both high signal-to-noise ratio (SNR) and low SNR simultaneously. In this paper...Show More

Metadata

Abstract:

It is very challenging for speech enhancement methods to achieves robust performance under both high signal-to-noise ratio (SNR) and low SNR simultaneously. In this paper, we propose a method that integrates an SNR-based teachers-student technique and time-domain U-Net to deal with this problem. Specifically, this method consists of multiple teacher models and a student model. We first train the teacher models under multiple small-range SNRs that do not coincide with each other so that they can perform speech enhancement well within the specific SNR range. Then, we choose different teacher models to supervise the training of the student model according to the SNR of the training data. Eventually, the student model can perform speech enhancement under both high SNR and low SNR. To evaluate the proposed method, we constructed a dataset with an SNR ranging from -20dB to 20dB based on the public dataset. We experimentally analyzed the effectiveness of the SNR-based teachers-student technique and compared the proposed method with several state-of-the-art methods.

Published in: 2020 IEEE International Conference on Multimedia and Expo (ICME)

Date of Conference: 06-10 July 2020

Date Added to IEEE Xplore: 09 June 2020

ISBN Information:

ISSN Information:

DOI: 10.1109/ICME46284.2020.9102846

Conference Location: London, UK

Contents

1. Introduction

Speech enhancement is to separate clean speech from noisy speech [1]. It is an essential branch of speech signal processing and has been widely studied in the past few decades. It can be used in hearing aids, voice recorders, and smart speakers, as well as the front end of tasks such as speech recognition [2] and speaker recognition [3]. In recent years, a large number of speech enhancement methods based on deep learning have been proposed [4] [5] [6] [7] [8], showing stronger robustness than traditional signal-based methods. These methods can generally be divided into timedomain methods and frequency-domain methods. The timedomain methods [9] [10] use the neural network to directly map noisy speech waveform to clean speech waveform and usually do not require any preprocessing. The frequencydomain methods generally use short-term Fourier transform (STFT) to convert the noisy speech from the time domain to the frequency domain. They then use the neural network to map the magnitude spectrum of the noisy speech to some masking [11] or the magnitude spectrum of the clean speech [5]. Compared with the chaotic time-domain sampling points, the magnitude spectrum contains more geometric information, which makes it easier to calculate losses and analyze frequency components. As the SNR of noisy speech decreases, the correct phase becomes more and more important for speech intelligibility and quality [12]. However, since the mapping of the phase spectrum is complicated (no obvious geometric structure), the speech enhancement methods in the time domain are also widely used.

References is not available for this document.

Snr-Based Teachers-Student Technique For Speech Enhancement

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Snr-Based Teachers-Student Technique For Speech Enhancement

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References