Conferences >ICASSP 2021 - 2021 IEEE Inter...

Robust Voice Activity Detection Using a Masked Auditory Encoder Based Convolutional Neural Network

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Voice activity detection (VAD) based on deep learning has achieved remarkable success. However, when the traditional features (e.g., raw waveforms and MFCCs) are directly...Show More

Metadata

Abstract:

Voice activity detection (VAD) based on deep learning has achieved remarkable success. However, when the traditional features (e.g., raw waveforms and MFCCs) are directly fed to the deep neural network model, the performance decreases because of noise interference. Here, we propose a robust VAD approach using a masked auditory encoder based convolutional neural network (M-AECNN). First, we analyze the effectiveness of using auditory features as deep learning encoder. These features can roughly simulate the transmission of sound to human inner-ear hair cells; thus, they are more robust than the raw waveform and frequency domain features designed as encoders. Second, similar to the human ear’s masking effect for different speech frequencies, the proposed auditory encoder can further improve the robustness of VAD by increasing the gain for cleaner speech frequencies. Extensive experimental results demonstrate that this approach achieves about 10.5% absolute improvement in the area under the curve on the AURORA-2J dataset compared with a VAD method based on a CNN and MFCCs.

Published in: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 06-11 June 2021

Date Added to IEEE Xplore: 13 May 2021

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP39728.2021.9415045

Conference Location: Toronto, ON, Canada

Funding Agency:

Contents

References is not available for this document.

Robust Voice Activity Detection Using a Masked Auditory Encoder Based Convolutional Neural Network

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Robust Voice Activity Detection Using a Masked Auditory Encoder Based Convolutional Neural Network

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?