Exploiting Periodicity Features for Joint Detection and DOA Estimation of Speech Sources Using Convolutional Neural Networks | IEEE Conference Publication | IEEE Xplore

Exploiting Periodicity Features for Joint Detection and DOA Estimation of Speech Sources Using Convolutional Neural Networks


Abstract:

While many algorithms deal with direction of arrival (DOA) estimation and voice activity detection (VAD) as two separate tasks, only a small number of data-driven methods...Show More

Abstract:

While many algorithms deal with direction of arrival (DOA) estimation and voice activity detection (VAD) as two separate tasks, only a small number of data-driven methods have addressed these two tasks jointly. In this paper, a multi-input single-output convolutional neural network (CNN) is proposed which exploits a novel feature combination for joint DOA estimation and VAD in the context of binaural hearing aids. In addition to the well-known generalized cross correlation with phase transform (GCC-PHAT) feature, the network uses an auditory-inspired feature called periodicity degree (PD), which provides a broadband representation of the periodic structure of the signal. The proposed CNN has been trained in a multi-conditional training scheme across different signal-to-noise ratios. Experimental results for a single-talker scenario in reverberant environments show that by exploiting the PD feature, the proposed CNN is able to distinguish speech from non-speech signal blocks, thereby outperforming the baseline CNN in terms of DOA estimation accuracy. In addition, the results show that the proposed method is able to adapt to different unseen acoustic conditions and background noises.
Date of Conference: 04-08 May 2020
Date Added to IEEE Xplore: 09 April 2020
ISBN Information:

ISSN Information:

Conference Location: Barcelona, Spain

1. INTRODUCTION

In many applications, such as assistive listening devices, hands-free speech communication systems and robot audition, reliable DOA estimates of sound sources are required [1]–[4]. While the human auditory system has the remarkable ability to localize a speech source in noisy and reverberant environments, this remains a challenging task for machine listening systems such as hearing aids (HAs) [5].

References

References is not available for this document.