A Speech Enhancement System for Automotive Speech Recognition with a Hybrid Voice Activity Detection Method | IEEE Conference Publication | IEEE Xplore

A Speech Enhancement System for Automotive Speech Recognition with a Hybrid Voice Activity Detection Method


Abstract:

This paper presents a front-end speech enhancement approach to robust speech recognition in automotive environments. It combines hybrid voice activity detection (VAD), re...Show More

Abstract:

This paper presents a front-end speech enhancement approach to robust speech recognition in automotive environments. It combines hybrid voice activity detection (VAD), relative transfer function (RT-F) based generalized sidelobe cancelation, and single-channel post filtering to enhance the speech signal of interest, thereby improving the robustness of speech recognition. First, we choose four typical driving scenarios, which include most of the noise types in automobiles to record training data. The recorded data is then used to train deep neural network models (DNNs) for both speech and noise. The trained DNNs are subsequently used to estimate the speech presence probability on a frame-by-frame basis. This speech presence probability is then combined with the output of an energy-based VAD to form a hybrid VAD, which serves as the basis for the rest components of the speech enhancement system, including RTF estimation, adaptive beamforming, and post-filtering. Experiments are conducted in real automotive environments. The results show that the developed method can significantly improve the performance of both VAD and automatic speech recognition (ASR).
Date of Conference: 17-20 September 2018
Date Added to IEEE Xplore: 04 November 2018
ISBN Information:
Conference Location: Tokyo, Japan

1. Introduction

Speech interaction based on automatic speech recognition (ASR) in automotive systems is becoming more and more popular in recent years as it can help improve driving safety by enabling hands-free operations. However, noise in automotive environments may dramatically affect the ASR performance and, therefore, speech enhancement is needed in such applications, which has attracted a significant amount of attention over the past decade [1]–[9]. Many methods have been developed [10]–[15], which have achieved a certain degree of success in either enhancing the quality of hands-free voice communication or improving the ASR performance for human-machine interaction. But dealing with noise in automotive environments remains a challenging problem. This paper studies this problem and presents a speech enhancement approach to robust ASR in automotive environments based on the use of beamforming (with two microphones) and postfiltering techniques.

Contact IEEE to Subscribe

References

References is not available for this document.