Conferences >2022 IEEE International Confe...

A Two-Stage LSTM Based Approach for Voice Activity Detection with Sound Event Classification

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

We introduce a two-stage approach using LSTM for voice activity detection with sound event classification. This approach proves to be effective when training data is limi...Show More

Metadata

Abstract:

We introduce a two-stage approach using LSTM for voice activity detection with sound event classification. This approach proves to be effective when training data is limited. Moreover, it achieves better performance than pre-trained model using large-scale data set (AudioSet). Apart from clip-level accuracy, we also introduce two metrics for evaluating overall audio segmentation accuracy: mean

$\mathbf{IoU}$ ),and mean front miss. On test set, our method achieves 98 % accuracy, 0.95 mean

$\mathbf{IoU}$ for speech and 0.99 mean

$\mathbf{IoU}$ for music, and 0.03 mean front miss for both speech and music.

Published in: 2022 IEEE International Conference on Consumer Electronics (ICCE)

Date of Conference: 07-09 January 2022

Date Added to IEEE Xplore: 15 March 2022

ISBN Information:

ISSN Information:

DOI: 10.1109/ICCE53296.2022.9730179

Conference Location: Las Vegas, NV, USA

Contents

I. Introduction

The problem we consider in this paper involves audio recordings of the dialogue between a voice assistant (such as Alexa in Echo Dot) and a “user” collected in controlled lab experiments. ¹

The user's voice is synthesized. There is no recording of any actual conversation between customer and voice assistant involved.

The first seconds of each dialogue is recorded. Fig. 1 shows the lab setup and Fig. 2 shows the waveforms of two example audio recordings. In our use case, such a dialogue usually follows this pattern: wake word(user) question/request(user) acknowledgement of receiving the question(voice assistant) actual answer/content(voice assistant). For example, the first plot in Fig. 2 corresponds to this dialogue: •

User: Alexa, add bananas to my shopping list.

•

Alexa: You already have bananas on your shop-ping

•

list.

References is not available for this document.

A Two-Stage LSTM Based Approach for Voice Activity Detection with Sound Event Classification

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

A Two-Stage LSTM Based Approach for Voice Activity Detection with Sound Event Classification

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References