Conferences >2023 IEEE Automatic Speech Re...

Brouhaha: Multi-Task Training for Voice Activity Detection, Speech-to-Noise Ratio, and C50 Room Acoustics Estimation

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Most automatic speech processing systems register degraded performance when applied to noisy or reverberant speech. But how can one tell whether speech is noisy or reverb...Show More

Metadata

Abstract:

Most automatic speech processing systems register degraded performance when applied to noisy or reverberant speech. But how can one tell whether speech is noisy or reverberant? We propose Brouhaha, a neural network jointly trained to extract speech/non-speech segments, speech-to-noise ratios, and C50 room acoustics from single-channel recordings. Brouhaha is trained using a data-driven approach in which noisy and reverberant audio segments are synthesized. We first evaluate its performance and demonstrate that the proposed multi-task regime is beneficial. We then present two scenarios illustrating how Brouhaha can be used on naturally noisy and reverberant data: 1) to investigate the errors made by a speaker diarization model (pyannote.audio); and 2) to assess the reliability of an automatic speech recognition model (Whisper from OpenAI). Both our pipeline and a pretrained model are open source and shared with the speech community.

Published in: 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

Date of Conference: 16-20 December 2023

Date Added to IEEE Xplore: 19 January 2024

ISBN Information:

DOI: 10.1109/ASRU57964.2023.10389718

Conference Location: Taipei, Taiwan

Contents

1. Introduction

Robustness to degraded acoustic environments is a critical factor limiting the impact and adoption of speech technologies. Numerous sources of variations in the audio can degrade or hide the signal of interest and impact the performance of automatic speech processing systems. Be it automatic speech recognition (ASR) [1, 2, 3], speaker identification/diarization [4, 5], or speaker localization [6], most systems exhibit a loss of performance when applied in noisy or reverberant conditions.

References is not available for this document.

Brouhaha: Multi-Task Training for Voice Activity Detection, Speech-to-Noise Ratio, and C50 Room Acoustics Estimation

Abstract:

Metadata

Abstract:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Brouhaha: Multi-Task Training for Voice Activity Detection, Speech-to-Noise Ratio, and C50 Room Acoustics Estimation

Alerts

Abstract:

Metadata

Abstract:

1. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

Footnotes

References