Conferences >2023 Signal Processing: Algor...

Augmented Transformer for Speech Detection in Adverse Acoustical Conditions

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In this work, we have presented a study of speech signal detection in adverse acoustic conditions. We have prepared a dedicated dataset, where fragments of monologues and...Show More

Metadata

Abstract:

In this work, we have presented a study of speech signal detection in adverse acoustic conditions. We have prepared a dedicated dataset, where fragments of monologues and dialogues were mixed with various background noises at five SNR levels. Then we used a vision transformer adapted to audio signals to determine the speech regions in an audio signal. To adapt to the adverse acoustic conditions, we have added an augmentation module as an extra head in the transformer with low-pass and band-pass filters. As conducted experiments show, the proposed AugViT architecture improves speech detection compared with the accuracy of the baseline ViT transformer.

Published in: 2023 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)

Date of Conference: 20-22 September 2023

Date Added to IEEE Xplore: 10 October 2023

Print on Demand(PoD) ISBN:979-8-3503-0498-5

ISSN Information:

DOI: 10.23919/SPA59660.2023.10274438

Conference Location: Poznan, Poland

No metrics found for this document.

Contents

I. Introduction

Voice activation detection (VAD) is essential to speech enhancement, coding and recognition tasks. Its functionality directly influences the effectiveness due to selecting parts of the input audio signal containing speech. This task has been applied in many speech-based systems for years, especially in speaker diarisation, speech transmission, voice interaction systems, and automatic speech recognition (ASR). Many VAD systems have been developed because speech detection significantly impacts the quality and efficiency of voice-based tasks. In the initial phase, research was conducted on finding the relevant attributes of the speech signal, which unambiguously indicate the presence of such a signal in the acoustic stream. Then, some systems [1] - [3] has mechanisms for reconstructing the speech signal by the introduction of a noise reduction stage and then began to conduct research on the search for such attributes of the speech signal that would be more resistant to the changing conditions of signal acquisition, including the type and intensity of other sources of sound. The existing VADs use various methods to determine speech regions in the audio signal. They may use statistical techniques or machine learning with a supervised or unsupervised approach. Currently implemented VAD systems exploit the paradigm of deep neural networks.

Usage

Select a Year

View as

Total usage sinceOct 2023:84

Year Total:5

Data is updated monthly. Usage includes PDF downloads and HTML views.

Citations

Search for
Citations in
Google Scholar^®

References is not available for this document.

Augmented Transformer for Speech Detection in Adverse Acoustical Conditions

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

View as

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Augmented Transformer for Speech Detection in Adverse Acoustical Conditions

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

Authors

Figures

References

Keywords

Metrics

View as

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?