Loading [MathJax]/extensions/MathMenu.js
Dual Attention Pooling Network for Recording Device Classification Using Neutral and Whispered Speech | IEEE Conference Publication | IEEE Xplore

Dual Attention Pooling Network for Recording Device Classification Using Neutral and Whispered Speech


Abstract:

In this work, we proposed a method for recording device classification using the recorded speech signal. With the rapid increase in different mobile and professional reco...Show More

Abstract:

In this work, we proposed a method for recording device classification using the recorded speech signal. With the rapid increase in different mobile and professional recording devices, determining the source device has many applications in forensics and in further improving various speech-based applications. This paper proposes dual and single attention pooling-based convolutional neural networks (CNN) for recording device classification using neutral and whispered speech. Experiments using five recording devices with simultaneous direct recordings from 88 speakers speaking both in neutral and whisper and recordings from 21 mobile devices with simultaneous playback recordings reveal that the proposed dual attention pooling based CNN method performs better than the best baseline scheme. We show that we achieve a better performance in recording device classification with whispered speech recordings than corresponding neutral speech. We also demonstrate the importance of voiced/unvoiced speech and different frequency bands in classifying the recording devices.
Date of Conference: 23-27 May 2022
Date Added to IEEE Xplore: 27 April 2022
ISBN Information:

ISSN Information:

Conference Location: Singapore, Singapore

1. INTRODUCTION

One of the crucial pieces of information in a recorded speech signal is the information about the device which was used to record the speech [1]. With a rapid increase in the number of different mobile and recording devices, identifying the recording device from a speech signal has various digital forensics and speech-enabled applications [2]. For example, establishing the source of audio or multi-media evidence in the court of law increases the authenticity of the evidence [2], [3]. This is because digital audio technology, at present, has facilitated the processing, manipulation and editing of audio by using sophisticated tools and software without leaving any perceptible trace [4]. Furthermore, knowing recording device characteristics has the potential to help other critical speech applications such as speech recognition and speaker verification by normalizing recording devices' variability.

References

References is not available for this document.