Conferences >ICASSP 2022 - 2022 IEEE Inter...

Dual Attention Pooling Network for Recording Device Classification Using Neutral and Whispered Speech

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In this work, we proposed a method for recording device classification using the recorded speech signal. With the rapid increase in different mobile and professional reco...Show More

Metadata

Abstract:

In this work, we proposed a method for recording device classification using the recorded speech signal. With the rapid increase in different mobile and professional recording devices, determining the source device has many applications in forensics and in further improving various speech-based applications. This paper proposes dual and single attention pooling-based convolutional neural networks (CNN) for recording device classification using neutral and whispered speech. Experiments using five recording devices with simultaneous direct recordings from 88 speakers speaking both in neutral and whisper and recordings from 21 mobile devices with simultaneous playback recordings reveal that the proposed dual attention pooling based CNN method performs better than the best baseline scheme. We show that we achieve a better performance in recording device classification with whispered speech recordings than corresponding neutral speech. We also demonstrate the importance of voiced/unvoiced speech and different frequency bands in classifying the recording devices.

Published in: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 23-27 May 2022

Date Added to IEEE Xplore: 27 April 2022

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP43922.2022.9747700

Conference Location: Singapore, Singapore

Contents

1. INTRODUCTION

One of the crucial pieces of information in a recorded speech signal is the information about the device which was used to record the speech [1]. With a rapid increase in the number of different mobile and recording devices, identifying the recording device from a speech signal has various digital forensics and speech-enabled applications [2]. For example, establishing the source of audio or multi-media evidence in the court of law increases the authenticity of the evidence [2], [3]. This is because digital audio technology, at present, has facilitated the processing, manipulation and editing of audio by using sophisticated tools and software without leaving any perceptible trace [4]. Furthermore, knowing recording device characteristics has the potential to help other critical speech applications such as speech recognition and speaker verification by normalizing recording devices' variability.

References is not available for this document.

MIT Libraries

MIT Libraries

Dual Attention Pooling Network for Recording Device Classification Using Neutral and Whispered Speech

Abstract:

Metadata

Abstract:

ISSN Information:

1. INTRODUCTION

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Dual Attention Pooling Network for Recording Device Classification Using Neutral and Whispered Speech

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. INTRODUCTION

References

IEEE Account

Purchase Details

Profile Information

Need Help?