Zero resource anti-spoofing detection for unit selection based synthetic speech using image spectrogram artifacts | IEEE Conference Publication | IEEE Xplore

Zero resource anti-spoofing detection for unit selection based synthetic speech using image spectrogram artifacts


Abstract:

Synthesized speech poses a serious threat to speaker verification systems, which is aggravated by speech synthesis systems becoming more freely available and easily adapt...Show More

Abstract:

Synthesized speech poses a serious threat to speaker verification systems, which is aggravated by speech synthesis systems becoming more freely available and easily adaptable to a target speaker. This motivated research into synthetic speech detection to circumvent the threat. Although current algorithms are effective in the detection of HMM-based speech synthesizers, unit selection based speech synthesizers remain a serious threat due to its ability to generate spoofing speech which easily overcame existing detectors. Current error rates for their detection is a lot higher than that obtained for other spoofing methods. This paper proposes a detection algorithm to counter unit selection based synthesis speech. It is free of training and exploits presence of artifacts in image spectrogram to perform detection. To the best of our knowledge, this is the first attempt targeted for unit selection based synthesis speech. Experimental results show the effectiveness of the proposed approach.
Date of Conference: 13-16 December 2016
Date Added to IEEE Xplore: 19 January 2017
ISBN Information:
Conference Location: Jeju, Korea (South)
References is not available for this document.

I. Introduction

A spoofing attack refers to an attempt to mimic a target speaker in order to fool a speaker verification (SV) system. Different spoofing techniques are available, such as voice mimicry, playback, voice conversion or speech synthesis [1]. With the recent advances of voice conversion and speech synthesis technologies, open-source toolkits to facilitate voice spoofing has become more prevalent [2], posing serious threats to automatic SV systems. Furthermore, state-of-the-art HMM-based speech synthesizers now require only a few minutes of a speaker's data to perform model adaptation [3], making spoofing techniques easily available. As reported in [4], [2], [5]–[6], [1], synthetic speech greatly compromises the accuracy of SV systems. The false acceptance rate could be as high as 85.5% when using a GMM-UBM based SV system on synthetic speech obtain from HMM-based speech synthesizer trained on Wall Street Journal corpus [4], and as high as 98.08% [2] for a corpus synthesized using the MARY Text-to-Speech Synthesis (MaryTTS) system [7] based on unit selection.

Getting results...

Contact IEEE to Subscribe

References

References is not available for this document.