Loading [MathJax]/extensions/MathMenu.js
Study of statistical robust closed set speaker identification with feature and score-based fusion | IEEE Conference Publication | IEEE Xplore

Study of statistical robust closed set speaker identification with feature and score-based fusion


Abstract:

In this paper, the statistical combination of Power Normalization Cepstral Coefficient (PNCC) and Mel Frequency Cepstral Coefficient (MFCC) features in robust closed set ...Show More

Abstract:

In this paper, the statistical combination of Power Normalization Cepstral Coefficient (PNCC) and Mel Frequency Cepstral Coefficient (MFCC) features in robust closed set speaker identification is studied. Feature normalization and warping together with late score-based fusion are also exploited to improve performance in the presence of channel and noise effects. In addition, combinations of score and feature-based approaches are considered with early and/or late fusion; these systems use different feature dimensions (16, 32). A 4th order G.712 type IIR filter is employed to represent handset degradation in the channel. Simulation studies based on the TIMIT database confirm the improvement in Speaker Identification Accuracy (SIA) through the combination of PNCC and MFCC features in the presence of handset and Additive White Gaussian Noise (AWGN) effects.
Date of Conference: 26-29 June 2016
Date Added to IEEE Xplore: 25 August 2016
ISBN Information:
Conference Location: Palma de Mallorca, Spain
References is not available for this document.

1. Introduction

Handset effects and the mismatch between training and testing due to AWGN are two of the most important challenges in speaker identification. Another problem is reverberation; to achieve Robust Speaker Identification (RSI), Zhao et al. [1] suggested using binary masking with a deep neural network. Alternatively, denoising algorithms can be used to reduce the system complexity. According to [2] RSI is accomplished for noisy environments with Cochlear Filter Cepstral Coefficients (CFCCs). Unlike previous work higher system enhancement can be produced by exploiting fusion techniques between various types of features. Several researchers have focused only on mismatched noise conditions [3], [4] in order to improve the speaker recognition in both verification and identification tasks. Togneri and Pullella have presented an overview paper [5] based on two major issues which are the accuracy and robustness of speaker identification tasks. However, this work used only limited populations and modern strategies like fusion are missing. In general, several researchers in the speaker identification field concentrated on front-end (feature extraction) in the presence of AWGN such as [6], [8] to improve the system performance, On the other hand, [9] focused on the back-end (classifier) under different types of noise to improve RSI. Basically, the main drawback for these papers is limitation in number of samples used. Although the authors in [10] tried to employ fusion strategies to improve robustness for the speaker identification system, noise and channel effects were not investigated extensively. This study provides new investigations by using combinations of two main features, PNCC features which provide noise robustness in the speaker identification system [11], [13], and MFCC features that typically have better performance in clean speech. In addition, application of two feature compensation methods Feature Warping (FW) and Cepstral Mean and Variance Normalization (CMVN) discussed in [5], [14] are used to reduce noise and the handset channel effects, and alleviate linear/non-linear channel effects. Furthermore, exploitation of fusion techniques depending on feature dimension is considered such as early feature fusion (32 feature dimension), late score fusion (16 feature dimension) and finally combination of feature and score based early and late fusion (32 feature dimension). The major contribution in this work is to perform a thorough evaluation of the scheme first proposed in our previous work [15] by conducting more sophisticated fusion schemes in the presence of handset and AWGN.

Select All
1.
X. Zhao, Y. Wang and D. Wang, "Robust speaker identification in noisy and reverberant conditions", Audio Speech and Language Processing IEEE/ACM Transactions on, vol. 22, no. 4, pp. 836-845, 2014.
2.
Q. Li and Y. Huang, "An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions", Audio Speech and Language Processing IEEE Transactions on, vol. 19, no. 6, pp. 1791-1801, 2011.
3.
N. Wang, P. Ching, N. Zheng and T. Lee, "Robust speaker recognition using denoised vocal source and vocal tract features", Audio Speech and Language Processing IEEE Transactions on, vol. 19, no. 1, pp. 196-205, 2011.
4.
J. Ming, T. J. Hazen, J. R. Glass, D. Reynolds et al., "Robust speaker recognition in noisy conditions", Audio Speech and Language Processing IEEE Transactions on, vol. 15, no. 5, pp. 1711-1723, 2007.
5.
R. Togneri and D. Pullella, "An overview of speaker identification: Accuracy and robustness issues", Circuits and Systems Magazine IEEE, vol. 11, no. 2, pp. 23-61, 2011.
6.
H. Maged, A. Abou El-Farag and S. Mesbah, "Improving speaker identification system using discrete wavelet transform and awgn", Software Engineering and Service Science (IC-SESS) 2014 5th IEEE International Conference on, pp. 1171-1176, 2014.
7.
X. Zhao and D. Wang, "Analyzing noise robustness of mfcc and gfcc features in speaker identification", Acoustics Speech and Signal Processing (ICASSP) 2013 IEEE International Conference on, pp. 7204-7208, 2013.
8.
E. B. Tazi, A. Benabbou and M. Harti, "Efficient text independent speaker identification based on GFCC and CMN methods", Multimedia Computing and Systems (ICMCS) 2012 International Conference on, pp. 90-95, 2012.
9.
A. Khanteymoori, M. Homayounpour and M. Menhaj, "Speaker identification in noisy environments using dynamic Bayesian networks", Computer Conference 2009. CSICC 2009. 14th International CSI, pp. 601-606, 2009.
10.
M. McLaren, N. Scheffer, M. Graciarena, L. Ferrer and Y. Lei, "Improving speaker identification robustness to highly channel-degraded speech through multiple system fusion", Acoustics Speech and Signal Processing (ICASSP) 2013 IEEE International Conference on, pp. 6773-6777, 2013.
11.
S. Y. Chang, B. T. Meyer and N. Morgan, "Spectro-temporal features for noise-robust speech recognition using power-law nonlinearity and power-bias subtraction", Acoustics Speech and Signal Processing (ICASSP) 2013 IEEE International Conference on, pp. 7063-7067, 2013.
12.
F. Kelly and N. Harte, "Auditory features revisited for robust speech recognition", Pattern Recognition (ICPR) 2010 20th International Conference on, pp. 4456-4459, 2010.
13.
C. Kim and R. M. Stern, "Feature extraction for robust speech recognition using a power-law nonlinearity and power-bias subtraction", INTERSPEECH, pp. 28-31, 2009.
14.
R. Zheng, S. Zhang and B. Xu, "A comparative study of feature and score normalization for speaker verification" in Advances in Biometrics, Springer, pp. 531-538, 2005.
15.
M. T. S. Al-Kaltakchi, W. L. Woo, S. S. Dlay and J. A. Chambers, "Study of fusion strategies and exploiting the combination of MFCC and PNCC features for robust biometric speaker identification", 2016 4th International Conference on Biometrics and Forensics (IWBF), pp. 1-6, March 2016.
16.
R. S. S. Kumari, S. S. Nidhyananthan et al., "Fused MEL feature sets based text-independent speaker identification using Gaussian mixture model", Procedia Engineering, vol. 30, pp. 319-326, 2012.

Contact IEEE to Subscribe

References

References is not available for this document.