Conferences >2016 IEEE Statistical Signal ...

Study of statistical robust closed set speaker identification with feature and score-based fusion

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In this paper, the statistical combination of Power Normalization Cepstral Coefficient (PNCC) and Mel Frequency Cepstral Coefficient (MFCC) features in robust closed set ...Show More

Metadata

Abstract:

In this paper, the statistical combination of Power Normalization Cepstral Coefficient (PNCC) and Mel Frequency Cepstral Coefficient (MFCC) features in robust closed set speaker identification is studied. Feature normalization and warping together with late score-based fusion are also exploited to improve performance in the presence of channel and noise effects. In addition, combinations of score and feature-based approaches are considered with early and/or late fusion; these systems use different feature dimensions (16, 32). A 4^th order G.712 type IIR filter is employed to represent handset degradation in the channel. Simulation studies based on the TIMIT database confirm the improvement in Speaker Identification Accuracy (SIA) through the combination of PNCC and MFCC features in the presence of handset and Additive White Gaussian Noise (AWGN) effects.

Published in: 2016 IEEE Statistical Signal Processing Workshop (SSP)

Date of Conference: 26-29 June 2016

Date Added to IEEE Xplore: 25 August 2016

ISBN Information:

DOI: 10.1109/SSP.2016.7551807

Conference Location: Palma de Mallorca, Spain

References is not available for this document.

Contents

1. Introduction

Handset effects and the mismatch between training and testing due to AWGN are two of the most important challenges in speaker identification. Another problem is reverberation; to achieve Robust Speaker Identification (RSI), Zhao et al. [1] suggested using binary masking with a deep neural network. Alternatively, denoising algorithms can be used to reduce the system complexity. According to [2] RSI is accomplished for noisy environments with Cochlear Filter Cepstral Coefficients (CFCCs). Unlike previous work higher system enhancement can be produced by exploiting fusion techniques between various types of features. Several researchers have focused only on mismatched noise conditions [3], [4] in order to improve the speaker recognition in both verification and identification tasks. Togneri and Pullella have presented an overview paper [5] based on two major issues which are the accuracy and robustness of speaker identification tasks. However, this work used only limited populations and modern strategies like fusion are missing. In general, several researchers in the speaker identification field concentrated on front-end (feature extraction) in the presence of AWGN such as [6], [8] to improve the system performance, On the other hand, [9] focused on the back-end (classifier) under different types of noise to improve RSI. Basically, the main drawback for these papers is limitation in number of samples used. Although the authors in [10] tried to employ fusion strategies to improve robustness for the speaker identification system, noise and channel effects were not investigated extensively. This study provides new investigations by using combinations of two main features, PNCC features which provide noise robustness in the speaker identification system [11], [13], and MFCC features that typically have better performance in clean speech. In addition, application of two feature compensation methods Feature Warping (FW) and Cepstral Mean and Variance Normalization (CMVN) discussed in [5], [14] are used to reduce noise and the handset channel effects, and alleviate linear/non-linear channel effects. Furthermore, exploitation of fusion techniques depending on feature dimension is considered such as early feature fusion (32 feature dimension), late score fusion (16 feature dimension) and finally combination of feature and score based early and late fusion (32 feature dimension). The major contribution in this work is to perform a thorough evaluation of the scheme first proposed in our previous work [15] by conducting more sophisticated fusion schemes in the presence of handset and AWGN.

References is not available for this document.

Study of statistical robust closed set speaker identification with feature and score-based fusion

Abstract:

Metadata

Abstract:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Study of statistical robust closed set speaker identification with feature and score-based fusion

Alerts

Abstract:

Metadata

Abstract:

1. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?