Conferences >CONIELECOMP 2012, 22nd Intern...

Keyword word recognition using a fusion of spectral, cepstral and modulation features

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

We present the results of applying a combination of features for recognizing word utterances extracted from a continuous stream of speech. Three sets of features, namely,...Show More

Metadata

Abstract:

We present the results of applying a combination of features for recognizing word utterances extracted from a continuous stream of speech. Three sets of features, namely, spectral energy in Bark bands, mel frequency cepstral coefficients, and parameters from an AM-FM model, were employed for training and testing a set of keywords in the CallHome telephone speech database. A pair-wise comparison between the feature set of an unknown word utterance and that of each of the reference utterances in a dynamic time warping process showed a false negative score of 4 out of 12, and a false positive score of 5 out of 132 for a subset of speech from the database. Long, multisyllabic words were spotted correctly while two short words in the word list contributed to errors.

Published in: CONIELECOMP 2012, 22nd International Conference on Electrical Communications and Computers

Date of Conference: 27-29 February 2012

Date Added to IEEE Xplore: 26 April 2012

ISBN Information:

DOI: 10.1109/CONIELECOMP.2012.6189915

Conference Location: Cholula, Puebla

Contents

1. Introduction

Word utterance recognition as applied to keyword recognition is concerned with the detection of the occurrence of a set of selected words in a continuous stream of speech. The process involves locating the occurrence of selected keywords in speech containing extraneous (out of vocabulary) speech and noise. Prior methods of recognition typically involved template matching of keyword features with time normalization by dynamic time warping [1], [2]. Features used for creating templates are commonly derived from spectral or log spectral representation of each frame of speech with templates typically formed using parameters from linear prediction model and mel frequency cepstral coefficients. Due to the large amount of training data required for efficient modeling using statistical parametrization models such as the Gaussian mixture model representation of keyword, dynamic time warping, in spite of its large computational requirement, is still considered a viable alternative [3].

References is not available for this document.

MIT Libraries

MIT Libraries

Keyword word recognition using a fusion of spectral, cepstral and modulation features

Abstract:

Metadata

Abstract:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Keyword word recognition using a fusion of spectral, cepstral and modulation features

Alerts

Abstract:

Metadata

Abstract:

1. Introduction

References