1. Introduction
Word utterance recognition as applied to keyword recognition is concerned with the detection of the occurrence of a set of selected words in a continuous stream of speech. The process involves locating the occurrence of selected keywords in speech containing extraneous (out of vocabulary) speech and noise. Prior methods of recognition typically involved template matching of keyword features with time normalization by dynamic time warping [1], [2]. Features used for creating templates are commonly derived from spectral or log spectral representation of each frame of speech with templates typically formed using parameters from linear prediction model and mel frequency cepstral coefficients. Due to the large amount of training data required for efficient modeling using statistical parametrization models such as the Gaussian mixture model representation of keyword, dynamic time warping, in spite of its large computational requirement, is still considered a viable alternative [3].