I. Introduction
Linear predictive coding (LPC) has been widely used especially in speech signal processing since its introduction in the late 1960s [1]. During the last three decades, the number of available techniques has been growing rapidly [2]–[5]. However, relatively little has been done to change the basic principle of prediction. In linear prediction an estimate for a sample value is given as a linear combination of previous sample values. There are an infinite number of alternative ways to form a linear combination of signal history and use it to predict the next signal value. The most obvious way, a weighted sum of a finite number of previous signal values, is typically utilized in, e.g., coding applications. This selection for sampling of signal history is not based on any mathematical necessity. The underlying mathematical theory is in Wold decomposition principle [6] stating that a regular sequence can be obtained from white noise by filtering with an IIR filter. In traditional LPC, this filter is assumed to be a conventional all-pole filter, and the optimal coefficients, in a least squares sense, can be obtained from the celebrated Yule–Walker equations which obey a more general orthogonality principle of linear prediction [7], [8].