I. Introduction
Digital signal processing (DSP) is considered as the key component of the digital revolution that is currently taking place around the globe. It has been found successful in digital modulation and demodulation, speech and image data compression, speech recognition, synthesis and equalization, spectral estimation and analysis, along with a wide range of adaptive filtering applications [1], [2]. The DSP functionalities are, therefore, appearing increasingly in electronic systems for wired and wireless communication, interactive multimedia systems, biomedical instrumentation, military surveillance and target tracking operations, satellite and aerospace control, remote sensing, and in a host of digital consumer products. Algorithms pertaining to the DSP operations are basically computation-intensive, and most of their applications are of hard-real-time by nature [3]. Apart from that, the DSP systems are very often used in small portable devices which depend mostly on limited battery power [4]. The rigid constraints on size and cost do not usually leave scope for a cooling arrangement in these systems, while the system reliability falls to half for every 10 to 20 degree Celsius rise in temperature [5]. The general-purpose computers, usually, do not meet the cost, size, speed and power dissipation requirement for implementation of such computation-intensive algorithms for realtime applications. Varieties of dedicated and special-purpose devices are, thus, designed especially to handle the signal processing functions. Along with the increasing popularity of digital technology, in the recent years, not only the DSP applications are becoming more prevalent in daily use, but also the algorithms are subjected to more stringent specifications to meet the basic constraints of the application environments. As a natural follow up of the situation, significant research interest has been observed, in the recent years, for developing improved algorithms and architectures to design the DSP systems with less power dissipation, higher speed performance and less area-complexity. But due to mutually conflicting behaviour of these constraints, it has been noticed that one has to trade one or more aspects to meet a more important requirement [6]. Architectural solutions can be obtained to trade area for time and power or to trade time for area and cost, but it is difficult to minimize the cost, area, delay and power all together in a given architecture. Several efforts have been made to minimize the arithmetic complexities of the algorithms in order to reduce the overall area-delay-power complexities [7].