I. Introduction
Existing hybrid video standards, such as MPEG series [2]–[4] and the emerging H.264/AVC [5], mainly consist of a close-loop motion-compensated prediction (MCP) scheme and a transform-based texture coder. The “close-loop” means it uses the reconstructed previous frames to predict the current frame, which forms a feedback loop. The close-loop MCP scheme has been highly optimized for the compression efficiency in the last decade, and H.264/AVC is a landmark of this development. However, for many video applications in the present and the future, the spatial, temporal, and signal-to-noise-ratio (SNR) scalabilities become more and more demanded. The scalability means we can have multiple adaptations in one video bitstream, such as different frame sizes, frame rates, and visual qualities. However, the close-loop MCP scheme is hard to provide these scalabilities while maintaining a high compression efficiency due to the drift problem, which is the mismatch of the reconstructed frame between the encoder and decoder. In order to avoid the drift problem, the compression efficiency will be degraded very much and become unacceptable when there are many scalability layers.