I. Introduction
The MPEG-4 standard [1] allows for arbitrarily-shaped video objects to be coded. Within this standard, a scene is viewed as a composition of video objects (VOs) with properties such as shape, motion, and texture. Instances of video object in a given time are called video object planes (VOPs) [2]. This object-based representation permits new applications such as object-based video editing [3], object interaction [4], and object-based indexing and retrieval [5]. One distinct feature of this video coding standard is that the binary shape information is encoded/decoded to represent the video object. The shape information is referred to as alpha plane which is defined as a binary shape map, where pixels are assigned the value of “255” as part of the object and assigned a value of “0” as outside of the object. The context-based arithmetic encoding (CAE) algorithm has been adopted in MPEG-4 for coding the shape information [6]. It is block-based and it relates to the statistics of the data contained within a block. To reduce the bit rate, size conversion is utilized by down-sampling the original shape. However, distortion is introduced.
(a) intra template and context construction. (b) inter template and context construction. The pixel to be coded is marked with ‘?’.