I. Introduction
High Efficiency Video Coding (HEVC) is an ongoing standard [1], and the current working draft is a successor to H.264. It significantly outperforms previous standards such as MPEG-2, MPEG-4 Part 2 and H.264 in terms of coding efficiency [2], [3], because new techniques are adopted in HEVC, such as the hierarchical quadtree-structure of motion compensation, large coding treeblock, coding unit (CU), and partition unit (PU). A treeblock is an block of luma samples together with the two corresponding blocks of chroma samples, whose concept is broadly analogous to that of the macroblock (MB) in H.264 [4]. The CU is the basic unit of region splitting used for inter/intra prediction, which allows recursive subdividing into four equally sized blocks. It is always square and may take a size from the size of treeblock down to 8×8. The PU is the basic unit used for carrying the information related to the prediction processes. In general, it is not restricted to being square in shape in order to facilitate partitioning that matches the boundaries of real objects in the image. Intra CUs have two types of PUs, partition and partition but inter CUs have four types of PUs including , , and . HEVC encoders enable 7 different modes including SKIP mode, inter , inter , inter , inter , intra and intra for inter slice. Fig. 1 shows the architecture of tree structured CUs and prediction modes at each depth level. An image is first divided into treeblocks in the test model of HEVC (HM). Each treeblock can further be split into the so-called CUs. For a CU in depth level (X), the procedure shown in the right part of Fig. 1 will be followed and the current CU will be divided into 4 sub-CUs (coming to depth level ). In each level, a CU can be split into 2 or 4 prediction units as shown in the left part of Fig. 1. Similar to the joint model of H.264, the mode decision process in HM is performed using all the possible CU sizes and prediction modes to find the one with the least rate distortion (RD) cost using Lagrange multiplier (more detail can be found in [5]). RD cost for each CU size or prediction mode is defined as follows: $$J_{X} = B_{X} + \lambda_{X} \cdot SSE \eqno{\hbox{(1)}}$$ where specifies bit cost to be considered for CU size decision and mode decision. is the average difference between the current CU and the matching blocks, and is the Lagrange multiplier. However, this “try all and select the best” method will result in high computational complexity and limit the use of HEVC encoders in real-time applications. Therefore, fast algorithms, which can reduce the complexity of CU size decision without compromising coding efficiency, are very desirable for real-time implementation of HEVC encoders.
Illustration of recursive CU structure and modes at each depth level.