I Introduction
Demands for high resolution video processing techniques are rapidly increasing as high-definition digital broadcasting services are widely provided. Therefore, standards and techniques for video compression and decoding are being actively developed. One of the most popular video codec standards is H.264/AVC. One main advantage of H.264/AVC is that it is capable of providing good video quality at substantially lower bit rates than previous standards. However, it is very challenging to achieve high performance by a software implementation because its complexity is pretty high. Therefore hardware implementations have been used commonly. Recently, enhancing performance through intelligent parallel processing with multiple cores integrated in a single chip is widely adopted. Since a multi-core system typically has better power efficiency than a single core system with a comparable processing power, it is expected that many high performance embedded systems will adopt multiprocessor system-on-chip platforms soon. Even though multi-core systems will provide potentially ample computation power, it is not straightforward to achieve such high performance because efficient parallel programming for a multi-core system is difficult. Thus, it is crucial to re-write an existing software implementation into a new implementation which is more suitable for parallel processing. Also, it is very important to understand the target multi-core platform to achieve high performance. Understanding a target platform includes understanding of not only the processor itself but also the memory system and the on-chip interconnection system.