I. Introduction
Graphics rendering processes for 3-D require lots of memory operations, including texture mapping, stencil tests, tests, color blending, and vertex data fetching. Because of the large memory accesses, the performance of the rendering engine greatly depends on the bandwidth of the memory system. Among the memory operations, most of the operations are required for 3-D rasterization engine (RE) due to the per-pixel processing characteristics. In conventional high-performance rendering processors, texture mapping is performed before the -test [1], [2]. This architecture properly supports the semantics of the standard APIs, such as OpenGL [3], but the major disadvantage is the unnecessary memory operations of texture mapping and -test for hidden pixels, which waste a great deal of the memory bandwidth.