1 Introduction
To debug or analyze a software program running in a processor-based system-on-chip (SoC), it is often necessary to collect the program execution trace directly inside the SoC in real time. However, the problem is that the volume of the real-time trace grows so rapidly that it is impractical to store the trace on chip or send it outside through limited I/O pins. Therefore, various techniques have been proposed to reduce the trace volume directly in hardware. A straightforward approach is to employ a dedicated hardware compressor, which implements some compression algorithms such as the Lempel-Ziv (LZ) algorithm [1]. However, the hardware cost is very high.