I Introduction
Lossless compression of a discrete information source to its entropy rate is a well studied topic. A possibly lesser known approach to this problem is one based on symbolic dynamical systems, where the information generating mechanism is modeled by a randomly initialized iterative mapping of the unit interval to itself, and the emitted source sequence is a quantized observation of that process. For well behaved mappings the source sequence constitutes an expansion of the initial point, i.e., corresponds to a unique such point. Furthermore, the prefixes of this expansion describe the initial point with (exponentially) increasing resolution, and the unit interval can be uniformly partitioned into subintervals so that with high probability, the subinterval containing the initial point will have all its points admitting the same length-n expansion. This leads to a conceptually simple and optimal compression scheme: A finite source sequence is mapped to a representing subinterval by computing the corresponding reverse trajectory of the dynamical system, and is reconstructed by following the trajectory of an arbitrary point in that subinterval
This has a flavor similar to arithmetic coding and (using variable-length coding) essentially coincides with it in some cases, see Example 1.
. A comprehensive study of the symbolic dynamics framework for information sources can be found in [1]. Some of the ideas can be traced back to Rényi, see [2] and references therein.