1 Introduction
Let and be finite non-empty sets, and let be a pair of (correlated) random variables taking values in . Consider the following communication problem between two parties, Alice and Bob. Alice is given a random input , sampled according to the distribution . (We use the same symbol to refer to a random variable and its distribution.) Alice needs to transmit a message to Bob so that Bob can generate a value , that is distributed according to the conditional distribution (i.e., the pair has joint distribution . How many bits must Alice send Bob in any protocol that accomplishes this? It follows from the data processing inequality in information theory that this minimum, which we shall call , is at least the mutual information between, and , that is, I[X\ : \ Y]\ \triangleq\ H[X]+H[Y]-H[X, Y],
The total variation distance between two distribution and is defined as , which is also equal to where is the -norm
to the joint distribution .