I. Introduction
A communication system can be modeled by an abstract channel with a set of inputs at the transmitter and a set of corresponding outputs at the receiver. Often times the transmitted symbols (inputs) are different from the receiving symbols (outputs), i.e., errors occur due to many factors such as the physics of signal propagation through a medium or thermal noise. Thus, the goal of a communication system is to transmit the information reliably at the fastest rate. The fastest achievable rate with the error approaching zero for a given channel is the channel capacity which is the maximum mutual information between the input and output random variables. In the case of discrete memoryless channels (DMC), a channel matrix is used to specify the property of the transmissions. Furthermore, for a given channel matrix, the mutual information is a concave function of the input probability mass function (pmf), thus there are efficient convex algorithms to find the channel capacity [1] [2] [3] . On the other hand, in many real-world scenarios, the channel matrix is not given. Rather, the channel matrix is a result of the engineering design under many considerations such as complexity of circuit implementations, power consumption, encoding/decoding speeds, and so on. In this case, the entries in the channel matrix are also the variables to be optimized. Consequently, the mutual information is no longer a concave function of the input pmf, but is a possibly non-concave/convex function in both input pmf and the entries of the channel matrix. Thus, the problem becomes more challenging.