I. Introduction
In a time nonhomogeneous Markov chain (TNHMC), the states spaces, transition probabilities, and the reward functions may be different at different times. The difficulties encountered in studying the average reward of such Markov chains include: 1) notions crucial to time homogeneous Markov chains (THMC's), such as ergodicity, stationarity, periodicity, and recurrent and transient states, no longer apply; 2) state classification has not been established; and 3) the average reward criterion is under-selective; i.e., it does not depend on the transition probabilities and rewards in any finite period, and thus the problem is not amenable to dynamic programming.