Loading [MathJax]/extensions/MathMenu.js
State Classification of Time-Nonhomogeneous Markov Chains and Average Reward Optimization of Multi-Chains | IEEE Journals & Magazine | IEEE Xplore

State Classification of Time-Nonhomogeneous Markov Chains and Average Reward Optimization of Multi-Chains


Abstract:

In a discrete time nonhomogeneous Markov chain (TNHMC), the states spaces, transition probabilities, and reward functions at different times may be different. In this pap...Show More

Abstract:

In a discrete time nonhomogeneous Markov chain (TNHMC), the states spaces, transition probabilities, and reward functions at different times may be different. In this paper, with the confluencity previously introduced, we show that the states of a TNHMC can be classified into the branching states and a number of classes of confluent states (versus the transient and recurrent states in the time homogeneous case). The optimization of average reward in TNHMC's consisting of a single confluent class (uni-chain) have been addressed in a previous paper by the author. In this paper, we show that with confluencity and the state classification and under some bound conditions, we can obtain the necessary and sufficient conditions for optimal policies of the average reward of TNHMCs consisting of multiple confluent classes (multi-chains). Just like in the uni-chain TNHMC case, the sufficient condition does not need to hold in any “zero frequently visited” time sequence. This “under-selectivity” makes the problem not amenable to dynamic programming. A direct comparison based approach is used to prove the results. The results enhance our understanding of state classification and performance optimization with the notion of confluencity.
Published in: IEEE Transactions on Automatic Control ( Volume: 61, Issue: 10, October 2016)
Page(s): 3001 - 3015
Date of Publication: 17 December 2015

ISSN Information:

Funding Agency:


I. Introduction

In a time nonhomogeneous Markov chain (TNHMC), the states spaces, transition probabilities, and the reward functions may be different at different times. The difficulties encountered in studying the average reward of such Markov chains include: 1) notions crucial to time homogeneous Markov chains (THMC's), such as ergodicity, stationarity, periodicity, and recurrent and transient states, no longer apply; 2) state classification has not been established; and 3) the average reward criterion is under-selective; i.e., it does not depend on the transition probabilities and rewards in any finite period, and thus the problem is not amenable to dynamic programming.

Contact IEEE to Subscribe

References

References is not available for this document.