Journals & Magazines >IEEE Transactions on Automati... >Volume: 61 Issue: 10

State Classification of Time-Nonhomogeneous Markov Chains and Average Reward Optimization of Multi-Chains

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In a discrete time nonhomogeneous Markov chain (TNHMC), the states spaces, transition probabilities, and reward functions at different times may be different. In this pap...Show More

Metadata

Abstract:

In a discrete time nonhomogeneous Markov chain (TNHMC), the states spaces, transition probabilities, and reward functions at different times may be different. In this paper, with the confluencity previously introduced, we show that the states of a TNHMC can be classified into the branching states and a number of classes of confluent states (versus the transient and recurrent states in the time homogeneous case). The optimization of average reward in TNHMC's consisting of a single confluent class (uni-chain) have been addressed in a previous paper by the author. In this paper, we show that with confluencity and the state classification and under some bound conditions, we can obtain the necessary and sufficient conditions for optimal policies of the average reward of TNHMCs consisting of multiple confluent classes (multi-chains). Just like in the uni-chain TNHMC case, the sufficient condition does not need to hold in any “zero frequently visited” time sequence. This “under-selectivity” makes the problem not amenable to dynamic programming. A direct comparison based approach is used to prove the results. The results enhance our understanding of state classification and performance optimization with the notion of confluencity.

Published in: IEEE Transactions on Automatic Control ( Volume: 61, Issue: 10, October 2016)

Page(s): 3001 - 3015

Date of Publication: 17 December 2015

ISSN Information:

DOI: 10.1109/TAC.2015.2508739

Funding Agency:

Contents

I. Introduction

In a time nonhomogeneous Markov chain (TNHMC), the states spaces, transition probabilities, and the reward functions may be different at different times. The difficulties encountered in studying the average reward of such Markov chains include: 1) notions crucial to time homogeneous Markov chains (THMC's), such as ergodicity, stationarity, periodicity, and recurrent and transient states, no longer apply; 2) state classification has not been established; and 3) the average reward criterion is under-selective; i.e., it does not depend on the transition probabilities and rewards in any finite period, and thus the problem is not amenable to dynamic programming.

References is not available for this document.

State Classification of Time-Nonhomogeneous Markov Chains and Average Reward Optimization of Multi-Chains

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

State Classification of Time-Nonhomogeneous Markov Chains and Average Reward Optimization of Multi-Chains

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?