I. INTRODUCTION
POMDP has very wide areas of applications including: financial, medical, communications and others. Maintenance is a major area of POMDP applications. Formally, a Markov Decision Process (MDP) can be defined as a dynamic decision making framework that aims at optimally controlling a Markov stochastic process over a given number of future stages, such that, a set of available control actions influence the state transition of the Markov chain at each stage. A PODMP is a generalization of the MDP where the true/actual state of the system is not known exactly to the controller or decision maker; instead, an output signal can be measured from the system. This signal is assumed to be probabilistically related to the true/actual state of the system. Hence, the system is “partially observed” and control actions are made based on the “belief state” of the system. The belief state is a state occupancy vector that has a number of elements equal to the number of system states. Each element represents the probability of the system being in one of its states. Mainly, a POMDP decision making framework consists of the following steps:
Take control action
Gain or loss takes place
Observe a signal from the system
Update belief state (state occupancy vector)
Start next stage