I. Introduction
Partially observable Markov decision processes (POMDPs) [1] represent a planning problem in which an agent performs actions and obtains sensor observations with the goal of maximizing the total long-term reward. POMDPs can address noise in both the sensors and actuators of a robotic agent. Solving a POMDP is the process of computing an action policy that maximizes the total accumulated reward from an arbitrary reward function. The optimal action policy consists of the optimal action for any possible sequence of observations, such that the expected total reward is maximized under the POMDP model of sensing and action uncertainty. POMDPs have been established as a tool to solve a variety of tasks in robot soccer [2], household robotics [3], coastal survey [4] , and even nursing assistance [5].