I. Introduction
The objective for traditional robotic motion planning is to compute a trajectory that will move the robot between two valid poses while respecting all physical constraints [1] – [3]. In practice, however, robots suffer from unexpected events like noisy actuation of a wheeled base when navigating uneven terrain, imperfect observations due to unfavorable sensing conditions, or an incorrect map if objects have moved around a room. These kinds of disturbances force the robot to deviate from its current course into a state from which there may be no clear path to the goal. Replanning is one option to address these detours, but this can be computationally prohibitive if the deviations are frequent or constant. A more robust strategy to combat motion planning under uncertainty is to model the problem as a stochastic decision process where the solution is not a single trajectory, but rather a control policy over all possible states of the system to maximize an objective function [3], [4].