I. Introduction
Humans internally develop and rely on models of the world around them to make goal-oriented decisions. They have internalized strategies that drive their decision-making based on their goals and observations. In collaborative tasks, human strategies must accommodate their teammates' behaviors in order to successfully meet their mutual objectives. Robots that collaborate with humans must emulate this ability to be effective teammates [1], [2]. Achieving this capability is especially difficult due to the inherently multimodal nature of human interactions. However, even when robots are equipped with an underlying prior on human actions, human actions are extremely noisy and difficult to predict. This situation is especially true in complex, continuous action and state space tasks. Learning the sampling distributions over future motion sequences from successful human demonstrations not only allows the robot to plan over potential outcomes, but also come up with viable actions that result in more fluent interactions (e.g. not pushing the table towards each other, or pulling in opposite directions).