I. Introduction
While robots have reached the hardware capabilities to tackle a wide range of household tasks, generating and executing such motions remains an open problem. The efficient collection of diverse robotic data has become a key factor in teaching such motions via imitation learning [1], [2], [3], [4], [5]. Although a wide variety of interfaces, teleoperation methods, and kinesthetic teaching approaches exist for static manipulators, collecting demonstrations for mobile manipulation platforms is still challenging. Their large number of degrees of freedom (DoF) often overwhelm standard input methods such as joysticks and keyboards or lead to a large cognitive load when trying to coordinate all the necessary buttons and joysticks. While motion tracking systems [6], [7], [8], [9] and exoskeletons [4], [10], [11] provide more intuitive interfaces, they are confronted with the correspondence problem if the morphology of robot and human do not match. Furthermore, exoskeletons are highly specialized, expensive equipment, and tracking-based methods restrict the operator from staying within the tracked area, not allowing them to move freely with the mobile robot and having to operate from afar.