I. Introduction
Body language – the use of body pose and especially motion for the purpose of communication – is a fast, intuitive and prevalent modality for negotiating courses of action in human-human interaction, especially for collaborative tasks [11], [3], [5]. Because of its ubiquity in human-human interaction, even untrained users could quickly learn to interact with robots if they communicate with body language. To do so, robots need to modulate properties ascribed to motion [3], react visibly, and especially react timely to signals by their interaction partners. As an example, consider the task to hand over an object. It can usually be achieved sufficiently well in several distinct ways, e.g. left-hand or right-hand. Both parties have to agree upon a mutually consistent course of actions in order to succeed though [11]. Traditionally, robots determine this course of actions at specific points in time (discrete decision events). Decisions are always prior to executing an action, and a decision is not reassessed until after completing the action. This behavior is afforded by the use of discrete state machines (e.g. hybrid automata, MDPs, grid worlds) for partitioning interactions. Their state graphs provide numerous advantages for learning, reasoning and planning, especially when the number of states is small. But they also impose a coarse discretization of time, which makes them particularly unsuited for reacting timely and smoothly on continuous streams of perceptual information. Conventional state machines are also unable to perform a speculative execution of actions, i.e. they don’t have the ability to start a reversible action (e.g. reach out for handover) for the purpose of proposing that course of action to the interaction partner without committing to its completion. With speculative execution, the robot buys itself time to observe corroborating or contradicting body motion to reaffirm or reconsider its preliminary choice. If guessed correctly the first time – which is a probable scenario due to cultural norms or frequent repetition – then no extra time is spent on communication, and the interaction is fluent and swift. Timely and proportional feedback via motions also aids the understandability of robots [4]. Further, body language is amenable to imitation, i.e. humans can discover ways to interact with a robot by observing others interact with it. Body motion aware robots may therefore be more intuitive to use for non-expert users.
While one could argue that this could be counteracted by finer-grained discretization, we would also lose all advantages of a small state graph.
Example of facilitating reactive interaction using non-instantaneous state transitions. At first, the robot does not observe any pose that indicates whether left or right hand is preferred. To signal its intent to pass the object, it starts to move. Indecision is conveyed by progressing slowly and mixing both possible reach motions. The human notices the robots intent and reaches out, signaling a preferred side. The robot disambiguates its reach motion quickly to acknowledge the human’s proposal.