I. Introduction
Human motion capture data, which precisely record human motion, have been widely used as a driver for many applications, such as movie production [1], medical rehabilitation [2], [3], and humanoid robots [4], [5], but the high cost of motion capture and performance limitations of actors hinder wider applications of them. Motion synthesis technologies [6], [7], [8], [9] that can generate motion data without motion capture have drawn much research attention. However, from the conventional motion representation (global joint positions or rotations), many current motion synthesis methods [6], [7], [10] encapsulate high degrees of freedom (DOFs) of the motion data to avoid breaking the naturalness of human motion. This limits the precise control of the motion synthesis process and the diversity of generated motions.