1. Introduction
Human motion synthesis has recently rapidly developed in a multi-modal generative fashion. Various condition inputs, such as music [34, 33, 32 ], control signals 65], [45, 66, action categories [46], [19], and natural language descrip 69, 18, [16, 47, tions 2, 2], provide a more convenient and human-friendly way to animate virtual characters or even control humanoid robots. It will benefit numerous applications in the game industry, film production, VR/AR, and robotic assistance.-
Our motion latent-based diffusion (mld) model can achieve high-quality and diverse motion generation given a text prompt. The darker colors indicate the later in time, and the colored words refer to the motions with same colored trajectory.