Conferences >2023 IEEE/CVF Conference on C...

Executing your Commands via Motion Diffusion in Latent Space

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

We study a challenging task, conditional human motion generation, which produces plausible human motion sequences according to various conditional inputs, such as action ...Show More

Metadata

Abstract:

We study a challenging task, conditional human motion generation, which produces plausible human motion sequences according to various conditional inputs, such as action classes or textual descriptors. Since human motions are highly diverse and have a property of quite different distribution from conditional modalities, such as textual descriptors in natural languages, it is hard to learn a probabilistic mapping from the desired conditional modality to the human motion sequences. Besides, the raw motion data from the motion capture system might be redundant in sequences and contain noises; directly modeling the joint distribution over the raw motion sequences and conditional modalities would need a heavy computational over-head and might result in artifacts introduced by the captured noises. To learn a better representation of the various human motion sequences, we first design a powerful Variational AutoEncoder (VAE) and arrive at a representative and low-dimensional latent code for a human motion sequence. Then, instead of using a diffusion model to establish the connections between the raw motion sequences and the conditional inputs, we perform a diffusion process on the motion latent space. Our proposed Motion Latent-based Diffusion model (MLD) could produce vivid motion sequences conforming to the given conditional inputs and substantially reduce the computational overhead in both the training and inference stages. Extensive experiments on various human motion generation tasks demonstrate that our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks, with two orders of magnitude faster than previous diffusion models on raw motion sequences.

Published in: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Date of Conference: 17-24 June 2023

Date Added to IEEE Xplore: 22 August 2023

ISBN Information:

ISSN Information:

DOI: 10.1109/CVPR52729.2023.01726

Conference Location: Vancouver, BC, Canada

Funding Agency:

Contents

1. Introduction

Human motion synthesis has recently rapidly developed in a multi-modal generative fashion. Various condition inputs, such as music [34, 33, 32 ], control signals 65], [45, 66, action categories [46], [19], and natural language descrip 69, 18, [16, 47, tions 2, 2], provide a more convenient and human-friendly way to animate virtual characters or even control humanoid robots. It will benefit numerous applications in the game industry, film production, VR/AR, and robotic assistance.- Figure 1.

Our motion latent-based diffusion (mld) model can achieve high-quality and diverse motion generation given a text prompt. The darker colors indicate the later in time, and the colored words refer to the motions with same colored trajectory.

References is not available for this document.

MIT Libraries

MIT Libraries

Executing your Commands via Motion Diffusion in Latent Space

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Executing your Commands via Motion Diffusion in Latent Space

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1. Introduction

References