Loading [MathJax]/extensions/MathZoom.js
Human Motion Generation: A Survey | IEEE Journals & Magazine | IEEE Xplore

Abstract:

Human motion generation aims to generate natural human pose sequences and shows immense potential for real-world applications. Substantial progress has been made recently...Show More

Abstract:

Human motion generation aims to generate natural human pose sequences and shows immense potential for real-world applications. Substantial progress has been made recently in motion data collection technologies and generation methods, laying the foundation for increasing interest in human motion generation. Most research within this field focuses on generating human motions based on conditional signals, such as text, audio, and scene contexts. While significant advancements have been made in recent years, the task continues to pose challenges due to the intricate nature of human motion and its implicit relationship with conditional signals. In this survey, we present a comprehensive literature review of human motion generation, which, to the best of our knowledge, is the first of its kind in this field. We begin by introducing the background of human motion and generative models, followed by an examination of representative methods for three mainstream sub-tasks: text-conditioned, audio-conditioned, and scene-conditioned human motion generation. Additionally, we provide an overview of common datasets and evaluation metrics. Lastly, we discuss open problems and outline potential future research directions. We hope that this survey could provide the community with a comprehensive glimpse of this rapidly evolving field and inspire novel ideas that address the outstanding challenges.
Page(s): 2430 - 2449
Date of Publication: 08 November 2023

ISSN Information:

PubMed ID: 37938938

Funding Agency:

References is not available for this document.

I. Introduction

Humans plan and execute body motions based on their intention and the environmental stimulus [1], [2]. As an essential goal of artificial intelligence, generating human-like motion patterns has gained increasing interest from various research communities, including computer vision [3], [4], computer graphics [5], [6], multimedia [7], [8], robotics [9], [10], and human-computer interaction [11], [12]. The goal of human motion generation is to generate natural, realistic and diverse human motions that can be used for a wide range of applications, including film production, video games, AR/VR, human-robot interaction, and digital humans.

Select All
1.
B. Hommel, "Toward an action-concept model of stimulus-response compatibility", Adv. Psychol., vol. 118, pp. 281-320, 1997.
2.
S.-J. Blakemore and J. Decety, "From the perception of action to the understanding of intention", Nature Rev. Neurosci., vol. 2, no. 8, pp. 561-567, 2001.
3.
C. Guo et al., "Generating diverse and natural 3D human motions from text", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 5152-5161, 2022.
4.
J. Kim, H. Oh, S. Kim, H. Tong and S. Lee, "A brand new dance partner: Music-conditioned pluralistic dancing controlled by multiple dance genres", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 3490-3500, 2022.
5.
S. Alexanderson, R. Nagy, J. Beskow and G. E. Henter, "Listen denoise action! audio-driven motion synthesis with diffusion models", ACM Trans. Graph., vol. 42, no. 4, pp. 1-20, 2023.
6.
T. Ao, Z. Zhang and L. Liu, "GestureDiffuCLIP: Gesture diffusion model with CLIP latents", ACM Trans. Graph., vol. 42, 2023.
7.
C. Guo et al., "Action2Motion: Conditioned generation of 3D human motions", Proc. ACM Int. Conf. Multimedia, pp. 2021-2029, 2020.
8.
J. Gao, J. Pu, H. Zhang, Y. Shan and W.-S. Zheng, "PC-dance: Posture-controllable music-driven dance synthesis", Proc. ACM Int. Conf. Multimedia, pp. 1261-1269, 2022.
9.
Y. Nishimura, Y. Nakamura and H. Ishiguro, "Long-term motion generation for interactive humanoid robots using GAN with convolutional network", Proc. ACM/IEEE Int. Conf. Hum.-Robot Interact., pp. 375-377, 2020.
10.
G. Gulletta, W. Erlhagen and E. Bicho, "Human-like arm motion generation: A review", Robotics, vol. 9, no. 4, pp. 102, 2020.
11.
T. Kucherenko, D. Hasegawa, G. E. Henter, N. Kaneko and H. Kjellström, "Analyzing input and output representations for speech-driven gesture generation", Proc. Int. Conf. Intell. Virtual Agents, pp. 97-104, 2019.
12.
T. Yin, L. Hoyet, M. Christie, M.-P. Cani and J. Pettré, "The one-man-crowd: Single user generation of crowd motions using virtual reality", IEEE Trans. Vis. Comput. Graph., vol. 28, no. 5, pp. 2245-2255, May 2022.
13.
A. Chang et al., "Matterport3D: Learning from RGB-D data in indoor environments", Proc. Int. Conf. 3D Vis., pp. 667-676, 2017.
14.
G. Tevet, S. Raab, B. Gordon, Y. Shafir, D. Cohen-or and A. H. Bermano, "Human motion diffusion model", Proc. Int. Conf. Learn. Representations, 2023.
15.
J. Tseng, R. Castellon and K. Liu, "EDGE: Editable dance generation from music", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 448-458, 2023.
16.
Z. Wang, Y. Chen, T. Liu, Y. Zhu, W. Liang and S. Huang, "HUMANISE: Language-conditioned human motion generation in 3D scenes", Proc. Adv. Neural Inf. Process. Syst., pp. 14959-14971, 2022.
17.
Y. LeCun, Y. Bengio and G. Hinton, "Deep learning", Nature, vol. 521, pp. 436-444, 2015.
18.
Y. Bengio, R. Ducharme and P. Vincent, "A neural probabilistic language model", Proc. Adv. Neural Inf. Process. Syst., pp. 932-938, 2000.
19.
D. P. Kingma and M. Welling, "Auto-encoding variational bayes", Proc. Int. Conf. Learn. Representations, 2014.
20.
D. Rezende and S. Mohamed, "Variational inference with normalizing flows", Proc. Int. Conf. Mach. Learn., pp. 1530-1538, 2015.
21.
I. Goodfellow et al., "Generative adversarial nets", Proc. Adv. Neural Inf. Process. Syst., pp. 2672-2680, 2014.
22.
J. Ho, A. Jain and P. Abbeel, "Denoising diffusion probabilistic models", Proc. Adv. Neural Inf. Process. Syst., 2020.
23.
T. Brown et al., "Language models are few-shot learners", Proc. Adv. Neural Inf. Process. Syst., pp. 1877-1901, 2020.
24.
L. Ouyang et al., "Training language models to follow instructions with human feedback", Proc. Adv. Neural Inf. Process. Syst., pp. 27730-27744, 2022.
25.
T. Karras, S. Laine and T. Aila, "A style-based generator architecture for generative adversarial networks", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 4396-4405, 2019.
26.
T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen and T. Aila, "Analyzing and improving the image quality of StyleGAN", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 8107-8116, 2020.
27.
T. Karras et al., "Alias-free generative adversarial networks", Proc. Adv. Neural Inf. Process. Syst., pp. 852-863, 2021.
28.
J. Ho, T. Salimans, A. Gritsenko, W. Chan, M. Norouzi and D. J. Fleet, "Video diffusion models", 2022.
29.
I. Skorokhodov, S. Tulyakov and M. Elhoseiny, "StyleGAN-V: A continuous video generator with the price image quality and perks of StyleGAN2", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 3626-3636, 2022.
30.
S. Yu et al., "Generating videos with dynamics-aware implicit generative adversarial networks", Proc. Int. Conf. Learn. Representations, 2022.
Contact IEEE to Subscribe

References

References is not available for this document.