I. Introduction
Humans plan and execute body motions based on their intention and the environmental stimulus [1], [2]. As an essential goal of artificial intelligence, generating human-like motion patterns has gained increasing interest from various research communities, including computer vision [3], [4], computer graphics [5], [6], multimedia [7], [8], robotics [9], [10], and human-computer interaction [11], [12]. The goal of human motion generation is to generate natural, realistic and diverse human motions that can be used for a wide range of applications, including film production, video games, AR/VR, human-robot interaction, and digital humans.