I. Introduction
Nowadays human body 3D skeleton is used to estimate the pose of the human body based on the previous activity. Human motion prediction is becoming increasingly popular since it enables computers to comprehend human activity. Such a strategy (skeleton-based human motion prediction) might be used in various computer vision and robotics applications, including human-computer interaction, autonomous driving, and pedestrian tracking [1], [2]. Real-world human motion prediction is challenging since the movement is unpredictable, flexible, and non-linear in nature. Many approaches have been put forward to address this problem, such as the traditional state-based and deep-network approaches, which provide group posture features to learn the motion sequence [3]. Although these approaches came up with satisfactory results and tried to handle inherent challenges, studying spatial or temporal relationships among the human body joints in this domain is still an opportunity. The temporal relationships represent inter-frame interaction between displaying the continuous movements, while the spatial relations record the underlying pose. Spatial features are to be calculated by determining the spatial deviation of each frame of the activity sequence. The temporal feature is associated with or changes over time. So, it seems promising to estimate the temporal change of each frame of the activity sequence.