I. Introduction
Understanding human behavior is indispensable for automating many tasks. Human body motion analysis is a branch of understanding human behavior, and it is fundamental for various applications, such as surveillance [1]–[3], human–computer interaction [4]–[6], health care of the elderly [7], [8], human gait analysis [9], [10], and robotics [11]–[16]. A variety of motion capture systems have been built to capture the human motion using different techniques, including wearable devices and sensors, multiple cameras, or even a single camera. Some of the systems combine many devices together to improve the capturing quality and generate accurate positions of the human body joints. Recently, the fast development in machine learning algorithms [17], [18] greatly improved the quality of motion data. Microsoft Kinect [19] can accurately estimate three-dimensional (3-D) human body joints to form a skeleton model from a single depth image using the method in [20]. Real-time motion capture from just a single RGB image is proposed in [21] via a deep learning convolutional neural network (CNN) model to estimate accurate 3-D human body poses.