1. Introduction
3D hand pose estimation is essential in various application scenarios, from action recognition and sign language translation to AR/VR [19], [20]. Hand pose estimation has achieved a significant improvement in recent years. However, the progress heavily relies on the emergence of many hand pose datasets with accurate 3D annotations. Acquiring labeled datasets is quite time-consuming and laborious, exposing a realistic challenge for deep learning models to learn with limited and noisy data.