I. Introduction
Human action recognition from ground-based cameras and/or airborne videos (e.g., from unmanned aerial vehicles) is a challenging problem and has received much attention in the computer vision and robotics literature. Action recognition can enhance human-agent teaming through gesture communication, help search and rescue efforts, enable learning by social imitation, and increase social awareness (e.g., autonomous driving). In recent years, remarkable performance using deep neural networks (DNNs) has been obtained for action classification [2]–[11]. While classification accuracy is critical, equally important are data efficiency and robustness to varying viewpoints.