Multi-Modal Transformer with Skeleton and Text for Action Recognition | IEEE Conference Publication | IEEE Xplore