CapFormer: A Space-Time Video Description Model using Joint-Attention Transformer | IEEE Conference Publication | IEEE Xplore