Human-Centric Spatio-Temporal Video Grounding With Visual Transformers | IEEE Journals & Magazine | IEEE Xplore