Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning | IEEE Conference Publication | IEEE Xplore