Distilling Vision-Language Pre-Training to Collaborate with Weakly-Supervised Temporal Action Localization | IEEE Conference Publication | IEEE Xplore