I. Introduction
Weakly-supervised Temporal Action Localization (WTAL) aims to precisely localize the temporal boundaries of action instances and identify the appropriate action categories in untrimmed videos, with only video-level labels. Given the impracticality and high cost of obtaining detailed annotation in real-world scenarios, WTAL has become the spotlight of current research.