I. Introduction
Video analysis and understanding has received extensive attention from academia and industry because of its broad applications in many fields. One important task is action recognition which recognizes the action classes in the trimmed action segments. However, most videos in the real world are untrimmed long videos. Therefore, to implement the existing algorithms for this task, one has to first know when the action happens. Recently, researchers pay more attention to a similar task termed as temporal action location, in which the action category of an untrimmed action instance is recognized, together with its start and end time.