I. Introduction
As one of the core technologies in the field of video under standing, the importance of Temporal Action Detection (TAD) is not only reflected in the accurate recognition and classification of actions in videos, but also in the close integration with other tasks. For example, in the field of target tracking, TAD can be used to identify and locate the target in the video to provide accurate initialization information for target tracking; in the field of behavior recognition, TAD can be used to extract behavioral features in the video to improve the accuracy and robustness of behavior recognition. With the booming development of deep learning technology, research in the field of TAD has made great progress. However, despite the many achievements, TAD still faces challenges in practical applications. For example, issues such as the duration diversity of actions and the ambiguity of boundaries make temporal action detection still a challenging research area.