I. Introduction
In Recent years, extensive efforts [1], [2], [3], [4], [5], [6] have been dedicated to the task of multi-label image recognition (MLR), as it offers numerous benefits to various applications ranging from video parsing [7], [8], [9] and scene recognition [10], [11], [12] to human activity analysis [13], [14], [15] and facial analysis [16], [17], [18]. Despite impressive progress, existing approaches presume that each training sample contains complete labels, which require significant staffing and resources to collect large-scale multi-label datasets for training MLR models. Such a collection process is time-consuming, expensive, and impractical, especially when the number of target categories is significant. In light of this, the community's attention has shifted towards weakly supervised multi-label image recognition (WSMLR) [19], [20], [21], which aims to reduce the dependence on complete annotations of multi-label datasets. Among different settings of WSMLR, multi-label recognition with partial labels (MLR-PL) [22], [23], [24], [25] attracts growing attention since it allows training MLR models with incomplete labels per image.