I. Introduction
Recommender systems play an irreplaceable role in various online platforms [1], [2], [3], which aim to facilitate information seeking by providing personalized services. A canonical paradigm is solving recommendation as a machine learning problem to model the interaction likelihood between user-item pairs for making recommendations. A de facto standard is learning the recommender model from historical interactions, which however suffers from severe data sparsity issues [4]. The ratio of missing data, i.e., user-item pairs lacking the label of interaction, can reach 99% in many practical cases such as e-commerce [2] and social media [3] due to the huge size of candidate item set which typically increases over time. Worse still, the historical interactions are unevenly distributed over items where long-tail items encounter more missing data, leading to notorious issues like popularity bias [5]. Therefore, it is essential to properly account for the missing data in recommender training.