I. Introduction
A lot of machine-learning problems can be formulated as pairwise learning paradigm [1] where the loss function involves a pair of samples and , where is a hypothesis function. For example, area under the curve (AUC) maximization [2], [3], [4], [5], [6] considers the least-square pairwise loss function as the form of , where and are with different labels. In addition to AUC maximization, many other machine-learning problems, such as metric learning [7], [8], [9], ranking [10], [11], and multiple kernel learning [12] also consider pairwise loss functions. Currently, pairwise learning [13] has been an important research topic in machine learning.