I. Introduction
Visual attributes are intrinsic properties in images with human-designed names (e.g., “natural,” “smiling”), and they are valuable as higher semantic cues than low level visual features in many interesting scenarios. For example, researchers have shown that visual attributes are valuable for facial verification [3], object recognition [4]–[6], image retrieval/search [7]– [14], video retrieval and recommendation [15], [16], generating descriptions of unfamiliar objects [17] and transfer learning [18]–[21]. Many attributes mining and learning methods have also been proposed [22] –[24]. In these methods, the attributes are binary, which indicates the presence (or absence) of a certain property in an image. Compared with the binary attributes, using relative attributes (RA) is a much richer way for humans to describe objects semantically with relative visual properties. The consecutive relative values of the attributes can reflect not only whether the attribute appears in an image, but also the strength of the attribute. As a richer language of visual description than the commonly used binary attributes, RA learning has gained much attention and can be used in many applications especially social event analysis [25]–[28], and zero-shot learning [2], [29] –[31].