1. Introduction
Fashion products have become one of the most consumed products in online shopping. Unlike other types of products, fashion products are usually rich in visual elements at different levels of granularity. For instance, besides the overall visual appearance, a fashion product can be described by a set of attributes, such as “shape”, “color” and “style”, which focus on different aspects of the visual representation. Each attribute can be further categorized into various classes. For example, “fit”, “flare” and “pencil” are different classes under attribute “shape” (Fig. 1). Therefore, modeling fashion representation in different granularities is essential for online shopping and other downstream applications, especially those that require analysis of subtle or fine-grained details such as attribute-based fashion manipulation [1], [2], [27] and retrieval [6], [14], [19], [23], [24], fashion copyright [6], [19], and fashion compatibility analysis [11], [15], [21], [23].
Left: existing fine-grained representation learning methods often learn attribute-specific representations for fashion products, thus may not be able to discern the two dresses that have different compositions of visual elements at the class level.right: our proposed method (right) jointly learns attribute and class-specific representations. Therefore, it can discriminate between the two dresses by their class-specific representations.