1. Introduction
The cross-domain visual recognition problem was firstly explicitly proposed in [33], although many previous works [29], [3], [40], [4], [36]–[38], [7] also implicitly tackled part of such a problem. In such a problem, multiple visual recognition problems in different semantic domains are simultaneously solved through a joint formulation instead of being handled independently. This is based on the intuition that the semantics across different domains are associated with the same visual entity and hence there are intrinsic correlations among them to facilitate the joint inference of all of these visual semantics. For example, we can interpret each photo from people, location and event domain, and then employ the estimated cross-domain correlations to improve the recognition accuracy in each domain, e.g., face recognition in people domain.