I. Introduction
With the rapid growth of multiview data, such as image, text, and video on the Internet, there are increasing demands on developing cross-view methods for a variety of applications [1]–[6]. Among them, cross-view retrieval has arisen great interest from the community, which aims to retrieve the interested content across different views/modalities, for example, retrieving the corresponding text counterpart for a given image query. Due to the low storage cost and high query speed of hash codes [7], [8], cross-view hashing (CVH) has achieved promising performance and is becoming increasingly popular for the large-scale multimedia retrieval. Although CVH has been paid more attention to by both academia and industry [9], [10], there still remain many challenges. Especially, different views may lie in completely disparate spaces with large semantic gaps, thus resulting in inferior retrieval performance.