1. Introduction
Cross-modal retrieval plays an important role in the abundant appearance of multimedia data on the Internet [1], [2], [3]. The cross-modal retrieval task aims to establish an information retrieval system, which can support querying across content domains, e.g., searching for the related texts through a query image. Due to its low memory consumption and fast computation speed, the binary-based method, which includes hashing and quantization, is one of the most promising solutions for cross-modal retrieval [4], [5].