1. Introduction
Scalable video retrieval seeks similar videos from a large database given a query video. Usually, videos are represented by sampled frames and each frame is characterized by a representative feature. The set of frame features are utilized to identify relevant videos or nearest neighbors. In the face of high dimensional features and large scale datasets, hashing methods have attracted a lot of attention in scalable visual retrieval [1–10]. Video hashing methods encode frame features of each video into a compact binary code while enabling the similarity between videos to be preserved in the Hamming space [11–22]. Among them, learning-based video hashing methods which learn data-dependent and task-specific hashing functions have achieved good search accuracy [23–25].