I. Introduction
Recent years have experienced the eruptive development of large video data. Both the huge quantity and the complicated framework of the video material seriously restrict the power of the existing indexing methods, which causes great challenges for scalable video search. In an effort to enable rapid search of these enormous databases, the hash-based Approximate Nearest Neighbor (ANN) search methods have been extensively researched in several fields, including computer vision [1]–[7], recommendation systems [8], machine learning [9], [10], etc. The hashing method represents an appealing solution to large-scale visual retrieval that achieves appealing performance in practice [1], [11] by generating compact binary codes for effective indexing and searching.