I. Introduction
Establishing accurate correspondences among sequential images plays a crucial role in many computer vision tasks, including wide-baseline stereo [1], image retrieval [2], large-scale visual localization [3], structure-from-motion [4], and 3-D construction [5]. Such correspondences are generally estimated by matching the local features, which can be subdivided into keypoints detection and description. The learning-based description is supervised under contrastive learning, which repulses negative pairs (noncorresponding keypoints) while attracting positive pairs (corresponding keypoints).