I. Introduction
Crowd counting aims to count the number of humans in a crowd scene, which has drawn increasing attention in recent years due to its wide applications in the real world [1]. Most existing methods generate a density map from the input image and then match it to the corresponding dot annotations. One significant challenge in crowd counting is to associate the pixels of the predicted density map with the target dot annotations in a reasonable way.