I. Introduction
Crowd localization is an essential task in many real-world applications, where it involves identifying the number of individuals in a particular area and their spatial locations [1]. With the increasing availability of UAVs equipped with high-resolution cameras and advanced processing capabilities, UAV-based crowd localization has become necessary for the scenarios of surveillance, emergency response, disaster management, and large-scale event monitoring [2], [3].