I. Introduction
The global population is rapidly increasing, with an estimated annual rise of 72 million people, and this trend is projected to continue. As a result, there has been a significant increase in stampede incidents, which frequently occur in situations when big crowds lack effective management, resulting in potentially catastrophic scenarios. Inadequate crowd control, spontaneous rushes for rescue, and seemingly unexplainable disturbances all contribute to these dangerous circumstances. Public gatherings, such as athletic events, political demonstrations, and music concerts, typically draw large audiences, demanding increased security measures. CCTVs are used for a variety of reasons in crowd management, including traffic control, monitoring public locations, anomaly detection, and crowd counting. The value of crowd analysis in management is highlighted by catastrophic events such as the mob crush in Houston, Texas, which took eight lives. Crowd turbulence can be exacerbated by public fear, mob crushes, and a breakdown in authority, making it critical to recognize and handle unmanaged crowds as soon as possible. While human surveillance can detect and respond to odd behaviour, monitoring multiple signals at once in crowded crowds has inherent limits. Tools in the field of crowd analysis have been developed to overcome this difficulty. The academic community has made major contributions to the development of frameworks for autonomous crowd counting in video surveillance, which represents a promising domain within modern AI. However, significant datasets are required to train deep networks for people counting in densely packed images. Traditionally, three ways to crowd counting have been used: the crowd-oriented method, which uses an object detector with a sliding window methodology to count individuals in images, the regression-based method, and the density map-oriented approach.