Loading web-font TeX/Math/Italic
A Novel Framework for Vehicle Detection and Tracking in Night Ware Surveillance Systems | IEEE Journals & Magazine | IEEE Xplore

A Novel Framework for Vehicle Detection and Tracking in Night Ware Surveillance Systems


The architecture of the proposed system

Abstract:

In the field of traffic surveillance systems, where effective traffic management and safety are the primary concerns, vehicle detection and tracking play an important rol...Show More

Abstract:

In the field of traffic surveillance systems, where effective traffic management and safety are the primary concerns, vehicle detection and tracking play an important role. Low brightness, low contrast, and noise are issues with low-light environments that result from poor lighting or insufficient exposure. In this paper, we proposed a vehicle detection and tracking model based on the aerial image captured during nighttime. Before object detection, we performed fogging and image enhancement using MIRNet architecture. After pre-processing, YOLOv5 was used to locate each vehicle position in the image. Each detected vehicle was subjected to a Scale-Invariant Feature Transform (SIFT) feature extraction algorithm to assign a unique identifier to track multiple vehicles in the image frames. To get the best possible location of vehicles in the succeeding frames templates were extracted and template matching was performed. The proposed model achieves a precision score of 0.924 for detection and 0.861 for tracking with the Unmanned Aerial Vehicle Benchmark Object Detection and Tracking (UAVDT) dataset, 0.904 for detection, and 0.833 for tracking with the Vision Meets Drone Single Object-Tracking (VisDrone) dataset.
The architecture of the proposed system
Published in: IEEE Access ( Volume: 12)
Page(s): 88075 - 88085
Date of Publication: 20 June 2024
Electronic ISSN: 2169-3536

Funding Agency:


SECTION I.

Introduction

Vehicle recognition in aerial images is crucial for both military and civilian applications. Military target strikes and traffic control can both benefit from the use of this technology. Researchers have proposed various techniques for object recognition in aerial photos in the daytime with sufficient lightning, producing remarkable results [1]. However, vehicle detection of objects in low light conditions is a challenging and significant issue with surveillance camera applications. In low-illumination conditions there is less information available and difficulty extracting enough useful features, such as at night time there is background light interference, the object is underexposed, and brightness and contrast are poor which results in low-image quality [2].

The generic object detection method has poor accuracy and a limited ability to extract the intended objects. Consequently, in low brightness, capturing every detail of a scene is impossible, and a lot of detailed information, such as colour and texture, is lost. This is especially true in distant view or aerial images, where the objects are frequently far away and small, having low contrast against the background [3]. One of the solutions to this problem is to use specialized and improved hardware, which might get very costly. Therefore, more algorithmic solutions are focused on by the researchers.

Recent studies on low lighting concentrate on image enhancement to improve basic visual properties in the pre-processing steps [4], [5], [6]. These methods include global and local enhancement techniques. The global enhancement techniques when applied to night-time images may over-expose already bright parts of the images, however, local contrast enhancement methods focus on image details, but it increases noise when contrast gain is high [4]. However, deep learning models can produce better results to enhance image contrast. By applying a simple histogram equalization method or gamma correction, the contrast of road and vehicle headlights was increased that decreases the image quality for vehicle detection. Therefore, we applied the low light enhancement model miRNet on nighttime traffic sequences as it produces robust results and prevents over-exposure of car and road lights. Our proposed model consists of the following steps: all the extracted nighttime image sequences are first subjected to defogging and then fed into miRNet model. The enhanced images are then passed onto the YOLOv5 object detection algorithm to locate vehicles in each image frame. For each detection, SIFT features and templates are extracted, based on which IDs are assigned to each one. The extracted templates are used to find the possible matches in succeeding image frames which are filtered to get the best possible match by feature matching. In the last, the trajectories of all the tracked vehicles are drawn by plotting the centroid points.

The main contribution of our work is as follows:

  • An efficient and computationally lightweight vehicle detection and tracking algorithm for night-time aerial image sequences is established.

  • We used the deep learning model YOLOv5 to detect objects in night-time aerial images containing dense scenes to enhance the detection rate.

  • A simple and efficient multi-vehicle tracking approach that uses SIFT features which are robust to noise, light transformation, and angle view for identifier allocation, and a template-matching model is proposed.

The night sequences from the publicly available datasets UAVDT [5] and VisDrone [6] were used to evaluate our vehicle recognition and tracking system. Our proposed model produces efficient results on both datasets as compared to other techniques.

The remainder of the paper is structured as follows: Section II presents related work. Our proposed architecture is thoroughly explained in Section III. Benchmark datasets and the experimental findings are described in Section IV. Section V presents the conclusion and suggested next steps.

SECTION II.

Literature Review

Numerous researchers have focused on object detection in low-light conditions. In most cases, the images are first subjected to pre-processing to enhance brightness levels. Also, traffic monitoring work has been done, including vehicle detection and tracking as the core steps [7], [8], [9], [10]. Therefore, this section is divided into two categories: object detection in low-light conditions and vehicle detection and tracking methodologies.

A. Object Detection in Low-Light Conditions

One of the most extensively used machine learning approaches for image enhancement includes histogram equalization, which is easy to implement and consumes low computational power [11]. However, due to excessive gray merging, gray levels are easily lost in \gamma -correction. It is predicated on the hypothesis that the sensitivity of the human eye to ambient light is exponentially related to the input light intensity. Human eyes can more easily detect changes in low illumination, but it becomes more difficult for them to perceive brightness fluctuations as illumination levels increase. Gamma adjustment increases the visibility of the contrast effect of image illumination. However, it can be difficult to automatically determine a suitable gamma value while performing image processing on the source image. Reference [12] developed an improved SSD-based low-illumination image object detection technique, and the Retinex theory-based image enhancement algorithm was used to improve the original low-illumination image. In another study [13], a real-time object recognition method for nighttime monitoring is presented. The detection algorithm is built on methods for contrast analysis. The model inputs two image frames and calculates the change between the contrast of both images to detect object masks. However, this model can only detect moving objects. Tian et al. [14] proposed a statistical modelling technique for photos with low illumination and uneven illumination based on image wavelet coefficients. Tian et al. [14] has proposed an efficient pre-processing technique that uses a dynamic function of the pixel values in spatial neighbourhoods to improve underexposed, low dynamic range videos. This method has a very high level of computational complexity.

B. Vehicle Detection and Tracking

Several studies are focusing on vehicle detection and tracking methods [16]. In [18], an effective Gaussian Mixture Model (GMM) based image segmentation technique is applied. This technique may identify different automobiles’ frontal views. The Canny edge detector and Hough transform are used for spotting lanes to determine the vehicles’ driving area. This work trains the Support Vector Machine (SVM) classifier using the Histogram of Gradient (HOG) features, colours, and Haar features of automobiles to increase the efficacy of the proposed technique further. Also, to detect vehicles, an upgraded You Look Only Once version 3 (YOLOv3) algorithm is created. The data collection is first clustered using a clustering analysis approach, and the network topology is then optimized to increase the number of final output grids and improve the relatively weak vehicle prediction capacity. In another study [20], an intelligent transport system has been proposed which uses a Kalman filter and YOLO detector. The model also generates track IDs and uses a Hungarian algorithm to retrieve them. Similarly, [21] creates a Simple, Online, real-time tracking (SORT) technique based on the Kalman filter and the Hungarian matching algorithm, using the Faster Regional Convolutional Neural Network (R-CNN) algorithm as the target detector to track multiple targets concurrently. One drawback of the SORT algorithm is that it does not incorporate appearance features. Another study [22] presents a vehicle recognition and monitoring method using the Gaussian Mixture Model (GMM) and blob extraction method. Firstly, the background was estimated and subtracted from frames to extract the foreground objects. For further noise removal, morphological corrections were employed. Tracking of the vehicle was enhanced using the GMM algorithm. Ait Abdelali et al. [23] developed a vision-based traffic monitoring model. The detection of vehicles is done using the deep learning-based YOLO detector. For tracking, a particle filter is implemented. Mou et al. [24] proposed a detection method based on segmenting the aerial image into similar regions using a Convolutional Neural Network (CNN). Then, a trained SVM classifier was used to track and classify vehicles. The training of two different classifiers increases the complexity and computational cost of the model which limits its applicability to large datasets.

In this paper, we aim to introduce a lightweight vehicle detection and tracking approach which requires limited training. Also, we have combined deep learning and machine learning techniques to increase the model efficiency.

SECTION III.

The Proposed Framework

Fig. 1 depicts the general architecture of the proposed model. The model mainly consists of five modules: (i) pre-processing steps to enhance the brightness level of nighttime images; (ii) vehicle detection using the deep learning model YOLOv5; (iii) SIFT feature extraction of each detected vehicle and identifier assignment; (iv) vehicle tracking using the template matching algorithm; and (v) drawing trajectories of each tracked vehicles. Each module of the framework is explained in detail in the following subsections. a machine inspection dataset, the suggested system is tested, and the findings demonstrate that it outperforms a number of cutting-edge object recognition.

FIGURE 1. - The architecture of the proposed system for low-illumination conditions.
FIGURE 1.

The architecture of the proposed system for low-illumination conditions.

A. Image Pre-Processing

1) Defogging

The input images extracted from nighttime traffic videos at the rate of 8 FPS were first resized to 768\times 768 coordinates.

To denoise the image, we applied the defogging method [25], [26] which estimated the intensity of noise in each image pixel and then removed as follows:\begin{equation*} I\left ({{ x }}\right)=U\left ({{ x }}\right)t\left ({{ x }}\right)+K(1-t\left ({{ x }}\right)) \tag {1}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where x represents the location of the pixel, K is the density of the fog, and t(x) is the transmission map [27]. The visualization of the defogging process is shown in Fig 2.

FIGURE 2. - Defogging process over UAVDT and VisDrone datasets (a) original images and (b) defogged images.
FIGURE 2.

Defogging process over UAVDT and VisDrone datasets (a) original images and (b) defogged images.

2) Low-Light Enhancement Using MIRNet

After denoising the images, the next step is to enhance the brightness level of the images to locate the objects easily. For this purpose, we used the pre-trained model MIRNet. All the images are passed onto the contrast enhancement module. MIRNet is a pre-trained fully convolutional deep learning architecture that retains spatially exact high-resolution representations over the whole network while receiving significant contextual information from the low-resolution representations [28]. The model consists of a feature extraction module that maintains the high-resolution original features to reserve fine spatial details while computing a complementary collection of features at various spatial scales [29]. Also, the characteristics from numerous multi-resolution branches are gradually integrated for better representation learning using a recurring information exchange mechanism [30]. It uses a technique for fusing features from different scales that correctly maintains the original information of the feature at each spatial level while dynamically combining varying receptive fields. To simplify the learning process, the recursive residual gradually decomposes the input image, enabling deep networks to be built [31].

The output of the MIRNet image enhancement is shown in Fig. 3. Also, the overall architecture of MIRNet is seen in Fig. 4.

FIGURE 3. - Low-light image enhancement using MIRNet over UAVDT and Visdrone datasets.
FIGURE 3.

Low-light image enhancement using MIRNet over UAVDT and Visdrone datasets.

FIGURE 4. - Architecture of MIRNet for image enhancement where RRG = Recursive Residual Group, MRB = Multiscale Residual Block, DAU = Dual Attention Unit, SKFF = Selective Kernel Feature Fusion.
FIGURE 4.

Architecture of MIRNet for image enhancement where RRG = Recursive Residual Group, MRB = Multiscale Residual Block, DAU = Dual Attention Unit, SKFF = Selective Kernel Feature Fusion.

B. YOLO v5-Based Vehicle Detection

Because of its high-performance capabilities, YOLO algorithms are frequently used in object detection systems, especially for vehicle detection tasks. YOLO sees an image as a regression problem with fast speed [32]. While training, YOLO takes the entire image as input training, paying more attention to global information for target detection and returns the position of the object bounding box [33], [34], [35].

YOLOv5 is a single-stage detector that considerably reduces the processing time of deeper networks. As we have already processed the images to make them viable for detection, therefore, we used YOLOv5 for detection purposes to keep the system lightweight as well as efficient. Also, it performs better in small target detection [35]. Four primary parts make up the construction of YOLOv5: input, backbone, neck, and head.

1) Backbone

The backbone selects the important component from the input image for further analysis. YOLOv5 uses spatial pyramid pooling (SPP) and cross-stage partial networks (CSP) as its main building blocks to extract rich, significant information from input images. SPP can be used for the same object detection in multiple sizes and scales enhancing the model’s generalization.

2) Neck

It consists of a path aggregation network (PANet) and the Feature Pyramid Network (FPN). The primary function of PANet is to generate feature pyramids. FPN structure improves the bottom-up path and low-level feature propagation. Also, Localization features are sent from lower feature maps to higher feature maps via the PAN framework.

3) Head

The output layer consists of three convolution layers to predict the location of the object bounding box and scores. YOLOv5 uses the Sigmoid Linear Unit (SiLU) activation function in hidden layers and the Sigmoid activation function is utilized in the convolution operation of the output layer calculated as follows [36]:\begin{equation*} SiLU\left ({{ x }}\right)=x\times \sigma (x) \tag {2}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where \sigma (x) is the logistic sigmoid.\begin{equation*}S\left ({{ x }}\right)=\frac {1}{1+e^{-x}} \tag {3}\end{equation*}
View SourceRight-click on figure for MathML and additional features.
where S(x) is the sigmoid function.

The loss function of the overall structure is calculated as given below:\begin{equation*} Loss=\lambda _{1}L_{cls}+\lambda _{2}L_{obj}+\lambda _{3}L_{loc} \tag {4}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where L_{cls} , L_{obj} and L_{loc} are the classes loss, objectness loss, and location loss respectively. The data was split into 70:30 ratios for training and testing respectively. The detailed configuration of the YOLOv5 model is given in Table 1.

TABLE 1 Parameter Configuration for YOLOv5 Algorithm
Table 1- Parameter Configuration for YOLOv5 Algorithm

The architecture of the YOLOv5 algorithm is shown in Fig. 5.

FIGURE 5. - The architecture of the YOLOv5 model.
FIGURE 5.

The architecture of the YOLOv5 model.

The image frames were divided into bursts of five images. The detection is performed on the first image of each burst while tracking was done on the next four images. The vehicle detection result using the YOLOv5 algorithm is visualized in Fig. 6.

FIGURE 6. - Vehicle detection using YOLOv5 over UAVDT and VisDrone datasets.
FIGURE 6.

Vehicle detection using YOLOv5 over UAVDT and VisDrone datasets.

C. Identifier Number Assignment

As each image frame contains multiple vehicles to be tracked in the succeeding frames. Therefore, an identifier was required to locate each vehicle separately that should remain the same for a particular vehicle throughout the tracking. For this purpose, every detected vehicle was subjected to SIFT feature extraction [37], [38]. Based on this a unique identifier number was assigned to each car.

The SIFT features are local making them robust against occlusion and clutter [35], [36], [37]. The feature extraction algorithm consists of the following four steps.

1) Scale Space

This step includes selecting potential areas in the image to find features [38], [39]. The input image is convolved with a Gaussian kernel at various scales to produce the function L(x,y,\sigma) , which denotes the scale space [41] of the image given as.\begin{equation*} L(x,y,\sigma)=G(x,y,\sigma)^{\ast }I\left ({{ x,y }}\right) \tag {5}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where I is the input image with x, y coordinates. \sigma represents the scale parameter and G(x,y,\sigma) denotes the Gaussian blur operator which is calculated as follows:\begin{equation*} G(x,y,\sigma)=\frac {1}{2\pi (\sigma)^{2}} e^{- -\frac {x^{2}+ y^{2}}{2\sigma ^{2}} } \tag {6}\end{equation*}
View SourceRight-click on figure for MathML and additional features.
The size of the source image determines the number of octaves and scale in scale space. The Difference of Gaussian is further used to approximate Laplacian of Gaussian which is scale invariant.

2) Key Point Localization

To update the keypoint location, the Taylor series expansion of scale space is used to locate the extrema with greater accuracy, if the intensity at the extrema is less than a certain threshold it is rejected.

3) Orientation Assignment

Each keypoint was assigned an orientation to make the extracted keypoint invariant to rotation. The neighbourhood around the keypoint position is chosen depending on the scale, and the gradient’s amplitude and direction are defined as follows:\begin{align*}\left |{{ I }}\right |& =\sqrt {I_{x}^{2}+ I_{y}^{2 }} \tag {7}\\ \Theta & ={tan}^{-1}\left ({{ \frac {I_{y}}{I_{x}} }}\right) \tag {8}\end{align*}

View SourceRight-click on figure for MathML and additional features. where I_{x} and I_{y} are x,y coordinates of the descriptors. A 360-degree orientation histogram with 36 bins is produced.

4) Key Point Detector

To calculate the local image descriptor a 16\times 16 window is taken around the keypoint which is further separated into 16 subblocks of 4\times 4 size.

5) Key Point Matching

The matching between two images is obtained by identifying the nearest neighbour of key points using the formula given below:\begin{equation*}\left ({{ u,v }}\right)=\sqrt {\sum \nolimits _{i=1}^{n} {(v_{i}{-v}_{i})^{2}}} \tag {9}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where u and v are the key points descriptors.

If the number of matches exceeds threshold value 6, then the corresponding vehicle’s identifier number is retrieved and assigned to the matched vehicle in the succeeding frame. If no match is found, the vehicle is added as a new entry with a unique number. the identifier assignment to each detected vehicle is shown in Fig. 7.

FIGURE 7. - Identifier assignment to each detected vehicle based on SIFT feature extraction over UAVDT and VisDrone datasets.
FIGURE 7.

Identifier assignment to each detected vehicle based on SIFT feature extraction over UAVDT and VisDrone datasets.

D. Template Matching-Based Vehicle Tracking

To lower the computational complexity of the model, we used a template-matching algorithm to avoid unnecessary feature extraction. A template model has been generated for every new vehicle registered in the system [43], [44]. This generated model was used to locate all the possible locations of the vehicle in the following frame. The template matching algorithm moves the template across the entire image and a similarity score is calculated between the area covered by the window and the template [45], [46], [47], [48]. The matching is implemented through a 2-dimensional convolution:\begin{equation*} l\left ({{ x,y }}\right)=f(x,y) ^{\circ } g(x,y) \tag {10}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where f(x,y) is the original image frame and g(x,y) is the vehicle template.

The extracted templates contain texture and appearance information which helps to find its match. If an image has more than one possible location detected, then it is subjected to SIFT feature matching to get the best match and the associated identifier number [49], [50]. Vehicle templates that are not found in the succeeding frames are retained and matched for the next 5 frames before deletion to handle the occlusion within the tracked images. The tracking results are shown in Fig. 8. The steps involved in template-based matching are given in Algorithm 1.

FIGURE 8. - Vehicle Tracking using Template Matching (a) vehicle model extracted from detection (b) number of template matchings is greater than 1 (c) SIFT feature extraction and matching with the template matches (d) best possible location retained across the image frames.
FIGURE 8.

Vehicle Tracking using Template Matching (a) vehicle model extracted from detection (b) number of template matchings is greater than 1 (c) SIFT feature extraction and matching with the template matches (d) best possible location retained across the image frames.

Algorithm 1 Vehicle Detection and Tracking

Input:

vehicle detections V = {v1,v2,v3,.......vn} where vn = (x1,y1),(x2,y2), Input_image I, Frames F = {f1,f2,.......,fn}

Output:

The tracking results

1:

Initilize feature_list =[], thresh =6, vehicle_model =[]

2:

for i in range (V)

3:

x,y,w,h \leftarrow V[i]

4:

ROI = Extract Region_of_interest(I(x,y,x+w,y+h))

5:

{f} \leftarrow SIFT (ROI)

6:

feature_list \leftarrow f,i

7:

vehicle_model \leftarrow ROI

8:

while F \gt 0

9:

for j in range (vehicle_model)

10:

matches = template matching(vehicle_model[j], F)

11:

if matches >1 then

12:

fm = F eatureMatching(matches, feature_list)

13:

if fm > thresh then

14:

Retrieve and assign corresponsing ID and discard other matched templates

15:

else

16:

Retrieve corresponding ID and assign to matched vehicle

end if

end if

end

E. Trajectories Approximation

Each tracked vehicle’s path was recorded and plotted against each video frame to understand the traffic flow conditions and routes. To estimate the trajectories [51], the final match obtained from the tracking algorithm for each vehicle was recorded by calculating the rectangular centroid of each vehicle against the frame number taken as a reference for the time stamp. The centroids are calculated as:\begin{equation*} rectangular-centroids_{vehicles}=\left ({{\frac {x1+x2}{2},\frac {y1+y2}{2}}}\right) \tag {11}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

The points were plotted and joined with time information incorporated as shown in Fig. 9.

FIGURE 9. - Vehicle trajectories approximation is estimated by joining the centroid of each vehicle location against the identifier number ID and Frame number.
FIGURE 9.

Vehicle trajectories approximation is estimated by joining the centroid of each vehicle location against the identifier number ID and Frame number.

SECTION IV.

Experiments and Results

The experiments were conducted using a laptop with an Intel Core i5-8550U 1.80GHz processor, 6GB of Random Access Memory (RAM), Windows 10 running on the x64 architecture, and the Python tool. Also, to compare the performance of CPU and GPU. We ran the experiment on Tesla K80 GPU which is available free on Google Colab. The training time on the CPU was 1.3 hrs whereas it took 0.86 hrs to train on the GPU. However, there was no difference in the precision values. The proposed model produces remarkable results when tested on two benchmark datasets: UAVDT and VisDrone datasets.

A. DATASETs

1) VisDrone Dataset

The Vision Meets Drone Single Object-Tracking (VisDrone) dataset contains 288 clips of videos with a total of 261,908 frames and 10,209 still photos taken by several drones equipped with cameras and covering a variety of places. We used traffic image sequences taken at nighttime to test our model. Some of the sample images from the VisDrone dataset are displayed in Fig. 10.

FIGURE 10. - Sample frames from the VisDrone dataset.
FIGURE 10.

Sample frames from the VisDrone dataset.

2) VisDrone Dataset

The other dataset includes the Unmanned Aerial Vehicle Detection and Tracking (UAVDT) benchmark dataset. It consists of traffic sequences recorded using a UAV platform in various urban settings. Each frame is in jpg format with a 1080\times 540 pixels resolution. Sample images from the UAVDT dataset are shown in Fig. 11.

FIGURE 11. - Sample frames from the UAVDT dataset.
FIGURE 11.

Sample frames from the UAVDT dataset.

B. Evaluation of Detection and Tracking Algorithm

We used three performance metrics to assess our proposed detection and tracking algorithm specially designed for low illumination conditions: Precision, Recall and F1-score. These parameters are calculated as follows:\begin{align*} Precision& = \frac {True Positive}{(True Positive+False Positive)} \tag {12}\\ Recal& = \frac {True Positive}{(True Positive+False Negative)} \tag {13}\\ F1& = \frac {2\times Precision\times Recall}{Precision + Recall} \tag {14}\end{align*}

View SourceRight-click on figure for MathML and additional features. Table 2 shows the evaluation of the YOLOv5-based detection algorithm. Whereas the tracking algorithm evaluation is given in Table 3.

TABLE 2 Precision, Recall, and F1-Score for the Detection Algorithm
Table 2- Precision, Recall, and F1-Score for the Detection Algorithm
TABLE 3 Precision, Recall, and F1-Score for the Tracking Algorithm
Table 3- Precision, Recall, and F1-Score for the Tracking Algorithm

C. Comparision with Other Methods

We compared our proposed model with other methods in terms of precision score. Our model outperforms other techniques for both vehicle detection and tracking. Table 4 demonstrates the contrast of our proposed detection model with other methodologies.

TABLE 4 Comparison of Detection Algorithm With Other Methods
Table 4- Comparison of Detection Algorithm With Other Methods

Table 5 shows the comparison of our proposed tracking algorithm with other methodologies. It can be seen that our model produces efficient results.

TABLE 5 Comparison of Tracking Algorithm With Other Methods
Table 5- Comparison of Tracking Algorithm With Other Methods

A comparison of detection and tracking techniques with state-of-the-art techniques has been demonstrated in Tables 6 and 7.

TABLE 6 Comparison of Detection Algorithm With State-of-the-Art Techniques
Table 6- Comparison of Detection Algorithm With State-of-the-Art Techniques
TABLE 7 Comparison of Tracking Algorithm With State-of-the-Art Techniques
Table 7- Comparison of Tracking Algorithm With State-of-the-Art Techniques

SECTION V.

Limitations

The proposed method performs well for nighttime surveillance of road traffic. However, there are still some limitations of the model. The system can detect vehicles in case of partial occlusion or cluttering, but a separate method is required to eliminate the full occlusion or background cluttering problem due to low contrast as shown in Fig. 12. Moreover, the model does not take into account the pedestrians, bicycles or bikes. Moreover, diverse weather conditions such as images taken in cloudy, foggy or rainy weather require other pre-processing methodologies which are beyond the scope of our model.

FIGURE 12. - Vehicles left undetected are marked in red circles.
FIGURE 12.

Vehicles left undetected are marked in red circles.

SECTION VI.

Conclusion and Future Work

In this study, we propose a lightweight and efficient vehicle detection and tracking algorithm specially designed for low-illumination conditions. First of all, we pre-processed the nighttime traffic scenes to adjust the brightness level of the image. Then, we applied semantic segmentation based on FCM clustering to segment the image into multiple uniform regions to reduce the overall complexity. For detection, we used YOLOv5 which can detect small objects precisely. We assign identifiers based on SIFT features to track multiple vehicles within a single image frame. Then, template matching was employed to get each vehicle’s possible location and its corresponding identifier was retrieved by SIFT feature matching. The evaluation experimentation on public datasets demonstrates that our proposed framework can efficiently detect and track automobiles and outperforms other methods. In the future, we aim to enhance vehicle monitoring techniques to adapt to more complex traffic scenarios.

References

References is not available for this document.