Loading [MathJax]/extensions/MathMenu.js
Self-Training-Based Semantic-Balanced Network for Weakly Supervised Object Detection in Remote-Sensing Images | IEEE Journals & Magazine | IEEE Xplore

Self-Training-Based Semantic-Balanced Network for Weakly Supervised Object Detection in Remote-Sensing Images


Abstract:

A weakly supervised object detection (WSOD) task is to train a detector with only image-level labels provided. Except for the training difficulty introduced by weaker ann...Show More

Abstract:

A weakly supervised object detection (WSOD) task is to train a detector with only image-level labels provided. Except for the training difficulty introduced by weaker annotations, the inherent complexity of the remote-sensing images (RSIs) also adds to the challenge. To boost the detector’s localization accuracy, we aim to exploit more semantic information contained in images and help improve the general robustness of the model. Noticing previous methods tend to focus on the most discriminative part of an object, we design a self-training-based network that leverages local semantic features. To this end, we develop a semantic-balanced localization module (SBLM) that distinguishes foreground from background and accurate proposals from incomplete ones, by leveraging a balance of region of interest (ROI) and its context information. Moreover, we find that the self-training strategy highly relies on the quality of pseudo-ground-truth boxes. Motivated by this possible lack of robustness, we design a comprehensive clustering module (CCM) and saliency-based proposal filtering (SPF) module that select pseudo-ground truth more comprehensively under supervision. To be more specific, CCM aims to reduce the arbitrariness during assigning pseudo-labels by considering multiple categorical vectors simultaneously. Salient object detection (SOD) is applied in the SPF module to help evaluate the quality of the chosen pseudo-ground-truth boxes. The detection performance is significantly boosted with the proposed method. Extensive experiments conducted on the NWPU VHR-10.v2 dataset and the DIOR dataset validate that the proposed model outperforms the previous state-of-the-art methods favorably with an mAP of 64.9% and 28.1%, respectively.
Article Sequence Number: 5601612
Date of Publication: 12 December 2023

ISSN Information:

Funding Agency:


I. Introduction

Object detection is a significant task in remote-sensing images (RSIs) [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12]. With the rapid development of sensor technology, more data is available for training the detector. Driven by this, the object detection task has greatly advanced in recent years. However, as the amount of data increases, there is a growing demand for manually labeled annotations. The labeling process is laborious because subtle annotations for each instance are required in the common fully supervised object detection tasks. To alleviate the problem, a weakly supervised paradigm [13], [14], [15], [16], [17], [18], where only image-level categorical labels are needed, is expansively leveraged to perform the detection task. The paradigm is called weakly supervised object detection (WSOD).

Contact IEEE to Subscribe

References

References is not available for this document.