Loading [MathJax]/extensions/MathMenu.js
A Unified Remote Sensing Object Detector Based on Fourier Contour Parametric Learning | IEEE Journals & Magazine | IEEE Xplore

A Unified Remote Sensing Object Detector Based on Fourier Contour Parametric Learning


Abstract:

A unified object detector needs to integrate various abilities for adapting to different remote sensing object detection tasks. However, there is a lack of a feasible way...Show More

Abstract:

A unified object detector needs to integrate various abilities for adapting to different remote sensing object detection tasks. However, there is a lack of a feasible way to integrate multigrained object detection requirements i.e., horizontal bounding box (HBB), oriented bounding box (OBB), and instance segmentation (InSeg) into a unified detection way. Then, it often has to design specific parametric learning ways and their corresponding architectures, which cannot be finely adaptive to various kinds of object detection tasks. Therefore, in this article, a new benchmark is set up to integrate multigrained object detection requirements of HBB, OBB, and InSeg into one challenging task of arbitrary-shaped object contour detection. At the same time, a unified object contour detector (UniconDet) is proposed for achieving multigrained object detection from complicated remote sensing scenes. First, a Fourier contour parametric modeling (FCPM) is defined to project arbitrary-shaped object contours from the spatial domain into the frequency domain. Then, it can unify spatial parametric representations of HBB, OBB, and InSeg as frequency coefficient representations, which can be used for realizing a more generic and robust parametric regression. Second, a multiview cross-attention (MVCA) feature extraction way is designed at each scale of the regression layer, which can assist UniconDet in perceiving Fourier contour parameters by exploring the coupled relations between different discrete contour sampling periods of each object. Third, a center-contour enhancing regression layer (C2-ERL) is designed to generate regional guidance and cascade contour propagation, which can ensure a more accurate center point prediction and Fourier contour parameter regression. Finally, extensive experiments are carried out on benchmarks of HBB, OBB, InSeg, and new multigrained object detection, and the results indicate that our proposed UniconDet can obtain superior performance. The source cod...
Article Sequence Number: 5611225
Date of Publication: 10 February 2025

ISSN Information:

Funding Agency:


I. Introduction

Currently, there exist three different tasks for remote sensing object detection, including the horizontal bounding box (HBB), oriented bounding box (OBB), and instance segmentation (InSeg), which aim to accurately localize different category objects from complex remote sensing scenes. Although massive efforts have been made for HBB [1], [2], OBB [3], [4], and InSeg [5], [6] in the remote sensing domain, they cannot also satisfy multigrained object detection requirements for different categories of remote sensing objects. As shown in Fig. 1(a), in a very complicated airport (AT) scene, the OBB can be adopted for the detection of rectangular cars. However, since the storage tanks (STs) do not have apparent orientation information, the OBB is not suitable for their detection. Moreover, because of the irregular shape of airplanes (ALs), neither OBB nor HBB are very viable options for AL detection. Therefore, several studies [7], [8], [9], [10], [11], [12], [13] attempt to establish a unified remote sensing object detector to integrate the multigrained detection abilities involving HBB, OBB, and InSeg. Xu et al. [7], Shi and Zhang [8], and Liu et al. [9] are all considering utilizing a multitask learning strategy to integrate the InSeg ability with HBB or OBB for improving object detection ability based on parallel, cascaded and shared feature extraction architecture. In order to set up the multigrained object detection ability, Yang et al. [10] and Qian et al. [11], respectively, designed their unified detectors for integrating HBB and OBB detection abilities by HBB and OBB conversion according to a specific weakly self-supervised learning way and horizontal smallest enclosing rectangle constraint. Besides, based on the segment anything model (SAM), Chen et al. [12] employed various prompting mechanisms (i.e., the bounding boxes and queries) to integrate HBB and InSeg detection abilities into a generic framework called RSPrompter. Zhang et al. [13] considered HBB and OBB detection tasks as universal language modeling and designed a visual-text alignment representation learning to integrate HBB and OBB detection abilities into a multimodal large language model called EarthGPT. Nevertheless, whether multitask learning, different detection granularity conversion, prompt engineering, or universal language modeling, they are all dependent on constructing multigrained labeled datasets, designing valid strategies for fundamental model integration and undergoing expensive training procedures. Then, through tedious dataset preparation, specific model design, and lengthy model training procedure, these studies [7], [8], [9], [10], [11], [12], [13] also cannot well integrate multigrained object detection requirements of HBB, OBB, and InSeg. They can only unify HBB and OBB or HBB and InSeg for detection; thus, it would hinder precise and adaptive detections of multicategory objects from complicated remote sensing scenes. Sebsequently, how to integrate multigrained object detection abilities (i.e., HBB, OBB, and InSeg) into a unified detection way becomes a challenge, which has to be further studied for catering to different detection granularity of multicategory objects, while advancing the development of remote sensing object detection technique.

Illustration of the multigrained object detection requirement in the complex remote sensing scene. (a) Complex AT scene including multiple categories, e.g., AL, V, and ST. (b) Individual detectors are designed for different categories. (c) Proposed UniconDet is applied for detecting all kinds of objects.

Contact IEEE to Subscribe

References

References is not available for this document.