Journals & Magazines >IEEE Transactions on Image Pr... >Volume: 33

YOLOH: You Only Look One Hourglass for Real-Time Object Detection

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Multi-scale detection based on Feature Pyramid Networks (FPN) has been a popular approach in object detection to improve accuracy. However, using multi-layer features in ...Show More

Metadata

Abstract:

Multi-scale detection based on Feature Pyramid Networks (FPN) has been a popular approach in object detection to improve accuracy. However, using multi-layer features in the decoder of FPN methods entails performing many convolution operations on high-resolution feature maps, which consumes significant computational resources. In this paper, we propose a novel perspective for FPN in which we directly use fused single-layer features for regression and classification. Our proposed model, You Only Look One Hourglass (YOLOH), fuses multiple feature maps into one feature map in the encoder. We then use dense connections and dilated residual blocks to expand the receptive field of the fused feature map. This output not only contains information from all the feature maps, but also has a multi-scale receptive field for detection. The experimental results on the COCO dataset demonstrate that YOLOH achieves higher accuracy and better run-time performance than established detector baselines, for instance, it achieves an average precision (AP) of 50.2 on a standard

$3\times$ training schedule and achieves 40.3 AP at a speed of 32 FPS on the ResNet-50 model. We anticipate that YOLOH can serve as a reference for researchers to design real-time detection in future studies. Our code is available at https://github.com/wsb853529465/YOLOH-main.

Published in: IEEE Transactions on Image Processing ( Volume: 33)

Page(s): 2104 - 2115

Date of Publication: 12 March 2024

ISSN Information:

PubMed ID: 38470577

DOI: 10.1109/TIP.2024.3374225

Funding Agency:

Contents

I. Introduction

Object detection, aiming to locate and classify various objects, is a core task in computer vision. After the success of Convolutional Neural Networks (CNNs) [1], plenty of object detection models [2], [3] achieve great progress on benchmark datasets [4], [5]. However, these models rely on the last layer of the backbone, which contains limited semantic features due to the feature map with low resolution.

References is not available for this document.

YOLOH: You Only Look One Hourglass for Real-Time Object Detection

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

YOLOH: You Only Look One Hourglass for Real-Time Object Detection

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?