Loading [a11y]/accessibility-menu.js
DEFCON: Deformable Convolutions Leveraging Interval Search and GPU Texture Hardware | IEEE Conference Publication | IEEE Xplore

DEFCON: Deformable Convolutions Leveraging Interval Search and GPU Texture Hardware


Abstract:

Deformable convolutions can improve detection accuracy in Convolution Neural Networks (CNNs) by leveraging flexible spatial sampling in augmenting kernels with learnable ...Show More

Abstract:

Deformable convolutions can improve detection accuracy in Convolution Neural Networks (CNNs) by leveraging flexible spatial sampling in augmenting kernels with learnable offsets. However, the resulting irregular memory access patterns and additional pixel lookup overhead introduced by deformable layers pose inherent challenges when executed on high-throughput devices such as GPUs. To address these challenges, we introduce DEFCON, a systematic approach to optimizing deformable convolutions. DEFCON is designed to provide: (1) better placement of operators in the neural architecture using interval search, (2) reduced computational demands by leveraging lightweight operators, and (3) optimized inference by using GPU texture hardware. By performing an interval search, we reduce the number of deformable layers in our architecture. By leveraging the GPU’s texture hardware, we are able to use lightweight operators to improve the execution performance of layers, without sacrificing prediction accuracy. By combining these approaches, DEFCON increases the inference performance by 2.8× over YOLACT++ implementation, when run on an NVIDIA Jetson AGX Xavier GPU. Our work enables faster and more accurate predictions when performing deformable convolutions.
Date of Conference: 27-31 May 2024
Date Added to IEEE Xplore: 08 July 2024
ISBN Information:

ISSN Information:

Conference Location: San Francisco, CA, USA

I. Introduction

During the past decade, research has significantly advanced the state-of-the-art in object detection and image segmentation [1]–[9]. Convolution Neural Networks (CNNs) have paved the way for groundbreaking approaches toward object detection. Earlier CNNs were unable to effectively accommodate geometric or spatial variations in terms of object scale, pose, viewpoint, and partial deformation [10]. Therefore, two approaches were followed: i) data augmentation, which includes spatial variations in the training dataset [11], [12], and ii) handcrafting of feature layers, such as pooling [13], [14]. However, such highly specialized approaches could not be generalized for new datasets or handle complicated deformations that require a different receptive field.

Contact IEEE to Subscribe

References

References is not available for this document.