Loading [MathJax]/extensions/MathMenu.js
Bounding Box Regression With Uncertainty for Accurate Object Detection | IEEE Conference Publication | IEEE Xplore

Bounding Box Regression With Uncertainty for Accurate Object Detection


Abstract:

Large-scale object detection datasets (e.g., MS-COCO) try to define the ground truth bounding boxes as clear as possible. However, we observe that ambiguities are still i...Show More

Abstract:

Large-scale object detection datasets (e.g., MS-COCO) try to define the ground truth bounding boxes as clear as possible. However, we observe that ambiguities are still introduced when labeling the bounding boxes. In this paper, we propose a novel bounding box regression loss for learning bounding box transformation and localization variance together. Our loss greatly improves the localization accuracies of various architectures with nearly no additional computation. The learned localization variance allows us to merge neighboring bounding boxes during non-maximum suppression (NMS), which further improves the localization performance. On MS-COCO, we boost the Average Precision (AP) of VGG-16 Faster R-CNN from 23.6% to 29.1%. More importantly, for ResNet-50-FPN Mask R-CNN, our method improves the AP and AP90 by 1.8% and 6.2% respectively, which significantly outperforms previous state-of-the-art bounding box refinement methods. Our code and models are available at github.com/yihui-he/KL-Loss.
Date of Conference: 15-20 June 2019
Date Added to IEEE Xplore: 09 January 2020
ISBN Information:

ISSN Information:

Conference Location: Long Beach, CA, USA
References is not available for this document.

1. Introduction

Large scale object detection datasets like ImageNet [6], MS-COCO [35] and CrowdHuman [47] try to define the ground truth bounding boxes as clear as possible.

Select All
1.
Navaneeth Bodla, Bharat Singh, Rama Chellappa and Larry S Davis, "Soft-nms - improving object detection with one line of code", Computer Vision (ICCV) 2017 IEEE International Conference on, pp. 5562-5570, 2017.
2.
Zhaowei Cai and Nuno Vasconcelos, "Cascade r-cnn: Delving into high quality object detection", arXiv preprint arXiv:1712.00726, 2017.
3.
Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, et al., "cudnn: Efficient primitives for deep learning", arXivpreprint arXiv:1410.0759, 2014.
4.
Jifeng Dai, Yi Li, Kaiming He and Jian Sun, "R-fcn: Object detection via region-based fully convolutional networks", Advances in neural information processing systems, pp. 379-387, 2016.
5.
Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, et al., "Deformable convolutional networks", CoRR abs/1703.06211, vol. 1, no. 2, pp. 3, 2017.
6.
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li and Li Fei-Fei, "Imagenet: A large-scale hierarchical image database", Computer Vision and Pattern Recognition 2009. CVPR 2009. IEEE Conference on, pp. 248-255, 2009.
7.
Nemanja Djuric, Vladan Radosavljevic, Henggang Cui, Thi Nguyen, Fang-Chieh Chou, Tsung-Han Lin, et al., "Motion prediction of traffic actors for autonomous driving using deep convolutional networks", arXiv preprint arXiv:1808.05819, 2018.
8.
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn and A. Zisserman, "The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results", [online] Available: pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
9.
Di Feng, Lars Rosenbaum and Klaus Dietmayer, "Towards safe autonomous driving: Capture uncertainty in the deep neural network for lidar 3d vehicle detection", 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pp. 3266-3273, 2018.
10.
Di Feng, Lars Rosenbaum, Fabian Timm and Klaus Diet-mayer, "Leveraging heteroscedastic aleatoric uncertainties for robust real-time lidar 3d object detection", arXiv preprint arXiv:1809.05590, 2018.
11.
Spyros Gidaris and Nikos Komodakis, "Object detection via a multi-region and semantic segmentation-aware cnn model", Proceedings of the IEEE International Conference on Computer Vision, pp. 1134-1142, 2015.
12.
Ross Girshick, "Fast r-cnn", Proceedings of the IEEE International Conference on Computer Vision, pp. 1440-1448, 2015.
13.
Ross Girshick, Jeff Donahue, Trevor Darrell and Jitendra Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation", Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580-587, 2014.
14.
Ross Girshick, Ilija Radosavovic, Georgia Gkioxari, Piotr Dollar and Kaiming He, "Detectron", 2018, [online] Available: github.com/facebookresearch/detectron.
15.
Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noord-huis, Lukasz Wesolowski, Aapo Kyrola, et al., "Accurate large minibatch sgd: training imagenet in 1 hour", arXiv preprint arXiv:1706.02677, 2017.
16.
Marcus Gualtieri and Robert Platt, "Learning 6-dof grasping and pick-place using attention focus", arXiv preprint arXiv:1806.06134, 2018.
17.
Kaiming He, Georgia Gkioxari, Piotr Dollár and Ross Girshick, "Mask r-cnn", Computer Vision (ICCV) 2017 IEEE International Conference on, pp. 2980-2988, 2017.
18.
Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun, "Deep residual learning for image recognition", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.
19.
Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li and Song Han, "Amc: Automl for model compression and acceleration on mobile devices", Proceedings of the European Conference on Computer Vision (ECCV), pp. 784-800, 2018.
20.
Yihui He, Xianggen Liu, Huasong Zhong and Yuchun Ma, "Addressnet: Shift-based primitives for efficient convolutional neural networks", 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1213-1222, 2019.
21.
Yihui He, Xiaobo Ma, Xiapu Luo, Jianfeng Li, Mengchen Zhao, Bo An, et al., "Vehicle traffic driven camera placement for better metropolis security surveillance", arXiv preprint arXiv:1705.08508, 2017.
22.
Yihui He, Xiangyu Zhang, Marios Savvides and Kris Kitani, "Softer-nms: Rethinking bounding box regression for accurate object detection", arXiv preprint arXiv:1809.08545, 2018.
23.
Yihui He, Xiangyu Zhang and Jian Sun, "Channel pruning for accelerating very deep neural networks", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1389-1397, 2017.
24.
Jan Hendrik Hosang, Rodrigo Benenson and Bernt Schiele, "Learning non-maximum suppression", CVPR, 2017.
25.
Han Hu, Jiayuan Gu, Zheng Zhang, Jifeng Dai and Yichen Wei, "Relation networks for object detection", arXiv preprint arXiv:1711.11575, vol. 8, 2017.
26.
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, et al., "Caffe: Convolutional architecture for fast feature embedding", Proceedings of the 22nd ACM international conference on Multimedia, pp. 675-678, 2014.
27.
Borui Jiang, Ruixuan Luo, Jiayuan Mao, Tete Xiao and Yuning Jiang, "Acquisition of localization confidence for accurate object detection", Proceedings of the European Conference on Computer Vision (ECCV), pp. 784-799, 2018.
28.
Alex Kendall and Yarin Gal, "What uncertainties do we need in bayesian deep learning for computer vision?", Advances in neural information processing systems, pp. 5574-5584, 2017.
29.
Alex Kendall, Yarin Gal and Roberto Cipolla, "Multi-task learning using uncertainty to weigh losses for scene geometry and semantics", arXiv preprint arXiv:1705.07115, vol. 3, 2017.
30.
Hei Law and Jia Deng, "Cornernet: Detecting objects as paired keypoints", Proceedings of the European Conference on Computer Vision (ECCV), pp. 734-750, 2018.

Contact IEEE to Subscribe

References

References is not available for this document.