Loading [MathJax]/extensions/MathMenu.js
USD: Uncertainty-Based One-Phase Learning to Enhance Pseudo-Label Reliability for Semi-Supervised Object Detection | IEEE Journals & Magazine | IEEE Xplore

USD: Uncertainty-Based One-Phase Learning to Enhance Pseudo-Label Reliability for Semi-Supervised Object Detection


Abstract:

With the ease of accessing large unlabeled datasets, studies on semi-supervised learning for object detection (SSOD) have become increasingly popular. Among these SSOD st...Show More

Abstract:

With the ease of accessing large unlabeled datasets, studies on semi-supervised learning for object detection (SSOD) have become increasingly popular. Among these SSOD studies, the pseudo-labeling method significantly depends on the accuracy of the pseudo-labels; thus, inaccurate annotations must be filtered to prevent performance degradation. This study classifies annotation errors that occur in pseudo-labeling methods as false negative (FN) and false positive (FP), and solutions to address each type of error are proposed using uncertainty information obtained through Gaussian modeling. Network performance is improved by preventing the background learning of the FN objects based on the uncertainty of the network output. In addition, based on the uncertainty of the annotations, low-reliability annotations are filtered out, and the learning reflectivity of FP objects is determined. Considering the network performance improvement and training complexity, the proposed method employs one-phase learning, including a single pseudo-label update, to achieve maximum performance with the minimum learning process. Moreover, an algorithm is proposed for an optimal update point search to increase the expected performance improvement. Experiments on the Pascal VOC, COCO, and Cityscapes datasets show that the SSD network improves accuracy by 3.3%, 4.7%, and 4.1%, respectively, with negligible computational complexity compared to the baseline.
Published in: IEEE Transactions on Multimedia ( Volume: 26)
Page(s): 6336 - 6347
Date of Publication: 01 January 2024

ISSN Information:

Funding Agency:

References is not available for this document.

I. Introduction

Deep learning models are significantly affected by the quality [1] and quantity [2] of the dataset used for training. Supervised learning with labeled datasets is common in deep learning model training. In particular, for object detection tasks that require high accuracy, a high-quality labeled dataset is essential because the accuracy decreases when low-quality labels are used, including errors such as localization, classification, and false errors [3]. However, obtaining numerous high-quality labeled datasets is challenging owing to the high cost of annotation [4], [5]. Moreover, a small training dataset also reduces accuracy because it is not representative of the actual distribution of data [2]. Although abundant unlabeled data are readily available in the real world, they cannot be directly used for supervised learning. Therefore, by utilizing a large number of new datasets suitable for the user environment, obtained from numerous mobile devices, an optimal trained model can be secured for each user. Recently, to use unlabeled datasets generated from each device on the user side, a method of transmitting data to a data center and processing the data on a cloud server has been widely used [6], [7]. However, this approach faces many obstacles, including data privacy difficulties [8], transmission issues [9], cloud computing burden, data center maintenance [10], and annotation costs [4]. If optimized training is possible on personal devices (e.g., mobile/edge devices) using large unlabeled datasets, benefits such as lower annotation cost, lower cloud processing cost, and improved model accuracy for personal applications can be provided [11]. Therefore, semi-supervised learning for object detection (SSOD) [12], [13], [14], which trains networks using large and readily available unlabeled datasets, is becoming increasingly popular.

Select All
1.
Y. Lau, W. Sim, K. Chew, Y. Ng and Z. A. A. Salam, "Understanding how noise affects the accuracy of CNN image classification", J. Appl. Technol. Innov., vol. 5, no. 2, 2021.
2.
C. Luo et al., "How does the data set affect CNN-based image classification performance ?", Proc. IEEE 5th Int. Conf. Syst. Inform., pp. 361-366, 2018.
3.
J. Ma, Y. Ushiku and M. Sagara, "The effect of improving annotation quality on object detection datasets: A preliminary study", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 4850-4859, 2022.
4.
O. Alonso, "Challenges with label quality for supervised learning", J. Data Inf. Qual., vol. 6, no. 1, pp. 1-3, 2015.
5.
C. Tang et al., "Adaptive hypergraph embedded semi-supervised multi-label image annotation", IEEE Trans. Multimedia, vol. 21, no. 11, pp. 2837-2849, Nov. 2019.
6.
Y. Jing et al., "CrowdTracker: Optimized urban moving object tracking using mobile crowd sensing", IEEE Internet Things J., vol. 5, no. 5, pp. 3452-3463, Oct. 2018.
7.
H.-J. Hong, C.-L. Fan, Y.-C. Lin and C.-H. Hsu, "Optimizing cloud-based video crowdsensing", IEEE Internet Things J., vol. 3, no. 3, pp. 299-313, Jun. 2016.
8.
B. M. Gaff, H. E. Sussman and J. Geetter, "Privacy and Big Data", Computer, vol. 47, no. 6, pp. 7-9, 2014.
9.
C. Guo et al., "Pingmesh: A large-scale system for data center network latency measurement and analysis", Proc. ACM Conf. Special Int. Group Data Commun., pp. 139-152, 2015.
10.
M. Uddin and A. A. Rahman, "Energy efficiency and low carbon enabler green it framework for data centers considering green metrics", Renewable Sustain. Energy Rev., vol. 16, no. 6, pp. 4078-4094, 2012.
11.
J. Chen and X. Ran, "Deep learning with edge computing: A review", Proc. IEEE, vol. 107, no. 8, pp. 1655-1674, Aug. 2019.
12.
Z. Wang, Y. Li, Y. Guo, L. Fang and S. Wang, "Data-uncertainty guided multi-phase learning for semi-supervised object detection", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 4568-4577, 2021.
13.
J. Jeong, S. Lee, J. Kim and N. Kwak, "Consistency-based semi-supervised learning for object detection", Proc. Adv. Neural Inf. Process. Syst., pp. 10759-10768, 2019.
14.
J. Jeong, V. Verma, M. Hyun, J. Kannala and N. Kwak, "Interpolation-based semi-supervised learning for object detection", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 11602-11611, 2021.
15.
H. Zhou et al., "Dense teacher: Dense pseudo-labels for semi-supervised object detection", Proc. Eur. Conf. Comput. Vis., pp. 35-50, 2022.
16.
K. Sohn et al., "A simple semi-supervised learning framework for object detection", 2020.
17.
F. Zhang, T. Pan and B. Wang, "Semi-supervised object detection with adaptive class-rebalancing self-training", Proc. AAAI Conf. Artif. Intell., pp. 3252-3261, 2022.
18.
Y.-C. Liu et al., "Unbiased teacher for semi-supervised object detection", Proc. Int. Conf. Learn. Representations, 2020.
19.
Y.-C. Liu, C.-Y. Ma and Z. Kira, "Unbiased teacher V2: Semi-supervised object detection for anchor-free and anchor-based detectors", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 9819-9828, 2022.
20.
J. Choi, D. Chun, H. Kim and H.-J. Lee, "Gaussian YOLOv3: An accurate and fast object detector using localization uncertainty for autonomous driving", Proc. IEEE/CVF Int. Conf. Comput. Vis., pp. 502-511, 2019.
21.
W. Liu et al., "SSD: Single shot multibox detector", Proc. Eur. Conf. Comput. Vis., pp. 21-37, 2016.
22.
J. Redmon and A. Farhadi, "YOLOv3: An incremental improvement", 2018.
23.
M. Everingham, L. Van Gool, C. K. Williams, J. Winn and A. Zisserman, "The Pascal visual object classes (VOC) challenge", Int. J. Comput. Vis., vol. 88, no. 2, pp. 303-338, 2010.
24.
S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards real-time object detection with region proposal networks", Proc. Adv. Neural Inf. Process. Syst., pp. 91-99, 2015.
25.
T.-Y. Lin et al., "Feature pyramid networks for object detection", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 2117-2125, 2017.
26.
Z. Cai and N. Vasconcelos, "Cascade R-CNN: High quality object detection and instance segmentation", IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 5, pp. 1483-1498, May 2021.
27.
O. Russakovsky, L.-J. Li and L. Fei-Fei, "Best of both worlds: Human-machine collaboration for object annotation", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 2121-2131, 2015.
28.
M. Sajjadi, M. Javanmardi and T. Tasdizen, "Regularization with stochastic transformations and perturbations for deep semi-supervised learning", Proc. Adv. Neural Inf. Process. Syst., pp. 1163-1171, 2016.
29.
C. Zhang, J. Cheng and Q. Tian, "Unsupervised and semi-supervised image classification with weak semantic consistency", IEEE Trans. Multimedia, vol. 21, no. 10, pp. 2482-2491, Oct. 2019.
30.
D.-H. Lee et al., "Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks", Proc. Workshop Challenges Representation Learn., 2013.

Contact IEEE to Subscribe

References

References is not available for this document.