LSAS: Lightweight Sub-attention Strategy for Alleviating Attention Bias Problem | IEEE Conference Publication | IEEE Xplore

LSAS: Lightweight Sub-attention Strategy for Alleviating Attention Bias Problem


Abstract:

In computer vision, the performance of deep neural networks (DNNs) is highly related to the feature extraction ability, i.e., the ability to recognize and focus on key pi...Show More

Abstract:

In computer vision, the performance of deep neural networks (DNNs) is highly related to the feature extraction ability, i.e., the ability to recognize and focus on key pixel regions in an image. However, in this paper, we quantitatively and statistically illustrate that DNNs have a serious attention bias problem on many samples from some popular datasets: (1) Position bias: DNNs fully focus on label-independent regions; (2) Range bias: The focused regions from DNN are not completely contained in the ideal region. Moreover, we find that the existing self-attention modules can alleviate these biases to a certain extent, but the biases are still non-negligible. To further mitigate them, we propose a lightweight sub-attention strategy (LSAS), which utilizes high-order sub-attention modules to improve the original self-attention modules. The effectiveness of LSAS is demonstrated by extensive experiments on widely-used benchmark datasets and popular attention networks. We release our code to help other researchers to reproduce the results of LSAS 1
Date of Conference: 10-14 July 2023
Date Added to IEEE Xplore: 25 August 2023
ISBN Information:

ISSN Information:

Conference Location: Brisbane, Australia

Funding Agency:

References is not available for this document.

I. Introduction

Deep neural networks (DNNs) have been empirically confirmed to have efficient and reliable feature extraction capabilities which play a fundamental role in the performance of DNNs [1], [2] through comprehensive experimental results under various tasks [3]–[5]. Specifically, the feature extraction ability of DNNs is mainly reflected in whether it can recognize and pay attention to key pixel regions in an image [6], [7] in computer vision. As depicted in Fig. 1, a popular interpretability technology, i.e., Grad-CAM [8], is adopted to explicitly visualize the regions where DNNs attend in the form of heat maps. From the results, we can find that although the vanilla ResNet [3] achieves good performance, there are non-negligible attention bias problems in key semantic feature extraction: (1) Position bias. In the examples illustrated in Fig. 1(a)(b), ResNet only attends to the label-independent background region rather than the region of the bird and the cat. These position biases can make the features extracted by DNNs sensitive to background information, resulting in error predictions. (2) Range bias. As shown in Fig. 1(c)(d), ResNet is unable to attend to the overlay region of the label while attending to some extra regions such as sky and fence.

Select All
1.
Senwei Liang, Zhongzhan Huang, Mingfu Liang and Haizhao Yang, "Instance enhancement batch normalization: An adaptive regulator of batch noise", Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 4819-4827, 2020.
2.
Qing Kuang, "Face image feature extraction based on deep learning algorithm", Journal of Physics: Conference Series, vol. 1852, pp. 032040, 2021.
3.
Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun, "Deep residual learning for image recognition", Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
4.
Matthew D Zeiler and Rob Fergus, "Visualizing and understanding convolutional networks", European conference on computer vision, pp. 818-833, 2014.
5.
Quinten McNamara, Alejandro De La Vega and Tal Yarkoni, "Developing a comprehensive framework for multimodal feature extraction", Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1567-1574, 2017.
6.
Ke Zhu and Jianxin Wu, "Residual attention: A simple but effective method for multi-label recognition", Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 184-193, 2021.
7.
Xudong Guo, Xun Guo and Yan Lu, "Ssan: Separable self-attention network for video representation learning", Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12618-12627, 2021.
8.
Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh and Dhruv Batra, "Grad-cam: Visual explanations from deep networks via gradient-based localization", International Conference on Computer Vision, 2017.
9.
Zhongzhan Huang, Senwei Liang, Mingfu Liang and Haizhao Yang, "Dianet: Dense-and-implicit attention network", AAAI, pp. 4206-4214, 2020.
10.
Jie Hu, Li Shen and Gang Sun, "Squeeze-and-excitation networks", Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132-7141, 2018.
11.
John R Anderson, Cognitive psychology and its implications, Macmillan, 2005.
12.
Sanghyun Woo, Jongchan Park, Joon-Young Lee and In So Kweon, "Cbam: Convolutional block attention module", Proceedings of the European conference on computer vision (ECCV), pp. 3-19, 2018.
13.
HyunJae Lee, Hyo-Eun Kim and Hyeonseob Nam, "Srm: A style-based recalibration module for convolutional neural networks", Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1854-1862, 2019.
14.
Q. Wang, B. Wu, P. Zhu, P. Li and Q. Hu, "Eca-net: Efficient channel attention for deep convolutional neural networks", 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
15.
Jingda Guo, Xu Ma, Andrew Sansom, Mara McGuire, Andrew Kalaani, Qi Chen, et al., "Spanet: Spatial pyramid attention network for enhanced image recognition", 2020 IEEE International Conference on Multimedia and Expo (ICME), pp. 1-6, 2020.
16.
Zhongzhan Huang, Senwei Liang, Mingfu Liang, Wei He, Haizhao Yang and Liang Lin, "The lottery ticket hypothesis for self-attention in convolutional neural network", 2022.
17.
Zhongzhan Huang, Senwei Liang, Mingfu Liang, Wei He and Haizhao Yang, "Efficient attention network: Accelerate attention by searching where to plug", 2020.
18.
Jie Hu, Li Shen, Samuel Albanie, Gang Sun and Enhua Wu, "Squeeze-and-excitation networks", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 8, pp. 2011-2023, 2019.
19.
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein et al., "Imagenet large scale visual recognition challenge", International journal of computer vision, vol. 115, no. 3, pp. 211-252, 2015.
20.
Adam Coates, Andrew Ng and Honglak Lee, "An analysis of single-layer networks in unsupervised feature learning", Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, pp. 215-223, 2011.
21.
Alex Krizhevsky, Geoffrey Hinton et al., "Learning multiple layers of features from tiny images", 2009.
22.
Zhongzhan Huang, Senwei Liang, Mingfu Liang, Weiling He and Liang Lin, "Layer-wise shared attention network on dynamical system perspective", 2022.
23.
Shanshan Zhong, Wushao Wen and Jinghui Qin, "Mix-pooling strategy for attention mechanism", 2022.
Contact IEEE to Subscribe

References

References is not available for this document.