Loading [MathJax]/extensions/MathZoom.js
LSAS: Lightweight Sub-attention Strategy for Alleviating Attention Bias Problem | IEEE Conference Publication | IEEE Xplore

LSAS: Lightweight Sub-attention Strategy for Alleviating Attention Bias Problem


Abstract:

In computer vision, the performance of deep neural networks (DNNs) is highly related to the feature extraction ability, i.e., the ability to recognize and focus on key pi...Show More

Abstract:

In computer vision, the performance of deep neural networks (DNNs) is highly related to the feature extraction ability, i.e., the ability to recognize and focus on key pixel regions in an image. However, in this paper, we quantitatively and statistically illustrate that DNNs have a serious attention bias problem on many samples from some popular datasets: (1) Position bias: DNNs fully focus on label-independent regions; (2) Range bias: The focused regions from DNN are not completely contained in the ideal region. Moreover, we find that the existing self-attention modules can alleviate these biases to a certain extent, but the biases are still non-negligible. To further mitigate them, we propose a lightweight sub-attention strategy (LSAS), which utilizes high-order sub-attention modules to improve the original self-attention modules. The effectiveness of LSAS is demonstrated by extensive experiments on widely-used benchmark datasets and popular attention networks. We release our code to help other researchers to reproduce the results of LSAS 1
Date of Conference: 10-14 July 2023
Date Added to IEEE Xplore: 25 August 2023
ISBN Information:

ISSN Information:

Conference Location: Brisbane, Australia

Funding Agency:

Sun Yat-Sen University, Guangzhou, China
Sun Yat-Sen University, Guangzhou, China
Guangdong University of Technology, Guangzhou, China
Sun Yat-Sen University, Guangzhou, China
Sun Yat-Sen University, Guangzhou, China

I. Introduction

Deep neural networks (DNNs) have been empirically confirmed to have efficient and reliable feature extraction capabilities which play a fundamental role in the performance of DNNs [1], [2] through comprehensive experimental results under various tasks [3]–[5]. Specifically, the feature extraction ability of DNNs is mainly reflected in whether it can recognize and pay attention to key pixel regions in an image [6], [7] in computer vision. As depicted in Fig. 1, a popular interpretability technology, i.e., Grad-CAM [8], is adopted to explicitly visualize the regions where DNNs attend in the form of heat maps. From the results, we can find that although the vanilla ResNet [3] achieves good performance, there are non-negligible attention bias problems in key semantic feature extraction: (1) Position bias. In the examples illustrated in Fig. 1(a)(b), ResNet only attends to the label-independent background region rather than the region of the bird and the cat. These position biases can make the features extracted by DNNs sensitive to background information, resulting in error predictions. (2) Range bias. As shown in Fig. 1(c)(d), ResNet is unable to attend to the overlay region of the label while attending to some extra regions such as sky and fence.

Sun Yat-Sen University, Guangzhou, China
Sun Yat-Sen University, Guangzhou, China
Guangdong University of Technology, Guangzhou, China
Sun Yat-Sen University, Guangzhou, China
Sun Yat-Sen University, Guangzhou, China
Contact IEEE to Subscribe

References

References is not available for this document.