Understanding Self-attention Mechanism via Dynamical System Perspective | IEEE Conference Publication | IEEE Xplore

Understanding Self-attention Mechanism via Dynamical System Perspective


Abstract:

The self-attention mechanism (SAM) is widely used in various fields of artificial intelligence and has successfully boosted the performance of different models. However, ...Show More

Abstract:

The self-attention mechanism (SAM) is widely used in various fields of artificial intelligence and has successfully boosted the performance of different models. However, current explanations of this mechanism are mainly based on intuitions and experiences, while there still lacks direct modeling for how the SAM helps performance. To mitigate this issue, in this paper, based on the dynamical system perspective of the residual neural network, we first show that the intrinsic stiffness phenomenon (SP) in the high-precision solution of ordinary differential equations (ODEs) also widely exists in high-performance neural networks (NN). Thus the ability of NN to measure SP at the feature level is necessary to obtain high performance and is an important factor in the difficulty of training NN. Similar to the adaptive step-size method which is effective in solving stiff ODEs, we show that the SAM is also a stiffness-aware step size adaptor that can enhance the model's representational ability to measure intrinsic SP by refining the estimation of stiffness information and generating adaptive attention values, which provides a new understanding about why and how the SAM can benefit the model performance. This novel perspective can also explain the lottery ticket hypothesis in SAM, design new quantitative metrics of representational ability, and inspire a new theoretic-inspired approach, StepNet. Extensive experiments on several popular benchmarks demonstrate that StepNet can extract fine-grained stiffness information and measure SP accurately, leading to significant improvements in various visual tasks.
Date of Conference: 01-06 October 2023
Date Added to IEEE Xplore: 15 January 2024
ISBN Information:

ISSN Information:

Conference Location: Paris, France

Funding Agency:


1. Introduction

The self-attention mechanism (SAM) [41], [15], [16], [9], [43], [4] is widely used in various artificial intelligence fields and has successfully improved the models' performance in a number of vision tasks, including image classification [22], [60], [47], object detection [37], [56], [25], instance segmentation [8], [52], image super-resolution [64], [46], [49], etc. However, most previous works lay emphasis on designing a new self-attention method, and intuitively or heuristically exploring how the self-attention mechanism helps the performance. For example, many popular channel attention methods [22], [47], [56], [35] consider the attention values as the soft weight of the channels, leading to the importance reassignment of feature maps. These soft weights can also be seen as a gate mechanism [60], [28] to control the forward transmission of information flow, which are usually applied to neural network pruning and neural architecture search [40], [65]. Another viewpoint [38] argues that the self-attention mechanism can help to regulate the noise by enhancing instance-specific information to obtain a better regularization effect. Moreover, the receptive field [62], [63], [67] and long-range dependency [69], [17], [57] are also used to understand the role of self-attention. Although these explanations describe the behavior of self-attention mechanisms to some extent, the relationship between the SAM and model performance is still ambiguous.

Contact IEEE to Subscribe

References

References is not available for this document.