I. Introduction
The capacity of optical transmission systems is bound by fundamental limits in both linear and nonlinear regimes [1], [2]. Nonlinear impairment aroused by the fiber nonlinear effects has become the main limiting factor of the optical fiber communication systems [3]. Traditional digital signal processing (DSP) techniques to compensate for nonlinearities of optical fiber are computationally difficult and require pre-knowledge of a large number of system parameters [4]. These are impractical in a realistic environment. Due to the capability of modeling the dynamics and dependencies in sequential data, some neural networks (NN) based algorithms are proposed to compensate for the nonlinear impairments [5] - [8]. Traditionally, the weights of input data are even in the NN-based architectures, but [3] suggested that various symbols should have various weights. The performances of algorithms based on bi-directional recurrent neural networks (Bi-RNN) are shown advantages in compensation for nonlinear effects. Generally, two types of Bi-RNN, Bi-Vanilla-RNN [10] and Bi-LSTM [11], algorithms are well discussed. To our best, this is the first time that the attention mechanism is introduced to compensate the optical nonlinear effects. By using a context vector, the input symbols are associated with different weights. Numerical simulations show that the performance is improved by using the attention mechanism, and there are 0.6 dB to 1 dB improvements by comparing with the digital back propagation (DBP) algorithm at the launch power 0 dBm.