I. Introduction
Backpropagation (BP) learning algorithm [1] is a supervised learning method extensively applied in the training of multilayered feedforward neural networks. It is the most widely used learning algorithm because of its simplicity and low-computational complexity. Despite of the general success of BP in learning the neural networks, there are two major drawbacks that need to be improved. First, BP will converge to a local minimum of the error function, especially for nonlinearly separable problems such as the XOR problem [2], [3]. Having been trapped into local minima, BP may lead to failure in finding a global optimal solution. Second, the convergence rate of BP is too slow even if learning can be achieved. The main reason for the slow convergence rate is due to the derivative of the activation function that may lead to the occurrence of premature saturation, sometimes referred as the “flat spot” problem [4] [5]– [7]. If the algorithm is trapped into a flat spot, the learning process and weight adjustment of the algorithm will become very slow or even suppressed. BP usually requires tens to thousands of epochs to leave the flat spot, and this causes the slow convergence of the algorithm.