I. Introduction
The interval type-2 fuzzy neural networks (IT2FNNs) have powerful fuzzy reasoning capability and learning capability, which have been increasingly used in the identification of nonlinear systems [1], [2], [3]. Generally, IT2FNNs need an optimization technique to update the type-2 fuzzy rules during their applications [4], [5]. The most frequently used technique is the gradient descent algorithm due to its ease of implementation [6], [7], [8], [9]. Nevertheless, once the variances associated with Gaussian membership functions or the weights in IT2FNN tend to be infinitesimal, the first-order gradient learning typically stays at an extreme oscillation or a long plateau with a very small change. In such singularity phenomena, it is likely to suffer from local minima [10], [11], [12].