I. Introduction
In the past decade, fuzzy neural networks (FNNs) have been widely used in many kinds of subject areas and engineering applications for problem solving, such as pattern recognition, intelligent adaptive control, regression or density estimation, and so on [1]–[6]. The FNN possesses the characteristics of linguistic information and the learning of a neural network (NN) [7]–[12]. If the FNN is properly constructed, then it follows universal approximate theorem (UAT), i.e., the properly constructed FNN can approximate any nonlinear functions [13]–[16]. However, the universal approximate theorem does not show us how to properly construct and tune the FNN. This is to say that the FNN designed for certain applications by human expert must have constraints, such as the maximum number of input and output samples it can approximate or memorize. Similar to the discussions of the capacity of multilayer NNs [17], the capacity of the FNN is thus defined as the maximum number of arbitrary distinct input samples that can be mapped to desired output samples with a zero error. The overcapacity may lead training process to diverge in the FNN. The training samples should be independent. During the past decade, the capacity of an associative memory and multilayer perceptron (MLP) has been derived, assuming a fully connected NN [18]–[24].