I. Introduction
Most multilayer perceptron neural network (MLPNN) classifier training methods minimize an objective function by selecting the architecture of the network and the connection weights. Minimizing the training error as the only objective function commonly results in an overfitted MLPNN. Moreover, neural networks achieving the same training error on a given set of training samples often produce different decisions on the same unseen sample. To overcome such problems, a regularization [1] or penalty term is added to the minimized objective function to enhance the smoothness of the MLPNN outputs to secure better generalization capability. Output fluctuations of an MLPNN can be estimated, for instance, by the squared norm of weights or the Vapnik–Chervonenkis (VC) dimension. However, the squared norm of weights can be misleading when there are several combinations of weights yielding the same squared norm. This will be discussed in Section III-B3. The VC-dimension-based error produces a loose error bound that may not be useful to distinguish individual MLPNNs with the same number of connection weights and the maximum weight magnitude among all weights.