I. Introduction
Nonlinear system identification (NSI) refers to the problem of building a mathematical relation between input u and output y of an unknown dynamical system [1]–[7]. A large number of different approaches have been proposed in the literature over the last decades to face this problem. Among these the Lee-Shetzen method [8], [9] that identifies the Volterra kernels of nonlinear systems stimulated by random inputs with assigned statistics, is one of the most popular. To overcome calculation of multidimensional Volterra kernels a cascaded nonlinear identification model, with a static nonlinear element followed by a time-varying element (Hammerstein model [10]), and with a time-varying linear block followed by a static nonlinear element (Wiener model [11]), has been proposed. In the discrete-time domain one of the most successful approach for nonlinear system identification is the NARMAX model [12] (and its derivatives NARX [13] and NARMA [14]), in which the system is modelled in terms of a nonlinear functional expansion of lagged inputs, outputs and prediction errors. NARMAX models have shown to be very effective in many real-world applications [15] –[21], as they are powerful, efficient and unified representations of a wide variety of nonlinear systems. However a major difficulty in system identification using NARMAX model is selecting a model that is parsimonious in the number of parameters and represents the dynamics of the system adequately. Even though various methods, such as polynomials, multilayer perceptrons, wavelet ANNs and radial basis functions, have been used to build NARMAX models, the choice of an adequately model remains a bottleneck.