I. Introduction
In data based modeling, two of the main aims are good generalization and model interpretation. Model generalization refers to the model's capability to approximate accurately the system output for unseen data. Fundamental to the evaluation of model generalization capability is the concept of cross-validation [1], and one commonly used version of cross-validation is the so-called leave-one-out (LOO) cross-validation. For the linear-in-the-parameters models, the leave-one-out mean square error (LOOMSE) can be calculated without actually splitting the training dataset and estimating the associated models by making use of the Sherman–Morrison–Woodbury theorem [2]. The orthogonal forward regression (OFR) algorithm efficiently constructs parsimonious models [3] by selecting regressors according to their contribution to the maximization of the model error reduction ratio (ERR). By incorporating the OFR framework with analytical expression of LOO errors, the LOOMSE was proposed [4] as model term selective criterion based on the idea that model generalization can be sequentially optimized within the model construction process. Because the LOOMSE measures the expected performance on new data, it reaches an optimal value for a certain model size, therefore, that the OFR based construction procedure automatically terminates without use of other stopping criterion [4]. Similarly for the two-class classification problem, sparse kernel classifiers can be constructed via sequentially minimizing the LOO misclassification rate (MR) [5].