I. Introduction
An automated variable selection in QSPR method, based on the k-nearest neighbor principle (kNN-QSPR) has been developed and studied by A.Tropsha group in the University of North Carolina since 2000 year up to now [1]–[4]. However, the most important part of QSPR model development is the model validation. Most of the QSPR modeling methods implement the Leave-One-Out (LOO) cross-validation procedure. The outcome from the cross-validation procedure is cross-validated (q2), which is used as a criterion of both robustness and predictive ability of the model. Its well known that only way to estimate the true predictive power of a model is to test it on a sufficiently large collection of compounds. In this case, one needs to have three data sets for model validation: training, test and external data sets. In different studies have been used different sets of descriptors and the tools of scaling, but we did not find the papers where was aimed to reveal the dependence between the quality of developed models and its characteristics, particularly the initial data set of descriptors and the methods of descriptors scaling. There are many methods of scaling, but did not exist the tool for automatic selection of one method which can improve the model quality. Considering these requirements, we created new software in R system, strictly following to the kNN-QSPR approach, used PaDEL-Descriptor [5] and Dragon [6] descriptor software for generation of initial set of descriptors and developed the program for automatic scaling. Early we presented the results of case study based on 90 nitroaromatic compounds tested for in vivo toxicity using the combination of SiRMS descriptors [7].