I. Introduction
One of technological challenges faced by data analytics is an enormous amount of data. This challenge is well known and a term big data was coined with the purpose to bring attention to it and to develop s new solutions. However, in many important application areas, the excess of data is not a problem, and quite the opposite, there is actually not enough data available. There are several reasons for this, the data may be inherently scarce (rare diseases, faults in complex systems, and rare grammatical structures), difficult to obtain (due to proprietary systems, confidentiality of business contracts, and privacy of records), expensive (obtainable with expensive equipment, requiring significant investment of human or material resources), or the distribution of the events of interests is highly imbalanced (fraud detection, outlier detection, and distributions with long tails). For machine learning approaches, the lack of data causes problems in the model selection, reliable performance estimation, development of specialized algorithms, and tuning of learning model parameters. While certain problems caused by scarce data are inherent to underrepresentation of the problem and cannot be solved, some aspects can be alleviated by generating artificial data similar to the original one. For example, similar artificial data sets can be of great help in tuning the parameters, development of specialized solutions, simulations, and imbalanced problems, as they prevent overfitting of the original data set, yet allow sound comparison of different approaches.