Effect of Training Data Order for Machine Learning | IEEE Conference Publication | IEEE Xplore

Effect of Training Data Order for Machine Learning


Abstract:

For many Machine Learning algorithms on supervised learning problems, the order of training data samples can affect the quality of the derived model and the accuracy of p...Show More

Abstract:

For many Machine Learning algorithms on supervised learning problems, the order of training data samples can affect the quality of the derived model and the accuracy of predictions. This paper describes a project to quantify this effect, and to statistically quantify the variation exhibited by several algorithms using permutations of a given training data set. It is demonstrated that this variation can be quite significant, and that training data set ordering should be an important consideration when approaching a classification task.
Date of Conference: 05-07 December 2019
Date Added to IEEE Xplore: 20 April 2020
ISBN Information:
Conference Location: Las Vegas, NV, USA

1. Background

Supervised Machine Learning for classification is the process wherein an algorithm develops a method of assigning class labels to input data based on example input / output pairs, or “training data”. Many such algorithms exist and have demonstrated success in a variety of contexts. Some of these algorithms are training data order invariant, meaning the same classification model will result from the same training data regardless of the order in which the individual samples are presented to the algorithm. Others, however, can vary based on the training data order.

Contact IEEE to Subscribe

References

References is not available for this document.