Stop Chasing Trends: Discovering High Order Models in Evolving Data | IEEE Conference Publication | IEEE Xplore

Stop Chasing Trends: Discovering High Order Models in Evolving Data


Abstract:

Many applications are driven by evolving data - patterns in Web traffic, program execution traces, network event logs, etc., are often non-stationary. Building prediction...Show More

Abstract:

Many applications are driven by evolving data - patterns in Web traffic, program execution traces, network event logs, etc., are often non-stationary. Building prediction models for evolving data becomes an important and challenging task. Currently, most approaches work by "chasing trends", that is, they keep learning or updating models from the evolving data, and use these impromptu models for online prediction. In many cases, this proves to be both costly and ineffective - much time is wasted on re-learning recurring concepts, yet the classifier may remain one step behind the current trend all the time. In this paper, we propose to mine high-order models in evolving data. More often than not, there are a limited number of concepts, or stable distributions, in the data stream, and concepts switch between each other constantly. We mine all such concepts offline from a historical stream, and build high quality models for each of them. At run time, combining historical concept change patterns and cues provided by an online training stream, we find the most likely current concept and use its corresponding models to classify data in an unlabeled stream. The primary advantage of the high-order model approach is its high accuracy. Experiments show that in benchmark datasets, classification error of the high-order model is only a small fraction of that of the current best approaches. Another important benefit is that, unlike state-of-the-art approaches, our approach does not require users to tune any parameters to achieve a satisfying result on streams of different characteristics.
Date of Conference: 07-12 April 2008
Date Added to IEEE Xplore: 25 April 2008
ISBN Information:

ISSN Information:

Conference Location: Cancun, Mexico
No metrics found for this document.

I. Introduction

The primary task of data mining is to develop models based on existing data. In classification, usually the training data is fixed, for example, it is stored in a data warehouse, and the models, once trained from the stored data, can be applied to future data without much change. Thus, the knowledge discovery process can be regarded as consisting of two sequential phases: a training phase, where models are learned from past data, and a testing phase, where models are applied on the future data.

Usage
Select a Year
2024

View as

Total usage sinceJan 2011:251
01234JanFebMarAprMayJunJulAugSepOctNovDec003020002000
Year Total:7
Data is updated monthly. Usage includes PDF downloads and HTML views.

Contact IEEE to Subscribe

References

References is not available for this document.