Conferences >2008 IEEE 24th International ...

Stop Chasing Trends: Discovering High Order Models in Evolving Data

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Many applications are driven by evolving data - patterns in Web traffic, program execution traces, network event logs, etc., are often non-stationary. Building prediction...Show More

Metadata

Abstract:

Many applications are driven by evolving data - patterns in Web traffic, program execution traces, network event logs, etc., are often non-stationary. Building prediction models for evolving data becomes an important and challenging task. Currently, most approaches work by "chasing trends", that is, they keep learning or updating models from the evolving data, and use these impromptu models for online prediction. In many cases, this proves to be both costly and ineffective - much time is wasted on re-learning recurring concepts, yet the classifier may remain one step behind the current trend all the time. In this paper, we propose to mine high-order models in evolving data. More often than not, there are a limited number of concepts, or stable distributions, in the data stream, and concepts switch between each other constantly. We mine all such concepts offline from a historical stream, and build high quality models for each of them. At run time, combining historical concept change patterns and cues provided by an online training stream, we find the most likely current concept and use its corresponding models to classify data in an unlabeled stream. The primary advantage of the high-order model approach is its high accuracy. Experiments show that in benchmark datasets, classification error of the high-order model is only a small fraction of that of the current best approaches. Another important benefit is that, unlike state-of-the-art approaches, our approach does not require users to tune any parameters to achieve a satisfying result on streams of different characteristics.

Published in: 2008 IEEE 24th International Conference on Data Engineering

Date of Conference: 07-12 April 2008

Date Added to IEEE Xplore: 25 April 2008

ISBN Information:

ISSN Information:

DOI: 10.1109/ICDE.2008.4497501

Conference Location: Cancun, Mexico

References is not available for this document.

Contents

I. Introduction

The primary task of data mining is to develop models based on existing data. In classification, usually the training data is fixed, for example, it is stored in a data warehouse, and the models, once trained from the stored data, can be applied to future data without much change. Thus, the knowledge discovery process can be regarded as consisting of two sequential phases: a training phase, where models are learned from past data, and a testing phase, where models are applied on the future data.

Select All

G. Hulten, L. Spencer, and P. Domingos, "Mining time-changing data streams," in SIGKDD. San Francisco, CA: ACM Press, 2001, pp. 97-106. [Online]. Available: citeseer.nj.nec.com/hulten01mining.html

CrossRef Google Scholar

W. N. Street and Y. Kim, "A streaming ensemble algorithm (SEA) for large-scale classification," in SIGKDD, 2001. [Online]. Available: citeseer.nj.nec.com/489183.html

CrossRef Google Scholar

Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang, "Multi-dimensional regression analysis of time-series data streams," in VLDB, Hongkong, China, 2002.

CrossRef Google Scholar

S. Guha, N. Milshra, R. Motwani, and L. O'Callaghan, "Clustering data streams," in FOCS, 2000, pp. 359-366.

View Article

Google Scholar

H. Wang, J. Yin, J. Pei, P. S. Yu, and J. X. Yu, "Suppressing model overfitting in mining concept-drifting data streams," in SIGKDD, 2006.

CrossRef Google Scholar

H. Wang, W Fan, P. S. Yu, and J. Han, "Mining concept-drifting data streams using ensemble classifiers," in SIGKDD, 2003.

View Article

Google Scholar

Y. Chi, P. S. Yu, H. Wang, and R. Muntz, "Loadstar: A load shedding scheme for classifying data streams," in SIAM Data Mining, 2005.

CrossRef Google Scholar

Y. Chi, H. Wang, and P. S. Yu, "Loadstar: Load shedding in data stream mining," in VLDB, 2005, pp. 1303-1305.

Google Scholar

P. Wang, H. Wang, X. Wu, W. Wang, and B. Shi, "On reducing classifier granularity in mining concept-drifting data streams," in ICDM, 2005.

View Article

Google Scholar

10.

P. Chan, "An extensible meta-learning approach for scalable and accurate inductive learning," Ph.D. dissertation, Columbia University, 1996. [Online]. Available: citeseer.ist.psu.edu/article/chan96extensible.html

11.

H. S. Seung, M. Opper, and H. Sompolinsky, "Query by committee," in Computational Learning Theory, 1992, pp. 287-294. [Online]. Available: citeseer.ist.psu.edu/seung92query.html

CrossRef Google Scholar

12.

T. G. Dietterich, "Ensemble methods in machine learning," Lecture Notes in Computer Science, vol. 1857, pp. 1-15, 2000. [Online]. Available: citeseer.nj.nec.com/dietterich00ensemble.html

CrossRef Google Scholar

13.

Y. Yang, X. Wu, and X. Zhu, "Combining proactive and reactive predictions for data streams," in SIGKDD, 2005, pp. 710-715.

CrossRef Google Scholar

14.

J. Yang, W. Wang, and P. Yu, "Discovering high order periodic patterns," Knowledge and Information Systems Journal (KAIS), vol. 6, no. 3, pp. 243-268, 2004.

CrossRef Google Scholar

15.

J. Z. Kolter and M. A. Maloof, "Dynamic weighted majority: A new ensemble method for tracking concept drift," in ICDM, 2003.

View Article

Google Scholar

16.

K. O. Stanley, "Learning concept drift with a committee of decision trees," in Technical Report Al-03-302, Dept. of Computer Sci., Uni. of Terns at Austin, USA, 2003.

Google Scholar

17.

A. Tsymbal, "The problem of concept drift: definitions and related work," in Technical Report TCD-CS-2004-15, Dept. of Computer Sci., Trinity College Dublin, Ireland, 2004.

Google Scholar

18.

G. Widmer and M. Kubat, "Learning in the presence of concept drift and hidden contexts," in Machine learning, 1996.

CrossRef Google Scholar

19.

C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu, "A framework for clustering evolving data streams," in VLDB, 2003.

CrossRef Google Scholar

20.

KDDCUP-1999, "The third international knowledge discovery and data mining tools competition," http://kdd.ics.uci.edu/databases/kddcup99/ kddcup99.html, 1999.

21.

J. R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.

Google Scholar

References is not available for this document.

Stop Chasing Trends: Discovering High Order Models in Evolving Data

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Stop Chasing Trends: Discovering High Order Models in Evolving Data

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?