1 Introduction
Time series of gene expression are extremely useful for investigating various kinds of biological processes, such as cell reproduction, development, and response to external stimuli [1], [2], [3]. Gene temporal expression data can be roughly divided into two categories: (i) short time series containing several time points (generally three to eight time points) and (ii) long time series with more than eight time points [4]. It was estimated that approximately 80% of the time series data sets of gene expression are short time series [5]. Most algorithms for analyzing time series are based on traditional clustering methods, such as hierarchical clustering, K-means, Bayesian networks self-organizing map and many more [2], [6], [7], [8], [9], [10], [11]. These methods can reveal some biological characteristics but do not consider the temporal nature of time series, as the algorithms generally do not account for temporal autocorrelation between adjacent pairs of time points. Recently, some progress has been reported for clustering time series of gene expression, such as the expression profiles of continuous representation through hidden Markov models [5], but those remain restricted and domain specific only to long time series data sets. For time series data, these state-of-the-art algorithms tend to overfit because of the small number of sampling points.