I. Introduction
Efficient interdisciplinary research requires consolidation of large amounts of historical data from disparate data sources in different subject areas. This type of data reports events of interest that often occurs within various time intervals (e.g., daily, weekly, and monthly). It is common to have multiple concurrent reports on the same event within overlapping time intervals. As a result, this type of data overlapping may lead to severe data redundancy related issues, such as data conflict, data duplication, and missing values that prevent researchers from obtaining the correct answers to queries on an integrated historical database. Meanwhile, the value of individual time intervals on the reporting period is a type of sequence data with the issue of data redundancy; thus, we consider this type of overlapping sequence data as redundant process data. For example, epidemiological data analysis often relies upon the knowledge of reports from different authorities. In order to know the correct total cases of measles in Los Angeles for the year 1900, we have to deal with the related issues of this redundant process data. Those issues obstruct us from evaluating both aggregated results over certain smaller time spans (e.g., weekly or monthly results) and individual value (e.g., daily results). In the case of estimating the aggregated results, consideration of only non-overlapping reports may result in significant underestimation; but at the same time, by ignoring the overlaps, there is a risk of overestimating that number. In the case of recovering values from individual time intervals, without knowing sample sequence values or additional information (e.g., high, low, average values) from both overlapping and non-overlapping reports, it is an even more challenging estimation task. The area of sequence data analysis may include topics of estimating optimal state sequence, pattern recognition, and data clustering. The hidden Markov model (HMM) has been applied and studied for state sequences in many domains. HMM, parameter estimation can be performed by the maximum likelihood method through the expectation maximization (EM) algorithm [1]. The Viterbi algorithm [2] is a well-known approach that predicts the most probable Several studies [3] [4] have investigated rare events using pattern recognizers that are based on HMMs or support vector machines (SVM).