Conferences >2023 IEEE 47th Annual Compute...

Historical Redundant Process Data Recovery based on Genetic Algorithm

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In many other domains, the data may have such issues as conflicts, duplicates, and missing values, and it must be cleaned before utilization. For instance, historical dat...Show More

Metadata

Abstract:

In many other domains, the data may have such issues as conflicts, duplicates, and missing values, and it must be cleaned before utilization. For instance, historical data reports on numerous events for overlapping time intervals may have data conflicts caused by database redundancy. These conflicts can prevent researchers from obtaining the correct answers from data. In this paper, we investigated redundant process data recovery (RPDR) approaches to recover the individual values of time intervals from redundant process data. There are three major contributions to this study. First, we explore RPDR approaches from the areas of statistical analysis, evolutionary computation (Genetic Algorithm), and probabilistic value estimations (Bayesian method). Second, we explore the applicability of the proposed RPDR algorithms to the case of having additional information from redundant data. Third, we utilize the concept of optimal CD (Conflict Degree) further to reduce data aggregation error in the integrated historical database. In general, it is challenging to estimate an accurate individual value within a given time interval. With the help of optimal CD, our experimental results demonstrate the high efficiency of the proposed approach by using the genetic algorithm to minimize the misestimation of those sequence values in those individual time spans.

Published in: 2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC)

Date of Conference: 26-30 June 2023

Date Added to IEEE Xplore: 02 August 2023

ISBN Information:

Print on Demand(PoD) ISSN: 0730-3157

DOI: 10.1109/COMPSAC57700.2023.00043

Conference Location: Torino, Italy

Contents

I. Introduction

Efficient interdisciplinary research requires consolidation of large amounts of historical data from disparate data sources in different subject areas. This type of data reports events of interest that often occurs within various time intervals (e.g., daily, weekly, and monthly). It is common to have multiple concurrent reports on the same event within overlapping time intervals. As a result, this type of data overlapping may lead to severe data redundancy related issues, such as data conflict, data duplication, and missing values that prevent researchers from obtaining the correct answers to queries on an integrated historical database. Meanwhile, the value of individual time intervals on the reporting period is a type of sequence data with the issue of data redundancy; thus, we consider this type of overlapping sequence data as redundant process data. For example, epidemiological data analysis often relies upon the knowledge of reports from different authorities. In order to know the correct total cases of measles in Los Angeles for the year 1900, we have to deal with the related issues of this redundant process data. Those issues obstruct us from evaluating both aggregated results over certain smaller time spans (e.g., weekly or monthly results) and individual value (e.g., daily results). In the case of estimating the aggregated results, consideration of only non-overlapping reports may result in significant underestimation; but at the same time, by ignoring the overlaps, there is a risk of overestimating that number. In the case of recovering values from individual time intervals, without knowing sample sequence values or additional information (e.g., high, low, average values) from both overlapping and non-overlapping reports, it is an even more challenging estimation task. The area of sequence data analysis may include topics of estimating optimal state sequence, pattern recognition, and data clustering. The hidden Markov model (HMM) has been applied and studied for state sequences in many domains. HMM, parameter estimation can be performed by the maximum likelihood method through the expectation maximization (EM) algorithm [1]. The Viterbi algorithm [2] is a well-known approach that predicts the most probable Several studies [3] [4] have investigated rare events using pattern recognizers that are based on HMMs or support vector machines (SVM).

References is not available for this document.

Historical Redundant Process Data Recovery based on Genetic Algorithm

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Historical Redundant Process Data Recovery based on Genetic Algorithm

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

References