Loading [MathJax]/extensions/MathMenu.js
A Novel Data Placement Strategy for Data-Sharing Scientific Workflows in Heterogeneous Edge-Cloud Computing Environments | IEEE Conference Publication | IEEE Xplore

A Novel Data Placement Strategy for Data-Sharing Scientific Workflows in Heterogeneous Edge-Cloud Computing Environments


Abstract:

The deployment of datasets in the heterogeneous edge-cloud computing paradigm has received increasing attention in state-of-the-art research. However, due to their large ...Show More

Abstract:

The deployment of datasets in the heterogeneous edge-cloud computing paradigm has received increasing attention in state-of-the-art research. However, due to their large sizes and the existence of private scientific datasets, finding an optimal data placement strategy that can minimize data transmission as well as improve performance, remains a persistent problem. In this study, the advantages of both edge and cloud computing are combined to construct a data placement model that works for multiple scientific workflows. Apparently, the most difficult research challenge is to provide a data placement strategy to consider shared datasets, both within individual and among multiple workflows, across various geographically distributed environments. According to the constructed model, not only the storage capacity of edge micro-datacenters, but also the data transfer between multiple clouds across regions must be considered. To address this issue, we considered the characteristics of this model and identified the factors that are causing the transmission delay. The authors propose using a discrete particle swarm optimization algorithm with differential evolution (DE-DPSO) to distribute dataset during workflow execution. Based on this, a new data placement strategy named DE-DPSO-DPS is proposed. DE-DPSO-DPS is evaluated using several experiments designed in simulated heterogeneous edge-cloud computing environments. The results demonstrate that our data placement strategy can effectively reduce the data transmission time and achieve superior performance as compared to traditional strategies for data-sharing scientific workflows.
Date of Conference: 19-23 October 2020
Date Added to IEEE Xplore: 22 December 2020
ISBN Information:
Conference Location: Beijing, China

Funding Agency:


I. Introduction

With the exponential increase of global cooperation in the scientific research and the rapid development of distributed computing technology, scientific applications have changed significantly these days. They now involve thousands of interwoven tasks and are generally data and computing intensive [1]. To represent these complicated scientific applications, scientific workflow is widely used in several scientific fields [2], such as astronomy, physics, and bioinformatics. Expectedly, due to the complex structure and large data tasks, the deployment of scientific workflows has rigid requirements for computational and storage resources. In some scientific domains, when creating the data placement strategy for these workflows, multiple practical scenarios must be considered. For example, the datasets are often shared among multiple tasks within workflows, including workflows in different geo-distributed organizations. Furthermore, there are several private datasets that may only be allowed to be stored in specific research institutes. Thus, proposing a good data placement strategy, which can generally optimize data transmission time during workflows execution, has always been a major challenge.

Contact IEEE to Subscribe

References

References is not available for this document.