I. Introduction
With the exponential increase of global cooperation in the scientific research and the rapid development of distributed computing technology, scientific applications have changed significantly these days. They now involve thousands of interwoven tasks and are generally data and computing intensive [1]. To represent these complicated scientific applications, scientific workflow is widely used in several scientific fields [2], such as astronomy, physics, and bioinformatics. Expectedly, due to the complex structure and large data tasks, the deployment of scientific workflows has rigid requirements for computational and storage resources. In some scientific domains, when creating the data placement strategy for these workflows, multiple practical scenarios must be considered. For example, the datasets are often shared among multiple tasks within workflows, including workflows in different geo-distributed organizations. Furthermore, there are several private datasets that may only be allowed to be stored in specific research institutes. Thus, proposing a good data placement strategy, which can generally optimize data transmission time during workflows execution, has always been a major challenge.