Abstract
By renting pay-as-you-go cloud resources (e.g., virtual machines) to do science, the data transfers required during the execution of data-intensive scientific workflows may be remarkably costly not only regarding the workflow execution time (makespan) but also regarding money. As such transfers are prone to delays, they may jeopardise the makespan, stretch the period of resource rentals and, as a result, compromise budgets. In this paper, we explore the possibility of trading some communication for computation during the scheduling production, aiming to schedule a workflow by duplicating some computation of its tasks on which other dependent-tasks critically depend upon to lessen communication between them. This paper explores this premise by enhancing the Heterogeneous Earliest Finish Time (HEFT) algorithm and the Lookahead variant of HEFT. The proposed approach is evaluated using simulation and synthetic data from four real-world scientific workflow applications. Our proposal, which is based on task duplication, can effectively reduce the size of data transfers, which, in turn, contributes to shortening the rental duration of the resources, in addition to minimising network traffic within the cloud.
Original language | Undefined |
---|---|
Title of host publication | 2018 IEEE/ACM 11th International Conference on Utility and Cloud Computing (UCC) |
Pages | 83-92 |
Number of pages | 10 |
DOIs | |
Publication status | Published - 7 Jan 2018 |
Keywords
- Task analysis
- Schedules
- Cloud computing
- Data transfer
- Processor scheduling
- Scheduling
- Delays
- cloud computing
- task duplication
- workflow scheduling
- dag scheduling