Start Over

A Reinforcement Learning Scheduling Strategy for Parallel Cloud-Based Workflows

Authors :: Vítor Silva
Victor Olimpio
André Ferreira do Nascimento
Aline Paes
Daniel de Oliveira
Source :: IPDPS Workshops
Publication Year :: 2019
Publisher :: IEEE, 2019.
Abstract: Scientific experiments can be modeled as Workflows. Such Workflows are usually computing-and data-intensive, demanding the use of High-Performance Computing environments such as clusters, grids, and clouds. This latter offers the advantage of elasticity, which allows for increasing and/or decreasing the number of Virtual Machines (VMs) on demand. Workflows are typically managed using Scientific Workflow Management Systems (SWfMS). Many existing SWfMSs offer support for cloud-based execution. Each SWfMS has its own scheduler that follows a well-defined cost function. However, such cost functions must consider the characteristics of a dynamic environment, such as live migrations and/or performance fluctuations, which are far from trivial to model. This paper proposes a novel scheduling strategy, named ReASSIgN, based on Reinforcement Learning (RL). By relying on an RL technique, one may assume that there is an optimal (or sub-optimal) solution for the scheduling problem, and aims at learning the best scheduling based on previous executions in the absence of a mathematical model of the environment. For this, an extension of a well-known workflow simulator WorkflowSim is proposed to implement an RL strategy for scheduling workflows. Once the scheduling plan is generated, the workflow is executed in the cloud using SciCumulus SWfMS. We conducted a thorough evaluation of the proposed scheduling strategy using a real astronomy workflow.

Subjects :: Schedule
Job shop scheduling
Computer science
business.industry
Distributed computing
020206 networking & telecommunications
Cloud computing
02 engineering and technology
computer.software_genre
Scheduling (computing)
Workflow
Elasticity (cloud computing)
Virtual machine
0202 electrical engineering, electronic engineering, information engineering
Reinforcement learning
020201 artificial intelligence & image processing
business
computer
Workflow management system

Details

Database :: OpenAIRE
Journal :: 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
Accession number :: edsair.doi...........1ed7a45ac29dbab0001d5114af5f227b
Full Text :: https://doi.org/10.1109/ipdpsw.2019.00134