Back to Search Start Over

Data distribution and scheduling for distributed analytics tasks

Authors :
Stephen Pasteris
Christian Makaya
Shiqiang Wang
Mark Herbster
Kevin S. Chan
Source :
SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI
Publication Year :
2017
Publisher :
IEEE, 2017.

Abstract

We consider a distributed edge computing system where we have a number of interconnected machines with limited communication bandwidth and storage capacity. Analytics tasks run on the machines, where each task runs on a single machine but may require data from multiple other machines. Every task requires a given amount of data to run, and it needs to receive all its data within a specific deadline. The application scenario is that each machine has limited storage, thus we usually cannot place the entire amount of data for a specific task on a single machine that executes the task. We assume that the task execution is sparse in time, so that at most one task is executed in the system at any time. The problem we study in this paper is how to distribute the data on machines in the system, without violating the bandwidth and storage constraints, while ensuring that the data transfer deadlines are met. We prove that the optimal solution to this problem is equivalent to that of a max-flow problem on a specifically constructed graph. We present how to construct this graph so that the problem can be solved using standard algorithms for max-flow problems, and also provide some numerical results and further discussions.

Details

Database :
OpenAIRE
Journal :
2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI)
Accession number :
edsair.doi...........2ac186c9b421e6dda52dde498d63dbc0