1. A Survey on Job and Task Scheduling in Big Data
- Author
-
ALEXANDER, Mr. MALCOM MARSHALL
- Subjects
Computer Science: Artificial Intelligence ,Computer Science: Dynamical Systems ,Artificial Intelligence ,Dynamical Systems - Abstract
Bigdata handles the datasets which exceeds the ability of commonly used software tools for storing, sharing and processing the data. Classification of workload is a major issue to the Big Data community namely job type evolution and job size evolution. On the basis of job type, job size and disk performance, clusters are been formed with data node, name node and secondary name node. To classify the workload and to perform the job scheduling, mapreduce algorithm is going to be applied. Based on the performance of individual machine, workload has been allocated. Mapreduce has two phases for processing the data: map and reduce phases. In map phase, the input dataset taken is splitted into keyvalue pairs and an intermediate output is obtained and in reduce phase that key value pair undergoes shuffle and sort operation. Intermediate files are created from map tasks are written to local disk and output files are written to distributed file system of Hadoop. Scheduling of different jobs to different disks are identified after completing mapreduce tasks. Johnson algorithm is used to schedule the jobs and used to find out the optimal solution of different jobs. It schedules the jobs into different pools and performs the scheduling. The main task to be carried out is to minimize the computation time for entire jobs and analyze the performance using response time factors in hadoop distributed file system. Based on the dataset size and number of nodes which is formed in hadoop cluster, the performance of individual jobs are identified Keywords — hadoop; mapreduce; johnson algorithm
- Published
- 2014