1. Handling Data-skew Effects in Join Operations using MapReduce
- Author
-
Frédéric Loulergue, Mohamad Al Hajj Hassan, Mostafa Bamha, PaMDA, Laboratoire d'Informatique Fondamentale d'Orléans (LIFO), Ecole Nationale Supérieure d'Ingénieurs de Bourges-Université d'Orléans (UO)-Ecole Nationale Supérieure d'Ingénieurs de Bourges-Université d'Orléans (UO), Ecole Nationale Supérieure d'Ingénieurs de Bourges-Université d'Orléans (UO), and Loulergue, Frédéric
- Subjects
[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB] ,Adaptive algorithm ,Computer science ,Computation ,Skew ,Join operations ,02 engineering and technology ,Parallel computing ,Hadoop framework ,Data skew ,020204 information systems ,Scalability ,Still face ,0202 electrical engineering, electronic engineering, information engineering ,Programming paradigm ,[INFO.INFO-DC] Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,General Earth and Planetary Sciences ,[INFO.INFO-DB] Computer Science [cs]/Databases [cs.DB] ,020201 artificial intelligence & image processing ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,Raw data ,MapReduce model ,General Environmental Science - Abstract
International audience; For over a decade, MapReduce has become a prominent programming model to handle vast amounts of raw data in large scale systems. This model ensures scalability, reliability and availability aspects with reasonable query processing time. However these large scale systems still face some challenges: data skew, task imbalance, high disk I/O and redistribution costs can have disastrous effects on performance. In this paper, we introduce MRFA-Join algorithm: a new frequency adaptive algorithm based on MapReduce programming model and a randomised key redistribution approach for join processing of large-scale datasets. A cost analysis of this algorithm shows that our approach is insensitive to data skew and ensures perfect balancing properties during all stages of join computation. These performances have been confirmed by a series of experimentations.
- Published
- 2014