Back to Search
Start Over
Bohr: Similarity aware geo-distributed data analytics
- Source :
- Scopus-Elsevier, CoNEXT
-
Abstract
- We propose Bohr, a similarity aware geo-distributed data analytics system that minimizes query completion time. The key idea is to exploit similarity between data in different data centers (DCs), and transfer similar data from the bottleneck DC to other sites with more WAN bandwidth. Though these sites have more input data to process, these data are more similar and can be more efficiently aggregated by the combiner to reduce the intermediate data that needs to be shuffled across the WAN. Thus our similarity aware approach reduces the shuffle time and in turn the query completion time (QCT).We design Bohr based on OLAP data cubes to perform efficient similarity checking among datasets in different sites. We implement Bohr on Spark and deploy it across ten sites of AWS EC2. Our extensive evaluation using realistic query workloads shows that Bohr improves the QCT by up to 50% and reduces the intermediate data by up to 6x compared to state-of-the-art solutions that also use OLAP cubes.
- Subjects :
- Computer science
business.industry
Online analytical processing
020206 networking & telecommunications
Cloud computing
02 engineering and technology
computer.software_genre
Bottleneck
Bohr model
Data cube
symbols.namesake
Similarity (network science)
020204 information systems
Spark (mathematics)
0202 electrical engineering, electronic engineering, information engineering
Data analysis
symbols
Data mining
business
computer
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- Scopus-Elsevier, CoNEXT
- Accession number :
- edsair.doi.dedup.....b96491f9656cf02a7b668d8bbebf8b7c