A Study of SQL-on-Hadoop Systems

Authors :: Yueguo Chen
Jiaheng Lu
Huijie Zhang
Haoqiong Bian
Zhaoan Dong
Yanjie Gao
Xiongpai Qin
Xiaoyong Du
Jun Chen
Dehai Liu
Source :: Big Data Benchmarks, Performance Optimization, and Emerging Hardware ISBN: 9783319130200, BPOE@ASPLOS/VLDB
Publication Year :: 2014
Publisher :: Springer International Publishing, 2014.
Abstract: Hadoop is now the de facto standard for storing and processing big data, not only for unstructured data but also for some structured data. As a result, providing SQL analysis functionality to the big data resided in HDFS becomes more and more important. Hive is a pioneer system that support SQL-like analysis to the data in HDFS. However, the performance of Hive is not satisfactory for many applications. This leads to the quick emergence of dozens of SQL-on-Hadoop systems that try to support interactive SQL query processing to the data stored in HDFS. This paper firstly gives a brief technical review on recent efforts of SQL-on-Hadoop systems. Then we test and compare the performance of five representative SQL-on-Hadoop systems, based on some queries selected or derived from the TPC-DS benchmark. According to the results, we show that such systems can benefit more from the applications of many parallel query processing techniques that have been widely studied in the traditional MPP analytical databases.

Subjects :: SQL
Database
Computer science
Data definition language
InformationSystems_DATABASEMANAGEMENT
Data Transformation Services
computer.software_genre
Language Integrated Query
Spatial query
In-Memory Processing
Query by Example
Stored procedure
computer
computer.programming_language

ISBN :: 978-3-319-13020-0
ISBNs :: 9783319130200
Database :: OpenAIRE
Journal :: Big Data Benchmarks, Performance Optimization, and Emerging Hardware ISBN: 9783319130200, BPOE@ASPLOS/VLDB
Accession number :: edsair.doi...........004e36be6334873d5a6af4eac1203763
Full Text :: https://doi.org/10.1007/978-3-319-13021-7_12