1. LuBase: A Search-Efficient Hybrid Storage System for Massive Text Data
- Author
-
Dan Meng, Jingzi Gu, Debin Jia, Bo Li, Weiping Wang, Zhengwei Liu, and Xiaoyan Gu
- Subjects
Database ,Computer science ,Process (engineering) ,business.industry ,Big data ,Fault tolerance ,NoSQL ,computer.software_genre ,Index (publishing) ,Scalability ,Data analysis ,Performance improvement ,business ,computer - Abstract
Recent years have witnessed a great deal of enthusiasm devoting to big data analytics systems, some of them, with the property of high scalability and fault tolerance, are extensively used in real productions. However, such systems are mostly designed for processing immutable data stored in HDFS, not suitable for real-time text data in NoSQL database like HBase. In this paper, we propose a search-efficient hybrid storage system termed LuBase for large-scale text data analytics scenarios. Not just a novel hybrid storage system with fine-grained index, LuBase also presents a new query process flow which can fully employ pre-built full-text index to accelerate the execution of interactive queries and achieve more efficient I/O performance at the same time. We implemented LuBase in a data analytics system based on Impala. Experimental results demonstrate that LuBase can reap huge fruits from Lucene index technique and bring significant performance improvement for Impala when querying HBase.
- Published
- 2015