Back to Search Start Over

Resource-Efficient Index Shard Replication in Large Scale Search Engines.

Authors :
Li, Yusen
Tang, Xueyan
Cai, Wentong
Tong, Jiancong
Liu, Xiaoguang
Wang, Gang
Source :
IEEE Transactions on Parallel & Distributed Systems. Dec2019, Vol. 30 Issue 12, p2820-2835. 16p.
Publication Year :
2019

Abstract

With the rapid growth of the Web scale, large scale search engines have to set up a huge number of machines to place the index files of the Web contents. The index files are normally divided into smaller index shards which are often replicated so that queries can be processed in parallel. We observe from real systems that the index shard replication strategy could have a significant impact on the resource usage. In this paper, we investigate the index shard replication problem with the goal of minimizing the resource usage in search engine datacenters. We consider both the offline version and online version of the problem, and formulate the problems as non-linear integer programming problems. We propose several heuristic algorithms to approximate the optimal solution. The proposed algorithms are evaluated by extensive experiments using both synthetic data and real data from commercial search engines. The results demonstrate the effectiveness of the proposed algorithms. Our work also yields many insights about the impact of different input properties on the performance of each algorithm. We believe that this paper will provide valuable guidance to the design of the index shard replication strategy in practice. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10459219
Volume :
30
Issue :
12
Database :
Academic Search Index
Journal :
IEEE Transactions on Parallel & Distributed Systems
Publication Type :
Academic Journal
Accession number :
139681956
Full Text :
https://doi.org/10.1109/TPDS.2019.2924423