Back to Search
Start Over
Resource-Efficient Index Shard Replication in Large Scale Search Engines.
- Source :
-
IEEE Transactions on Parallel & Distributed Systems . Dec2019, Vol. 30 Issue 12, p2820-2835. 16p. - Publication Year :
- 2019
-
Abstract
- With the rapid growth of the Web scale, large scale search engines have to set up a huge number of machines to place the index files of the Web contents. The index files are normally divided into smaller index shards which are often replicated so that queries can be processed in parallel. We observe from real systems that the index shard replication strategy could have a significant impact on the resource usage. In this paper, we investigate the index shard replication problem with the goal of minimizing the resource usage in search engine datacenters. We consider both the offline version and online version of the problem, and formulate the problems as non-linear integer programming problems. We propose several heuristic algorithms to approximate the optimal solution. The proposed algorithms are evaluated by extensive experiments using both synthetic data and real data from commercial search engines. The results demonstrate the effectiveness of the proposed algorithms. Our work also yields many insights about the impact of different input properties on the performance of each algorithm. We believe that this paper will provide valuable guidance to the design of the index shard replication strategy in practice. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 10459219
- Volume :
- 30
- Issue :
- 12
- Database :
- Academic Search Index
- Journal :
- IEEE Transactions on Parallel & Distributed Systems
- Publication Type :
- Academic Journal
- Accession number :
- 139681956
- Full Text :
- https://doi.org/10.1109/TPDS.2019.2924423