1. Dynamic single node failure recovery in distributed storage systems
- Author
-
Itani, M., Sharafeddine, S., Elkabani, I., Itani, M., Sharafeddine, S., and Elkabani, I.
- Abstract
With the emergence of many erasure coding techniques that help provide reliability in practical distributed storage systems, we use fractional repetition coding on the given data and optimize the allocation of data blocks on system nodes in a way that minimizes the system repair cost. We selected fractional repetition coding due to its simple repair mechanism that minimizes the repair and disk access bandwidths together with the property of un-coded repair process. To minimize the system repair cost, we formulate our problem using incidence matrices and solve it heuristically using genetic algorithms for all possible cases of single node failures. We then address three practical extensions that respectively account for newly arriving blocks, newly arriving nodes and variable priority files. A re-optimization mechanism for the storage allocation matrix is proposed for the first two extensions that can be easily implemented in real time without the need to redistribute original on-node blocks. The third extension is addressed by implementing variable fractional repetition codes which is shown to achieve significant cost reduction. The contributions of the paper are four fold: (i) generating an optimized block distribution scheme among the nodes of a given data center for fixed and variable size blocks; (ii) optimization of storage allocation under dynamic environments with data block arrivals; (iii) optimization of storage allocation with newly added storage nodes; and (iv) generating an effective block distribution scheme among the nodes by accounting for varying priorities among data blocks. We present a wide range of results for the various proposed algorithms and considered scenarios to quantify the achievable performance gains.
- Published
- 2018