1. Disaster Recovery Layer for Distributed OpenStack Deployments
- Author
-
Oshrit Feder, Dimosthenis Kyriazis, Luis Tomas, Emmanouel Varvarigos, V. Anagnostopoulos, Theodora Varvarigou, Panagiotis Kokkinos, and Kalman Meth
- Subjects
Computer Networks and Communications ,business.industry ,Computer science ,Testbed ,Disaster recovery ,020206 networking & telecommunications ,Cloud computing ,Context (language use) ,02 engineering and technology ,computer.software_genre ,Electronic mail ,Computer Science Applications ,Business continuity ,Hardware and Architecture ,Anycast ,Backup ,020204 information systems ,Data_FILES ,0202 electrical engineering, electronic engineering, information engineering ,Operating system ,business ,computer ,Software ,Information Systems ,Computer network - Abstract
We present the Disaster Recovery Layer (DRL) that enables OpenStack-managed datacenter workloads, Virtual Machines (VMs) and Volumes, to be protected and recovered in another datacenter, in case of a disaster. This work has been carried out in the context of the EU FP7 ORBIT project that develops technologies for enabling business continuity as a service. The DRL framework is based on a number of autonomous components and extensions of OpenStack modules, while its functionalities are available through OpenStack's Horizon UI and command line interface. Also, the DRL's architecture is extensible, allowing for the easy and dynamic integration of protection, restoration and orchestration plug-ins that adopt new approaches. A distributed disaster detection mechanism was also developed for identifying datacenter disasters and alerting the DRL. For the evaluation of the DRL, a two (active and backup) datacenters testbed has been setup in respective sites in Umea and Lulea, 265km apart and connected through the Swedish national research and education network. In case of a disaster, traffic is redirected between the datacenters utilizing the BGP anycast scheme. The experiments performed, show that DRL can efficiently protect VMs and Volumes, with minimum service disruption in case of failures and low overhead, even when the available bandwidth is limited.
- Published
- 2020
- Full Text
- View/download PDF