1. Instability in Geo-Distributed Kubernetes Federation: Causes and Mitigation
- Author
-
Guillaume Pierre, Mulugeta Ayalew Tamiru, Erik Elmroth, Johan Tordsson, Elastisys AB, Design and Implementation of Autonomous Distributed Systems (MYRIADS), Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-SYSTÈMES LARGE ÉCHELLE (IRISA-D1), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT), This work is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 765452. The information and views set out in this publication are those of the author(s) and do not necessarilyreflect the official opinion of the European Union. Neither the European Union institutions and bodies nor any person acting on their behalf maybe held responsible for the use which may be made of the information contained therein. Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr), European Project: 765452,h2020,H2020-MSCA-ITN-2017,FogGuru(2017), Université de Bretagne Sud (UBS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Rennes (ENS Rennes)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-CentraleSupélec-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Bretagne Sud (UBS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), and Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-École normale supérieure - Rennes (ENS Rennes)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes 1 (UR1)
- Subjects
Computer science ,Distributed computing ,System stability ,020207 software engineering ,Automatic configuration tuning ,02 engineering and technology ,Fog Computing ,Network connectivity ,Instability ,Self adaptation ,Self-configuration ,Fog computing ,020204 information systems ,Kubernetes Federation ,0202 electrical engineering, electronic engineering, information engineering ,Feedback controller ,[INFO.INFO-OS]Computer Science [cs]/Operating Systems [cs.OS] ,Latency (engineering) ,Self-adaptation ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] - Abstract
International audience; As resources in geo-distributed environments are typically located in remote sites characterized by high latency and intermittent network connectivity, delays and transient network failures are common between the management layer and the remote resources. In this paper, we show that delays and transient network failures coupled with static configuration, including the default configuration parameter values, can lead to instability of application deployments in Kubernetes Federation, making applications unavailable for long periods of time. Leveraging on the benefits of configuration tuning, we propose a feedback controller to dynamically adjust the concerned configuration parameter to improve the stability of application deployments without slowing down the detection of hard failures. We show the effectiveness of our approach in a geo-distributed setup across five sites of Grid'5000, bringing system stability from 83-92% with no controller to 99.5-100% using the controller.
- Published
- 2020