191 results on '"Östberg, Per-Olov"'
Search Results
2. Power-Performance Tradeoffs in Data Center Servers: DVFS, CPU pinning, Horizontal, and Vertical Scaling
- Author
-
Krzywda, Jakub, Ali-Eldin, Ahmed, Carlson, Trevor E., Östberg, Per-Olov, and Elmroth, Erik
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
Dynamic Voltage and Frequency Scaling (DVFS), CPU pinning, horizontal, and vertical scaling, are four techniques that have been proposed as actuators to control the performance and energy consumption on data center servers. This work investigates the utility of these four actuators, and quantifies the power-performance tradeoffs associated with them. Using replicas of the German Wikipedia running on our local testbed, we perform a set of experiments to quantify the influence of DVFS, vertical and horizontal scaling, and CPU pinning on end-to-end response time (average and tail), throughput, and power consumption with different workloads. Results of the experiments show that DVFS rarely reduces the power consumption of underloaded servers by more than 5%, but it can be used to limit the maximal power consumption of a saturated server by up to 20% (at a cost of performance degradation). CPU pinning reduces the power consumption of underloaded server (by up to 7%) at the cost of performance degradation, which can be limited by choosing an appropriate CPU pinning scheme. Horizontal and vertical scaling improves both the average and tail response time, but the improvement is not proportional to the amount of resources added. The load balancing strategy has a big impact on the tail response time of horizontally scaled applications., Comment: 31 pages
- Published
- 2019
- Full Text
- View/download PDF
3. Lessons learn on responsible AI implementation: the ASSISTANT use case
- Author
-
Vyhmeister, Eduardo, Castane, Gabriel G., Buchholz, Johan, and Östberg, Per-Olov
- Published
- 2022
- Full Text
- View/download PDF
4. Domain Models and Data Modeling as Drivers for Data Management: The ASSISTANT Data Fabric Approach
- Author
-
Östberg, Per-Olov, Vyhmeister, Eduardo, Castañé, Gabriel G., Meyers, Bart, and Van Noten, Johan
- Published
- 2022
- Full Text
- View/download PDF
5. Application Optimisation: Workload Prediction and Autonomous Autoscaling of Distributed Cloud Applications
- Author
-
Östberg, Per-Olov, Le Duc, Thang, Casari, Paolo, García Leiva, Rafael, Fernández Anta, Antonio, Domaschka, Jörg, Lynn, Theo, Series Editor, Mooney, John G., Series Editor, Domaschka, Jörg, editor, and Ellis, Keith A., editor
- Published
- 2020
- Full Text
- View/download PDF
6. Towards an Architecture for Reliable Capacity Provisioning for Distributed Clouds
- Author
-
Domaschka, Jörg, Griesinger, Frank, Leznik, Mark, Östberg, Per-Olov, Ellis, Keith A., Casari, Paolo, Fowley, Frank, Lynn, Theo, Lynn, Theo, Series Editor, Mooney, John G., Series Editor, Domaschka, Jörg, editor, and Ellis, Keith A., editor
- Published
- 2020
- Full Text
- View/download PDF
7. ASSISTANT: Learning and Robust Decision Support System for Agile Manufacturing Environments
- Author
-
Beldiceanu, Nicolas, Dolgui, Alexandre, Gonnermann, Clemens, Gonzalez-Castañé, Gabriel, Kousi, Niki, Meyers, Bart, Prud’homme, Julien, Thevenin, Simon, Vyhmeister, Eduardo, and Östberg, Per-Olov
- Published
- 2021
- Full Text
- View/download PDF
8. Towards Understanding Job Heterogeneity in HPC: A NERSC Case Study
- Author
-
Rodrigo, Gonzalo P, Östberg, Per-Olov, Elmroth, Erik, Antypas, Katie, Gerber, Richard, and Ramakrishnan, Lavanya
- Subjects
Distributed Computing and Systems Software ,Information and Computing Sciences ,Built Environment and Design - Published
- 2016
9. ScSF: A Scheduling Simulation Framework
- Author
-
Rodrigo, Gonzalo P., Elmroth, Erik, Östberg, Per-Olov, Ramakrishnan, Lavanya, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Klusáček, Dalibor, editor, Cirne, Walfredo, editor, and Desai, Narayan, editor
- Published
- 2018
- Full Text
- View/download PDF
10. Application Optimisation: Workload Prediction and Autonomous Autoscaling of Distributed Cloud Applications
- Author
-
Östberg, Per-Olov, primary, Le Duc, Thang, additional, Casari, Paolo, additional, García Leiva, Rafael, additional, Fernández Anta, Antonio, additional, and Domaschka, Jörg, additional
- Published
- 2020
- Full Text
- View/download PDF
11. COGNIT: Challenges and Vision for a Serverless and Multi-Provider Cognitive Cloud-Edge Continuum
- Author
-
Townend, Paul, primary, Martí, Alberto P., additional, De La Iglesia, Idoia, additional, Matskanis, Nikolaos, additional, Timoudas, Thomas Ohlson, additional, Hallmann, Torsten, additional, Lalaguna, Antonio, additional, Swat, Kaja, additional, Renzi, Francesco, additional, Bocheński, Dominik, additional, Mancini, Marco, additional, Bhuyan, Monowar, additional, González-Hierro, Marco, additional, Dupont, Sébastien, additional, Kristiansson, Johan, additional, Montero, Rubén S., additional, Elmroth, Erik, additional, Valdés, Iván, additional, Massonet, Philippe, additional, Olsson, Daniel, additional, Llorente, Ignacio M., additional, Östberg, Per-Olov, additional, and Abdou, Michael, additional
- Published
- 2023
- Full Text
- View/download PDF
12. Reducing Complexity in Service Development and Integration
- Author
-
Östberg, Per-Olov, Lockner, Niclas, Filipe, Joaquim, Editorial Board Member, Barbosa, Simone Diniz Junqueira, Founding Editor, Chen, Phoebe, Founding Editor, Du, Xiaoyong, Founding Editor, Kara, Orhun, Founding Editor, Kotenko, Igor, Founding Editor, Liu, Ting, Founding Editor, Sivalingam, Krishna M., Founding Editor, Washio, Takashi, Founding Editor, Helfert, Markus, editor, Desprez, Frédéric, editor, Ferguson, Donald, editor, Leymann, Frank, editor, and Méndez Munoz, Victor, editor
- Published
- 2015
- Full Text
- View/download PDF
13. Priority Operators for Fairshare Scheduling
- Author
-
Rodrigo, Gonzalo P., Östberg, Per-Olov, Elmroth, Erik, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Cirne, Walfredo, editor, and Desai, Narayan, editor
- Published
- 2015
- Full Text
- View/download PDF
14. Formal models for the energy-aware cloud-edge computing continuum : analysis and challenges
- Author
-
Patel, Yashwant Singh, Townend, Paul, Östberg, Per-Olov, Patel, Yashwant Singh, Townend, Paul, and Östberg, Per-Olov
- Abstract
Cloud infrastructures are rapidly evolving from centralised systems to geographically distributed federations of edge devices, fog nodes, and clouds. These federations (often referred to as the Cloud-Edge Continuum) are the foundation upon which most modern digital systems depend, and consume enormous amounts of energy. This consumption is becoming a critical issue as society's energy challenges grow, and is a great concern for power grids which must balance the needs of clouds against other users. The Continuum is highly dynamic, mobile, and complex; new methods to improve energy efficiency must be based on formal scientific models that identify and take into account a huge range of heterogeneous components, interactions, stochastic properties, and (potentially contradictory) service-level agreements and stakeholder objectives. Currently, few formal models of federated Cloud-Edge systems exist - and none adequately represent and integrate energy considerations (e.g. multiple providers, renewable energy sources, pricing, and the need to balance consumption over large-areas with other non-Cloud consumers, etc.). This paper conducts a systematic analysis of current approaches to modelling Cloud, Cloud-Edge, and federated Continuum systems with an emphasis on the integration of energy considerations. We identify key omissions in the literature, and propose an initial high-level architecture and approach to begin addressing these - with the ultimate goal to develop a set of integrated models that include data centres, edge devices, fog nodes, energy providers, software workloads, end users, and stakeholder requirements and objectives. We conclude by highlighting the key research challenges that must be addressed to enable meaningful energy-aware Cloud-Edge Continuum modelling and simulation.
- Published
- 2023
- Full Text
- View/download PDF
15. COGNIT: challenges and vision for a serverless and multi-provider cognitive cloud-edge continuum
- Author
-
Townend, Paul, Martí, Alberto P., De La Iglesia, Idoia, Matskanis, Nikolaos, Ohlson Timoudas, Thomas, Hallmann, Torsten, Lalaguna, Antonio, Swat, Kaja, Renzi, Francesco, Bocheński, Dominik, Mancini, Marco, Bhuyan, Monowar H., González-Hierro, Marco, Dupont, Sébastien, Kristiansson, Johan, Montero, Rubén S., Elmroth, Erik, Valdés, Iván, Massonet, Philippe, Olsson, Daniel, Llorente, Ignacio M., Östberg, Per-Olov, Abdou, Michael, Townend, Paul, Martí, Alberto P., De La Iglesia, Idoia, Matskanis, Nikolaos, Ohlson Timoudas, Thomas, Hallmann, Torsten, Lalaguna, Antonio, Swat, Kaja, Renzi, Francesco, Bocheński, Dominik, Mancini, Marco, Bhuyan, Monowar H., González-Hierro, Marco, Dupont, Sébastien, Kristiansson, Johan, Montero, Rubén S., Elmroth, Erik, Valdés, Iván, Massonet, Philippe, Olsson, Daniel, Llorente, Ignacio M., Östberg, Per-Olov, and Abdou, Michael
- Abstract
Use of the serverless paradigm in cloud application development is growing rapidly, primarily driven by its promise to free developers from the responsibility of provisioning, operating, and scaling the underlying infrastructure. However, modern cloud-edge infrastructures are characterized by large numbers of disparate providers, constrained resource devices, platform heterogeneity, infrastructural dynamicity, and the need to orchestrate geographically distributed nodes and devices over public networks. This presents significant management complexity that must be addressed if serverless technologies are to be used in production systems. This position paper introduces COGNIT, a major new European initiative aiming to integrate AI technology into cloud-edge management systems to create a Cognitive Cloud reference framework and associated tools for serverless computing at the edge. COGNIT aims to: 1) support an innovative new serverless paradigm for edge application management and enhanced digital sovereignty for users and developers; 2) enable on-demand deployment of large-scale, highly distributed and self-adaptive serverless environments using existing cloud resources; 3) optimize data placement according to changes in energy efficiency heuristics and application demands and behavior; 4) enable secure and trusted execution of serverless runtimes. We identify and discuss seven research challenges related to the integration of serverless technologies with multi-provider Edge infrastructures and present our vision for how these challenges can be solved. We introduce a high-level view of our reference architecture for serverless cloud-edge continuum systems, and detail four motivating real-world use cases that will be used for validation, drawing from domains within Smart Cities, Agriculture and Environment, Energy, and Cybersecurity.
- Published
- 2023
- Full Text
- View/download PDF
16. ScSF: A Scheduling Simulation Framework
- Author
-
Rodrigo, Gonzalo P., primary, Elmroth, Erik, additional, Östberg, Per-Olov, additional, and Ramakrishnan, Lavanya, additional
- Published
- 2018
- Full Text
- View/download PDF
17. Service Development Abstraction: A Design Methodology and Development Toolset for Abstractive and Flexible Service-Based Software
- Author
-
Östberg, Per-Olov, Elmroth, Erik, Ivanov, Ivan, editor, van Sinderen, Marten, editor, and Shishkov, Boris, editor
- Published
- 2012
- Full Text
- View/download PDF
18. Designing Service-Based Resource Management Tools for a Healthy Grid Ecosystem
- Author
-
Elmroth, Erik, Hernández, Francisco, Tordsson, Johan, Östberg, Per-Olov, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Wyrzykowski, Roman, editor, Dongarra, Jack, editor, Karczewski, Konrad, editor, and Wasniewski, Jerzy, editor
- Published
- 2008
- Full Text
- View/download PDF
19. Grid infrastructure tools for multi-level job management
- Author
-
Elmroth, Erik, Gardfjäll, Peter, Norberg, Arvid, Tordsson, Johan, Östberg, Per-Olov, Priol, Thierry, editor, and Vanneschi, Marco, editor
- Published
- 2007
- Full Text
- View/download PDF
20. The 6G Computing Continuum (6GCC): Meeting the 6G computing challenges
- Author
-
Tärneberg, William, Fitzgerald, Emma, Bhuyan, Monowar, Townend, Paul, Årzén, Karl-Erik, Östberg, Per-Olov, Elmroth, Erik, Eker, Johan, Tufvesson, Fredrik, and Kihl, Maria
- Subjects
Communication Systems - Abstract
6G systems, such as Large Intelligent Surfaces, will require distributed, complex, and coordinated decisions throughout a very heterogeneous and cell free infrastructure. This will require a fundamentally redesigned software infrastructure accompanied by massively distributed and heterogeneous computing resources, vastly different from current wireless networks.To address these challenges, in this paper, we propose and motivate the concept of a 6G Computing Continuum (6GCC) and two research testbeds, to advance the rate and quality of research. 6G Computing Continuum is an end-to-end computeand software platform for realizing large intelligent surfaces and its tenant users and applications. One for addressing the challenges or orchestrating shared computational resources in the wireless domain, implemented on a Large Intelligent Surfaces testbed. Another simulation-based testbed is intended to address scalability and global-scale orchestration challenges.
- Published
- 2022
21. Decentralized scalable fairshare scheduling
- Author
-
Östberg, Per-Olov, Espling, Daniel, and Elmroth, Erik
- Published
- 2013
- Full Text
- View/download PDF
22. GJMF — a composable service-oriented grid job management framework
- Author
-
Östberg, Per-Olov and Elmroth, Erik
- Published
- 2013
- Full Text
- View/download PDF
23. The ASSISTANT project: AI for high level decisions in manufacturing
- Author
-
Castañé, G., Dolgui, A., Kousi, N., Meyers, B., Thevenin, S., Vyhmeister, E., Östberg, Per-Olov, Castañé, G., Dolgui, A., Kousi, N., Meyers, B., Thevenin, S., Vyhmeister, E., and Östberg, Per-Olov
- Abstract
This paper outlines the main idea and approach of the H2020 ASSISTANT (LeArning and robuSt deciSIon SupporT systems for agile mANufacTuring environments) project. ASSISTANT is aimed at the investigation of AI-based tools for adaptive manufacturing environments, and focuses on the development of a set of digital twins for integration with, management of, and decision support for production planning and control. The ASSISTANT tools are based on the approach of extending generative design, an established methodology for product design, to a broader set of manufacturing decision making processes; and to make use of machine learning, optimisation, and simulation techniques to produce executable models capable of ethical reasoning and data-driven decision making for manufacturing systems. Combining human control and accountable AI, the ASSISTANT toolsets span a wide range of manufacturing processes and time scales, including process planning, production planning, scheduling, and real-time control. They are designed to be adaptable and applicable in a both general and specific manufacturing environments.
- Published
- 2022
- Full Text
- View/download PDF
24. Priority Operators for Fairshare Scheduling
- Author
-
Rodrigo, Gonzalo P., primary, Östberg, Per-Olov, additional, and Elmroth, Erik, additional
- Published
- 2015
- Full Text
- View/download PDF
25. Multivariate Time Series Synthesis Using Generative Adversarial Networks
- Author
-
Leznik, Mark, Michalsky, Patrick, Willis, Peter, Schanzel, Benjamin, Östberg, Per-Olov, Domaschka, Jörg, Leznik, Mark, Michalsky, Patrick, Willis, Peter, Schanzel, Benjamin, Östberg, Per-Olov, and Domaschka, Jörg
- Abstract
Collection and analysis of distributed (cloud) computing workloads allows for a deeper understanding of user and system behavior and is necessary for efficient operation of infrastructures and applications. The availability of such workload data is however often limited as most cloud infrastructures are commercially operated and monitoring data is considered proprietary or falls under GPDR regulations. This work investigates the generation of synthetic workloads using Generative Adversarial Networks and addresses a current need for more data and better tools for workload generation. Resource utilization measurements such as the utilization rates of Content Delivery Network (CDN) caches are generated and a comparative evaluation pipeline using descriptive statistics and time-series analysis is developed to assess the statistical similarity of generated and measured workloads. We use CDN data open sourced by us in a data generation pipeline as well as back-end ISP workload data to demonstrate the multivariate synthesis capability of our approach. The work contributes a generation method for multivariate time series workload generation that can provide arbitrary amounts of statistically similar data sets based on small subsets of real data. The presented technique shows promising results, in particular for heterogeneous workloads not too irregular in temporal behavior.
- Published
- 2021
- Full Text
- View/download PDF
26. Multivariate Time Series Synthesis Using Generative Adversarial Networks
- Author
-
Leznik, Mark, primary, Michalsky, Patrick, additional, Willis, Peter, additional, Schanzel, Benjamin, additional, Östberg, Per-Olov, additional, and Domaschka, Jörg, additional
- Published
- 2021
- Full Text
- View/download PDF
27. Modeling and Simulation of QoS-Aware Power Budgeting in Cloud Data Centers
- Author
-
Krzywda, Jakub, Meyer, Vinicius, Xavier, Miguel G., Ali-Eldin, Ahmed, Östberg, Per-Olov, De Rose, Cesar A. F., Elmroth, Erik, Krzywda, Jakub, Meyer, Vinicius, Xavier, Miguel G., Ali-Eldin, Ahmed, Östberg, Per-Olov, De Rose, Cesar A. F., and Elmroth, Erik
- Abstract
Power budgeting is a commonly employed solution to reduce the negative consequences of high power consumption of large scale data centers. While various power budgeting techniques and algorithms have been proposed at different levels of data center infrastructures to optimize the power allocation toservers and hosted applications, testing them has been challengingwith no available simulation platform that enables such testingfor different scenarios and configurations. To facilitate evaluationand comparison of such techniques and algorithms, we introducea simulation model for Quality-of-Service aware power budgetingand its implementation in CloudSim. We validate the proposedsimulation model against a deployment on a real testbed, showcase simulator capabilities, and evaluate its scalability.
- Published
- 2020
- Full Text
- View/download PDF
28. Workload Diffusion Modeling for Distributed Applications in Fog/Edge Computing Environments
- Author
-
Le Duc, Thang, Leznik, Mark, Domaschka, Jorg, Östberg, Per-Olov, Le Duc, Thang, Leznik, Mark, Domaschka, Jorg, and Östberg, Per-Olov
- Abstract
This paper addresses the problem of workload generation for distributed applications in fog/edge computing. Unlike most existing work that tends to generate workload data for individual network nodes using historical data from the targeted node, this work aims to extrapolate supplementary workloads for entire application / infrastructure graphs through diffusion of measurements from limited subsets of nodes. A framework for workload generation is proposed, which defines five diffusion algorithms that use different techniques for data extrapolation and generation. Each algorithm takes into account different constraints and assumptions when executing its diffusion task, and individual algorithms are applicable for modeling different types of applications and infrastructure networks. Experiments are performed to demonstrate the approach and evaluate the performance of the algorithms under realistic workload settings, and results are validated using statistical techniques.
- Published
- 2020
- Full Text
- View/download PDF
29. Final System Architecture and Integration : RECAP Deliverable 4.4
- Author
-
Domaschka, Jörg, Narvä, Linus, Östberg, Per-Olov, Svorobej, Sergej, Garcia, Rafael, Ellis, Keith, Griesinger, Frank, European Union (EU), and Horizon 2020
- Subjects
Programming (Mathematics) ,capacity provisioning ,Computerarchitektur ,Optimierung ,Rechenkapazität ,system architecture ,Edge computing ,Computer simulation ,simulation ,machine learning ,ddc:000 ,Computer architecture ,DDC 000 / Computer science, information & general works ,Computer capacity ,optimization ,Maschinelles Lernen ,Computersimulation - Abstract
RECAP targets the automated operation and management of applications / service chains in largescale geographically distributed infrastructure. As such, RECAP components need to operate across distributed infrastructure, which makes the RECAP tooling itself a distributed application. Earlier documents introduced RECAP’s initial architecture (D4.2) and described the initial prototype of the platform (D4.6). This document combines both predecessor documents and enhances them with an updated description of the RECAP architecture. From an architectural point of view, RECAP consists of four functional building blocks (sub systems): (i) landscaping and monitoring, (ii) application and infrastructure optimization, (iii) simulation and planning, and (iv) data analytics and machine learning. (i) The Landscaping and Monitoring sub-system is responsible for gathering information about the structure of the current infrastructure and application landscape and monitoring their state. This is the main source for both the Data Analytics and Machine Learning as well as Application and Infrastructure Optimization sub-system. (ii) The Application and Infrastructure optimizers make decisions based on internal models as well as live information provided by the landscaping and monitoring subsystem. Separating between dedicated application and infrastructure optimizers enables the realisation of a ‘separation of concerns’. (iii) The simulation and planning sub-system provides means for supporting the validation of RECAP models, the execution of ‘what-if’ scenarios, and the long-term planning of the large scale infrastructures. (iv) The data analytics and machine learning sub system provides tools and means to distil statistical properties and patterns from load traces; particular focus is thereby put on workload prediction. With regard to integration, RECAP targets a loose coupling between the components fostering independent uptake and re-use of building blocks.
- Published
- 2019
30. Service Development Abstraction: A Design Methodology and Development Toolset for Abstractive and Flexible Service-Based Software
- Author
-
Östberg, Per-Olov, primary and Elmroth, Erik, additional
- Published
- 2012
- Full Text
- View/download PDF
31. Workload Diffusion Modeling for Distributed Applications in Fog/Edge Computing Environments
- Author
-
Le Duc, Thang, primary, Leznik, Mark, additional, Domaschka, Jörg, additional, and Östberg, Per-Olov, additional
- Published
- 2020
- Full Text
- View/download PDF
32. Power Shepherd : Application Performance Aware Power Shifting
- Author
-
Krzywda, Jakub, Ali-Eldin, Ahmed, Wadbro, Eddie, Östberg, Per-Olov, Elmroth, Erik, Krzywda, Jakub, Ali-Eldin, Ahmed, Wadbro, Eddie, Östberg, Per-Olov, and Elmroth, Erik
- Abstract
Constantly growing power consumption of data centers is a major concern from environmental and economical reasons. Current approaches to reduce negative consequences of high power consumption focus on limiting the peak power consumption. During high workload periods, power consumption of highly utilized servers is throttled to stay within the power budget. However, the peak power reduction affects performance of hosted applications and thus leads to Quality of Service violations. In this paper, we introduce Power Shepherd, a hierarchical system for application performance aware power shifting. Power Shepherd reduces the data center operational costs by redistributing the available power among applications hosted in the cluster. This is achieved by, assigning server power budgets by the cluster controller, enforcing these power budgets using Running Average Power Limit (RAPL), and prioritizing applications within each server by adjusting the CPU scheduling configuration. We implement a prototype of the proposed solution and evaluate it in a real testbed equipped with power meters and using representative cloud applications. Our experiments show that Power Shepherd has potential to manage a cluster consisting of thousands of servers and limit the increase of operational costs by a significant amount when the cluster power budget is limited and the system is overutilized. Finally, we identify some outstanding challenges regarding model sensitivity and the fact that this approach in its current from is not beneficial to be used in all situations, e.g., when the system is underutilized., Originally included in thesis in manuscript form.
- Published
- 2019
- Full Text
- View/download PDF
33. Machine Learning Methods for Reliable Resource Provisioning in Edge-Cloud Computing : A Survey
- Author
-
Le Duc, Thang, García Leiva, Rafael, Casari, Paolo, Östberg, Per-Olov, Le Duc, Thang, García Leiva, Rafael, Casari, Paolo, and Östberg, Per-Olov
- Abstract
Large-scale software systems are currently designed as distributed entities and deployed in cloud data centers. To overcome the limitations inherent to this type of deployment, applications are increasingly being supplemented with components instantiated closer to the edges of networks—a paradigm known as edge computing. The problem of how to efficiently orchestrate combined edge-cloud applications is, however, incompletely understood, and a wide range of techniques for resource and application management are currently in use. This article investigates the problem of reliable resource provisioning in joint edge-cloud environments, and surveys technologies, mechanisms, and methods that can be used to improve the reliability of distributed applications in diverse and heterogeneous network environments. Due to the complexity of the problem, special emphasis is placed on solutions to the characterization, management, and control of complex distributed applications using machine learning approaches. The survey is structured around a decomposition of the reliable resource provisioning problem into three categories of techniques: workload characterization and prediction, component placement and system consolidation, and application elasticity and remediation. Survey results are presented along with a problem-oriented discussion of the state-of-the-art. A summary of identified challenges and an outline of future research directions are presented to conclude the article.
- Published
- 2019
- Full Text
- View/download PDF
34. Machine Learning Methods for Reliable Resource Provisioning in Edge-Cloud Computing
- Author
-
Duc, Thang Le, primary, Leiva, Rafael García, additional, Casari, Paolo, additional, and Östberg, Per-Olov, additional
- Published
- 2019
- Full Text
- View/download PDF
35. Power-performance tradeoffs in data center servers : DVFS, CPU pinning, horizontal, and vertical scaling
- Author
-
Krzywda, Jakub, Ali-Eldin, Ahmed, Carlson, Trevor E., Östberg, Per-Olov, Elmroth, Erik, Krzywda, Jakub, Ali-Eldin, Ahmed, Carlson, Trevor E., Östberg, Per-Olov, and Elmroth, Erik
- Abstract
Dynamic Voltage and Frequency Scaling (DVFS), CPU pinning, horizontal, and vertical scaling, are four techniques that have been proposed as actuators to control the performance and energy consumption on data center servers. This work investigates the utility of these four actuators, and quantifies the power-performance tradeoffs associated with them. Using replicas of the German Wikipedia running on our local testbed, we perform a set of experiments to quantify the influence of DVFS, vertical and horizontal scaling, and CPU pinning on end-to-end response time (average and tail), throughput, and power consumption with different workloads. Results of the experiments show that DVFS rarely reduces the power consumption of underloaded servers by more than 5%, but it can be used to limit the maximal power consumption of a saturated server by up to 20% (at a cost of performance degradation). CPU pinning reduces the power consumption of underloaded server (by up to 7%) at the cost of performance degradation, which can be limited by choosing an appropriate CPU pinning scheme. Horizontal and vertical scaling improves both the average and tail response time, but the improvement is not proportional to the amount of resources added. The load balancing strategy has a big impact on the tail response time of horizontally scaled applications.
- Published
- 2018
- Full Text
- View/download PDF
36. ALPACA : Application Performance Aware Server Power Capping
- Author
-
Krzywda, Jakub, Ali-Eldin, A., Wadbro, Eddie, Östberg, Per-Olov, Elmroth, Erik, Krzywda, Jakub, Ali-Eldin, A., Wadbro, Eddie, Östberg, Per-Olov, and Elmroth, Erik
- Abstract
Server power capping limits the power consumption of a server to not exceed a specific power budget. This allows data center operators to reduce the peak power consumption at the cost of performance degradation of hosted applications. Previous work on server power capping rarely considers Quality-of-Service (QoS) requirements of consolidated services when enforcing the power budget. In this paper, we introduce ALPACA, a framework to reduce QoS violations and overall application performance degradation for consolidated services. ALPACA reduces unnecessary high power consumption when there is no performance gain, and divides the power among the running services in a way that reduces the overall QoS degradation when the power is scarce. We evaluate ALPACA using four applications: MediaWiki, SysBench, Sock Shop, and CloudSuite’s Web Search benchmark. Our experiments show that ALPACA reduces the operational costs of QoS penalties and electricity by up to 40% compared to a non optimized system.
- Published
- 2018
- Full Text
- View/download PDF
37. Towards understanding HPC users and systems : a NERSC case study
- Author
-
Rodrigo, Gonzalo P., Östberg, Per-Olov, Elmroth, Erik, Antypas, Katie, Gerber, Richard, Ramakrishnan, Lavanya, Rodrigo, Gonzalo P., Östberg, Per-Olov, Elmroth, Erik, Antypas, Katie, Gerber, Richard, and Ramakrishnan, Lavanya
- Abstract
High performance computing (HPC) scheduling landscape currently faces new challenges due to the changes in the workload. Previously, HPC centers were dominated by tightly coupled MPI jobs. HPC workloads increasingly include high-throughput, data-intensive, and stream-processing applications. As a consequence, workloads are becoming more diverse at both application and job levels, posing new challenges to classical HPC schedulers. There is a need to understand the current HPC workloads and their evolution to facilitate informed future scheduling research and enable efficient scheduling in future HPC systems. In this paper, we present a methodology to characterize workloads and assess their heterogeneity, at a particular time period and its evolution over time. We apply this methodology to the workloads of three systems (Hopper, Edison, and Carver) at the National Energy Research Scientific Computing Center (NERSC). We present the resulting characterization of jobs, queues, heterogeneity, and performance that includes detailed information of a year of workload (2014) and evolution through the systems' lifetime (2010–2014)., Work also supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) and we used resources at the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility, supported by the Officece of Science of the U.S. Department of Energy, both under Contract No. DE-AC02-05CH11231.Originally included in thesis in manuscript form in 2017.
- Published
- 2018
- Full Text
- View/download PDF
38. ScSF : a scheduling simulation framework
- Author
-
Gonzalo P., Rodrigo, Elmroth, Erik, Östberg, Per-Olov, Ramakrishnan, Lavanya, Gonzalo P., Rodrigo, Elmroth, Erik, Östberg, Per-Olov, and Ramakrishnan, Lavanya
- Abstract
High-throughput and data-intensive applications are increasingly present, often composed as workflows, in the workloads of current HPC systems. At the same time, trends for future HPC systems point towards more heterogeneous systems with deeper I/O and memory hierarchies. However, current HPC schedulers are designed to support classical large tightly coupled parallel jobs over homogeneous systems. Therefore, There is an urgent need to investigate new scheduling algorithms that can manage the future workloads on HPC systems. However, there is a lack of appropriate models and frameworks to enable development, testing, and validation of new scheduling ideas. In this paper, we present an open-source scheduler simulation framework (ScSF) that covers all the steps of scheduling research through simulation. ScSF provides capabilities for workload modeling, workload generation, system simulation, comparative workload analysis, and experiment orchestration. The simulator is designed to be run over a distributed computing infrastructure enabling to test at scale. We describe in detail a use case of ScSF to develop new techniques to manage scientific workflows in a batch scheduler. In the use case, such technique was implemented in the framework scheduler. For evaluation purposes, 1728 experiments, equivalent to 33 years of simulated time, were run in a deployment of ScSF over a distributed infrastructure of 17 compute nodes during two months. Finally, the experimental results were analyzed in the framework to judge that the technique minimizes workflows’ turnaround time without over-allocating resources. Finally, we discuss lessons learned from our experiences that will help future researchers., Work also supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) and we used resources at the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility, supported by the Officece of Science of the U.S. Department of Energy, both under Contract No. DE-AC02-05CH11231.
- Published
- 2018
- Full Text
- View/download PDF
39. Application, Workload, and Infrastructure Models for Virtualized Content Delivery Networks Deployed in Edge Computing Environments
- Author
-
Le Duc, Thang, Östberg, Per-Olov, Le Duc, Thang, and Östberg, Per-Olov
- Abstract
Content Delivery Networks (CDNs) are handling a large part of the traffic over the Internet and are of growing importance for management and operation of coming generations of data intensive applications. This paper addresses modeling and scaling of content-oriented applications, and presents workload, application, and infrastructure models developed in collaboration with a large-scale CDN operating infrastructure provider aimed to improve the performance of content delivery subsystems deployed in wide area networks. It has been shown that leveraging edge resources for the deployment of caches of content greatly benefits CDNs. Therefore, the models are described from an edge computing perspective and intended to be integrated in network topology aware application orchestration and resource management systems., RECAP
- Published
- 2018
- Full Text
- View/download PDF
40. CACTOS toolkit version 2: accompanying document for prototype deliverable D5.2.2
- Author
-
Groenda, Henning, Stier, Christian, Krzywda, Jakub, Byrne, James, Svorobej, Sergej, Castañé, Gabriel González, Papazachos, Zafeirios, Sheridan, Craig, Whigham, Darren, Hauser, Christopher, Tsitsipas, Athanasios, Domaschka, Jörg, Ali-Eldin, Ahmed, and Östberg, Per-Olov
- Subjects
Analytics ,Toolkit ,Datenmanagement ,Cloud Computing ,Data management ,Context-aware cloud topology ,Electric network topology ,Cloud services ,Tooling ,Optimisation ,DDC 004 / Data processing & computer science ,ddc:004 ,Cloud ,Simulation - Abstract
This document is accompanying material for the prototype deliverable D5.2.2. It describes the changes for the second version of the CACTOS Toolkit and provides details on the integration between the tools and toolkits. A main focus is on showing updated models, as this is how information is passed between the tools. Identical models are used during Runtime and Prediction time. Please refer to accompanying material for the prototype deliverable (D5.2.1 CACTOS Toolkit Version 1) for an overview on the CACTOS toolkits and an exemplary use case. Note that there are two CACTOS toolkits: The CACTOS Runtime Toolkit (label before year 1: CACTOS Toolkit) and the CACTOS Prediction Toolkit. The CACTOS Runtime Toolkit contains the tools CactoScale and CactoOpt and is described in this deliverable. The CACTOS Prediction Toolkit is described in (D6.4 CactoSim Simulation Framework Final Prototype). The major architectural additions to the CACTOS Runtime Toolkit since year 1 are added support for monitoring and scaling of White-Box Applications. White-Box Applications allow for monitoring of application internals on top of the VM-level metrics that CACTOS collects for all VMs. White-Box Applications such as PlayGen’s DataPlay can use CACTOS AutoScaling services to let the CACTOS Runtime Toolkit adapt the degree of horizontal scaling based on the current load. This document describes both the additions to the models and CACTOS Runtime Toolkit that have been made to support monitoring and scaling of White-Box Applications. Finally, the document provides detailed insight into the architecture and service structure of the CACTOS Runtime Toolkit. This includes a detailed description of the Virtualisation Middleware Integration (VMI) and VMI Controller that form the Cloud-specific connector CACTOS uses to translate its optimisation decisions to a running Cloud environment’s API. Additionally, an overview over the Extensible Services Infrastructure architecture style is given. This architecture style allows for a dynamic reconfiguration of used optimisation algorithms and policies. The style also eases the coupling and analysis of optimisation algorithms in the CACTOS Prediction Toolkit.
- Published
- 2017
41. Model integration method and supporting tooling: project deliverable D5.1
- Author
-
Groenda, Henning, Stier, Christian, Krzywda, Jakub, Byrne, James, Svorobej, Sergej, Papazachos, Zafeirios, Sheridan, Craig, Whigham, Darren, and Östberg, Per-Olov
- Subjects
Analytics ,Datenmanagement ,Data management ,Model integration ,Context-aware cloud topology ,Electric network topology ,Cactos Projekt ,Cloud services ,Tooling ,Cloud computing ,DDC 004 / Data processing & computer science ,ddc:004 ,Cloud - Abstract
The CACTOS project aims to improve the operational efficiency of cloud data centres by supporting data centre operators in the planning and operation of heterogeneous data centres. One major goal of CACTOS is to enable automated capacity and resource management for virtualised infrastructure environments built upon the Infrastructure as a Service (IaaS) paradigm. This document outlines the model‐driven methodology developed for the integration of runtime monitoring of cloud‐based data centres with runtime optimisation techniques. The CACTOS project develops an integrated solution for runtime monitoring, optimisation and predictive analysis of data centres. The solution supports data centre providers in managing and planning data centres. CACTOS consists of two toolkits: • The CACTOS Runtime Toolkit enables automated resource planning and optimisation for IaaS data centres. • The CACTOS Prediction Toolkit supports what‐if analyses for existing or planned data centre topologies that account for effects caused by automated resource optimisation. While the focus of this document is to describe the integration methodology developed to couple runtime monitoring and optimisation for cloud data centres in the CACTOS Runtime Toolkit, the outlined methodology was developed to facilitate the integration across all toolkits developed in CACTOS. Hence, the integration of optimisation and monitoring with the simulative predictions in the CACTOS Prediction Toolkit is also discussed. The main contributions of this deliverable are the CACTOS Cloud Infrastructure Models that define the common language through which the runtime analytics tool, CactoScale, and the optimisation tool, CactoOpt, exchange information on the data centre’s structure and operational state. The models allow for the capturing of the deployment of Virtual Machines (VMs) on the middleware used in cloud data centres. Additionally, they track measurements and metrics that reflect the operational efficiency of the data centre. Instances of the CACTOS Cloud Infrastructure Models are constructed and maintained by CactoScale. CactoOpt uses the captured models as input for its optimisations. This document gives an overview on the developed models and how they are utilised in the context of a holistic integration process. It relates to other deliverables by integrating the information on CactoScale’s runtime monitoring (D4.2 Preliminary offline trace analysis), CactoOpt’s topology optimisation algorithms (D3.1 Prototype Optimization Model) and the simulative what‐if analyses of CactoSim (D6.1 CactoSim Simulation Framework Initial Prototype) for data centres. Furthermore, recent (D5.2.1 CACTOS Toolkit Version 1) and planned releases of the CACTOS toolkit (D5.2.2 CACTOS Toolkit Version 2) and the licensing models proposed for the individual CACTOS tools are outlined. The feature scope and integration of these features has served as the foundation for the requirements analysis of the developed integration methodology. The current iteration of the CACTOS Cloud Infrastructure Models capture all essential characteristics required to support an integration of current and planned features in all toolkits. Future iterations will improve the usability of the developed models and extend them to address newly identified requirements. In addition, this document describes the development process of the toolkits and the infrastructure used throughout the CACTOS project. The document discusses the setup of CACTOS’ development and build infrastructure and sketches the chosen architecture for the infrastructure. A holistic development process for both CACTOS Runtime Toolkit and the CACTOS Prediction Toolkit was chosen in order to facilitate early as well as Continuous Integration throughout and beyond the project’s life cycle. The build infrastructure was set up following the principle of Continuous Integration and allows for continued development and integration of all tools developed in the CACTOS projects, as well as the tools that they build upon. Finally, the document discusses different licensing models for the release of both toolkits. In line with the effort to keep the results of the CACTOS project open for further development and use by the Open Source community, this document proposes to release all major project contributions under the Eclipse Public License Version 1.
- Published
- 2017
42. Evaluation methodology for the CACTOS runtime and prediction toolkits: project deliverable D5.4
- Author
-
Stier, Christian, Groenda, Henning, Whigham, Darren, Bharbuiya, Sakil, Papazachos, Zafeirios, Hauser, Christopher, Krzywda, Jakub, and Östberg, Per-Olov
- Subjects
Toolkit ,Datenmanagement ,020206 networking & telecommunications ,02 engineering and technology ,Cloud Computing ,Data management ,Context-aware cloud topology ,Runtime ,Electric network topology ,Cloud services ,0202 electrical engineering, electronic engineering, information engineering ,Tooling ,Evaluation methodology ,020201 artificial intelligence & image processing ,Optimisation ,DDC 004 / Data processing & computer science ,ddc:004 ,Prediction ,Cloud ,Simulation - Abstract
Infrastructure as a Service (IaaS) cloud data centres enable customers to run arbitrary software systems on virtualised infrastructure. In contrast to Software or Platform as a Service approaches, customers do not need to adapt the design of their applications to be cloud-compatible. At the same time, they can benefit from easy scalability and pay-as-you-go models. Customers do not pay for dedicated physical machines. Rather, they are able to request Virtual Machines (VM) with varying characteristics, such as processing speed or memory size. Data centre providers can assign the VMs of multiple customers within their data centre to physical machines. If the VMs are deployed in a manner where the Quality of Service (QoS) of all customers is upheld, the data centre provider benefits from drastically larger economy of scale when compared to traditional one-customer-per-server hosting. The efficient utilisation of the underlying physical infrastructure including management and topology optimisation determines the costs and ultimately the business success for data centre operators. The CACTOS project develops an integrated solution for runtime monitoring, optimisation and prediction. The solution supports data centre providers in data centre management and planning. CACTOS consists of two toolkits: • The CACTOS Runtime Toolkit facilitates automated resource scheduling and optimisation for IaaS data centres. • The CACTOS Prediction Toolkit enables what-if analyses including effects caused by automated resource optimisation based on existing or planned data centre topologies. The CACTOS Runtime Toolkit collects data on a distributed data centre as input to scheduling and optimisation algorithms. Up-to-date load and topology measurements are essential for runtime monitoring, data collection and optimisation. The monitoring and data collection infrastructure introduces unavoidable load in the data centre. The benefit gained by using an automated monitoring and optimisation framework such as the CACTOS Runtime Toolkit strongly depends on the amount of this additional load. The CACTOS Prediction Toolkit requires resources to simulate the behaviour of a data centre. The size and complexity of the simulated data centre influences the feasibility of such a simulative analysis. If the simulative analysis takes a brief amount of time, the data centre planner can quickly account for the results of the simulation and adjust his plans accordingly. This document presents an evaluation methodology for the CACTOS Toolkits as established in (D5.1 Model Integration Method and Supporting Tooling) and (D5.2.1 CACTOS Toolkit Version 1). The evaluation focuses on performance and scalability of the CACTOS Runtime Toolkits. The evaluation approach is driven by the use-case specific requirements for the scientific computing use case of the University of Ulm (c.f. (D7.3.1 Validation Goals and Metrics), (D7.4.1 Validation and Result Analysis)) and Flexiant’s business analytics IaaS hosting use case. For an overview of the use cases, please refer to (D7.1 Scenario Requirements on Context-Aware Topology Optimisation and Simulation) and (D7.4.1 Validation and Result Analysis). The application of the evaluation methodology presented in this document will be outlined in (D5.5 Performance Evaluation of the CACTOS Toolkit on a Small Cloud Testbed). The use case brought into the project by PlayGen will be included in this evaluation. This document closely relates to the documents (D7.3.1 Validation Goals and Metrics) and (D7.4.1 Validation and Result Analysis). These two documents outline goals and results of a practical validation of the CACTOS Runtime Toolkit against the specific goals of each use case. Their focus is on an evaluation in small-scale testbeds and on use-case specific benefit analyses. This document outlines an evaluation methodology that is concerned with the applicability of CACTOS to different testbeds with respect to the performance of the CACTOS tools.
- Published
- 2017
43. Final optimization model: project deliverable D3.4
- Author
-
Ali-Eldin, Ahmed, Krzywda, Jakub, Lakew, Ewnetu Bayuh, Sedeghat, Mina, Domaschka, Jörg, and Östberg, Per-Olov
- Subjects
Context-aware cloud topology ,Electric network topology ,Cactos Projekt ,Cloud services ,Cloud computing ,Datenmanagement ,DDC 004 / Data processing & computer science ,ddc:004 ,Cloud ,Data management - Abstract
This deliverable describes the final version of the optimization model and algorithms implemented in CactoOpt. The model and algorithms include description of the implemented autoscaling algorithms, their integration with the CACTOS toolkits, and related performance results. In addition, the document describes research results obtained within CACTOS. There are five optimization capabilities of CactoOpt that can be performed on the logical (software) level of data center management: initial placement of virtual machines, migration of virtual machines, shut down of physical machines for energy savings, horizontal scaling, and vertical scaling. Using these four actuators, CactoOpt optimizes the power, performance, and cost tradeoffs of applications running on CACTOS enabled datacenters. The four main actuators enable CactoOpt to optimize for a wide range of scenarios including consolidation, and load balancing. This document elaborates the advances within CactoOpt since D3.3. This includes the improvements in the optimization models, and a thorough description of the new vertical scaling algorithms, horizontal scaling algorithms, fault-tolerant scheduling algorithms, and power capping and management, as well as, the interplay of all optimization capabilities.
- Published
- 2017
44. CactoSim simulation framework initial prototype: project deliverable D6.1
- Author
-
Svorobej, Sergej, Byrne, James, Byrne, Peter J., Groenda, Henning, Stier, Christian, Domaschka, Jörg, Wesner, Stefan, Krzywda, Jakub, and Östberg, Per-Olov
- Subjects
Framework ,Datenmanagement ,Prototype ,Data management ,Context-aware cloud topology ,Electric network topology ,Cloud services ,Cloud computing ,Optimisation ,DDC 004 / Data processing & computer science ,ddc:004 ,Cloud ,Simulation - Abstract
This deliverable provides supporting documentation for the official deliverable D6.1, the initial release of the CactoSim simulation framework. It presents the reader with the scope of the deliverable, initial requirements and architectural design for CactoSim. Updated requirements are given, and the foundations that CactoSim are built upon are described. A description of the architecture of the CactoSim V1.0 release is given, and this leads into a description of the graphical user interface by which users can interact with the tool. Provisioning is described, as well as licensing information. Finally, a feature description is given for CactoSim, with an overview of planned future releases also described.
- Published
- 2017
45. Preliminary results from optimisation models validation and experimentation: project deliverable D6.2
- Author
-
Svorobej, Sergej, Byrne, James, Castañé, Gabriel González, Krzywda, Jakub, Groenda, Henning, Stier, Christian, Domaschka, Jörg, Ahir, Mayur, Byrne, Peter J., and Östberg, Per-Olov
- Subjects
Analytics ,Datenmanagement ,Cloud Computing ,Data management ,Context-aware cloud topology ,Electric network topology ,Validation ,Cloud services ,Optimisation ,ddc:004 ,DDC 004 / Data processing & computer science ,Cloud ,Experimentation ,Simulation ,Model - Abstract
Since the arrival of cloud computing, a significant amount of research has been and continues to be carried out towards the creation of efficient optimisation strategies for meeting certain optimisation goals such as energy efficiency, resource consolidation or performance improvement within virtualised data centres. However, investigating whether specific optimisation algorithms can achieve the desired function in a production environment, and investigating how well they operate are quite complex tasks. Untested optimisation rules typically cannot be directly deployed in the production system, instead requiring manual test-bed experiments. This technique can be prohibitively costly, time consuming and cannot always account for scale and other constraints. This work presents a design-time optimisation evaluation solution based on discrete event simulation for cloud computing. By using a simulation toolkit (CactoSim) coupled with a runtime optimisation toolkit (CactoOpt), a cloud architect is able to create a direct replica model of the data centre production environment and then run simulations which take into account optimisation strategies. Results produced by such simulations can be used to estimate the optimisation algorithm performance under various conditions. In order to test the CactoSim and CactoOpt integration concept, a validation process has been performed on two different scenarios. The first scenario investigates the VM placement algorithm performance within a simulated testbed when admitting new VMs into the system. The second scenario analyses consolidation optimisation strategy impact on resource utilisation, with the objective being to free up nodes towards the goal of energy saving. This deliverable represents the initial part of two iterative pieces of work.
- Published
- 2017
- Full Text
- View/download PDF
46. Predictive cloud application model: project deliverable D3.2
- Author
-
Ali-Eldin, Ahmed, Östberg, Per-Olov, Krzywda, Jakub, Hauser, Christopher, Domaschka, Jörg, and Groenda, Henning
- Subjects
020208 electrical & electronic engineering ,Datenmanagement ,02 engineering and technology ,Prediction models ,Context-aware cloud topology ,Cactos Projekt ,Cloud services ,Electronic network topology ,0202 electrical engineering, electronic engineering, information engineering ,Cloud computing ,ddc:004 ,DDC 004 / Data processing & computer science ,Cloud - Abstract
This document outlines a framework for the cloud workload and application models used in CactoOpt, the CACTOS infrastructure optimisation tool, and presents initial prototypes for cloud application behaviour models. The purpose of this deliverable is to demonstrate some of the prediction models built for different cloud workloads, and illustrate how they are integrated with the application and component models used in infrastructure and workload deployment optimization. For prediction modelling we give special focus to cloud application user behaviour modelling, including, e.g., workload burstiness and request arrival pattern modelling. To place this work in context, we also present a framework for application and infrastructure modelling focused on translation of workload and application behaviour to infrastructure load.
- Published
- 2017
- Full Text
- View/download PDF
47. CACTOS toolkit version 1: project deliverable D5.2.1
- Author
-
Groenda, Henning, Stier, Christian, Krzywda, Jakub, Byrne, James, Svorobej, Sergej, Papazachos, Zafeirios, Sheridan, Craig, Whigham, Darren, and Östberg, Per-Olov
- Subjects
Analytics ,Toolkit ,Datenmanagement ,Data management ,Context-aware cloud topology ,Electric network topology ,Cloud services ,Cloud computing ,Optimisation ,ddc:004 ,DDC 004 / Data processing & computer science ,Cloud ,Simulation - Abstract
In Infrastructure as a Service (IaaS) cloud data centres, customers can run their software on the virtualized infrastructure of a data centre. They benefit from easy scalability and pay-as-you-go payment models and are able to request Virtual Machines (VM) with varying properties, such as processing speed or memory size. Data centre providers benefit from consolidation and economy of scales effects if several VMs are deployed on the same physical resources without Quality of Service (QoS) conflicts, e.g. because VMs often idle and rarely use all available resources. The efficient utilisation of the underlying physical infrastructure including management and topology optimisation determines the costs and ultimately the business success for data centre operators. The CACTOS project develops an integrated solution for runtime monitoring, optimisation and predictions. The solution supports data centre providers in managing and planning data centres. CACTOS consists of two toolkits: • The CACTOS Runtime Toolkit facilitates automated resource planning and optimisation for Infrastructure as a Service (IaaS) data centres. • The CACTOS Prediction Toolkit enables what-if analyses including effects caused by automated resource optimisation based on existing or planned data centre topologies. This document provides an overview on both toolkits and their interactions in the completed first iteration step. The focus of this deliverable is on describing the CACTOS Runtime Toolkit but was extended to give a holistic view and cover the CACTOS Prediction Toolkit as well. The CACTOS Runtime toolkit consists of independent tools for cloud infrastructure analytics and optimisation. This document describes the purpose and features of the tools as well as utilised base technology and provided interfaces. The analytics-oriented tool, CactoScale, provides already an automated extraction of central infrastructure information and monitoring of a running data centre. The optimisation-oriented tool, CactoOpt, can perform optimisation operations on the basis of the extracted information. However, the execution of optimisation operations on cloud middleware requires manual effort. This document describes the provisioning of both toolkits within a data centre and enables testing and running the approach in an own data centre. Exemplary use cases show the applicability and how important tasks are realized in the toolkit. This document delineates how the tools that were developed as part of the individual deliverables for CactoOpt (D3.1 Prototype Optimization Model), for CactoScale (D4.1 Data Collection Framework) and CactoSim (D6.1 CactoSim Simulation Framework Initial Prototype) are integrated into the CACTOS Runtime Toolkit and the CACTOS Prediction Toolkit. Based upon the integration implementation presented in this document, (D5.1 Model Integration Method and Supporting Tooling) will outline the integration methodology that is applied in the toolkits discussed by this deliverable. Exemplary use cases presented in this document were motivated by one of the CACTOS testbeds that is outlined in (D7.2.1 Physical Testbed). The current version of the CACTOS Runtime Toolkit requires some manual interaction with the data centre operator to realise the optimisations. As part of the deployment of the CACTOS Runtime Toolkit in a small-scale testbed (D5.3 Operational Small Scale Cloud Testbed Managed by the CACTOS Toolkit) the integration will be fully automated. In line with the effort of promoting and enabling continued development of the CACTOS toolkits, we released both toolkits under the licensing terms of the Eclipse Public License Version 1. Deliverable (D5.1 Model Integration Method and Supporting Tooling) will contain an in-depth evaluation of different licensing models and the rationale for opting with for the proposed licensing model.
- Published
- 2017
- Full Text
- View/download PDF
48. Prototype optimisation model: project deliverable D3.1
- Author
-
Krzywda, Jakub, Ali-Eldin, Ahmed, Östberg, Per-Olov, Groenda, Henning, and Stier, Christian
- Subjects
Analytics ,Datenmanagement ,Cloud Computing ,Prototype ,Data management ,Context-aware cloud topology ,Cactos Projekt ,Cloud services ,Electronic network topology ,Optimisation ,ddc:004 ,DDC 004 / Data processing & computer science ,Cloud ,Simulation - Abstract
This deliverable outlines a first prototype version of the optimisation model used in CactoOpt, the CACTOS infrastructure optimisation tool. The purpose of this deliverable is to demonstrate interfacing of the optimisation model with preliminary characterization templates describing workloads and infrastructures, i.e. model representations of cloud application workloads (describing virtual machine deployment, configuration, and load) and the data centre (hardware) resources they are executed on. This deliverable additionally shows the integration of CactoOpt in the overall architecture of CACTOS and discusses use cases and optimisation capabilities supported by the first CactoOpt prototype. CactoOpt is designed using a sensor-actuator model where the optimisation engine’s view of the surrounding world is captured in a set of infrastructure topology and load models (sensors) and the actions the optimisation engine can use to affect data centre resources (actuators) are represented using an optimisation plan language (describing a set of infrastructure actions recommended to optimise data centre layout and operation). The model does not assume that all recommended optimisation actions are immediately taken, but rather views these as a set of recommendations the optimizer gives to an external party (e.g., a virtualisation middleware integration implementation or a systems administrator) as part of a greater optimisation plan. This document describes CACTOS deliverable D3.1 – a prototype optimisation model designed for use in CactoOpt. As described in this document, the CactoOpt tool is one of the three main tools in the CACTOS toolkit and this document is related primarily to two other CACTOS year 1 deliverables: CACTOS deliverable D5.2.1 - CACTOS Toolkit Version 1 (2014) that describes the overall design and architecture of the toolkit, and CACTOS deliverable D5.1 - Model Integration and Supporting Tooling (2014) that details the construction of infrastructure topology and workload models and the integration of the different tools in the toolkit based on these models.
- Published
- 2017
- Full Text
- View/download PDF
49. Extended optimization model: project deliverable D3.3
- Author
-
Krzywda, Jakub, Rezaie, Ali, Papazachos, Zafeirios, Hamilton-Bryce, Ryan, Östberg, Per-Olov, Ali-Eldin, Ahmed, McCollum, Barry, and Domaschka, Jörg
- Subjects
Analytics ,Datenmanagement ,Data management ,Context-aware cloud topology ,Application model ,Electric network topology ,Cactos Projekt ,Cloud computing ,Optimisation ,ddc:004 ,DDC 004 / Data processing & computer science ,Cloud ,Simulation - Abstract
This deliverable describes an enhanced version of the optimization model that features predictive capabilities. The purpose of this deliverable is to demonstrate how the enhanced model and advanced optimization algorithms support the optimization of a data center configuration. Predictive optimization capabilities of CactoOpt mainly support three optimization activities that can be performed on the logical (software) level of data center management: initial placement of virtual machines, migration of virtual machines, and vertical scaling. To deliver against these capabilities two software components were implemented: Workload Analysis and Classification Tool (WAC) and Application Behaviour Predictor. WAC is a tool that enables a cloud provider to deploy multiple auto-scaling algorithms suitable for different workload types. The tool assigns a workload to an auto-scaler based on the type of the workload, i.e., some auto-scalers can be better for bursty workloads while other auto-scalers can be better for workloads with strong patterns. The application behavior predictor is a tool that utilizes the knowledge about how the workload and the dynamics of the applications changes over time to predict the future state of the application for optimization purposes, e.g., how long will a task run before terminating on a given hardware configuration.
- Published
- 2017
- Full Text
- View/download PDF
50. Machine Learning Methods for Reliable Resource Provisioning in Edge-Cloud Computing: A Survey.
- Author
-
THANG LE DUC, LEIVA, RAFAEL GARCÍA, CASARI, PAOLO, and ÖSTBERG, PER-OLOV
- Abstract
Large-scale software systems are currently designed as distributed entities and deployed in cloud data centers. To overcome the limitations inherent to this type of deployment, applications are increasingly being supplemented with components instantiated closer to the edges of networks--a paradigm known as edge computing. The problem of how to efficiently orchestrate combined edge-cloud applications is, however, incompletely understood, and a wide range of techniques for resource and application management are currently in use. This article investigates the problem of reliable resource provisioning in joint edge-cloud environments, and surveys technologies, mechanisms, and methods that can be used to improve the reliability of distributed applications in diverse and heterogeneous network environments. Due to the complexity of the problem, special emphasis is placed on solutions to the characterization, management, and control of complex distributed applications using machine learning approaches. The survey is structured around a decomposition of the reliable resource provisioning problem into three categories of techniques: workload characterization and prediction, component placement and system consolidation, and application elasticity and remediation. Survey results are presented along with a problem-oriented discussion of the state-of-the-art. A summary of identified challenges and an outline of future research directions are presented to conclude the article. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.