Start Over

Fine-Grained Modeling and Optimization for Intelligent Resource Management in Big Data Processing

Authors :: Chenghao Lyu
Qi Fan
Fei Song
Arnab Sinha
Yanlei Diao
Wei Chen
Li Ma
Yihui Feng
Yaliang Li
Kai Zeng
Jingren Zhou
University of Massachusetts [Amherst] (UMass Amherst)
University of Massachusetts System (UMASS)
Rich Data Analytics at Cloud Scale (CEDAR)
Laboratoire d'informatique de l'École polytechnique [Palaiseau] (LIX)
École polytechnique (X)-Centre National de la Recherche Scientifique (CNRS)-École polytechnique (X)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
École polytechnique (X)
Alibaba Group [Hangzhou]
Source :: VLDB 2022-48th International Conference on Very Large Databases, VLDB 2022-48th International Conference on Very Large Databases, Sep 2022, Sydney, Australia
Publication Year :: 2022
Abstract: International audience; Big data processing at the production scale presents a highly complex environment for resource optimization (RO), a problem crucial for meeting performance goals and budgetary constraints of analytical users. The RO problem is challenging because it involves a set of decisions (the partition count, placement of parallel instances on machines, and resource allocation to each instance), requires multi-objective optimization (MOO), and is compounded by the scale and complexity of big data systems while having to meet stringent time constraints for scheduling. This paper presents a MaxCompute based integrated system to support multi-objective resource optimization via ne-grained instance-level modeling and optimization. We propose a new architecture that breaks RO into a series of simpler problems, new ne-grained predictive models, and novel optimization methods that exploit these models to make effective instance-level RO decisions well under a second. Evaluation using production workloads shows that our new RO system could reduce 37-72% latency and 43-78% cost at the same time, compared to the current optimizer and scheduler, while running in 0.02-0.23s.

Subjects :: FOS: Computer and information sciences
Computer Science - Databases
Computer Science - Distributed, Parallel, and Cluster Computing
General Engineering
[SCCO.COMP]Cognitive science/Computer science
Databases (cs.DB)
Distributed, Parallel, and Cluster Computing (cs.DC)

Details

Language :: English
Database :: OpenAIRE
Journal :: VLDB 2022-48th International Conference on Very Large Databases, VLDB 2022-48th International Conference on Very Large Databases, Sep 2022, Sydney, Australia
Accession number :: edsair.doi.dedup.....3dbc5b8804cff82c864378b1143294c1

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Fine-Grained Modeling and Optimization for Intelligent Resource Management in Big Data Processing

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Fine-Grained Modeling and Optimization for Intelligent Resource Management in Big Data Processing

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources