Back to Search Start Over

Performance characterization of multi-container deployment schemes for online learning inference

Authors :
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
Barcelona Supercomputing Center
Universitat Politècnica de Catalunya. CROMAI - Computing Resources Orchestration and Management for AI
Liu, Peini
Guitart Fernández, Jordi
Taherkordi, Amir
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
Barcelona Supercomputing Center
Universitat Politècnica de Catalunya. CROMAI - Computing Resources Orchestration and Management for AI
Liu, Peini
Guitart Fernández, Jordi
Taherkordi, Amir
Publication Year :
2023

Abstract

Online machine learning (ML) inference services provide users with an interactive way to request for predictions in realtime. To meet the notable computational requirements of such services, they are increasingly being deployed in the Cloud. In this context, the efficient provisioning and optimization of ML inference services in the Cloud is critical to achieve the required performance and meet the dynamic queries by end-users. Existing provisioning solutions focus on framework parameter tuning and infrastructure resources scaling, without considering deployments based on containerization technologies. The latter promises reproducibility and portability features for ML inferences services. There is limited knowledge about the impact of distinct deployment schemes at the container-level on the performance of online ML inference services, particularly on how to exploit multi-container deployments and its relation with processor and memory affinity. In light of this, in this paper we investigate experimentally the containerization of ML inference services and analyze the performance of multi-container deployments that partition the threads belonging to an online learning application into multiple containers in each node. This paper shares the findings and lessons learned from conducting realistic client patterns on an image classification model across numerous deployment configurations, especially including the impact of container granularity and its potential to exploit processor and memory affinity. Our results indicate that fine-grained multi-container deployments and affinity are useful for improving performance (both throughput and latency). In particular, our experiments on single-node and four-node clusters show up to 69% and 87% performance improvement compared to the single-container deployment, respectively.<br />This work was partially supported by Lenovo as part of Lenovo-BSC collaboration agreement, by the Spanish Government under contract PID2019-107255GB-C22, and by the Generalitat de Catalunya under contract 2021-SGR-00478 and under grant 2020 FI-B 00257.<br />Peer Reviewed<br />Postprint (author's final draft)

Details

Database :
OAIster
Notes :
11 p., application/pdf, English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1409474281
Document Type :
Electronic Resource