Start Over

A switch method of model inference serving oriented to serverless computing.

Authors :: WEN Xin
ZENG Tao
LI Chun-bo
XU Zi-chen
Source :: Computer Engineering & Science / Jisuanji Gongcheng yu Kexue; Jul2024, Vol. 46 Issue 7, p1210-1217, 8p
Publication Year :: 2024
Abstract: The development of large-scale models has led to the widespread application of model inference services. Constructing a stable and reliable architectural support for model inference services has become a focus for cloud service providers. Serverless computing is a cloud service computing paradigm with fine-grained resource granularity and high abstraction level. It offers advantages such as on-demand billing and elastic scalability, which can effectively improve the computational efficiency of model inference services. However, the multi-stage nature of model inference service workflows makes it challenging for independent serverless computing frameworks to ensure optimal execution of each stage. Therefore, the key problem to be addressed is how to leverage the performance characteristics of different serverless computing frameworks to achieve online switching of model inference service workflows and reduce the overall execution time. This paper discusses the switching problem of model inference services on different serverless computing frameworks. Firstly, a pre-trained model is used to construct model inference service functions and derive the performance characteristics of heterogeneous serverless computing frameworks. Secondly, a machine learning technique is employed to build a binary classifica- tion model that combines the performance characteristics of heterogeneous serverless computing frameworks, enabling online switching of the model inference service framework. Finally, a testing platform is established to generate model inference service workflows and evaluate the performance of the online switching framework prototype. Preliminary experimental results indicate that compared with the independent serverless computing framework, the online switching framework prototype can reduce the execution time of model inference service workflows by up to 57%. [ABSTRACT FROM AUTHOR]