1. Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native
- Author
-
Lu, Yao, Bian, Song, Chen, Lequn, He, Yongjun, Hui, Yulong, Lentz, Matthew, Li, Beibin, Liu, Fei, Li, Jialin, Liu, Qi, Liu, Rui, Liu, Xiaoxuan, Ma, Lin, Rong, Kexin, Wang, Jianguo, Wu, Yingjun, Wu, Yongji, Zhang, Huanchen, Zhang, Minjia, Zhang, Qizhen, Zhou, Tianyi, and Zhuo, Danyang
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Machine Learning - Abstract
In this paper, we investigate the intersection of large generative AI models and cloud-native computing architectures. Recent large models such as ChatGPT, while revolutionary in their capabilities, face challenges like escalating costs and demand for high-end GPUs. Drawing analogies between large-model-as-a-service (LMaaS) and cloud database-as-a-service (DBaaS), we describe an AI-native computing paradigm that harnesses the power of both cloud-native technologies (e.g., multi-tenancy and serverless computing) and advanced machine learning runtime (e.g., batched LoRA inference). These joint efforts aim to optimize costs-of-goods-sold (COGS) and improve resource accessibility. The journey of merging these two domains is just at the beginning and we hope to stimulate future research and development in this area.
- Published
- 2024