Back to Search Start Over

深度学习平台体系架构及其关键技术.

Authors :
束柬
陈剑波
Source :
Application Research of Computers / Jisuanji Yingyong Yanjiu. Nov2023, Vol. 40 Issue 11, p3353-3357. 5p.
Publication Year :
2023

Abstract

In view of AI model production and training, the traditional script based physical server or cluster mode has problems such as training inference separated, insufficient resource utilization, difficult migration of computing environment, and lengthy training process. This paper proposed a platform architecture for deep learning model training, the architecture divided into four layers: data platform layer, computing platform layer, training suite layer, and management platform layer. Firstly, it proposed an integrated framework masks differences in network structures and optimized the graphs. Secondly, it researched an adaptive resource matching mechanism based on GPU state reduced communication costs. At the same time, it improved re- source utilization by providing a heuristic algorithm based label matching scheduling algorithm. Moreover, the establishment of tenant management and disaster recovery mechanisms ensured the safety and reliability of the system platform. Finally, it established the validation of the usability, safety, reliability, and scalability through the simulation platform. Through the construction of deep learning platform, it will accelerate the implementation of AI production and promote the prosperity and development of AI technology and ecology. [ABSTRACT FROM AUTHOR]

Details

Language :
Chinese
ISSN :
10013695
Volume :
40
Issue :
11
Database :
Academic Search Index
Journal :
Application Research of Computers / Jisuanji Yingyong Yanjiu
Publication Type :
Academic Journal
Accession number :
173767859
Full Text :
https://doi.org/10.19734/j.issn.1001-3695.2023.03.0111