Start Over

PanGu-$\alpha$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation

Authors :: Zeng, Wei
Ren, Xiaozhe
Su, Teng
Wang, Hui
Liao, Yi
Wang, Zhiwei
Jiang, Xin
Yang, ZhenZhang
Wang, Kaisheng
Zhang, Xiaoda
Li, Chen
Gong, Ziyan
Yao, Yifan
Huang, Xinjing
Wang, Jun
Yu, Jianfeng
Guo, Qi
Yu, Yue
Zhang, Yan
Wang, Jin
Tao, Hengtao
Yan, Dasen
Yi, Zexuan
Peng, Fang
Jiang, Fangqing
Zhang, Han
Deng, Lingfeng
Zhang, Yehong
Lin, Zhe
Zhang, Chao
Zhang, Shaojie
Guo, Mingyue
Gu, Shanzhi
Fan, Gaojun
Wang, Yaowei
Jin, Xuefeng
Liu, Qun
Tian, Yonghong
Publication Year :: 2021
Abstract: Large-scale Pretrained Language Models (PLMs) have become the new paradigm for Natural Language Processing (NLP). PLMs with hundreds of billions parameters such as GPT-3 have demonstrated strong performances on natural language understanding and generation with \textit{few-shot in-context} learning. In this work, we present our practice on training large-scale autoregressive language models named PanGu-$\alpha$, with up to 200 billion parameters. PanGu-$\alpha$ is developed under the MindSpore and trained on a cluster of 2048 Ascend 910 AI processors. The training parallelism strategy is implemented based on MindSpore Auto-parallel, which composes five parallelism dimensions to scale the training task to 2048 processors efficiently, including data parallelism, op-level model parallelism, pipeline model parallelism, optimizer model parallelism and rematerialization. To enhance the generalization ability of PanGu-$\alpha$, we collect 1.1TB high-quality Chinese data from a wide range of domains to pretrain the model. We empirically test the generation ability of PanGu-$\alpha$ in various scenarios including text summarization, question answering, dialogue generation, etc. Moreover, we investigate the effect of model scales on the few-shot performances across a broad range of Chinese NLP tasks. The experimental results demonstrate the superior capabilities of PanGu-$\alpha$ in performing various tasks under few-shot or zero-shot settings.<br />Comment: The technique report for PanGu-$\alpha$

Subjects :: Computer Science - Computation and Language

Details

Database :: arXiv
Publication Type :: Report
Accession number :: edsarx.2104.12369
Document Type :: Working Paper

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

PanGu-$\alpha$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

PanGu-$\alpha$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources