Back to Search Start Over

AnchiBERT: A Pre-Trained Model for Ancient Chinese Language Understanding and Generation

Authors :
Dayiheng Liu
Jiancheng Lv
Huishuang Tian
Kexin Yang
Source :
IJCNN
Publication Year :
2021
Publisher :
IEEE, 2021.

Abstract

Ancient Chinese is the essence of Chinese culture. There are several natural language processing tasks of ancient Chinese domain, such as ancient-modern Chinese translation, poem generation, and couplet generation. Previous studies usually use the supervised models which deeply rely on parallel data. However, it is difficult to obtain large-scale parallel data of ancient Chinese. In order to make full use of the more easily available monolingual ancient Chinese corpora, we release An-chiBERT, a pre-trained language model based on the architecture of BERT, which is trained on large-scale ancient Chinese corpora. We evaluate AnchiBERT on both language understanding and generation tasks, including poem classification, ancient-modern Chinese translation, poem generation, and couplet generation. The experimental results show that AnchiBERT outperforms BERT as well as the non-pretrained models and achieves state-of - the-art results in all cases.

Details

Database :
OpenAIRE
Journal :
2021 International Joint Conference on Neural Networks (IJCNN)
Accession number :
edsair.doi...........962b5b3423bff756bfff3e570cb5d85e
Full Text :
https://doi.org/10.1109/ijcnn52387.2021.9534342