AnchiBERT: A Pre-Trained Model for Ancient Chinese Language Understanding and Generation

Authors :: Dayiheng Liu
Jiancheng Lv
Huishuang Tian
Kexin Yang
Source :: IJCNN
Publication Year :: 2021
Publisher :: IEEE, 2021.
Abstract: Ancient Chinese is the essence of Chinese culture. There are several natural language processing tasks of ancient Chinese domain, such as ancient-modern Chinese translation, poem generation, and couplet generation. Previous studies usually use the supervised models which deeply rely on parallel data. However, it is difficult to obtain large-scale parallel data of ancient Chinese. In order to make full use of the more easily available monolingual ancient Chinese corpora, we release An-chiBERT, a pre-trained language model based on the architecture of BERT, which is trained on large-scale ancient Chinese corpora. We evaluate AnchiBERT on both language understanding and generation tasks, including poem classification, ancient-modern Chinese translation, poem generation, and couplet generation. The experimental results show that AnchiBERT outperforms BERT as well as the non-pretrained models and achieves state-of - the-art results in all cases.

Subjects :: Vocabulary
Poetry
business.industry
Computer science
media_common.quotation_subject
computer.software_genre
Chinese culture
Data modeling
Task analysis
Artificial intelligence
Couplet
Language model
Architecture
business
computer
Natural language processing
media_common

Full Text Access

Tools