1. Sequence Labeling of Chinese Text Based on Bidirectional Gru-Cnn-Crf Model
- Author
-
Xinyi Zou and Dil Iu
- Subjects
060201 languages & linguistics ,Conditional random field ,Word embedding ,Computer science ,business.industry ,Text segmentation ,Context (language use) ,06 humanities and the arts ,02 engineering and technology ,computer.software_genre ,Convolutional neural network ,Sequence labeling ,ComputingMethodologies_PATTERNRECOGNITION ,Named-entity recognition ,0602 languages and literature ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Hidden Markov model ,business ,computer ,Natural language processing - Abstract
Sequence labeling is the basis for many tasks in natural language processing (NLP). It plays an important role in tasks such as word segmentation, named entity recognition (NER), and part-of-speech (POS)tagging. The current mainstream method for sequence labeling is to combine neural network with conditional random field (CRF). The common model is usually a bidirectional RNN-CRF model, which can solve the problem that the labeling task with traditional method cannot be combined well with the context. This paper proposes a Chinese sequence labeling model based on bidirectional GRU-CNN-CRF, which can pay more attention to local features and context relationships, and has better performance in word segmentation and NER. This paper takes the corpus provided by Chinese Wikipedia as the training data set and preprocesses the text by word embedding. The data are then processed through a three-tier architecture of bidirectional Gated Recurrent Unit (GRU), Convolution Neural Network (CNN)and CRF, and finally complete the task of sequence annotation. Compared with the traditional Chinese word segmentation system, this method is more accurate. And it performs better than bidirectional GRU-CRF model on NER issues.
- Published
- 2018
- Full Text
- View/download PDF