1. A Novel Chinese Word Segmentation Method Utilizing Morphology Information
- Author
-
Xu Shuona and Zeng Bi-qing
- Subjects
Computer science ,business.industry ,Text segmentation ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Scale-space segmentation ,Context (language use) ,Pattern recognition ,Statistical model ,computer.software_genre ,Identification (information) ,ComputingMethodologies_PATTERNRECOGNITION ,Segmentation ,Artificial intelligence ,CRFS ,business ,computer ,Word (computer architecture) ,Natural language processing - Abstract
In this paper, we present a novel approach to integrate morphology information into the statistical model for CWS, which yields better accuracies than the traditional CRFs-based approach. The improvements are mainly attributed to two aspects. Firstly, the structure information within the words is integrated into the CRFs model by annotating the Chinese word corpus with morphology tags, which conveys the construction modes of Chinese words. Secondly, the training process adopts a joint CRFs model to integrate structure information with other context, which combine the morphology tag and word boundary in the same state level and complete the word segmentation and morphology tag identification complementarily. Experimental results show that the morphology information is of great use to word segmentation.
- Published
- 2012