Back to Search
Start Over
中文分词模型词典融入方法比较.
- Source :
-
Application Research of Computers / Jisuanji Yingyong Yanjiu . Jan2019, Vol. 36 Issue 1, p8-17. 4p. - Publication Year :
- 2019
-
Abstract
- Currently the mainstream methods for Chinese word segmentation exploit statistical machine learning models. These methods usually require manual-annotated segmented sentences as training corpus, yet have neglected the annotated large-scale lexicon resources which have been built before, where these resources can be highly valuable when cross-domain evaluation is conducted, as the gold-standard sentence-level annotations arerare. Recently, the integration of lexicon formation into word segmentation models has gained increasing interest. As a whole, the integration methods can be classified into two categories:one being based on character-based models that cast word segmentation problem as sequence labeling, and the other being based on word-basedmodels that use beam-search to decode. This paper compared these two models, and combined them. Experimental results on benchmark datasetsshow that lexicon information can be more fully explored after combination, and finally the combined model can achieve better performances with both in-and cross-domain settings. [ABSTRACT FROM AUTHOR]
Details
- Language :
- Chinese
- ISSN :
- 10013695
- Volume :
- 36
- Issue :
- 1
- Database :
- Academic Search Index
- Journal :
- Application Research of Computers / Jisuanji Yingyong Yanjiu
- Publication Type :
- Academic Journal
- Accession number :
- 135502926
- Full Text :
- https://doi.org/10.19734/j.issn.1001-3695.2017.05.0643