Back to Search
Start Over
A Two-Level Place Names Identification Based on the N-Shortest Path and CRFs
- Source :
- 2009 International Conference on Information Management, Innovation Management and Industrial Engineering.
- Publication Year :
- 2009
- Publisher :
- IEEE, 2009.
-
Abstract
- This paper presents a two-level place names identification method based on N-shortest path and Conditional Random Fields(CRFs) aiming at solving the low recall rate problem in Chinese place names identification. First, the rough segmentation method based on N-shortest path is used to improve the recall rate of Chinese place names identification at low level; Second, the result of rough segmentation is submitted to high level as one of the features of high-level place names identification. High level’ s CRFs model uses the feature which submitted by low level, single and complex features of place names words to tag the text. Adding the complex feature is conducive to mine the context information and improve accuracy rate of place names identification, and the result of text tagging could be combined with rules to identify place names finally. This two-level model ensures a high recall rate and improves the accuracy rate. During experiment, choose the mature corpora of People’ s Daily in January 1998 as training samples, which include 3128 place names(except duplicate names), and extract articles of People’ s Daily in 2003 randomly to carry out the test. Experimental results achieve a high recall rate, and this method is proved to be practical and effective.
- Subjects :
- Conditional random field
business.industry
Computer science
Feature extraction
Context (language use)
Pattern recognition
Toponymy
computer.software_genre
Shortest path problem
Feature (machine learning)
Identification (biology)
Artificial intelligence
business
CRFS
computer
Natural language processing
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2009 International Conference on Information Management, Innovation Management and Industrial Engineering
- Accession number :
- edsair.doi...........4474d159f41ff2fa05f90b74b005fa5f
- Full Text :
- https://doi.org/10.1109/iciii.2009.426