Back to Search
Start Over
Incorporating multi-kernel function and Internet verification for Chinese person name disambiguation
- Source :
- Frontiers of Computer Science. 10:1026-1038
- Publication Year :
- 2016
- Publisher :
- Springer Science and Business Media LLC, 2016.
-
Abstract
- The study on person name disambiguation aims to identify different entities with the same person name through document linking to different entities. The traditional disambiguation approach makes use of words in documents as features to distinguish different entities. Due to the lack of use of word order as a feature and the limited use of external knowledge, the traditional approach has performance limitations. This paper presents an approach for named entity disambiguation through entity linking based on a multikernel function and Internet verification to improve Chinese person name disambiguation. The proposed approach extends a linear kernel that uses in-document word features by adding a string kernel to construct a multi-kernel function. This multi-kernel can then calculate the similarities between an input document and the entity descriptions in a named person knowledge base to form a ranked list of candidates to different entities. Furthermore, Internet search results based on keywords extracted from the input document and entity descriptions in the knowledge base are used to train classifiers for verification. The evaluations on CIPS-SIGHAN 2012 person name disambiguation bakeoff dataset show that the use of word orders and Internet knowledge through a multi-kernel function can improve both precision and recall and our system has achieved state-of-the-art performance.
- Subjects :
- Information retrieval
General Computer Science
Computer science
business.industry
010102 general mathematics
01 natural sciences
Theoretical Computer Science
Feature (linguistics)
Entity linking
Knowledge base
String kernel
Kernel (statistics)
The Internet
0101 mathematics
Precision and recall
business
Word order
Subjects
Details
- ISSN :
- 20952236 and 20952228
- Volume :
- 10
- Database :
- OpenAIRE
- Journal :
- Frontiers of Computer Science
- Accession number :
- edsair.doi...........1b18adfa16bdc511b4cf475064e16716