1. Automatic Navbox Generation by Interpretable Clustering over Linked Entities
- Author
-
Hanghang Tong, Chenhao Xie, Wei Wang, Lihan Chen, Haixun Wang, Yanghua Xiao, Jiaqing Liang, and Kezun Zhang
- Subjects
Brown clustering ,Information retrieval ,Computer science ,Conceptual clustering ,02 engineering and technology ,computer.software_genre ,Data science ,Information extraction ,Knowledge extraction ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Table (database) ,020201 artificial intelligence & image processing ,Cluster analysis ,computer ,Natural language - Abstract
Rare efforts have been devoted to generating the structured Navigation Box (Navbox) for Wikipedia articles. A Navbox is a table in Wikipedia article page that provides a consistent navigation system for related entities. Navbox is critical for the readership and editing efficiency of Wikipedia. In this paper, we target on the automatic generation of Navbox for Wikipedia articles. Instead of performing information extraction over unstructured natural language text directly, an alternative avenue is explored by focusing on a rich set of semi-structured data in Wikipedia articles: linked entities. The core idea of this paper is as follows: If we cluster the linked entities and interpret them appropriately, we can construct a high-quality Navbox for the article entity. We propose a clustering-then-labeling algorithm to realize the idea. Experiments show that the proposed solutions are effective. Ultimately, our approach enriches Wikipedia with 1.95 million new Navboxes of high quality.
- Published
- 2017
- Full Text
- View/download PDF