Back to Search Start Over

An integrated system for building enterprise taxonomies.

Authors :
Li Zang
Tao Li
ShiXia Liu
Yue Pan
Source :
Information Retrieval Journal. Oct2007, Vol. 10 Issue 4/5, p365-391. 27p. 1 Color Photograph, 1 Diagram, 9 Charts, 1 Graph.
Publication Year :
2007

Abstract

Although considerable research has been conducted in the field of hierarchical text categorization, little has been done on automatically collecting labeled corpus for building hierarchical taxonomies. In this paper, we propose an automatic method of collecting training samples to build hierarchical taxonomies. In our method, the category node is initially defined by some keywords, the web search engine is then used to construct a small set of labeled documents, and a topic tracking algorithm with keyword-based content normalization is applied to enlarge the training corpus on the basis of the seed documents. We also design a method to check the consistency of the collected corpus. The above steps produce a flat category structure which contains all the categories for building the hierarchical taxonomy. Next, linear discriminant projection approach is utilized to construct more meaningful intermediate levels of hierarchies in the generated flat set of categories. Experimental results show that the training corpus is good enough for statistical classification methods. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13864564
Volume :
10
Issue :
4/5
Database :
Academic Search Index
Journal :
Information Retrieval Journal
Publication Type :
Academic Journal
Accession number :
29360506
Full Text :
https://doi.org/10.1007/s10791-007-9028-6