Back to Search
Start Over
An integrated system for building enterprise taxonomies.
- Source :
-
Information Retrieval Journal . Oct2007, Vol. 10 Issue 4/5, p365-391. 27p. 1 Color Photograph, 1 Diagram, 9 Charts, 1 Graph. - Publication Year :
- 2007
-
Abstract
- Although considerable research has been conducted in the field of hierarchical text categorization, little has been done on automatically collecting labeled corpus for building hierarchical taxonomies. In this paper, we propose an automatic method of collecting training samples to build hierarchical taxonomies. In our method, the category node is initially defined by some keywords, the web search engine is then used to construct a small set of labeled documents, and a topic tracking algorithm with keyword-based content normalization is applied to enlarge the training corpus on the basis of the seed documents. We also design a method to check the consistency of the collected corpus. The above steps produce a flat category structure which contains all the categories for building the hierarchical taxonomy. Next, linear discriminant projection approach is utilized to construct more meaningful intermediate levels of hierarchies in the generated flat set of categories. Experimental results show that the training corpus is good enough for statistical classification methods. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 13864564
- Volume :
- 10
- Issue :
- 4/5
- Database :
- Academic Search Index
- Journal :
- Information Retrieval Journal
- Publication Type :
- Academic Journal
- Accession number :
- 29360506
- Full Text :
- https://doi.org/10.1007/s10791-007-9028-6