Back to Search
Start Over
Weighting tags and paths in XML documents according to their topic generalization.
- Source :
-
Information Sciences . Nov2013, Vol. 249, p48-66. 19p. - Publication Year :
- 2013
-
Abstract
- Abstract: Text-centric (or document-centric) XML document retrieval aims to rank search results according to their relevance to a given query. To do this, most existing methods mainly rely on content terms and often ignore an important factor – the XML tags and paths, which are useful in determining the important contents of a document. In some previous studies, each unique tag/path is assigned a weight based on domain (expert) knowledge. However, such a manual assignment is both inefficient and subjective. In this paper, we propose an automatic method to infer the weights of tags/paths according to the topical relationship between the corresponding elements and the whole documents. The more the corresponding element can generalize the document’s topic, the more the tag/path is considered to be important. We define a model based on Average Topic Generalization (ATG), which integrates several features used in previous studies. We evaluate the performance of the ATG-based model on two real data sets, the IEEECS collection and the Wikipedia collection, from two different perspectives: the correlation between the weights generated by ATG and those set by experts, and the performance of XML retrieval based on ATG. Experimental results show that the tag/path weights generated by ATG are highly correlated with the manually assigned weights, and the ATG model significantly improves XML retrieval effectiveness. [Copyright &y& Elsevier]
Details
- Language :
- English
- ISSN :
- 00200255
- Volume :
- 249
- Database :
- Academic Search Index
- Journal :
- Information Sciences
- Publication Type :
- Periodical
- Accession number :
- 90147785
- Full Text :
- https://doi.org/10.1016/j.ins.2013.06.019