Back to Search Start Over

Weighting tags and paths in XML documents according to their topic generalization.

Authors :
Liu, Dexi
Wan, Changxuan
Chen, Lei
Liu, Xiping
Nie, Jian-Yun
Source :
Information Sciences. Nov2013, Vol. 249, p48-66. 19p.
Publication Year :
2013

Abstract

Abstract: Text-centric (or document-centric) XML document retrieval aims to rank search results according to their relevance to a given query. To do this, most existing methods mainly rely on content terms and often ignore an important factor – the XML tags and paths, which are useful in determining the important contents of a document. In some previous studies, each unique tag/path is assigned a weight based on domain (expert) knowledge. However, such a manual assignment is both inefficient and subjective. In this paper, we propose an automatic method to infer the weights of tags/paths according to the topical relationship between the corresponding elements and the whole documents. The more the corresponding element can generalize the document’s topic, the more the tag/path is considered to be important. We define a model based on Average Topic Generalization (ATG), which integrates several features used in previous studies. We evaluate the performance of the ATG-based model on two real data sets, the IEEECS collection and the Wikipedia collection, from two different perspectives: the correlation between the weights generated by ATG and those set by experts, and the performance of XML retrieval based on ATG. Experimental results show that the tag/path weights generated by ATG are highly correlated with the manually assigned weights, and the ATG model significantly improves XML retrieval effectiveness. [Copyright &y& Elsevier]

Details

Language :
English
ISSN :
00200255
Volume :
249
Database :
Academic Search Index
Journal :
Information Sciences
Publication Type :
Periodical
Accession number :
90147785
Full Text :
https://doi.org/10.1016/j.ins.2013.06.019