1. SAIL: Structure-aware indexing for effective and progressive top-k keyword search over XML documents
- Author
-
Li, Guoliang, Li, Chen, Feng, Jianhua, and Zhou, Lizhu
- Subjects
- *
KEYWORD searching , *INDEXING , *XML (Extensible Markup Language) , *INFORMATION storage & retrieval systems , *QUERY (Information retrieval system) , *DATA structures , *TREE graphs , *RELEVANCE ranking (Information science) - Abstract
Abstract: Keyword search in XML documents has recently gained a lot of research attention. Given a keyword query, existing approaches first compute the lowest common ancestors (LCAs) or their variants of XML elements that contain the input keywords, and then identify the subtrees rooted at the LCAs as the answer. In this the paper we study how to use the rich structural relationships embedded in XML documents to facilitate the processing of keyword queries. We develop a novel method, called SAIL, to index such structural relationships for efficient XML keyword search. We propose the concept of minimal-cost trees to answer keyword queries and devise structure-aware indices to maintain the structural relationships for efficiently identifying the minimal-cost trees. For effectively and progressively identifying the top-k answers, we develop techniques using link-based relevance ranking and keyword-pair-based ranking. To reduce the index size, we incorporate a numbering scheme, namely schema-aware dewey code, into our structure-aware indices. Experimental results on real data sets show that our method outperforms state-of-the-art approaches significantly, in both answer quality and search efficiency. [Copyright &y& Elsevier]
- Published
- 2009
- Full Text
- View/download PDF