Back to Search Start Over

Ranked document retrieval for multiple patterns.

Authors :
Biswas, Sudip
Ganguly, Arnab
Shah, Rahul
Thankachan, Sharma V.
Source :
Theoretical Computer Science. Oct2018, Vol. 746, p98-111. 14p.
Publication Year :
2018

Abstract

Abstract Let D = { T 1 , T 2 , … , T D } be a collection of D documents having n characters in total. Given two patterns P and Q , and an integer k > 0 , we consider the following queries. • top - k forbidden pattern query: Among all documents containing P , but not Q , report the k documents most relevant to P. • top - k two pattern query: Among all documents that contain both P and Q , report the k documents most relevant to P. For the above two queries, we provide a linear space index with O (| P | + | Q | + n k) query time, under some standard relevance functions such as PageRank and TermFrequency. The document listing version of the above two problems asks to report all t documents that either contain P , but not Q , or contain both P and Q , depending on the query type. As a corollary of the top- k result, we obtain a linear space and O (| P | + | Q | + n t) query time solution for the document listing problems. We conjecture that any significant improvement over these results is highly unlikely. We also consider the scenario when the query consists of more than two patterns. Finally, we present space-efficient indexes for these problems. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
03043975
Volume :
746
Database :
Academic Search Index
Journal :
Theoretical Computer Science
Publication Type :
Academic Journal
Accession number :
131658368
Full Text :
https://doi.org/10.1016/j.tcs.2018.06.029