Back to Search
Start Over
Ranked document retrieval for multiple patterns.
- Source :
-
Theoretical Computer Science . Oct2018, Vol. 746, p98-111. 14p. - Publication Year :
- 2018
-
Abstract
- Abstract Let D = { T 1 , T 2 , … , T D } be a collection of D documents having n characters in total. Given two patterns P and Q , and an integer k > 0 , we consider the following queries. • top - k forbidden pattern query: Among all documents containing P , but not Q , report the k documents most relevant to P. • top - k two pattern query: Among all documents that contain both P and Q , report the k documents most relevant to P. For the above two queries, we provide a linear space index with O (| P | + | Q | + n k) query time, under some standard relevance functions such as PageRank and TermFrequency. The document listing version of the above two problems asks to report all t documents that either contain P , but not Q , or contain both P and Q , depending on the query type. As a corollary of the top- k result, we obtain a linear space and O (| P | + | Q | + n t) query time solution for the document listing problems. We conjecture that any significant improvement over these results is highly unlikely. We also consider the scenario when the query consists of more than two patterns. Finally, we present space-efficient indexes for these problems. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 03043975
- Volume :
- 746
- Database :
- Academic Search Index
- Journal :
- Theoretical Computer Science
- Publication Type :
- Academic Journal
- Accession number :
- 131658368
- Full Text :
- https://doi.org/10.1016/j.tcs.2018.06.029