Back to Search Start Over

Keyword Search over Dynamic Categorized Information

Authors :
Krithi Ramamritham
Manish A. Bhide
Venkatesan T. Chakaravarthy
Prasan Roy
Source :
ICDE
Publication Year :
2009
Publisher :
IEEE, 2009.

Abstract

Consider an information repository whose content is categorized. A data item (in the repository) can belong to multiple categories and new data is continuously added to the system. In this paper, we describe a system, CS*, which takes a keyword query and returns the relevant top-K categories. In contrast, traditional keyword search returns the top-K documents (i.e., data items) relevant to a user query. The need to dynamically categorize new data and also update the meta-data required for fast responses to user queries poses interesting challenges. The brute force approach of updating the meta-data by comparing each new data item with all the categories is impractical due to (i) the large cost involved in finding the categories associated with a data item and (ii) the high rate of arrival of new data items. We show that a sampling based approach which provides statistical guarantees on the reported results is also impracticable. We hence develop the CS* approach whose effectiveness results from its ability to focus on a strategically chosen subset of categories on the one hand and a subset of new data on the other. Given a query, CS* finds the top-K categories with high accuracy even in time-constrained situations. An experimental evaluation of the CS* system using real world data shows that it can easily achieve accuracy in excess of 90%, whereas other approaches demand at least 57% more resources (i.e., processing power), for providing similar results. Our experimental results also show that, contrary to expectations, if the rate of arrival of data items doubles, whereas CS* continues to provide high accuracy without a significant increase in resources, other approaches require more than double the number of resources.

Details

ISSN :
10844627
Database :
OpenAIRE
Journal :
2009 IEEE 25th International Conference on Data Engineering
Accession number :
edsair.doi...........20061c187b2b70a43650ba084d14b371
Full Text :
https://doi.org/10.1109/icde.2009.91