1. Focussed crawling of environmental Web resources based on the combination of multimedia evidence.
- Author
-
Tsikrika, Theodora, Moumtzidou, Anastasia, Vrochidis, Stefanos, and Kompatsiaris, Ioannis
- Subjects
WEBSITE research ,HYPERLINKS ,MULTIMEDIA systems ,WEB development ,WEB design - Abstract
Focussed crawlers enable the automatic discovery of Web resources about a given topic by automatically navigating the Web link structure and selecting the hyperlinks to follow by estimating their relevance to the topic based on evidence obtained from the already downloaded pages. This work proposes a classifier-guided focussed crawling approach that estimates the relevance of a hyperlink to an unvisited Web resource based on the combination of textual evidence representing its local context, namely the textual content appearing in its vicinity in the parent page, with visual evidence associated with its global context, namely the presence of images relevant to the topic within the parent page. The proposed focussed crawling approach is applied towards the discovery of environmental Web resources that provide air quality measurements and forecasts, since such measurements (and particularly the forecasts) are not only provided in textual form, but are also commonly encoded as multimedia, mainly in the form of heatmaps. Our evaluation experiments indicate the effectiveness of incorporating visual evidence in the link selection process applied by the focussed crawler over the use of textual features alone, particularly in conjunction with hyperlink exploration strategies that allow for the discovery of highly relevant pages that lie behind apparently irrelevant ones. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF