Back to Search Start Over

A Focused Event Crawler with Temporal Intent.

Authors :
Wu, Hao
Hou, Dongyang
Source :
Applied Sciences (2076-3417); Apr2023, Vol. 13 Issue 7, p4149, 14p
Publication Year :
2023

Abstract

Temporal intent is an important component of events. It plays an important role in collecting them from the web with focused crawlers. However, traditionally focused crawlers usually only consider factors such as topic keywords, web page content, and anchor text, ignoring the relationship between web pages and the temporal intent of events. This leads to their poor crawling performance. This paper aims to understand the temporal intent of events and apply it within focused crawlers. First, a new temporal intent identification method is proposed based on Google Trends data. The method can automatically identify the start time of an event and quantify the temporal distribution of the event. Then, a new focused event crawler with temporal intent is proposed. The crawler incorporates the start time of the event into the similarity calculation module, and a new URL (Uniform Resource Locator) priority assignment method is developed using the quantified temporal distribution of temporal intent as the independent variable of a natural exponential function. Experimental results show that our method is effective in identifying the start time of events at the month level and quantifying the temporal distribution of events. Furthermore, compared to the traditional best-first crawling method, the precision of our method improves by an average of 10.28%, and a maximum of 25.21%. These results indicate that our method performs better in retrieving relevant pages and assigning URL priority. This also illustrates the importance of the relationship between web pages and the temporal intent of events. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
20763417
Volume :
13
Issue :
7
Database :
Complementary Index
Journal :
Applied Sciences (2076-3417)
Publication Type :
Academic Journal
Accession number :
163038034
Full Text :
https://doi.org/10.3390/app13074149