Back to Search Start Over

A supervised learning‐based approach for focused web crawling for IoMT using global co‐occurrence matrix.

Authors :
Rajiv, S
Navaneethan, C
Source :
Expert Systems. May2023, Vol. 40 Issue 4, p1-13. 13p.
Publication Year :
2023

Abstract

Irrelevant search results for a given topic end up wasting search engine users' time. A learning focused web crawler downloads relevant URLs for a given topic using machine‐learning algorithms. The dynamic nature of the web is a challenge in related computation for focused web crawlers. Studies have shown that the learning focused crawler utilizes term frequency‐inverse document frequency (TF‐IDF) to compute the relevance between a web page and a given topic. The TF‐IDF detects similarity of the given topic to its co‐occurrence on the web page. The necessity of efficient mechanism to compute the relevance of URLs syntactically and semantically has led to the proposal of this paper with a word embedding approach to compute the relevance of the web page. The global vector representation cosine similarity is calculated between a topic and the web page contents. The calculated cosine similarity is provided as input to the trained random forest classifier to predict the relevancy of the web page. The evaluation results proved that the proposed crawler produced an average hrate of 0.41 and prate of 0.59, which outperformed other learning‐focused crawlers on support vector machines, Naive Bayes and artificial neural networks. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
02664720
Volume :
40
Issue :
4
Database :
Academic Search Index
Journal :
Expert Systems
Publication Type :
Academic Journal
Accession number :
163094811
Full Text :
https://doi.org/10.1111/exsy.12993