Back to Search Start Over

Effectively and efficiently detect web page duplication

Authors :
Zhongming Han
Qian Mo
Jianzhi Sun
Hongzhi Liu
Source :
ICDIM
Publication Year :
2009
Publisher :
IEEE, 2009.

Abstract

There are a lot of redundant web pages on Internet. Based on tag statistic and text similarity comparison, we present a novel multilayer framework for detecting duplicated web pages in this paper. We propose two similarity text paragraphs detection algorithms and implement our framework. The experimental results show that our approach achieves high performance, which means that duplicated web pages can be efficiently detected simply by tag statistic and text comparison.

Details

Database :
OpenAIRE
Journal :
2009 Fourth International Conference on Digital Information Management
Accession number :
edsair.doi...........f7740f3a8ffc0cce5885de82b0324f1b