Back to Search Start Over

Improvement of Web Data Clustering Using Web Page Contents

Authors :
Yue Xu
Li-Tung Weng
Source :
Intelligent Information Processing II ISBN: 038723151X, Intelligent Information Processing
Publication Year :
2006
Publisher :
Springer-Verlag, 2006.

Abstract

This paper presents an approach that discovers clusters of Web pages based on Web log data and Web page contents as well. Most existing Web log mining techniques are access-based approaches that statistically analyze the log data without paying much attention on the contents of the pages. The log data contains various kinds of noise which can significantly influence the performance of pure access-based web log mining. The method proposed in this paper not only considers the frequence of page co-occurrence in user access logs, but also takes into account the web page contents to cluster Web pages. We also present a method of using information entropy to prune away irrelevant papges which improves the performance of the web page clustering.

Details

ISBN :
978-0-387-23151-8
0-387-23151-X
ISBNs :
9780387231518 and 038723151X
Database :
OpenAIRE
Journal :
Intelligent Information Processing II ISBN: 038723151X, Intelligent Information Processing
Accession number :
edsair.doi...........7faf2f6fd97574a056bbbcc16e85c9b7
Full Text :
https://doi.org/10.1007/0-387-23152-8_65