Back to Search
Start Over
Improvement of Web Data Clustering Using Web Page Contents
- Source :
- Intelligent Information Processing II ISBN: 038723151X, Intelligent Information Processing
- Publication Year :
- 2006
- Publisher :
- Springer-Verlag, 2006.
-
Abstract
- This paper presents an approach that discovers clusters of Web pages based on Web log data and Web page contents as well. Most existing Web log mining techniques are access-based approaches that statistically analyze the log data without paying much attention on the contents of the pages. The log data contains various kinds of noise which can significantly influence the performance of pure access-based web log mining. The method proposed in this paper not only considers the frequence of page co-occurrence in user access logs, but also takes into account the web page contents to cluster Web pages. We also present a method of using information entropy to prune away irrelevant papges which improves the performance of the web page clustering.
Details
- ISBN :
- 978-0-387-23151-8
0-387-23151-X - ISBNs :
- 9780387231518 and 038723151X
- Database :
- OpenAIRE
- Journal :
- Intelligent Information Processing II ISBN: 038723151X, Intelligent Information Processing
- Accession number :
- edsair.doi...........7faf2f6fd97574a056bbbcc16e85c9b7
- Full Text :
- https://doi.org/10.1007/0-387-23152-8_65