Back to Search Start Over

Pagerank based clustering of hypertext document collections

Authors :
Danil Nemirovsky
Son K. Pham
Vladimir Dobrynin
Elena Smirnova
Konstantin Avrachenkov
Models for the performance analysis and the control of networks (MAESTRO)
Inria Sophia Antipolis - Méditerranée (CRISAM)
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
St Petersburg State University (SPbU)
University of California [San Diego] (UC San Diego)
University of California
University of California (UC)
Source :
International ACM SIGIR Conference on Research & Development in Information Retrieval, International ACM SIGIR Conference on Research & Development in Information Retrieval, Jul 2008, Singapore, Singapore. pp.873--874, ⟨10.1145/1390334.1390549⟩, SIGIR
Publication Year :
2008
Publisher :
HAL CCSD, 2008.

Abstract

International audience; Clustering hypertext document collection is an important task in Information Retrieval. Most clustering methods are based on document content and do not take into account the hyper-text links. Here we propose a novel PageRank based clustering (PRC) algorithm which uses the hypertext structure. The PRC algorithm produces graph partitioning with high modularity and coverage. The comparison of the PRC algorithm with two content based clustering algorithms shows that there is a good match between PRC clustering and content based clustering.

Details

Language :
English
Database :
OpenAIRE
Journal :
International ACM SIGIR Conference on Research & Development in Information Retrieval, International ACM SIGIR Conference on Research & Development in Information Retrieval, Jul 2008, Singapore, Singapore. pp.873--874, ⟨10.1145/1390334.1390549⟩, SIGIR
Accession number :
edsair.doi.dedup.....03b837669abc3eb25f86105e8e833a61
Full Text :
https://doi.org/10.1145/1390334.1390549⟩