Back to Search
Start Over
Pagerank based clustering of hypertext document collections
- Source :
- International ACM SIGIR Conference on Research & Development in Information Retrieval, International ACM SIGIR Conference on Research & Development in Information Retrieval, Jul 2008, Singapore, Singapore. pp.873--874, ⟨10.1145/1390334.1390549⟩, SIGIR
- Publication Year :
- 2008
- Publisher :
- HAL CCSD, 2008.
-
Abstract
- International audience; Clustering hypertext document collection is an important task in Information Retrieval. Most clustering methods are based on document content and do not take into account the hyper-text links. Here we propose a novel PageRank based clustering (PRC) algorithm which uses the hypertext structure. The PRC algorithm produces graph partitioning with high modularity and coverage. The comparison of the PRC algorithm with two content based clustering algorithms shows that there is a good match between PRC clustering and content based clustering.
- Subjects :
- Clustering high-dimensional data
DBSCAN
Fuzzy clustering
Computer science
Single-linkage clustering
Correlation clustering
Conceptual clustering
02 engineering and technology
computer.software_genre
01 natural sciences
010305 fluids & plasmas
law.invention
Biclustering
[INFO.INFO-NI]Computer Science [cs]/Networking and Internet Architecture [cs.NI]
PageRank
CURE data clustering algorithm
law
0103 physical sciences
0202 electrical engineering, electronic engineering, information engineering
Cluster analysis
Brown clustering
Information retrieval
Graph partition
Directed graph
Document clustering
Data stream clustering
ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
Canopy clustering algorithm
FLAME clustering
Affinity propagation
020201 artificial intelligence & image processing
Data mining
Hierarchical clustering of networks
computer
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- International ACM SIGIR Conference on Research & Development in Information Retrieval, International ACM SIGIR Conference on Research & Development in Information Retrieval, Jul 2008, Singapore, Singapore. pp.873--874, ⟨10.1145/1390334.1390549⟩, SIGIR
- Accession number :
- edsair.doi.dedup.....03b837669abc3eb25f86105e8e833a61
- Full Text :
- https://doi.org/10.1145/1390334.1390549⟩