Back to Search Start Over

\mathsf pSCAN : Fast and Exact Structural Graph Clustering.

Authors :
Chang, Lijun
Li, Wei
Qin, Lu
Zhang, Wenjie
Yang, Shiyu
Source :
IEEE Transactions on Knowledge & Data Engineering. Feb2017, Vol. 29 Issue 2, p387-401. 15p.
Publication Year :
2017

Abstract

We study the problem of structural graph clustering, a fundamental problem in managing and analyzing graph data. Given an undirected unweighted graph, structural graph clustering is to assign vertices to clusters, and to identify the sets of hub vertices and outlier vertices as well, such that vertices in the same cluster are densely connected to each other while vertices in different clusters are loosely connected. In this paper, we develop a new two-step paradigm for scalable structural graph clustering based on our three observations. Then, we present a \mathsf pSCAN<alternatives> <inline-graphic xlink:href="chang-ieq2-2618795.gif"/></alternatives> approach, within the paradigm, aiming to reduce the number of structural similarity computations, and propose optimization techniques to speed up checking whether two vertices are structure-similar. \mathsf pSCAN <alternatives><inline-graphic xlink:href="chang-ieq3-2618795.gif"/></alternatives> outputs exactly the same clusters as the existing approaches \mathsf SCAN <alternatives><inline-graphic xlink:href="chang-ieq4-2618795.gif"/></alternatives> and \mathsf SCAN\text++<alternatives> <inline-graphic xlink:href="chang-ieq5-2618795.gif"/></alternatives>, and we prove that \mathsf pSCAN<alternatives> <inline-graphic xlink:href="chang-ieq6-2618795.gif"/></alternatives> is worst-case optimal. Moreover, we propose efficient techniques for updating the clusters when the input graph dynamically changes, and we also extend our techniques to other similarity measures, e.g., Jaccard similarity. Performance studies on large real and synthetic graphs demonstrate the efficiency of our new approach and our dynamic cluster maintenance techniques. Noticeably, for the twitter graph with 1 billion edges, our approach takes 25 minutes while the state-of-the-art approach cannot finish even after 24 hours. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10414347
Volume :
29
Issue :
2
Database :
Academic Search Index
Journal :
IEEE Transactions on Knowledge & Data Engineering
Publication Type :
Academic Journal
Accession number :
120763983
Full Text :
https://doi.org/10.1109/TKDE.2016.2618795