Back to Search
Start Over
Universal Graph Compression: Stochastic Block Models
- Source :
- ISIT
- Publication Year :
- 2021
- Publisher :
- IEEE, 2021.
-
Abstract
- Motivated by the prevalent data science applications of processing and mining large-scale graph data such as social networks, web graphs, and biological networks, as well as the high I/O and communication costs of storing and transmitting such data, this paper investigates lossless compression of data appearing in the form of a labeled graph. A universal graph compression scheme is proposed, which does not depend on the underlying statistics/distribution of the graph model. For graphs generated by a stochastic block model, which is a widely used random graph model capturing the clustering effects in social networks, the proposed scheme achieves the optimal theoretical limit of lossless compression without the need to know edge probabilities, community labels, or the number of communities. The key ideas in establishing universality for stochastic block models include: 1) block decomposition of the adjacency matrix of the graph; 2) generalization of the Krichevsky-Trofimov probability assignment, which was initially designed for i.i.d. random processes. In four benchmark graph datasets (protein-to-protein interaction, LiveJournal friendship, Flickr, and YouTube), the compressed files from competing algorithms (including CSR, Ligra+, PNG image compressor, and Lempel-Ziv compressor for two-dimensional data) take 2.4 to 27 times the space needed by the proposed scheme.
- Subjects :
- FOS: Computer and information sciences
Random graph
Discrete mathematics
Computer science
Information Theory (cs.IT)
Computer Science - Information Theory
Databases (cs.DB)
Mathematics - Statistics Theory
020206 networking & telecommunications
Data compression ratio
Statistics Theory (math.ST)
Data_CODINGANDINFORMATIONTHEORY
02 engineering and technology
Computer Science - Databases
Stochastic block model
FOS: Mathematics
0202 electrical engineering, electronic engineering, information engineering
Entropy (information theory)
Adjacency matrix
Cluster analysis
Block (data storage)
Universal graph
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2021 IEEE International Symposium on Information Theory (ISIT)
- Accession number :
- edsair.doi.dedup.....42d7a0affd9be0e13ee911c4d8e688f3