1. Partition-Merge: Distributed Inference and Modularity Optimization
- Author
-
Vincent D. Blondel, Kyomin Jung, Seungpil Won, Pushmeet Kohli, Devavrat Shah, and UCL - SST/ICTM - Institute of Information and Communication Technologies, Electronics and Applied Mathematics
- Subjects
Polynomial ,General Computer Science ,Computer science ,Computation ,0102 computer and information sciences ,02 engineering and technology ,01 natural sciences ,Approximate MAP ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Maximum a posteriori estimation ,General Materials Science ,Cluster analysis ,Modularity (networks) ,Markov random field ,General Engineering ,Approximation algorithm ,graphical model ,partition ,modularity optimization ,010201 computation theory & mathematics ,Distributed algorithm ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,lcsh:TK1-9971 ,Algorithm - Abstract
This paper presents a novel meta-algorithm, Partition-Merge (PM), which takes existing centralized algorithms for graph computation and makes them distributed and faster. In a nutshell, PM divides the graph into small subgraphs using our novel randomized partitioning scheme, runs the centralized algorithm on each partition separately, and then stitches the resulting solutions to produce a global solution. We demonstrate the efficiency of the PM algorithm on two popular problems: computation of Maximum A Posteriori (MAP) assignment in an arbitrary pairwise Markov Random Field (MRF) and modularity optimization for community detection. We show that the resulting distributed algorithms for these problems become fast, which run in time linear in the number of nodes in the graph. Furthermore, PM leads to performance comparable – or even better – to that of the centralized algorithms as long as the graph has polynomial growth property. More precisely, if the centralized algorithm is a $\mathcal {C}-$ factor approximation with constant $\mathcal {C}\ge 1$ , the resulting distributed algorithm is a $(\mathcal {C}+\delta)$ -factor approximation for any small $\delta >0$ ; and even if the centralized algorithm is a non-constant (e.g., logarithmic) factor approximation, then the resulting distributed algorithm becomes a constant factor approximation. For general graphs, we compute explicit bounds on the loss of performance of the resulting distributed algorithm with respect to the centralized algorithm. To show the efficiency of our algorithm, we conducted extensive experiments both on real-world networks and on synthetic networks. The experiments demonstrate that the PM algorithm provides a good trade-off between accuracy and running time.
- Published
- 2021
- Full Text
- View/download PDF