1. PDTL: Parallel and distributed triangle listing for massive graphs
- Author
-
Eiko Yoneki, George Panagopoulos, and Ilias Giechaskiel
- Subjects
Big Data ,Theoretical computer science ,Triangle Counting ,Computer science ,Triangle Listing ,Parallel algorithm ,Triangulation (social science) ,Parallel computing ,Orientation (graph theory) ,Graph ,Vertex (geometry) ,Indifference graph ,Massive Graphs ,Memory management ,Pathwidth ,Distributed algorithm ,Distributed Algorithm ,I/O-Efficient Algorithm ,Maximal independent set ,Parallel Algorithm ,MathematicsofComputing_DISCRETEMATHEMATICS - Abstract
This paper presents the first distributed triangle listing algorithm with provable CPU, I/O, Memory, and Network bounds. Finding all triangles (3-cliques) in a graph has numerous applications for density and connectivity metrics. The majority of existing algorithms for massive graphs are sequential processing and distributed versions of algorithms do not guarantee their CPU, I/O, Memory or Network requirements. Our Parallel and Distributed Triangle Listing (PDTL) framework focuses on efficient external-memory access in distributed environments instead of fitting subgraphs into memory. It works by performing efficient orientation and load-balancing steps, and replicating graphs across machines by using an extended version of Hu et al.'s Massive Graph Triangulation algorithm. As a result, PDTL suits a variety of computational environments, from single-core machines to high-end clusters. PDTL computes the exact triangle count on graphs of over 6 billion edges and 1 billion vertices (e.g. Yahoo graphs), outperforming and using fewer resources than the state-of-the-art systems PowerGraph, OPT, and PATRIC by 2 times to 4 times. Our approach highlights the importance of I/O considerations in a distributed environment, which has received less attention in the graph processing literature.
- Published
- 2018
- Full Text
- View/download PDF