1. Heterogeneous Network Crawling: Reaching Target Nodes by Motif-Guided Navigation
- Author
-
Kevin Chen-Chuan Chang, Tao Qin, Pinghui Wang, Xiaohong Guan, and Changyu Wang
- Subjects
Computational Theory and Mathematics ,Exploit ,Computer science ,Distributed computing ,Motif (music) ,Crawling ,Probabilistic inference ,Web crawler ,Small set ,Heterogeneous network ,Computer Science Applications ,Information Systems - Abstract
With numerous nodes on online heterogeneous networks, how to reach and extract target nodes of our specic interests is a pressing problem. In this paper, we propose a novel heterogeneous network crawler, MCrawl. It addresses the problem via iterative online heterogeneous network crawling by navigating its available APIs, starting from a set of target nodes, i.e., seed nodes. We are facing two challenges towards addressing the problem. First, to navigate within a vast network, how do we start from a small set of target nodes In other words, which nodes in the current frontier and which direction shall we expand, to reach promising target nodes quickly We propose motif-based crawling to exploit the complex structures and rich semantics of heterogeneous networks. Second, in many scenarios, we do not have a classier to assess the quality of the harvested nodes and thus the motifs to expand. We develop a probabilistic inference framework to estimate the yield and harvest rates of motifs, achieving principled bootstrapping for crawling. Our experiment on real networks of MCrawl achieves signicant margins over baselines.
- Published
- 2022