Start Over

Cohesive Subgraph Search Using Keywords in Large Networks.

Authors :: Zhu, Yuanyuan
Zhang, Qian
Qin, Lu
Chang, Lijun
Yu, Jeffrey Xu
Source :: IEEE Transactions on Knowledge & Data Engineering. Jan2022, Vol. 34 Issue 1, p178-191. 14p.
Publication Year :: 2022
Abstract: Keyword search has been widely studied to retrieve relevant substructures from graphs for a given set of keywords. However, existing well-studied approaches aim at finding compact trees/subgraphs containing the keywords, and ignore a critical measure, density, to represent how strongly and stably the keyword nodes are connected in the substructure. In this paper, given a set of keywords $Q = \lbrace w_1, w_2, \ldots, w_l\rbrace$ Q = { w 1 , w 2 ,... , w l } , we study the problem of finding a cohesive subgraph containing $Q$ Q with high density and compactness from a graph $G$ G . We model the cohesive subgraph based on a carefully chosen $k$ k -truss model, and formulate the problem of finding cohesive subgraphs for keyword queries as minimal dense truss search problem, i.e., finding minimal subgraph that maximizes the trussness covering $Q$ Q . However, unlike $k$ k -truss based community search that can be efficiently done based on the local search from a given set of nodes, minimal dense truss search for keyword queries is a nontrivial task as the subset of keyword nodes to be included in the retrieved substructure is previously unknown. To tackle this problem, we first design a novel hybrid KT-Index to keep the keyword and truss information compacly, and then propose an efficient algorithm that carries the search on KT-Index directly to find the dense truss with the maximum trussness $G_{den}$ G d e n without repeated accesses to the original graph. Then, we develop a novel refinement approach to extract minimal dense truss from the dense truss $G_{den}$ G d e n , by checking each node at most once based on the anti-monotonicity property derived from $k$ k -truss, together with several optimization strategies including batch based deletion, early-stop based deletion, and local exploration. Moreover, we also extend the proposed method to deal with the top- $r$ r search. Extensive experimental studies on real-world networks validated the effectiveness and efficiency of our approaches. [ABSTRACT FROM AUTHOR]