Back to Search Start Over

Efficient and Effective Academic Expert Finding on Heterogeneous Graphs through (k, P)-Core based Embedding.

Authors :
YUXIANG WANG
JUN LIU
XIAOLIANG XU
XIANGYU KE
TIANXING WU
XIAOXUAN GOU
Source :
ACM Transactions on Knowledge Discovery from Data; Jul2023, Vol. 17 Issue 6, p1-35, 35p
Publication Year :
2023

Abstract

Expert finding is crucial for a wealth of applications in both academia and industry. Given a user query and trove of academic papers, expert finding aims at retrieving the most relevant experts for the query, from the academic papers. Existing studies focus on embedding-based solutions that consider academic papers’ textual semantic similarities to a query via document representation and extract the top-n experts from the most similar papers. Beyond implicit textual semantics, however, papers’ explicit relationships (e.g., co-authorship) in a heterogeneous graph (e.g., DBLP) are critical for expert finding, because they help improve the representation quality. Despite their importance, the explicit relationships of papers generally have been ignored in the literature. In this article, we study expert finding on heterogeneous graphs by considering both the explicit relationships and implicit textual semantics of papers in one model. Specifically, we define the cohesive (k, P)-core community of papers w.r.t. a meta-path P (i.e., relationship) and propose a (k, P)-core based document embedding model to enhance the representation quality. Based on this, we design a proximity graph-based index (PGIndex) of papers and present a threshold algorithm (TA)-based method to efficiently extract top-n experts from papers returned by PG-Index. We further optimize our approach in two ways: (1) we boost effectiveness by considering the (k, P)-core community of experts and the diversity of experts’ research interests, to achieve high-quality expert representation from paper representation; and (2) we streamline expert finding, going from “extract top-n experts from top-m (m > n) semantically similar papers” to “directly return top-n experts”. The process of returning a large number of top-m papers as intermediate data is avoided, thereby improving the efficiency. Extensive experiments using real-world datasets demonstrate our approach’s superiority. [ABSTRACT FROM AUTHOR]

Subjects

Subjects :
COMMUNITIES
SEMANTICS
ALGORITHMS

Details

Language :
English
ISSN :
15564681
Volume :
17
Issue :
6
Database :
Complementary Index
Journal :
ACM Transactions on Knowledge Discovery from Data
Publication Type :
Academic Journal
Accession number :
163028861
Full Text :
https://doi.org/10.1145/3578365