1. Exploiting similarities across multiple dimensions for author name disambiguation
- Author
-
Joydeep Chandra, Samrat Mondal, and K. M. Pooja
- Subjects
Computer science ,business.industry ,media_common.quotation_subject ,General Social Sciences ,Ambiguity ,Library and Information Sciences ,computer.software_genre ,Computer Science Applications ,Margin (machine learning) ,Metric (mathematics) ,Pairwise comparison ,Artificial intelligence ,Cluster analysis ,Representation (mathematics) ,business ,Feature learning ,computer ,Natural language processing ,media_common ,Statistical hypothesis testing - Abstract
In bibliometric analysis, ambiguity in author names may lead to erroneous aggregation of records. The author name disambiguation techniques attempt to address this issue by attributing records to the corresponding author. The name disambiguation has been widely studied as a clustering task. However, maintaining consistent accuracy levels over datasets is still a major challenge. Recent efforts have witnessed the use of representation learning based techniques to map the records to an embedding space that can be used to determine the clusters. However, some of these models that use supervised global embedding fail to generalize across different datasets, while others lag in the accuracy. In this paper, we propose a method that uses two independent relations among the documents-co-authorship and meta-content of document, to generate a latent representation of documents that is capable of generalizing over various datasets (consisting different sets of features). Through rigorous validation, we discover that the proposed approach outperforms several state-of-the-art methods by a significant margin in terms of standard measures like pairwise F1, K metric, and BF1 scores. Moreover, we have also validated the performance of our method with the statistical test.
- Published
- 2021
- Full Text
- View/download PDF