Back to Search Start Over

Using citation networks to evaluate the impact of text length on keyword extraction.

Authors :
Tohalino, Jorge A. V.
Silva, Thiago C.
Amancio, Diego R.
Source :
PLoS ONE; 11/27/2023, Vol. 18 Issue 11, p1-17, 17p
Publication Year :
2023

Abstract

The identification of key concepts within unstructured data is of paramount importance in practical applications. Despite the abundance of proposed methods for extracting primary topics, only a few works investigated the influence of text length on the performance of keyword extraction (KE) methods. Specifically, many studies lean on abstracts and titles for content extraction from papers, leaving it uncertain whether leveraging the complete content of papers can yield consistent results. Hence, in this study, we employ a network-based approach to evaluate the concordance between keywords extracted from abstracts and those from the entire papers. Community detection methods are utilized to identify interconnected papers in citation networks. Subsequently, paper clusters are formed to identify salient terms within each cluster, employing a methodology akin to the term frequency-inverse document frequency (tf-idf) approach. Once each cluster has been endowed with its distinctive set of key terms, these selected terms are employed to serve as representative keywords at the paper level. The top-ranked words at the cluster level, which also appear in the abstract, are chosen as keywords for the paper. Our findings indicate that although various community detection methods used in KE yield similar levels of accuracy. Notably, text clustering approaches outperform all citation-based methods, while all approaches yield relatively low accuracy values. We also identified a lack of concordance between keywords extracted from the abstracts and those extracted from the corresponding full-text source. Considering that citations and text clustering yield distinct outcomes, combining them in hybrid approaches could offer improved performance. [ABSTRACT FROM AUTHOR]

Subjects

Subjects :
CITATION networks

Details

Language :
English
ISSN :
19326203
Volume :
18
Issue :
11
Database :
Complementary Index
Journal :
PLoS ONE
Publication Type :
Academic Journal
Accession number :
173857672
Full Text :
https://doi.org/10.1371/journal.pone.0294500