Back to Search
Start Over
GKEEP: An Enhanced Graph‐Based Keyword Extractor With Error‐Feedback Propagation for Geoscience Reports.
- Source :
- Earth & Space Science; May2021, Vol. 8 Issue 5, p1-22, 22p
- Publication Year :
- 2021
-
Abstract
- As the amount of published geoscience literature grows, reading and summarizing texts of large collections has become a challenging task. Publication keywords can be considered basic components of knowledge structure representations and have been used to reveal knowledge concerning research domains. In contrast to data used in other research domains, the works on textual geoscience data that entail keyword extraction are limited. In this paper, we propose an unsupervised algorithm, the graph‐based keyword extractor with error‐feedback propagation (GKEEP), that enhances graph‐based keyword extraction approaches by using an error‐feedback mechanism similar to the concept of backpropagation. The proposed approach comprises the following steps. A preprocessed document is used as the input of the proposed model and is represented as a weighted undirected graph, where the vertices represent words and the edges represent the cooccurrence relationship between the words constrained by a window size. Subsequently, its nodes are ranked by their importance scores calculated by a graph‐based ranking algorithm. Consequently, all the words have their own scores, and they are used to compute the scores of keyword candidates. Subsequently, the Word2Vec method is applied to recalculate the scores of keyword candidates and rank the keyword candidates to select the final keyword. It also utilizes error feedback to boost the rankings of the most salient terms that would otherwise be deemed less important. With empirical experiments on two real data sets (including our newly built data set), the proposed GKEEP model outperforms state‐of‐the‐art unsupervised models and the existing graph‐based ranking models. The proposed method can effectively reflect intrinsic keyword semantics and interrelationships. Plain Language Summary: The common or frequently used terms receive higher scores in traditional graph‐based extraction owing to there are more edges connected to them. This paper proposes a graph‐based KE algorithm called KE using error‐feedback propagation, which utilizes the semantics of word embedding to assist in extracting keywords from geoscience reports. We hope that our approach will serve as an alternative method that deserves further study. Key Points: Word embedding is incorporated to capture the dependency structure as well as the data distribution, and it computes semantic relations to solve the content sparsity problemError feedback is utilized to boost the most salient terms that graph‐based approaches deem less importantA set of experiments to verify the effectiveness of the proposed method on two available manually constructed data sets [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 23335084
- Volume :
- 8
- Issue :
- 5
- Database :
- Complementary Index
- Journal :
- Earth & Space Science
- Publication Type :
- Academic Journal
- Accession number :
- 150673308
- Full Text :
- https://doi.org/10.1029/2020EA001602