1. An Empirical Evaluation of Document Embeddings and Similarity Metrics for Scientific Articles
- Author
-
Joaquin Gomez Sanchez, Pere-Pau Vázquez, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, and Universitat Politècnica de Catalunya. ViRVIG - Grup de Recerca en Visualització, Realitat Virtual i Interacció Gràfica
- Subjects
Fluid Flow and Transfer Processes ,Natural language processing ,Process Chemistry and Technology ,General Engineering ,Computational linguistics ,Deep learning ,document similarity ,similarity measures ,word embeddings ,natural language processing ,Computer Science Applications ,Word embeddings ,Lingüística computacional ,General Materials Science ,Informàtica::Intel·ligència artificial::Llenguatge natural [Àrees temàtiques de la UPC] ,Document similarity ,Instrumentation ,Similarity measures ,Aprenentatge profund - Abstract
The comparison of documents—such as articles or patents search, bibliography recommendations systems, visualization of document collections, etc.—has a wide range of applications in several fields. One of the key tasks that such problems have in common is the evaluation of a similarity metric. Many such metrics have been proposed in the literature. Lately, deep learning techniques have gained a lot of popularity. However, it is difficult to analyze how those metrics perform against each other. In this paper, we present a systematic empirical evaluation of several of the most popular similarity metrics when applied to research articles. We analyze the results of those metrics in two ways, with a synthetic test that uses scientific papers and Ph.D. theses, and in a real-world scenario where we evaluate their ability to cluster papers from different areas of research. This research was funded by Project TIN2017-88515-C2-1-R funded by Ministerio de Economía y Competitividad, under MCIN/AEI/10.13039/501100011033/FEDER “A way to make Europe”.
- Published
- 2022