1. A Comprehensive Evaluation of Metadata-Based Features to Classify Research Paper’s Topics
- Author
-
Muhammad Usman, Ghulam Mustafa, Muhammad Afzal, Anis Koubaa, and Abdul Shahid
- Subjects
Root (linguistics) ,Hierarchy ,Information retrieval ,General Computer Science ,Exploit ,business.industry ,Computer science ,Document classification ,Deep learning ,General Engineering ,Decision tree ,computer.software_genre ,Random forest ,Metadata ,General Materials Science ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,computer - Abstract
The existing plethora of document classification techniques exploits different data sources either from the content or metadata of research articles. Various journal publishers like Springer, Elsevier, IEEE, etc., do not provide open access to the content of research articles, whereas metadata is freely available there. Metadata like title, keyword, and abstract can serve as a better alternative to the content in various scenarios. In the current literature, researchers have assessed the role of some of the metadata individually. We believe that the collective contribution of metadata parameters can play a significant role in classifying research papers. This paper presents a comprehensive evaluation of the role of metadata, individually as well as in combinations to achieve the objective of research paper classification. Moreover, we have classified the research articles into ACM hierarchy root categories (e.g. general literature, hardware, software, etc.). In this comprehensive evaluation, we have assessed all the possible combinations of metadata features against different classifiers such as Random Forest, K Nearest Neighbor, and Decision Tree. The results of this research reveal that the title & keywords combination outperforms other combinations with an F-measure score of 0.88.
- Published
- 2021
- Full Text
- View/download PDF