Back to Search Start Over

Linked Data Triples Enhance Document Relevance Classification

Authors :
Peter W. Eklund
Bahadorreza Ofoghi
Dinesh Nagumothu
Mohamed Reda Bouadjenek
Source :
Applied Sciences, Vol 11, Iss 6636, p 6636 (2021), Applied Sciences, Volume 11, Issue 14
Publication Year :
2021
Publisher :
MDPI AG, 2021.

Abstract

Standardized approaches to relevance classification in information retrieval use generative statistical models to identify the presence or absence of certain topics that might make a document relevant to the searcher. These approaches have been used to better predict relevance on the basis of what the document is “about”, rather than a simple-minded analysis of the bag of words contained within the document. In more recent times, this idea has been extended by using pre-trained deep learning models and text representations, such as GloVe or BERT. These use an external corpus as a knowledge-base that conditions the model to help predict what a document is about. This paper adopts a hybrid approach that leverages the structure of knowledge embedded in a corpus. In particular, the paper reports on experiments where linked data triples (subject-predicate-object), constructed from natural language elements are derived from deep learning. These are evaluated as additional latent semantic features for a relevant document classifier in a customized news-feed website. The research is a synthesis of current thinking in deep learning models in NLP and information retrieval and the predicate structure used in semantic web research. Our experiments indicate that linked data triples increased the F-score of the baseline GloVe representations by 6% and show significant improvement over state-of-the art models, like BERT. The findings are tested and empirically validated on an experimental dataset and on two standardized pre-classified news sources, namely the Reuters and 20 News groups datasets.

Details

Language :
English
ISSN :
20763417
Volume :
11
Issue :
6636
Database :
OpenAIRE
Journal :
Applied Sciences
Accession number :
edsair.doi.dedup.....37ecdb9af2f831739b1e09ac6b811bbe