Back to Search
Start Over
Linked Data Triples Enhance Document Relevance Classification
- Source :
- Applied Sciences, Vol 11, Iss 6636, p 6636 (2021), Applied Sciences, Volume 11, Issue 14
- Publication Year :
- 2021
- Publisher :
- MDPI AG, 2021.
-
Abstract
- Standardized approaches to relevance classification in information retrieval use generative statistical models to identify the presence or absence of certain topics that might make a document relevant to the searcher. These approaches have been used to better predict relevance on the basis of what the document is “about”, rather than a simple-minded analysis of the bag of words contained within the document. In more recent times, this idea has been extended by using pre-trained deep learning models and text representations, such as GloVe or BERT. These use an external corpus as a knowledge-base that conditions the model to help predict what a document is about. This paper adopts a hybrid approach that leverages the structure of knowledge embedded in a corpus. In particular, the paper reports on experiments where linked data triples (subject-predicate-object), constructed from natural language elements are derived from deep learning. These are evaluated as additional latent semantic features for a relevant document classifier in a customized news-feed website. The research is a synthesis of current thinking in deep learning models in NLP and information retrieval and the predicate structure used in semantic web research. Our experiments indicate that linked data triples increased the F-score of the baseline GloVe representations by 6% and show significant improvement over state-of-the art models, like BERT. The findings are tested and empirically validated on an experimental dataset and on two standardized pre-classified news sources, namely the Reuters and 20 News groups datasets.
- Subjects :
- Topic model
relevance classification
Technology
linked data triples
Computer science
QH301-705.5
named entities
QC1-999
topic modeling
02 engineering and technology
computer.software_genre
020204 information systems
0202 electrical engineering, electronic engineering, information engineering
General Materials Science
Relevance (information retrieval)
Biology (General)
Instrumentation
Semantic Web
QD1-999
Fluid Flow and Transfer Processes
business.industry
Process Chemistry and Technology
Deep learning
Physics
General Engineering
deep learning
Linked data
Engineering (General). Civil engineering (General)
Computer Science Applications
Chemistry
Bag-of-words model
020201 artificial intelligence & image processing
Artificial intelligence
TA1-2040
business
computer
Classifier (UML)
Natural language
Natural language processing
Subjects
Details
- Language :
- English
- ISSN :
- 20763417
- Volume :
- 11
- Issue :
- 6636
- Database :
- OpenAIRE
- Journal :
- Applied Sciences
- Accession number :
- edsair.doi.dedup.....37ecdb9af2f831739b1e09ac6b811bbe