Start Over

A new similarity measure for vector space models in text classification and information retrieval.

Authors :: Eminagaoglu, Mete
Source :: Journal of Information Science. Aug2022, Vol. 48 Issue 4, p463-476. 14p.
Publication Year :: 2022
Abstract: There are various models, methodologies and algorithms that can be used today for document classification, information retrieval and other text mining applications and systems. One of them is the vector space–based models, where distance metrics or similarity measures lie at the core of such models. Vector space–based model is one of the fast and simple alternatives for the processing of textual data; however, its accuracy, precision and reliability still need significant improvements. In this study, a new similarity measure is proposed, which can be effectively used for vector space models and related algorithms such as k -nearest neighbours (k -NN) and Rocchio as well as some clustering algorithms such as K -means. The proposed similarity measure is tested with some universal benchmark data sets in Turkish and English, and the results are compared with some other standard metrics such as Euclidean distance, Manhattan distance, Chebyshev distance, Canberra distance, Bray–Curtis dissimilarity, Pearson correlation coefficient and Cosine similarity. Some successful and promising results have been obtained, which show that this proposed similarity measure could be alternatively used within all suitable algorithms and models for information retrieval, document clustering and text classification. [ABSTRACT FROM AUTHOR]

Subjects :: *DOCUMENT clustering
*VECTOR spaces
*INFORMATION retrieval
*TEXT mining
*RESEMBLANCE (Philosophy)
*PEARSON correlation (Statistics)
*SIMILARITY (Psychology)

Details

Language :: English
ISSN :: 01655515
Volume :: 48
Issue :: 4
Database :: Academic Search Index
Journal :: Journal of Information Science
Publication Type :: Academic Journal
Accession number :: 157933773
Full Text :: https://doi.org/10.1177/0165551520968055

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

A new similarity measure for vector space models in text classification and information retrieval.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

A new similarity measure for vector space models in text classification and information retrieval.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources