Back to Search
Start Over
Duplicate detection in web shops using LSH to reduce the number of computations
- Source :
- SAC, 31st Annual ACM Symposium on Applied Computing (SAC 2016), 772-779, STARTPAGE=772;ENDPAGE=779;TITLE=31st Annual ACM Symposium on Applied Computing (SAC 2016)
- Publication Year :
- 2016
- Publisher :
- ACM, 2016.
-
Abstract
- The amount of online shops is growing daily and many Web shops focus on the same product types, like consumer electronics. Since Web shops use different product representations, it is hard to compare products among different Web shops. Duplicate detection methods aim to solve this problem by identifying the same products in differentWeb shops. In this paper, we focus on reducing the computation time of a state-of-the-art duplicate detection algorithm. First, we construct uniform vector representations for the products. We use these vectors as input for a Locality Sensitive Hashing (LSH) algorithm, which pre-selects potential duplicates. Finally, duplicate products are found by applying the Multi-component Similarity Method (MSM). Compared to original MSM, the number of needed computations can be reduced by 95% with only a minor decrease by 9% in the F1-measure.
- Subjects :
- Focus (computing)
Similarity (geometry)
Computer science
Computation
02 engineering and technology
Construct (python library)
computer.software_genre
Duplicate detection
Locality-sensitive hashing
020204 information systems
Product (mathematics)
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Data mining
computer
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- Proceedings of the 31st Annual ACM Symposium on Applied Computing
- Accession number :
- edsair.doi.dedup.....cf7311cd3c24bb94547b9d16e9dbbcce