Back to Search Start Over

A new method for duplicate document image detection with page layout

Authors :
Yafeng Li
Source :
2nd International Conference on Computer Vision, Image, and Deep Learning.
Publication Year :
2021
Publisher :
SPIE, 2021.

Abstract

The document images often appear in the digital library, social media, e-mail etc. The duplicate copies of the same content bring burden to the management system and waste network traffic and storage resources. This paper proposes a new algorithm for detecting the duplicate document images in large-scale image data sets. The key idea of the proposed algorithm lies in taking advantage of the characteristics of the document image that is structured because of the page layout. In this paper, the text lines are exacted to be taken as elements features of the document image and the Frechet Distance is introduced to measure the similarity of these features. The experimental results of different types of electronic documents show the advantages of the proposed algorithm in accuracy and stability.

Details

Database :
OpenAIRE
Journal :
2nd International Conference on Computer Vision, Image, and Deep Learning
Accession number :
edsair.doi...........5ca38e60c9ba96a36962e8f4a1b48302