Back to Search Start Over

A Preprocessing and Analyzing Method of Images in PDF Documents for Mathematical Expression Retrieval

Authors :
Jing Sun
Xuedong Tian
Botao Yu
Source :
TELKOMNIKA Indonesian Journal of Electrical Engineering. 12
Publication Year :
2014
Publisher :
Institute of Advanced Engineering and Science, 2014.

Abstract

PDF documents are the important information resources for a mathematical expression retrieval system. As a major component of PDF documents, the image objects must be converted to coded form with the help of character recognition and document analysis technology firstly for content based searching. Therefore, the quality of these images becomes the key factor which decides the correctness in this conversion process. Considering the characteristics of PDF images and mathematical expressions, a preprocessing and analyzing method was proposed which includes the modules of PDF image extraction, graying, binarization, denoising, skew correction and layout parameter detection. The features of mathematical expressions were adequately considered to avoid the information loss in image converting process and the adverse interference both to the analysis and correction process resulted from formulas. The experimental results show that the method is effective in improving the accuracy and efficiency of document image recognition, analysis and retrieval. DOI : http://dx.doi.org/10.11591/telkomnika.v12i6.5440

Details

ISSN :
2087278X and 23024046
Volume :
12
Database :
OpenAIRE
Journal :
TELKOMNIKA Indonesian Journal of Electrical Engineering
Accession number :
edsair.doi...........5777098085df64dda8daaf28822a45c1
Full Text :
https://doi.org/10.11591/telkomnika.v12i6.5440