Back to Search
Start Over
A Preprocessing and Analyzing Method of Images in PDF Documents for Mathematical Expression Retrieval
- Source :
- TELKOMNIKA Indonesian Journal of Electrical Engineering. 12
- Publication Year :
- 2014
- Publisher :
- Institute of Advanced Engineering and Science, 2014.
-
Abstract
- PDF documents are the important information resources for a mathematical expression retrieval system. As a major component of PDF documents, the image objects must be converted to coded form with the help of character recognition and document analysis technology firstly for content based searching. Therefore, the quality of these images becomes the key factor which decides the correctness in this conversion process. Considering the characteristics of PDF images and mathematical expressions, a preprocessing and analyzing method was proposed which includes the modules of PDF image extraction, graying, binarization, denoising, skew correction and layout parameter detection. The features of mathematical expressions were adequately considered to avoid the information loss in image converting process and the adverse interference both to the analysis and correction process resulted from formulas. The experimental results show that the method is effective in improving the accuracy and efficiency of document image recognition, analysis and retrieval. DOI : http://dx.doi.org/10.11591/telkomnika.v12i6.5440
Details
- ISSN :
- 2087278X and 23024046
- Volume :
- 12
- Database :
- OpenAIRE
- Journal :
- TELKOMNIKA Indonesian Journal of Electrical Engineering
- Accession number :
- edsair.doi...........5777098085df64dda8daaf28822a45c1
- Full Text :
- https://doi.org/10.11591/telkomnika.v12i6.5440