Back to Search
Start Over
PROCESSING IMAGES OF SALES RECEIPTS FOR ISOLATING AND RECOGNISING TEXT INFORMATION
- Source :
- Vestnik Dagestanskogo Gosudarstvennogo Tehničeskogo Universiteta: Tehničeskie Nauki, Vol 46, Iss 4, Pp 113-122 (2020)
- Publication Year :
- 2020
- Publisher :
- Daghestan State Technical University, 2020.
-
Abstract
- Objectives. This article presents an application for the processing of scanned images of sales receipts for subsequent extraction of text information using the Tesseract OCR Engine. Such an application is useful for maintaining a family budget or for accounting in small companies. The main problem of receipt recognition is the low quality of ink and printing paper, which results in creasing and tears, as well as the rapid fading of printed characters.Methods. The study is based on a number of algorithms based on mathematical morphology methods for opening, closing and morphological gradient operations, as well as image conversion, which can significantly improve the final recognition of characters by Tesseract.Results. In order to solve this problem, a special image normalisation algorithm is proposed, which includes locating a receipt on an image, processing the received image section, removing image capture and carrier defects, as well as point processing for restoring missing characters. The developed application supports increased recognition accuracy of text information when using Tesseract OCR.Conclusion. The developed system recognises characters with fairly high accuracy, demonstrates a result that is better than that obtained when using the unmodified Tesseract method, but which is still inferior to the recognition accuracy of ABBY FineReader. Methods are also been proposed aimed at improving the developed algorithm.
- Subjects :
- Technology
Morphological gradient
Artificial neural network
Point (typography)
Computer science
business.industry
sales receipts
020207 software engineering
Pattern recognition
Image processing
02 engineering and technology
Mathematical morphology
neural networks
Image conversion
image processing
ocr
image analysis
0202 electrical engineering, electronic engineering, information engineering
ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
020201 artificial intelligence & image processing
Tesseract
Artificial intelligence
Closing (morphology)
business
Subjects
Details
- Language :
- Russian
- ISSN :
- 20736185
- Volume :
- 46
- Issue :
- 4
- Database :
- OpenAIRE
- Journal :
- Vestnik Dagestanskogo Gosudarstvennogo Tehničeskogo Universiteta: Tehničeskie Nauki
- Accession number :
- edsair.doi.dedup.....c0db8e6adbde91e59f5affe076949d6c