A Novel Method for Image to Text Extraction Using Tesseract-OCR

Authors :: Sayan Kumar Garai
Ojaswita Paul
Upayan Dey
Sayan Ghoshal
Neepa Biswas
Sandip Mondal
Source :: American Journal of Electronics & Communication. 3:8-11
Publication Year :: 2022
Publisher :: Society for Makers, Artist, Researchers and Technologists, 2022.
Abstract: Text extraction process can play a vital role for detecting valuable information from a selected image. This text extraction process involves text detection, localization, marking, tracking, extraction, enhancement and finally recognition task. It is a difficult task to detect these text characters, because of their variation of size, style, font, orientation, alignment, contrast, color and textured background. There is a growing demand of information detection, indexing and retrieval from various multimedia documents nowadays. Several methods have been developed for extraction of text from an image. This article proposes a novel method for image to text extraction. In this paper, we are presenting a multiresolution morphology based text segmentation process suitable for various types of non-text elements like drawing, pictures, halftones or etc. For image processing, python library OpenCV is used and for text extraction Tessaract is used. Python Imaging Library (PIL) is capable to handle the opening and manipulation of images in many formats in Python. Also we are in testing of such an application that can give output in every language correctly.