1. Co-occurrence Matrix of Oriented Gradients for Word Script and Nature Identification
- Author
-
Asma Saidani, Afef Kacem, Abdel Belaïd, Technologie de l'Information et de la Communication (UTIC), École Supérieure des Sciences et Technologies de Tunis, Recognition of writing and analysis of documents (READ), Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Computer science ,Arabic ,02 engineering and technology ,computer.software_genre ,k-nearest neighbors algorithm ,Handwriting ,Classifier (linguistics) ,0202 electrical engineering, electronic engineering, information engineering ,[INFO]Computer Science [cs] ,business.industry ,020207 software engineering ,Pattern recognition ,language.human_language ,Support vector machine ,Co-occurrence matrix ,Identification (information) ,ComputingMethodologies_PATTERNRECOGNITION ,Scripting language ,language ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Word (computer architecture) ,Natural language processing - Abstract
In this paper, we propose a new scheme for script and nature identification. The objective is to discriminate between machine-printed/handwritten and Latin/Arabic scripts at word level. It is relatively a complex task due to possible use of multi-fonts and sizes, complexity and variation in handwriting. In the proposed script identification system, we extract features from word images using Co-occurrence Matrix of Oriented Gradients (Co-MOG). The classification is done using different classifiers. Extensive experimentation has been carried on 24000 words, extracted from standard databases. An average identification accuracy of 99.85% is achieved by k Nearest Neighbors (k-NN) classifier which clearly outperforms results of some existing systems.
- Published
- 2015
- Full Text
- View/download PDF