Back to Search
Start Over
Automatic Multi-lingual Script Recognition Application
- Source :
- GEMA Online® Journal of Language Studies. 18:203-221
- Publication Year :
- 2018
- Publisher :
- Penerbit Universiti Kebangsaan Malaysia (UKM Press), 2018.
-
Abstract
- Document Image Analysis and Recognition (DIAR) technique is used to recognize text component and translate it into editable format. Scripts are a set of graphical representations used to express a particular writing system as well as subsets belonging to a particular writing system. The writing styles of more than one script family may then be adopted by one language, such as in the cases where the old Malay language (Jawi) adopts the Arabic script while the modern one adopts the Roman script. The seven major scripts used in this research are in handwritten style including Arabic, Devanagari, Hebrew, Thai, Greek, Cyrillic and Korean. Automatic Multi-lingual Script Recognition (AMSR) is one of the main challenges in DIAR domain. Currently, only few attempts have been made for automated script identification of off-line handwritten documents images. Most available AMSR applications only deal with printed documents and script types, and they neglect handwritten and multi-lingual documents. The objective of this study is to propose a multi-lingual AMSR framework. The research methodology consists of a proposed multilingual AMSR framework. The multilingual AMSR framework is tested on Multilingual-HW datasets, which contains more than seven international unconstraint handwritten scripts, using Grey-Level Co-occurrence Matrix and Local Binary Pattern. The average accuracy of both methods is about 97.01% and 85.29% respectively. This proposed multilingual AMSR is hoped to be beneficial to a group of community which requires automatic sorting multi-lingual documents. This research can also be extended to document forensic area or international relations agency to identify unknown native document.
- Subjects :
- Linguistics and Language
Literature and Literary Theory
Hebrew
business.industry
Computer science
Latin script
02 engineering and technology
computer.software_genre
Language and Linguistics
language.human_language
Writing style
Identification (information)
Writing system
Scripting language
Devanagari
ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
0202 electrical engineering, electronic engineering, information engineering
language
020201 artificial intelligence & image processing
Artificial intelligence
business
Arabic script
computer
Natural language processing
Subjects
Details
- ISSN :
- 25502131 and 16758021
- Volume :
- 18
- Database :
- OpenAIRE
- Journal :
- GEMA OnlineĀ® Journal of Language Studies
- Accession number :
- edsair.doi...........09e5c15f35f3ccb97165cc029c61be0a
- Full Text :
- https://doi.org/10.17576/gema-2018-1803-12