Back to Search Start Over

Script Identification of Central Asian Printed Document Images based on Nonsubsampled Contourlet Transform.

Authors :
Xing-kun Han
Aysa, Alimjan
Mamt, Hornisa
Ubul, Kurban
Source :
Engineering Letters. Dec2017, Vol. 25 Issue 4, p389-395. 7p.
Publication Year :
2017

Abstract

Document images of various scripts must be identified and processed in today's international environment. As the front-end technology of Optical Character Recognition (OCR), script identification is an indispensable part of automatic document image analysis. Aiming at the nature of rich texture features of document images, a 3-level Nonsubsampled Contourlet Transform (NSCT) was used to extract 30- dimensional texture features in this paper. A Support Vector Machine (SVM) and K Nearest Neighbor (KNN) classifier were used for classification. A total of 10,000 document images in 10 kinds of Central Asian scripts--Arabic, Russian, Tibetan, Chinese, Uyghur, English, Mongolian, Kyrgyzstan, Kazakhstan, and Turkish--were classified. The identification efficiency of SVM and KNN was analyzed and compared, with the result that the SVM classifier obtained 99.5% average accuracy, a higher accuracy than KNN, during the experiment. The validity of the proposed method was proved by comparing the Wavelet Transforms (WT) and Local Binary Patterns (LBP) of these two script-identification methods. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
1816093X
Volume :
25
Issue :
4
Database :
Academic Search Index
Journal :
Engineering Letters
Publication Type :
Academic Journal
Accession number :
126541789