1. Script Identification of Central Asian Printed Document Images based on Nonsubsampled Contourlet Transform.
- Author
-
Xing-kun Han, Aysa, Alimjan, Mamt, Hornisa, and Ubul, Kurban
- Subjects
- *
OPTICAL character recognition , *DOCUMENT imaging systems , *IMAGE analysis , *TEXTURE analysis (Image processing) , *CONTOURS (Cartography) , *SUPPORT vector machines - Abstract
Document images of various scripts must be identified and processed in today's international environment. As the front-end technology of Optical Character Recognition (OCR), script identification is an indispensable part of automatic document image analysis. Aiming at the nature of rich texture features of document images, a 3-level Nonsubsampled Contourlet Transform (NSCT) was used to extract 30- dimensional texture features in this paper. A Support Vector Machine (SVM) and K Nearest Neighbor (KNN) classifier were used for classification. A total of 10,000 document images in 10 kinds of Central Asian scripts--Arabic, Russian, Tibetan, Chinese, Uyghur, English, Mongolian, Kyrgyzstan, Kazakhstan, and Turkish--were classified. The identification efficiency of SVM and KNN was analyzed and compared, with the result that the SVM classifier obtained 99.5% average accuracy, a higher accuracy than KNN, during the experiment. The validity of the proposed method was proved by comparing the Wavelet Transforms (WT) and Local Binary Patterns (LBP) of these two script-identification methods. [ABSTRACT FROM AUTHOR]
- Published
- 2017