Author: "Ubul, Kurban" / Journal: engineering letters - Searchworks@Jio Institute Digital Library Search Results

Author: Xing-kun Han, Aysa, Alimjan, Mamt, Hornisa, and Ubul, Kurban
Subjects: *OPTICAL character recognition, *DOCUMENT imaging systems, *IMAGE analysis, *TEXTURE analysis (Image processing), *CONTOURS (Cartography), *SUPPORT vector machines
Abstract: Document images of various scripts must be identified and processed in today's international environment. As the front-end technology of Optical Character Recognition (OCR), script identification is an indispensable part of automatic document image analysis. Aiming at the nature of rich texture features of document images, a 3-level Nonsubsampled Contourlet Transform (NSCT) was used to extract 30- dimensional texture features in this paper. A Support Vector Machine (SVM) and K Nearest Neighbor (KNN) classifier were used for classification. A total of 10,000 document images in 10 kinds of Central Asian scripts--Arabic, Russian, Tibetan, Chinese, Uyghur, English, Mongolian, Kyrgyzstan, Kazakhstan, and Turkish--were classified. The identification efficiency of SVM and KNN was analyzed and compared, with the result that the SVM classifier obtained 99.5% average accuracy, a higher accuracy than KNN, during the experiment. The validity of the proposed method was proved by comparing the Wavelet Transforms (WT) and Local Binary Patterns (LBP) of these two script-identification methods. [ABSTRACT FROM AUTHOR]
Published: 2017

Searchworks