Back to Search
Start Over
Script Identification of Central Asian Printed Document Images based on Nonsubsampled Contourlet Transform.
- Source :
-
Engineering Letters . Dec2017, Vol. 25 Issue 4, p389-395. 7p. - Publication Year :
- 2017
-
Abstract
- Document images of various scripts must be identified and processed in today's international environment. As the front-end technology of Optical Character Recognition (OCR), script identification is an indispensable part of automatic document image analysis. Aiming at the nature of rich texture features of document images, a 3-level Nonsubsampled Contourlet Transform (NSCT) was used to extract 30- dimensional texture features in this paper. A Support Vector Machine (SVM) and K Nearest Neighbor (KNN) classifier were used for classification. A total of 10,000 document images in 10 kinds of Central Asian scripts--Arabic, Russian, Tibetan, Chinese, Uyghur, English, Mongolian, Kyrgyzstan, Kazakhstan, and Turkish--were classified. The identification efficiency of SVM and KNN was analyzed and compared, with the result that the SVM classifier obtained 99.5% average accuracy, a higher accuracy than KNN, during the experiment. The validity of the proposed method was proved by comparing the Wavelet Transforms (WT) and Local Binary Patterns (LBP) of these two script-identification methods. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 1816093X
- Volume :
- 25
- Issue :
- 4
- Database :
- Academic Search Index
- Journal :
- Engineering Letters
- Publication Type :
- Academic Journal
- Accession number :
- 126541789