Back to Search Start Over

Language identification in historical Afghan manuscripts

Authors :
Faisal Farooq
Venu Govindaraju
Source :
ISSPA
Publication Year :
2007
Publisher :
IEEE, 2007.

Abstract

Automatic language identification is an important step prior to optical character recognition (OCR). In this paper we present a system to discriminate between Arabic and Persian in historical Afghan manuscripts. The classification is performed at a sub-sentence level. We propose a feature extraction algorithm for a sub-sentence based on Gabor filters followed by classification using a support vector machine (SVM). An overall precision of 96.72% and 94.90% is obtained for Persian and Arabic respectively.

Details

Database :
OpenAIRE
Journal :
2007 9th International Symposium on Signal Processing and Its Applications
Accession number :
edsair.doi...........7b56af2ac3ecb7dbb10245d01df5d923