Back to Search
Start Over
An efficient, font independent word and character segmentation algorithm for printed Arabic text
- Source :
- Journal of King Saud University: Computer and Information Sciences, Vol 34, Iss 1, Pp 1330-1344 (2022)
- Publication Year :
- 2022
- Publisher :
- Elsevier, 2022.
-
Abstract
- Characters segmentation is a necessity and the most critical stage in Arabic OCR system. It has attracted the interest of a wide range of researchers. However, the nature of the Arabic cursive script poses extra challenges that need further investigation. Therefore, having a reliable and efficient Arabic OCR system that is independent of font variations is highly required. In this paper, an indirect, font-in dependent word and character segmentation algorithm for printed Arabic text investigated. The proposed algorithm takes a binary line image as an input and produces a set of binary images consisting of one character or ligature as an output. The segmentation performed at two levels: a word segmentation performed in the first level, by employing a vertical projection at the input line image along with using Interquartile Range (IQR) method to differentiate between word gaps and within word gaps. A projection profile method used as a second level of segmentation along with a set of statistical and topological features, which are font-independent, to identify the correct segmentation points from all potential points. The APTI dataset used to test the proposed algorithm with a variety of font type, size, and style. The algorithm experimented on 1800 lines (approximately 24,816 words) with an average accuracy of 97.7% for words segmentation and 97.51% for characters segmentation.
- Subjects :
- General Computer Science
Computer science
Binary image
Text segmentation
020206 networking & telecommunications
Character segmentation
Cursive script
02 engineering and technology
QA75.5-76.95
Arabic OCR
Segmentation techniques
Character (mathematics)
Word segmentation
Baseline
Electronic computers. Computer science
Font
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Segmentation
Projection (set theory)
Cursive
Algorithm
Word (computer architecture)
Subjects
Details
- Language :
- English
- ISSN :
- 13191578
- Volume :
- 34
- Issue :
- 1
- Database :
- OpenAIRE
- Journal :
- Journal of King Saud University: Computer and Information Sciences
- Accession number :
- edsair.doi.dedup.....9093cbeb1d1227a17942f5dfd3f5faff