Back to Search
Start Over
Arabic short Text compression
- Source :
- Journal of Computer Science. Jan, 2010, Vol. 6 Issue 1, p24, 5 p.
- Publication Year :
- 2010
-
Abstract
- Problem statement: Text compression permits representing a document by using less space. This is useful not only to save disk space, but more importantly, to save disk transfer and network transmission time. With the continues increase in the number of Arabic short text messages sent by mobile phones, the use of a suitable compression scheme would allow users to use more characters than the default value specified by the provider. The development of an efficient compression scheme to compress short Arabic texts is not a straight forward task. Approach: This study combined the benefits of pre-processing, entropy reduction through splitting files and hybrid dynamic coding: A new technique proposed in this study that uses the fact that Arabic texts have single case letters. Experimental tests had been performed on short Arabic texts and a comparison with the well known plain Huffman compression was made to measure the performance of the proposed schema for Arabic short text. Results: The proposed schema can achieve a compression ratio around 4.6 bits [byte.sup.-1] for very short Arabic text sequences of 15 bytes and around 4 bits [byte.sup.-1] for 50 bytes text sequences, using only 8 Kbytes overhead of memory. Conclusion: Furthermore, a reasonable compression ratio can be achieved using less than 0.4 KB of memory overhead. We recommended the use of proposed schema to compress small Arabic text with recourses limited. Key words: Short text compression, Huffman coding, Arabic language, dynamic hybrid coding<br />INTRODUCTION Text compression permits representing a document by using less space. This is useful not only to save disk space, but more importantly, to save disk transfer and network transmission [...]
Details
- Language :
- English
- ISSN :
- 15493636
- Volume :
- 6
- Issue :
- 1
- Database :
- Gale General OneFile
- Journal :
- Journal of Computer Science
- Publication Type :
- Academic Journal
- Accession number :
- edsgcl.216897438