Back to Search Start Over

Turkish Text Compression via Characters Encoding.

Authors :
Hilal, Tariq Abu
Hilal, Hasan Abu
Source :
Procedia Computer Science; 2020, Vol. 175, p286-291, 6p
Publication Year :
2020

Abstract

In this paper, we suggest an efficient conversion for Turkish character's string, from UTF-8 to ANSI character's coding for space-preserving. Likewise, we present a decoding method that transforms the encoded ANSI string back to its original format. Unlike the one-byte ANSI characters, some of the Turkish alphabets are being stored in 2 bytes size. All that space comes at a price. The developed sequential encoding technique will reduce the size of the text file. Moreover, the Turkish encoded text will retain its original form after decoding. According to our proposal, it is considered as a lossless text compression, where it's a common concern today. Thus, many parties have become interested in Unicode compression. Basically, our algorithm is mapping Unicode Turkish characters into ANSI, by using the available 8-bit legacy. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
18770509
Volume :
175
Database :
Supplemental Index
Journal :
Procedia Computer Science
Publication Type :
Academic Journal
Accession number :
144992613
Full Text :
https://doi.org/10.1016/j.procs.2020.07.042