Turkish Text Compression via Characters Encoding.

Authors :: Hilal, Tariq Abu
Hilal, Hasan Abu
Source :: Procedia Computer Science; 2020, Vol. 175, p286-291, 6p
Publication Year :: 2020
Abstract: In this paper, we suggest an efficient conversion for Turkish character's string, from UTF-8 to ANSI character's coding for space-preserving. Likewise, we present a decoding method that transforms the encoded ANSI string back to its original format. Unlike the one-byte ANSI characters, some of the Turkish alphabets are being stored in 2 bytes size. All that space comes at a price. The developed sequential encoding technique will reduce the size of the text file. Moreover, the Turkish encoded text will retain its original form after decoding. According to our proposal, it is considered as a lossless text compression, where it's a common concern today. Thus, many parties have become interested in Unicode compression. Basically, our algorithm is mapping Unicode Turkish characters into ANSI, by using the available 8-bit legacy. [ABSTRACT FROM AUTHOR]