Back to Search
Start Over
Turkish Text Compression via Characters Encoding.
- Source :
- Procedia Computer Science; 2020, Vol. 175, p286-291, 6p
- Publication Year :
- 2020
-
Abstract
- In this paper, we suggest an efficient conversion for Turkish character's string, from UTF-8 to ANSI character's coding for space-preserving. Likewise, we present a decoding method that transforms the encoded ANSI string back to its original format. Unlike the one-byte ANSI characters, some of the Turkish alphabets are being stored in 2 bytes size. All that space comes at a price. The developed sequential encoding technique will reduce the size of the text file. Moreover, the Turkish encoded text will retain its original form after decoding. According to our proposal, it is considered as a lossless text compression, where it's a common concern today. Thus, many parties have become interested in Unicode compression. Basically, our algorithm is mapping Unicode Turkish characters into ANSI, by using the available 8-bit legacy. [ABSTRACT FROM AUTHOR]
- Subjects :
- TEXT files
CHARACTER
ALGORITHMS
DATA compression
STATE-space methods
Subjects
Details
- Language :
- English
- ISSN :
- 18770509
- Volume :
- 175
- Database :
- Supplemental Index
- Journal :
- Procedia Computer Science
- Publication Type :
- Academic Journal
- Accession number :
- 144992613
- Full Text :
- https://doi.org/10.1016/j.procs.2020.07.042