Back to Search
Start Over
PPM performance with BWT complexity: a fast and effective data compression algorithm
- Source :
- Proceedings of the IEEE. 88:1703-1712
- Publication Year :
- 2000
- Publisher :
- Institute of Electrical and Electronics Engineers (IEEE), 2000.
-
Abstract
- This paper introduces a new data compression algorithm. The goal underlying this new code design is to achieve a single lossless compression algorithm with the excellent compression ratios of the prediction by partial mapping (PPM) algorithms and the low complexity of codes based on the Burrows Wheeler Transform (BWT). Like the BWT-based codes, the proposed algorithm requires worst case O(n) computational complexity and memory; in contrast, the unbounded-context PPM algorithm, called PPM*, requires worst case O(n/sup 2/) computational complexity. Like PPM*, the proposed algorithm allows the use of unbounded contexts. Using standard data sets for comparison, the proposed algorithm achieves compression performance better than that of the BWT-based codes and comparable to that of PPM*. In particular, the proposed algorithm yields an average rate of 2.29 bits per character (bpc) on the Calgary corpus; this result compares favorably with the 2.33 and 2.34 bpc of PPM5 and PPM* (PPM algorithms), the 2.43 bpc of BW94 (the original BWT-based code), and the 3.64 and 2.69 bpc of compress and gzip (popular Unix compression algorithms based on Lempel-Ziv (LZ) coding techniques) on the same data set. The given code does not, however, match the best reported compression performance-2.12 bpc with PPMZ9-listed on the Calgary corpus results web page at the time of this publication. Results on the Canterbury corpus give a similar relative standing. The proposed algorithm gives an average rate of 2.15 bpc on the Canterbury corpus, while the Canterbury corpus web page gives average rates of 1.99 bpc for PPMZ9, 2.11 bpc for PPM5, 2.15 bpc for PPM7, 2.23 bpc for BZIP2 (a popular BWT-based code), and 3.31 and 2.53 bpc for compress and gzip, respectively.
- Subjects :
- Source code
Burrows–Wheeler transform
Computational complexity theory
Computer science
media_common.quotation_subject
Data_CODINGANDINFORMATIONTHEORY
Calgary corpus
Data set
Compression ratio
Electrical and Electronic Engineering
Algorithm
Caltech Library Services
Data compression
media_common
Coding (social sciences)
Subjects
Details
- ISSN :
- 15582256 and 00189219
- Volume :
- 88
- Database :
- OpenAIRE
- Journal :
- Proceedings of the IEEE
- Accession number :
- edsair.doi.dedup.....7420d1c47803ddf2efef861367e572e3
- Full Text :
- https://doi.org/10.1109/5.892706