Search this site:
How Deflater works (ctd)
On the previous page, we took a brief overview of dictionary compression in the DEFLATE algorithm. After applying dictionary compression, the DEFLATE algorithm applies a form of Huffman encoding, also known as Huffman compression.
It is worth understanding Huffman encoding because one of the options that the Deflater offers is to apply Huffman encoding only.
Huffman encoding is one of the most famous types of compression system. Even if it is not used on its own, it is often used in conjunction with another compression scheme (as indeed in the case here).
When does Huffman coding perform well?
The idea of Huffman encoding is to assign each symbol a representation that is inversely proportional to its frequency. But in reality, this is not usually possible.
The problem is that the exact number of bits requred for a given symbol's length to be inversely proportional to frequency could actually be a fractional number. Huffman codes are only actually optimal when the probabilities of symbols are negative powers of 2 (1/2, 1/4 etc): in other words, where there's one symbol occuring around 50% of the time, then another around 25% of the time, etc.
However, if you do have data distributed in this way, you may be able to take advantage of the Deflater's HUFFMAN_ONLY mode. You may also be able to transform your data so that the distribution of bytes is closer to the negative power of 2 relationship just described. On the next page, I discuss a simple transformation that generally improves text compression with the Deflater by between 10 and 20%.
Written by Neil Coffey. Copyright © Javamex UK 2012. All rights reserved.