Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

Whats wrong with using standard compression algorithms such as Huffman coding or patent-free variants of LZW?

April 26, 2017algorithms coding compression Huffman lzw standard variants wrong

0

10 Posted

Whats wrong with using standard compression algorithms such as Huffman coding or patent-free variants of LZW?

1 Answer

0

10 Posted

SCSU bridges the gap between an 8-bit based LZW and a 16-bit encoded Unicode text, by removing the extra redundancy that is part of the encoding (sequences of every other byte being the same) and not a redundancy in the content. The output of SCSU should be sent to LZW for block compression where that is desired. To get the same effect with one of the popular general purpose algorithms, like Huffman or any of the variants of Lempel-Ziv compression, it would have to be retargeted to 16-bit, losing effectiveness due to the larger alphabet size. It’s relatively easy to work out the math for the Huffman case to show how many extra bits the compressed text would need just because the alphabet was larger. Similar effects exist for LZW. For a detailed discussion of general text compression issues see the book Text Compression by Bell, Cleary and Witten (Prentice Hall 1990).