Entropy coding in Oodle Data: the big picture

submited by
Style Pass
2021-07-10 17:30:06

April 26, 2016 was the release date of Oodle 2.1.5 which introduced Kraken, so it celebrated its 5-year anniversary recently. A few months later we launched Selkie and Mermaid, which were already deep in development at the time of the Kraken release but not quite finished yet, and in early 2018 we added Leviathan, adding a higher-compression option to our current suite of codecs, the “sea beastiary”.

There’s a few interesting things to talk about, and the 5-year anniversary seems as good a reason as any to get started; this is the first in what will be a series of yet to be determined length, in which I’ll do a deep-dive on one interesting aspect of these codecs, namely the way they handle entropy coding.

Before I can get into any details, first some general notes on how things fit together. Kraken, Mermaid, Selkie, and Leviathan are lossless data compression algorithms, all variations on the basic LZ77 + entropy coding formula that has been the de facto standard in general-purpose compressors since the late 80s, because such codecs can achieve a good balance of compression ratio and compression/decompression speed for practical applications. Other codecs belonging to this family include include Deflate (ZIP/gzip), LZX (Amiga LZX/CAB), LZMA (7zip/xz), Zstd, Brotli, LZHAM, and many others, including most of the older Oodle Data codecs (Oodle LZH/LZHLW, LZA, LZNA, and BitKnit).

The “LZ77” portion here refers to the LZ77 algorithm, which, broadly speaking, compresses data by replacing repeated byte sequences in a stream with references to prior occurrences; as long as the back-reference is smaller than the bytes themselves, this will result in compression. Nobody actually uses the original LZ77 algorithm per se (which has a very inefficient encoding by today’s standards), but the general idea remains the same. Some codecs, especially ones designed for faster decoding or encoding, use just this byte/string matching approach without any entropy decoding; this includes many well-known faster codecs such as LZ4, Snappy, LZO, and again many others, including the remaining older Oodle Data codecs (Oodle LZB/LZBLW and LZNib).

Leave a Comment