XZ/LZMA Worked Example Part 1: Range Coding | Nigel Tao

submited by
Style Pass
2024-04-18 22:30:04

XZ is a general purpose compression file format, achieving very good compression ratios (smaller compressed file sizes). Almost always better than gzip/deflate and usually better than bzip2. Newer formats like brotli and zstd are now pretty competitive (and also offer better compression or decompression speeds), depending on your test corpus, but XZ is still widely used.

To be pedantic, XZ is a container format and LZMA is the compression algorithm. The 7z and LZIP file formats aren’t XZ but can also use LZMA.

For further pedantry, XZ is the name of the file format (such files are conventionally named foobar.xz) but also the name of a git repository of software that implements that file format. liblzma and /usr/bin/xz are example artifacts built from that project.

A few weeks ago, a backdoor was discovered in xz/liblzma, targeting SSH servers since sshd can depend on libsystemd can depend on liblzma. Planting that backdoor exploited the build process, rather than a weakness in the file format or its C code implementation. Still, xz is having its 15 minutes of infamy and some of you might be curious about how LZMA compression actually works. How does it achieve such a good compression ratio?

Leave a Comment