A research library with integer compression schemes. It is broadly applicable to the compression of arrays of 32-bit integers where most integers are

lemire / FastPFor

submited by
Style Pass
2021-07-20 10:30:04

A research library with integer compression schemes. It is broadly applicable to the compression of arrays of 32-bit integers where most integers are small. The library seeks to exploit SIMD instructions (SSE) whenever possible.

This library can decode at least 4 billions of compressed integers per second on most desktop or laptop processors. That is, it can decompress data at a rate of 15 GB/s. This is significantly faster than generic codecs like gzip, LZO, Snappy or LZ4.

It is used by the zsearch engine as well as in GMAP and GSNAP. It has been ported to Java, C# and Go. The Java port is used by ClueWeb Tools.

Fact: This is not true. Our fastest scheme (SIMDBinaryPacking) works over blocks of 128 integers. Another very fast scheme (Stream VByte) works over blocks of four integers.

If you are working primarily with sorted lists of integers, then you might want to use differential coding. That is you may want to compress the deltas instead of the integers themselves. The current library (fastpfor) is generic and was not optimized for this purpose. However, we have another library designed to compress sorted integer lists:

Leave a Comment
Related Posts