This module provides a PragzipFile class, which can be used to seek inside gzip files without having to decompress them first. Alternatively, you can

GitHub - mxmlnkn/pragzip: Parallel Random Access Gzip Decoder and Library

submited by
Style Pass
2022-08-06 11:30:07

This module provides a PragzipFile class, which can be used to seek inside gzip files without having to decompress them first. Alternatively, you can use this simply as a parallelized gzip decoder as a replacement for Python's builtin gzip module in order to fully utilize all your cores.

The random seeking support is the same as provided by indexed_gzip but further speedups are realized at the cost of higher memory usage thanks to a least-recently-used cache in combination with a parallelized prefetcher.

The first call to seek will ensure that the block offset list is complete and therefore might create them first. Because of this the first call to seek might take a while.

The creation of the list of gzip blocks can take a while because it has to decode the gzip file completely. To avoid this setup when opening a gzip file, the block offset list can be exported and imported.

Because pragzip can be used as a backend in ratarmount, you can use ratarmount to mount single gzip files easily. Furthermore, since ratarmount 0.11.0, parallelization is the default and does not have to be specified explicitly with -P.

Leave a Comment