Large Text Compression Benchmark

submited by
Style Pass
2022-09-05 03:30:09

This competition ranks lossless data compression programs by the compressed size (including the size of the decompression program) of the first 109 bytes of the XML text dump of the English version of Wikipedia on Mar. 3, 2006. About the test data.

The goal of this benchmark is not to find the best overall compression program, but to encourage research in artificial intelligence and natural language processing (NLP). A fundamental problem in both NLP and text compression is modeling: the ability to distinguish between high probability strings like recognize speech and low probability strings like reckon eyes peach. Rationale.

Open source compression improvements to this benchmark with certain hardware restrictions may be eligible for the Hutter Prize. Benchmark Results

Compressors are ranked by the compressed size of enwik9 (109 bytes) plus the size of a zip archive containing the decompresser. Options are selected for maximum compression at the cost of speed and memory. Other data in the table does not affect rankings. This benchmark is for informational purposes only. There is no prize money for a top ranking. Notes about the table: Program: The version believed to give the best compression. A | denotes a combination of 2 programs. Compression options: selected for what I believe gives the best compression. enwik8: compressed size of first 108 bytes of enwik9. This data is used for the Hutter Prize, and is also ranked here but has no effect on this ranking. enwik9: compressed size of first 109 bytes of enwiki-20060303-pages-articles.xml. decompresser size: size of a zip archive containing the decompression program (source code or executable) and all associated files needed to run it (e.g. dictionaries). A letter following the size has the following meaning: x = executable size. s = source code size (if available and smaller). d = size of a separate decompression program (separate from compression). For self extracting archives (SFX), the size is 0 because the decompresser and compressed data are combined into one file. For testing, if no zip file is supplied I create archives using InfoZIP 2.32 -9. (Prior to July 1, 2008 I used 7zip 4.32 -tzip -mx=9). Total size: total size of compressed enwik9 + decompresser size, ranked smallest to largest. Comp: compression rate in nanoseconds per byte on the largest file tested (e.g. seconds for enwik9). Speed is approximate and has no effect on ranking. A ~ means "very approximate". Not all tests are done on the same computer. Times reported are the smaller of process time (summed over processors if multi-threaded) or real time as measured with timer). If there is no note then the program was tested on a Compaq Presario 5440, 2.188 GHz, Athlon-64 3500+ in 32 bit Windows XP. An underlined time means that no better compressor is faster. Decomp: decompression time as above. If blank, decompression was not tested yet and ranking is pending verification that the output is identical. An underlined time means that no better compressor is faster. Mem: approximate memory used for compression in MB. Decompression uses the same or possibly less. There is some ambiguity whether a megabyte means 106 bytes or 220 bytes. The approximation is course enough that it doesn't matter. I use peak memory as measured with Windows Task Manager during compression (so if you really want to know, 1 MB = 1,024,000 bytes :) Memory does not include swap or temporary files. An underlined value means that no better compressor uses less memory. Alg: compression algorithm, referring to the method of parsing the input into symbols (strings, bytes, or bits) and estimating their probabilities (modeling) for choosing code lengths. Symbols may be arithmetic coded (fractional bit length for best compression), Huffman coded (bit aligned for speed), or byte aligned as a preprocessing step. Dict (Dictionary). Symbols are words, coded as 1 or 2 bytes, usually as a preprocessing step. LZ (Lempel Ziv). Symbols are strings. LZ77: repeated strings are coded by offset and length of previous occurrence. LZW (LZ Welch): repeats are coded as indexes into a dynamically built dictionary. ROLZ (Reduced Offset LZ): LZW with multiple small dictionaries selected by context. LZP (LZ predictive): ROLZ with a dictionary size of 1. on (Order-n, e.g. o0, o1, o2...): symbols are bytes, modeled by frequency distribution in context of last n bytes. PPM (Prediction by Partial Match): order-n, modeled in longest context matched, but dropping to lower orders for byte counts of 0. SR (Symbol Ranking): order-n, modeled by time since last seen. BWT (Burrows Wheeler Transform): bytes are sorted by context, then modeled by order-0 SR. ST (Sort Transform): BWT using stable sort with truncated string comparison. DMC (Dynamic Markov Coding): bits modeled by PPM. CM (Context Mixing): bits, modeled by combining predictions of independent models. LSTM (long short term memory): CM using neural network models. Tr: Transformer, CM using neural network with attention mechanism. Some compressors combine multiple steps such as Dict+PPM or LZP+DMC. I indicate the last stage before coding. Notes: Brief notes. See program descriptions for details. Usually this means the result was reported by somebody else on a different computer. Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note ------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ---- nncp v3.1 14,969,569 108,378,032 201,620 xd 108,579,652 212766 210970 6000 Tr 88 cmix v19 14,837,987 111,470,932 223,485 sd 111,694,417 605110 601825 25528 CM 83 tensorflow-compress v4 15,905,037 113,542,413 55,283 sd 113,597,696 291394 290803 45360 LSTM 94 cmix-hp 10 Jun 2021 15,957,339 113,712,798 0 xd 113,712,798 189420 194280 6873 CM 89 starlit 31 May 2021 15,215,107 114,951,433 0 xd 114,951,433 173953 171682 10233 CM 89 phda9 1.8 15,010,414 116,544,849 42,944 xd 116,587,793 86182 86305 6319 CM 83 paq8px_v206fix1 -12L 15,849,084 124,696,410 402,949 s 125,099,359 291916 294847 28151 CM 93 durilca'kingsize -m13000 -o40 -t2 16,209,167 127,377,411 407,477 xd 127,784,888 1398 1797 13000 PPM 31 cmve 0.2.0 -m2,3,0x7fed7dfd 16,424,248 129,876,858 307,787 x 130,184,645 1140801 19963 CM 81 paq8hp12any -8 16,230,028 132,045,026 330,700 x 132,375,726 37660 37584 1850 CM 41 drt|emma 1.23 16,523,517 134,164,521 1,358,251 xd 135,522,772 73006 67097 3800 CM 81 zpaq 6.42 -m s10.0.5fmax6 17,855,729 142,252,605 4,760 sd 142,257,365 6699 14739 14000 CM 61 drt|lpaq9m 9 17,964,751 143,943,759 110,579 x 144,054,338 868 898 1542 CM 41 mcm 0.83 -x11 18,233,295 144,854,575 79,574 s 144,934,149 394 281 5961 CM 72 nanozip 0.09a -cc -m32g -p1 -t1 -nm 18,594,163 148,545,179 783,642 x 149,328,821 1149 1141 32000 CM 74 xwrt 3.2 -l14 -b255 -m96 -s -e40000 -f200 18,679,742 151,171,364 52,569 s 151,223,933 2537 2328 1691 CM fp8 v3 -8 18,438,169 153,188,176 50,068 s 153,238,244 20605 22593 1192 CM 26 WinRK 3.03 pwcm +td 800MB SFX 18,612,453 156,291,924 99,665 xd 156,391,589 68555 800 CM 10 ppmonstr J -m1700 -o16 19,055,092 157,007,383 42,019 x 157,049,402 3574 ~3600 1700 PPM zcm 0.93 -m8 -t1 19,572,089 159,135,549 227,659 x 159,363,208 421 411 3100 CM 48 slim 23d -m1700 -o12 19,077,276 159,772,839 69,453 x 159,842,292 5232 ~5400 1700 PPM bwmonstr 0.02 20,307,295 160,468,597 69,401 x 160,537,998 331801 156147 590 BWT 30 nanozipltcb 0.09 20,537,902 161,581,290 133,784 x 161,715,074 64 30 3350 BWT 40 M03 1.1b 1000000000 20,710,197 163,667,431 50,468 x 163,717,899 457 406 5735 BWT 52 glza 0.10.1 -x -p3 20,356,097 163,768,203 69,935 s 163,838,138 8184 11.9 8205 Dict 67 bcm 0.14 c1000 20,736,614 163,885,873 74,569 x 163,960,442 162 153 5000 BWT 60 bsc 2.00 -b1000p 20,789,147 163,888,465 122,581 s 164,011,046 237 199 5095 BWT 39 bbb m1000 20,847,290 164,032,650 11,227 s 164,043,877 4524 2619 1401 BWT pcompress 3.1 -c libbsc -l14 -s1000m 20,769,968 163,391,884 1,370,611 x 164,762,495 359 74 3300 BWT 48 paq9a -9 19,974,112 165,193,368 13,749 s 165,207,117 3997 4021 1585 CM uda 0.300 19,393,460 166,272,261 11,264 x 166,283,525 25282 25174 180 CM BWTmix v1 c10000 20,608,793 167,852,106 9,565 x 167,861,671 1794 690 5000 BWT 49 lrzip 0.612 -z -L 9 -p 1 19,847,690 169,318,794 99,363 x 169,418,157 2987 2929 2700 CM 33 cm4_ext 20,188,048 170,566,799 204,782 x 170,771,581 4123 4130 1906 CM 26 M1x2 v0.6 7 enwik7.txt 20,723,056 172,212,773 38,467 s 172,251,240 711 715 1051 CM 26 cmm4 v0.1e 96 20,569,034 172,669,955 31,314 x 172,701,269 2052 2056 1321 CM lstm-compress v3 20,318,653 173,874,407 144,567 s 174,018,974 92342 91876 9 LSTM 83 ccmx 1.30 7 20,857,925 174,142,092 15,014 x 174,157,106 1313 1338 1332 CM bit 0.7 -p=5 20,823,204 174,425,039 62,493 x 174,487,532 2050 2100 663 CM 26 mcomp 2.00 -mw -M320m 21,103,670 174,388,351 172,531 x 174,560,882 473 399 1643 BWT 26 epmopt|epm r9 -m800 -n20 --fixedorder:12 19,713,502 174,817,424 141,101 x 174,958,525 3179 3376 800 PPM WinUDA 2.91 mode 3 (194 MB) 20,332,366 174,975,730 17,203 x 174,992,933 23610 23473 194 CM dark 0.51 -b333mf 21,169,819 175,471,417 34,797 x 175,506,214 533 453 1692 BWT FreeArc 0.40pre-4 -mppmd:1012m:o13:r1 20,931,605 175,254,732 748,202 x 176,002,934 1175 1216 1046 PPM hook v1.4 1700 21,990,502 176,648,663 37,004 x 176,685,667 741 695 1777 DMC 26 7zip 4.46a -m0=ppmd:mem=1630m:o=10 ... 21,197,559 178,965,454 0 xd 178,965,454 503 546 1630 PPM 23 rings 2.5 -m8 -t1 20,873,959 178,747,360 240,523 x 178,987,883 280 163 2518 BWT 48 pimple2 20,871,457 180,251,530 78,642 x 180,330,172 18474 17992 128 CM ash 04a /m700 /o10 19,963,105 180,735,542 11,137 x 180,746,679 6100 5853 700 CM bce3 22,729,148 180,732,702 19,889 s 180,752,591 1151 2444 5000 CM 71 ocamyd LTCB 1.0 -s0 -m3 21,285,121 182,359,986 21,030 x 182,381,016 108960~110000 300 DMC 6 bee 0.79 b0154 -m3 -d8 20,975,994 182,373,904 57,046 x 182,430,950 9295 9285 512 PPM uhbc 1.0 -m3 -b100m 20,930,838 182,918,172 56,242 x 182,974,414 1569 809 800 BWT smac 1.20 21,781,544 183,190,888 4,356 x 183,195,244 4249 4399 1542 CM 26 ppmd J1 -m256 -o10 -r1 21,388,296 183,964,915 11,099 s 183,976,014 880 895 256 PPM tc 5.2 dev 2 21,481,399 184,939,711 41,112 x 184,980,823 3637 3655 230 CM bwtsdc v1 23,414,955 185,709,858 8,421 s 185,718,279 2100 420 5213 BWT 47 fbc v1.1 333333334 22,554,133 185,975,548 23,576 x 185,999,124 451 415 1647 BWT 55 ppmvc v1.1 -m256 -o8 -r1 21,484,294 186,208,405 25,241 x 186,233,646 898 913 272 PPM chile 0.4 -b=244141 22,218,917 186,979,614 11,530 s 186,991,144 2513 512 1426 BWT bwtdisk 0.9.0 -b 2 -m 3500 24,725,277 190,004,306 169,579 s 190,173,885 1124 3500 BWT 48 CTXf 0.75 pre b1 -me 22,072,783 191,008,871 57,337 x 191,066,298 1112 1037 78 PPM m03exp 2005-02-15 32MB blocks 21,948,192 191,250,500 44,593 x 191,295,093 ~4800 ~2100 256 BWT Stuffit 12.0.0.17 -m=4 -l=16 -x=30 22,105,654 190,372,707 2,658,122 xd 193,030,829 628 658 1062 PPM plzma v3b c2 ... (see below) 24,206,571 193,240,160 101,221 x 193,341,381 8889 55 10110 LZ77 58 crook v0.1 -m1600 -O8 22,503,627 193,333,159 8,539 s 193,341,698 483 513 1641 PPM 26 ppmx 0.03 22,572,808 193,643,464 54,964 x 193,698,428 777 784 609 PPM 26 lzturbo 1.1 -49 -b1000 -p0 24,416,777 194,681,713 110,670 x 194,792,383 1920 9 14700 LZ77 59 enc 0.15 aq 22,156,982 195,604,166 94,888 x 195,699,054 6843 6868 50 CM comprolz 0.11.0-bugfix1 -b250 -f 22,813,215 196,651,379 29,453 x 196,680,832 984 308 688 ROLZ 26 sbc 0.970r2 -ad -m3 -b63 22,470,539 197,066,203 99,094 xd 197,165,297 1733 313 224 BWT xz 5.2.1--lzma2=preset=9e,dict=1GiB,lc=4,pb=0 24,703,772 197,331,816 36,752 xd 197,368,568 5876 20 6000 LZ77 73 WinRAR 3.60b3 -mc7:128t+ -sfxWinCon.sfx 22,713,569 198,454,545 0 xd 198,454,545 506 415 128 PPM quark v0.95r beta -m1 -d25 -l8 22,988,924 198,600,023 80,264 x 198,680,287 27952 217 534 LZ77 lzip 1.14-rc3 -9 -s512MiB 24,756,063 199,410,543 21,682 s 199,432,225 2409 21 5632 LZ77 57 comprox 0.11.0-bugfix1 -b250 -f -m100 23,064,386 199,515,912 34,176 x 199,550,088 917 153 688 LZ77 26 bssc 0.95 alpha -b16383 23,117,061 201,810,709 45,489 x 201,856,198 578 217 140 BWT 4 flashzip 1.0.0 -mx7 -b7 23,869,034 202,363,445 123,053 x 202,486,498 1296 122 802 ROLZ 26 lzham 1.0 -d29 -x 25,002,070 202,237,199 191,600 s 202,428,799 1096 6.6 7800 LZ77 70 csarc 3.3 -m5 -d1024m 24,516,202 203,995,005 69,848 s 204,064,853 621 22 2463 LZ77 48 packet 1.9 -mx -b512 -h8 24,968,492 204,195,438 261,967 x 204,457,405 974 14 2824 LZ77 48 uharc 0.6b -mx -md32768 23,911,123 208,026,696 73,608 xd 208,100,304 1666 1330 50 PPM TarsaLZP Jan 29 2012 24,751,389 208,867,187 13,081 s 208,880,268 203 ~2000 LZP 54 GRZipII 0.2.4 -b8m 23,846,878 208,993,966 41,645 s 209,035,641 312 216 58 BWT 4x4 0.2a 4t (grzip:m1:h18) 23,833,244 208,787,642 317,097 x 209,104,739 386 240 269 BWT rzm 0.07h 24,361,070 210,126,103 17,667 x 210,143,770 2336 81 160 ROLZ pim 2.50 best 24,303,638 210,124,895 330,901 x 210,455,796 764 ~764 88 PPM CTW 0.1 -d6 -n16M -f16M 23,670,293 211,995,206 43,247 x 212,038,452 19221 19524 144 CM boa 0.58b -m15 24,322,643 213,845,481 55,813 x 213,901,294 3953 ~4100 17 PPM yxz 0.11 -m9 -b7 -h6 25,754,856 214,317,684 131,062 x 214,448,746 642 77 1590 LZ 26 zstd 0.6.0 -22 --ultra 25,405,601 215,674,670 69,687 s 215,744,357 701 2.2 792 LZ77 76 tornado 0.6 -16 25,768,105 217,749,028 83,694 s 217,832,722 1482 9 1290 LZ77 48 LZPXj 1.2h 9 25,205,783 217,880,584 4,853 s 217,885,437 783 717 1316 PPM scmppm 0.93.3 -l 9 25,198,832 217,867,392 37,043 s 217,904,435 708 644 20 PPM acb 2.00c u 25,063,656 218,473,968 38,976 x 218,512,944 10656 10883 16 LZ77 26 crushm 25,013,576 218,656,416 30,097 x 218,686,513 617 649 39 CM 26 PX v1.0 24,971,871 219,091,398 3,054 s 219,094,452 1838 1809 66 CM 3 DGCA 1.10 default+SFX 25,203,248 219,655,072 0 xd 219,655,072 858 270 76 Squeez 5.20.4600 sqx2.0 32MB Ultra 25,118,441 220,004,873 91,019 xd 220,095,892 2575 116 365 fpaq2 25,287,775 221,242,386 3,429 s 221,245,815 20183 20186 131 CM TinyCM 0.1 9 25,913,605 221,773,542 12,553 x 221,786,095 1342 1330 1083 CM 26 dmc c 1800000000 25,320,517 222,605,607 2,220 s 222,607,827 676 721 1800 DMC lza 0.82b -mx9 -b7 -h7 26,396,613 222,808,457 285,766 x 223,094,223 449 9.7 2000 LZ77 48 brotli 18-Feb-2016 -q 11 -w 24 25,764,698 223,597,884 542,385 s 224,140,269 3400 5.9 437 LZ77 48 szip 1.12a -b41o16 26,120,472 227,586,463 31,708 x 227,618,171 1191 289 21 BWT 26 balz 1.13 ex 26,421,416 228,337,644 49,024 x 228,286,668 3700 190 206 ROLZ lzpm 0.11 9 26,501,542 229,083,971 46,824 x 229,130,795 15395 57 740 ROLZ qazar 0.0pre5 -l7 -d9 -x7 26,455,170 229,846,871 71,959 x 229,918,830 5738 903 105 LZP KuaiZip 2.3.2 x86 25,895,915 227,905,650 3,857,649 x 231,763,299 1061 47 197 LZ77 26 qc 0.050 -8 26,763,343 232,784,501 46,100 x 232,830,601 8218 1503 151 ppms J -o5 26,310,248 233,442,414 16,467 x 233,458,881 330 354 1.8 PPM dzo beta 26,616,115 235,056,859 618,883 x 235,675,742 1088 159 200 LZ77 26 comprox_ba 20110929 27,828,189 242,846,243 4,134 s 242,850,377 397 101 226 BWTS 48 WinTurtle 1.60 512 MB buffer 28,379,612 245,217,944 160,090 x 245,378,034 273 237 583 PPM diz 26,545,256 246,679,382 12,945 s 246,692,327 21240 22746 1350 PPM 26 cabarc 1.00.0601 -m lzx:21 28,465,607 250,756,595 51,917 xd 250,808,853 1619 15 20 LZ77 sr3 28,926,691 253,031,980 9,399 s 253,054,625 148 160 68 SR 26 bzip2 1.0.2 -9 29,008,736 253,977,839 30,036 x 254,007,875 379 129 8 BWT rh5_x64 -window:27 c6 29,078,552 254,220,469 36,744 x 254,257,213 196 9.4 145 ROLZ 48 RangeCoderC v1.7 c7 26 28,788,013 254,527,369 7,858 x 254,535,227 2460 2436 1116 CM 26 quad v1.11 -x 29,110,579 256,145,858 13,387 s 256,159,245 956 116 34 ROLZ WinACE -sfx -m5 -d4096 29,481,470 257,237,710 0 xd 257,237,710 1080 77 4 lzsr 0.01 29,433,834 258,912,605 40,287 x 258,952,892 194 88 6 LZ77 26 libzling 20160107 e4 29,721,114 259,475,639 35,582 s 259,511,221 83 27 28 ROLZ 48 xpv5 c2 29,963,217 262,525,246 14,371 x 262,539,617 2359 516 9 ROLZ 26 sr3c 1.0 29,731,019 266,035,006 7,701 x 266,042,707 160 145 5 SR 26 lzc v0.08 10 30,611,315 266,565,255 11,364 x 266,576,619 302 63 550 LZ77 nakamichi 2019-Jul-01 32,917,888 277,293,058 112,899 s 277,405,957 8200000 1.3 302000 LZSS 85 crush 1.00 cx 31,731,711 279,491,430 2,489 s 279,493,919 948 2.9 148 LZ77 60 xeloz 0.3.5.3 c889 32,441,272 283,621,211 18,771 s 283,639,982 1079 8 230 LZ77 48 bzp 0.2 31,563,865 283,908,295 36,808 x 283,945,103 110 120 3 LZP ha 0.98 a2 31,250,524 285,739,328 28,404 x 285,767,732 2010 1800 0.8 PPM ulz 0.06 c9 32,945,292 291,028,084 49,450 x 291,077,534 325 1.1 490 LZ77 82 irolz 33,310,676 292,448,365 4,584 s 292,452,949 274 144 17 ROLZ 26 lcssr 0.2 -b7 -l9 34,549,048 296,160,661 8,802 x 296,169,463 8186 8281 1184 SR zlite 33,975,840 298,470,807 4,880 s 298,475,687 61 28 36 ROLZ 26 lazy 1.00 5 35,024,082 306,245,949 5,986 s 306,251,935 273 24 96 LZ77 26 zhuff 0.97 beta -c2 34,907,478 308,530,122 63,209 x 308,593,331 24 3.5 32 LZ77 48 slug 1.27 35,093,954 309,201,454 6,809 x 309,208,263 32 28 14 ROLZ pigz 2.3 -11 35,002,893 309,812,953 52,717 s 309,865,670 2237 13 25 LZ77 48 kzip May 13 2006 /b1024 35,016,649 310,188,783 29,184 xd 310,217,967 6063 62 121 LZ77 2 uc2 rev 3 pro -tst 35,384,822 312,767,652 123,031 x 312,890,683 360 63 4 LZ77 thor 0.95 e4 35,795,184 314,092,324 49,925 x 314,142,249 64 34 16 LZP etincelle a3 35,776,971 314,801,710 44,103 x 314,845,813 29 18 976 ROLZ 26 lz5 1.3.3 -18 36,514,408 319,510,433 138,210 s 319,648,643 10578 3.7 1139 LZ77 48 gzip124hack 1.2.4 -9 36,273,716 321,050,648 62,653 x 321,113,301 149 19 1 LZ77 doboz 0.1 36,367,430 322,415,409 83,591 x 322,499,000 533 3.4 1200 LZ77 48 gzip 1.3.5 -9 36,445,248 322,591,995 38,801 x 322,630,796 101 17 1.6 LZ77 Info-ZIP 2.3.1 -9 36,445,373 322,592,120 57,583 x 322,649,703 104 35 0.1 LZ77 pkzip 2.0.4 -ex 36,556,552 323,403,526 29,184 xd 323,432,710 171 50 2.5 LZ77 jar (Java) 0.98-gcc cvfM 36,520,144 323,747,582 19,054 x 323,766,636 118 95 1.2 LZ77 PeaZip better, no integrity check 36,580,548 323,884,274 561,079 x 324,445,353 243 243 8 LZ77 20 arj 3.10 -m1 37,091,317 328,553,982 143,956 x 328,697,938 262 67 3 LZ77 26 lzgt3a 37,444,440 334,405,713 4,387 xd 334,410,100 1581 2886 2 LZ77 lzuf Apr.15.2009 38,036,810 338,488,945 4,070 xd 338,493,015 446 40 2 LZ77 26 pucrunch -d -c0 39,199,165 350,265,471 34,359 s 350,299,830 2649 463 2 LZ77 packARC v0.7RC11 -sfx -np 38,375,065 361,905,425 0 xd 361,905,425 1359 1486 23 CM urban 38,215,763 362,677,440 4,280 s 362,681,720 381 450 6 o2 48 lzop v1.01 -9 41,217,688 366,349,786 54,438 x 366,404,224 289 12 1.8 LZ77 lzw 0.2 41,960,994 367,633,910 671 s 367,634,581 3597 31 18 LZW MTCompressor v1.0 41,295,546 370,152,396 3,620 x 370,156,016 173 117 74 LZ77 26 lz4x 1.02 c4 41,950,112 372,068,437 48,609 x 372,117,046 79 1.4 114 LZ77 68 arbc2z 38,756,037 379,054,068 6,255 sd 379,060,323 2659 2674 68 PPM lz4 v1.2 -c2 42,870,164 379,999,522 49,128 x 380,048,650 91 6 20 LZ77 26 lzss 0.02 cx 42,874,387 380,192,378 48,114 x 380,240,492 107 2.3 145 LZSS 63 xdelta 3.0u -9 44,288,463 389,302,725 107,985 x 389,410,710 1021 30 47 LZ77 brieflz 1.1.0 43,300,800 390,122,722 14,907 s 390,137,629 21 7.5 3 LZ77 48 mtari 0.2 41,655,528 397,232,608 4,156 s 397,236,764 80 99 18 CM 26 lzf 1.02 cx 45,198,298 406,805,983 48,359 x 406,854,342 68 2.2 151 LZ77 68 srank 1.1 -C8 43,091,439 409,217,739 6,546 x 409,224,285 51 45 2 SR QuickLZ 1.30b (quick3) 46,378,438 410,633,262 44,202 x 410,677,464 48 12 3 LZ77 stz 0.7.2 -c2 47,192,312 416,524,596 41,941 x 416,566,537 14 13 3 LZ77 26 compress 4.3d 45,763,941 424,588,663 16,473 x 424,605,136 103 70 1.8 LZW lzrw3-a 48,009,194 438,253,704 4,750 x 438,258,454 38 17 2 LZ77 fcm1 45,402,225 447,305,681 1,116 s 447,306,797 228 261 1 CM1 runcoder1 46,883,939 458,125,932 5,488 s 458,131,420 140 156 4 o1 26 data-shrinker 23Mar2012 51,658,517 459,825,318 3,706 s 459,829,024 14 4 2 LZ77 26 lzwc_bitwise 0.7 46,639,414 463,884,550 4,183 x 463,888,733 123 134 71 LZW 26 exdupe 0.3.3 53,717,422 478,788,378 1,092,986 x 479,881,364 27 5 1000 LZ77 48 lzv 0.1.0 54,950,847 488,436,027 10,385 x 488,446,412 4 2.6 3 LZ77 48 FastLZ Jun 12 2007 54,658,924 493,066,558 7,065 xd 493,073,623 18 13 1 LZ77 sharc 0.9.11b -c2 53,175,042 494,421,068 81,001 s 494,502,069 15 14 6 LZP 26 flzp v1 57,366,279 497,535,428 3,942 s 497,539,370 78 38 8 LZP alba 0.5.1 cd 52,728,620 515,760,096 4,870 s 515,764,966 239 10 4 BPE 48 snappy 1.0.1 58,350,605 527,772,054 23,844 s 527,795,898 25 12 0.1 LZ77 26 bpe 5000 4096 200 3 53,906,667 532,250,688 1,037 sd 532,251,725 639 28 0.5 Dict 26 kwc 54,097,740 532,622,518 15,186 x 532,637,704 438 145 668 Dict 26 bpe2 v3 55,289,197 542,748,980 2,979 s 542,751,959 518 132 0.5 Dict 26 fpaq0f2 56,916,872 558,645,708 3,066 x 558,648,769 222 207 0.4 o0 ppp 61,657,971 579,352,307 1,472 s 579,353,779 80 59 1 SR ksc 4 59,511,259 580,557,413 13,507 x 580,570,920 40050 7917 1700 SR 48 lzbw1 0.8 67,620,436 590,235,688 21,751 x 590,257,439 15 12 55 LZP 26 lzp2 0.7c 67,909,076 598,076,882 40,819 x 598,117,701 11 8 15 LZP 26 NTFS LZNT1 76,955,648 636,870,656 0 636,870,656 10 9 0.1 LZ77 26 shindlet_fs 62,890,267 637,390,277 1,275 xd 637,391,552 113 103 0.6 o0 arb255 63,501,996 644,561,595 4,871 sd 644,566,466 2551 2574 1.6 o0 compact 63,862,371 648,370,029 3,600 sd 648,373,629 216 164 0.2 o0 TinyLZP 0.1 79,220,546 694,274,932 2,811 s 694,277,743 32 38 10 LZP 26 smile 71,154,788 695,562,502 207 xd 695,562,709 10517 10414 0.6 MTF 26 barf (2 passes) 76,074,327 758,482,743 983,782 s 759,466,525 756 53 4 LZ77 arb2x v20060602 99,642,909 995,674,993 3,433 sd 995,678,426 2616 2464 1.6 o0b Fails on enwik9 Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note ------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ---- hipp 5819 /o8 20,555,951 (fails) 36,724 x 5570 5670 719 CM ppmz2 23,557,867 (fails) 29,362 s 92210 88070 1497 PPM 26 XMill 0.8 -w -P -9 -m800 26,579,004 (fails) 114,764 xd 616 530 800 PPM lzp3o2 33,041,439 (fails) 23,427 xd 230 270 151 LZP

Leave a Comment