Zstandard is awesome, there is nothing else close to it that I've seen. There are plenty of codecs that give you either obscene ratios or low CPU usage, but none that support both in combination. Zstandard on its highest setting easily competes with xzip for anything I've thrown at it. Decompression throughput is unaffected by compression setting -- it just burns more memory. So you can get xzip-like ratios with decompression throughput approaching 1GB/sec
It allows trading lower CPU for gzip-or-worse compression, and you can mix the settings within a single file. This means you can e.g. use the lowest setting (or no compression at all - it supports that too) to append to a file, while occasionally recompressing recent appends into a single block using the highest setting -- so the cost of compression can be amortized
The only petty annoyance with it is ecosystem support - e.g. GNU tar has no option for it, so it's slightly more painful to work with
tar-1.3.1 added support for zstd with the option `--zstd` and `-a`, the auto-decompression flag, also supports zstd. Older tar versions also have the `-I` flag which you can use to (de)compress with zstd.
We're working to improve zstd support in the ecosystem over time, but this work moves slowly, and it takes a long time for upstream work to make it to users systems, especially LTS systems.
Starting with zstd-1.3.8 we support the `ZSTD_CLEVEL` environment variable. We've started with a small scope for the variable, because we don't want users to unexpectedly remove the source fie, for instance.
If you want to pass extra options, you can pipe the output to zstd, which is exactly what tar is doing internally.
My understanding is that Oodle generally performs better (less CPU on both ends for any desired compression ratio) than ZStd in pretty much every context,
I don't see comparisons against any version of Zstandard, much less Zstandard 1.4, there.
(Besides, regardless of the state of benchmarks today, the momentum is clearly with Zstandard, with Facebook, Intel, and the open source community behind it. The basic lossless compression algorithms haven't changed much since the late 1970s. Making compression fast is mostly just a long slog of engineering hurdles, the kind that big companies are very good at doing.)
> regardless of the state of benchmarks today, the momentum is clearly with Zstandard, with Facebook, Intel, and the open source community behind it.
This argument by hand-wave doesn’t match up with demonstrated progress over the past few years.
Irrespective of the skill or insight of its individual engineers, I would be surprised if a schizophrenic hack-it-with-duct-tape kind of engineering culture like Facebook could keep up with a focused and motivated expert like cbloom over the medium term (though I suppose the latter could conceivably at some point lose interest in the domain and switch to building something else).
I've seen this play out before. The history of jemalloc, particularly how it rapidly outpaced virtually every other allocator while under development at Facebook, suggests otherwise. At this point jemalloc is so good that there is little reason other than NIH to use anything else (unless you want an especially hardened allocator for security and are willing to give up some performance). Even Google uses it in Android.
Zstandard is likewise deservedly on track to dominate the lossless compression space.
> [Ruby's] memory usage only reduces when using jemalloc 3; memory usage is still high when using jemalloc 5. Nobody knows why, so that makes the choice of defaulting to jemalloc very dodgy.
Zstd is from Facebook, sure, but to be more specific its development is lead by Yann Collet (of LZ4 fame) who is unquestionably a focused and motivated expert.
But it’s not WinZip or WinRAR. This is a product marketed for professional use(r)s. How could it ever become more popular than something you just have?
Could be that Zstandard outperformes it but I also have similar good experience with brotli.
Good compression speed ratio (use q 1 for brotli lower than 0.6):
tar -I"brotli -q 2" -cvf file.tar.br inputfiles
Decompression:
tar -Ibrotli -xvf file.tar.br
It outperformes gzip and is near xz region without a lot of CPU power and is also blazing fast. Really useful if you have e.g. a big PostgreSQL db dump which you want to transfer to your own machine. Examples for Postgres dumping and restoring:
I got to experiment with it on some pretty poor spec MIPS processors not long after it was made public. Even there, on architecture it wasn't designed of specifically optimised for, it seriously outperformed competitors.
In my experience, for the best balance between compression speed and compression ratio, nothing beats 7zip with the right options:
-mmt=$(nproc) # use all available cores
-ms=off # disable solid archives (compress each file separately)
-m0=lzma2 # lzma2 has better threading than lzma1
-md=64m # dictionary size
-ma=0 # "fast" mode
-mmf=hc4 # hash chain match finder
-mfb=64 # number of "fast bits"
-mf=off # disable filters
The biggest gains are: 1) using all available cores, 2) setting the match finder (the binary tree match finders are terribly slow; I haven't played much with the newer patricia tree match finders), 3) disabling solid archives (this seems to cause 7zip to distribute the work more evenly between cores, though it still may only use a few cores if there are many small files), 4) using "fast" mode (whatever that is, it gives a noticeable performance boost and doesn't seem to affect compression ratio much).
Every few years I try zstd and others, and for the data I work with (primarily a mix of json and fixed-width-field binary data), lots of tools beat 7zip out of the box, but they fall short of 7zip with the above command-line options.
A comparable zstd call that uses a 64 MB window size and all cores is:
zstd --long=26 -T0
From there you can tune the compression level, or increase the window size up to 2 GB (--long=31). zstd won't beat the compression of xz, but it can compress much faster if you trade off some space.
Perhaps it's the use of a dictionary? As far as I'm aware, tar, zstd, xz do not use one by default, it's an extra set of hoops to create a training set, create the dictionary, use it for compression, and pack it away somewhere so that it's available for decompression, and then actually use it for decompression. If that's all being done by 7zip just by passing -md=64m that's pretty cool.
Edit: Ahh, I was confused. Neither require a separate training step. Zstd offers an option to do a training step. Both always use dictionaries with a default size that can optionally be changed.
$ time 7zr a -mmt=$(nproc) -ms=off -m0=lzma2 -md=64m -ma=0 -mmf=hc4 -mfb=64 -mf=off linux-5.0.8.tar{.7z,}
real 60.49 user 158.94 sys 3.06 maxrss 8995040
$ stat -c '%s %n' linux-5.0.8.tar.7z
127700475 linux-5.0.8.tar.7z
$ time 7zr e -so linux-5.0.8.tar.7z >/dev/null
real 14.09 user 13.96 sys 0.12 maxrss 282208
Basically:
- it took twice the time to compress data even compared to xz -2 (which also uses lzma2 under the hood),
- it is comparable to zstd/bzip2 ratio-wise,
- it used almost 6 times (!) more RAM than even zstd -12 --long,
- it only used about 2.5 CPU cores out of 4 while compressing (which aligns pretty well with your reasoning for using -ms=off).
----
But hey, source code is not that regular. Since you mentioned JSON and fixed-width-field binary data, I decided to re-run benchmarks on 10M lines of nginx access logs: they're way more regular in their structure (repetitive URLs, timestamps, Mozilla/5.0, stuff like that) that might benefit from larger window sizes.
$ time lbzip2 -k access-log-10m.log
real 90.59 user 313.04 sys 18.46 maxrss 117904
$ time ~/zstd-1.4.0/zstd -T0 -k -12 access-log-10m.log -o access-log-10m.log.zst-12
real 77.34 user 277.21 sys 1.55 maxrss 886416
$ time ~/zstd-1.4.0/zstd -T0 -k -12 --long access-log-10m.log -o access-log-10m.log.zst-12-long
real 69.24 user 242.18 sys 1.85 maxrss 1411872
$ time 7zr a -mmt=$(nproc) -ms=off -m0=lzma2 -md=64m -ma=0 -mmf=hc4 -mfb=64 -mf=off access-log-10m.log{.7z,}
real 109.10 user 356.42 sys 4.69 maxrss 9777664
$ stat -c '%s %n' access-log-10m.log* | sort -n
208537395 access-log-10m.log.bz2
231953002 access-log-10m.log.zst-12-long
237566691 access-log-10m.log.zst-12
249412192 access-log-10m.log.7z
3386733539 access-log-10m.log
Now tweaked 7z did better CPU- and time-wise, but it's still behind zst and bz2 on every metric, especially RAM which it requires so much of (literally gigabytes) it becomes impractical in a number of situations. And we needed a pretty regular input (not just some pretty compressible text like source code or Wikipedia dump) to close that gap. So I can't really recommend your suggestion, unless you have some niche input that benefits from that particular set of options (but then, who has time to learn lzma internals and how every option plays with different kinds of input?).
> There are plenty of codecs that give you either obscene ratios or low CPU usage, but none that support both in combination.
I still find lbzip2 (which is a bzip2 reimplementation with better algorithms and support for multithreading) quite competitive for highly compressible data. Here's quick and unscientific test that shows that lbzip2 (-9) is still both faster and has better ratio than zstd (-12, --long or not) while also using the least amount of RAM (tmpfs, multithreaded compression using 4-core Xeon E5-2603 v1):
$ time lbzip2 -k linux-5.0.8.tar
real 24.21 user 84.87 sys 4.07 maxrss 105472
$ time ~/zstd-1.4.0/zstd -T0 -k -12 linux-5.0.8.tar -o linux-5.0.8.tar.zst-12
real 30.69 user 105.27 sys 0.61 maxrss 942416
$ time ~/zstd-1.4.0/zstd -T0 -k -12 --long linux-5.0.8.tar -o linux-5.0.8.tar.zst-12-long
real 31.28 user 107.90 sys 0.86 maxrss 1532432
$ time xz -T0 -k -2 linux-5.0.8.tar
real 34.40 user 123.59 sys 0.57 maxrss 410192
$ stat -c '%s %n' linux-5.0.8.tar* | sort -n
126382954 linux-5.0.8.tar.bz2
126394210 linux-5.0.8.tar.zst-12-long
128003669 linux-5.0.8.tar.zst-12
131418488 linux-5.0.8.tar.xz
863426560 linux-5.0.8.tar
The only clear advantage zstd has is decompression speed:
$ time xzcat -T0 linux-5.0.8.tar.xz >/dev/null
real 17.25 user 17.06 sys 0.17 maxrss 17312
$ time ~/zstd-1.4.0/zstd -dc -T0 linux-5.0.8.tar.zst-12 >/dev/null
real 2.08 user 1.97 sys 0.08 maxrss 27088
$ time ~/zstd-1.4.0/zstd -dc -T0 linux-5.0.8.tar.zst-12-long >/dev/null
real 2.26 user 2.03 sys 0.17 maxrss 535360
$ time lbzcat linux-5.0.8.tar.bz2 >/dev/null
real 10.34 user 33.74 sys 3.53 maxrss 127088
Given that it's the decompression speed that is typically the "user-facing" part in many contexts (with compression being done by automatic jobs etc), such a difference in decompression speed is pretty awesome indeed.
(which is also what makes it a near-perfect codec for HDF5, via blosc-hdf5)
I mostly use lbzip2 for day-to-day tasks too. But shouldn't you compare to pzstd to be fair? It's not very surprising that running 4 threads is faster than 1 thread in wall clock time.
EDIT: no, I was wrong, `zstd -T0` is basically the same as `pzstd`.
I'll just add that I tried it and pzstd -12 is still significantly slower than lbzip2 -9 on my machine, with approximately the same compression ratio for linux-5.0.8.tar.
EDIT: no surprise, as -T0 also enables multithreading.
There are basically three scenarios where I choose various compression algorithms (a few exceptions excluded):
- maximum compatibility (while tolerating low performance) - gzip
- great performance (while tolerating larger files) - snappy
- very good performance with good (not best) compression ratios - zstd
I don't really want to use Any New Shiny Algo to compress some data that might outlive this piece of software, that's why I use gzip very often, because I know I'll always be able to decompress it. But I've been increasingly adopting zstd and snappy for one single reason - they are becoming widely supported within the ecosystems I work in (data processing).
That, to me, is more important than compression ratios and decompression speeds.
lz4 compresses and decompresses faster than snappy, and compresses similarly. You can see some comparisons on the GitHub's readme https://github.com/lz4/lz4.
I've been using zstd compression on btrfs for a while now and it's excellent, most of my stuff is already compressed (movies, music) but my home directory (which is mostly comprised of text files) has shrunk greatly.
The next GRUB release (grub-2.04) includes my patch to add support for zstd compressed BtrFS filesystems, which should solve one of the major pain points of Zstd BtrFS compression.
A temporary work around in the meantime, I've used `chattr +C` on the directories I want to be exempt from zstd compression, so that grub can read those files.
We have many terabytes of large files that are currently compressed using xz (well, pixz actually). In terms of compression speed and ratio, zstd is pretty comparable, and for single-threaded decompression it's faster. At this point the only thing stopping us from using it instead of xz/pixz is the fact that multi-threaded pixz decompression is faster. Are there any plans to add MT decompression to zstd?
There aren't any technical limitations to adding multithreaded decompression zstd. We just need a compelling enough use case to justify the work it would take to add it.
pzstd is now obsoleted by zstd -T0, but it offers multithreaded decompression for files compressed by pzstd (it will still be single threaded for files compressed by zstd).
On three test machines, HDD and SSD, the single decompression thread ranges from 9% to 15% CPU, and has maxed out the read+write capacity for my storage. But maybe you have super fast source and target storage?
In my scenario, I'm decompressing from HDD and piping the decompressed data directly to another process. I may be limited by disk sometimes, but often the data is already in the filesystem cache. To give a concrete example:
Granted, zstd is far, far more efficient per core, but there are plenty of workloads where I can afford to use a lot of cores for decompression. Also pixz still compresses slightly better than zstd -19, but I'd be willing to trade that for more efficient decompression if I could still have the option of really fast decompression using multiple threads.
Note also that with this particular data, I'm seeing a compression ratio of only about 4.3:1 using zstd -19. I can imagine that zstd would use less CPU when decompressing if the ratio was higher.
pzstd does support -d (decompression) option with a default of 4 processes, which can be set with -p. It's part of the zstd package, but I guess it's separate because it's experimental? Not sure what the difference is between zstd -T4 for compressing, and pzstd -p4 for compressing.
Anyway, for a test file I get ~125% CPU with pzstd -d, so it is able to do more work, and slightly decreases time.
Disclaimer: I'm a maintainer of zstd, so I'm biased.
Brotli dominates HTTP compression. Zstd just got its RFC approved a few months ago, but Brotli has been present in browsers for years.
However, zstd is more widely adopted everywhere else, especially in lower level systems. Zstd is present in compressed file systems (BtrFS, SquashFS, and ZFS), Mercurial, databases, caches, tar, libarchive, package managers (rpm and soon pacman). There is a pretty complete list here https://facebook.github.io/zstd/.
Again, I'm biased because I know almost everywhere where zstd is deployed, but not everywhere that Brotli is.
I'm also biased, as I'm the author of Brotli. With Brotli you get about 5 % more density. Brotli decompresses about 500 MB/s while Zstd decompresses about 700 MB/s. Typical web page is 100 kB, so you need to wait 200 us for decompression. For zstd, you'd only need to wait 140 us.
That 60 us saving comes with a cost: you'll be transferring 5 % more bytes, which can cost you a hundreds of milliseconds. Brotli is also more streaming, so you get your bytes our earlier during the data transfer. This allows for pages to be rendered with partial content and further fetches to be issued earlier.
Zstd supporters have used comparisons against brotli where they compare a small window brotli against large window zstd. This makes it seem like zstd can compete in density, too, but that is just apples to oranges comparisons.
Brotli is great at compressing static web content, zstd without a dictionary is unlikely to outperform it. For static content you'd probably rather save 5% of space over some decompression costs, since Brotli decompression is fast enough.
Zstd has an advantage if you don't have the CPU to compress at the maximum level, since zstd is generally faster than Brotli at the lower levels.
Even still, for web compression, Brotli has the advantage of already being present in the browsers, so you're betting off using Brotli for web compression as it stands today.
For encoding there shouldn't be a format specific difference. If there is, it is based on immaturity of the implementations. There were stages when brotli:0 used to be faster to compress than any setting in zstd, but now they have played catch-up and are the leader.
Zstd can be significantly slower in medium levels -- you just need not to be tricked to apply compressors at different window sizes. Zstd changes window sizes under the hood. With brotli you need to explicitly decide about your decoding resource use.
If you use the same window size and aim for the same density of compression, brotli actually tends to compress faster in the middle qualities, too.
Excited for Zstd getting into browsers, Brotli is great for static assets, but it doesn't really outperform gzip by much at dynamic compression scenarios.
You just cannot compare two things with a commutative function, and that blog post is based on an assumption that you can. Math just doesn't work like that.
Brotli's fastest compression modes are 3-5x faster than zlib's fastest modes. For every gzip quality setting there is a brotli setting that is both faster and more dense than that gzip setting.
We used https://quixdb.github.io/squash-benchmark/ mostly. It's not that Brotli wasn't a winner compared to our gzip, it just didn't outperform it substantially and came with challenges integrating it with our product for supporting dynamic payloads. Meanwhile static assets just need brotli support at build time.
While it is a great benchmark, that is about 3 year old data. The brotli quality 1 was moved to be quality 2, and two new levels have been added. Everything has been sped up and a few levels (possibly 5-10) have got a ~5 % density boost from improved context modeling. Level 10 has been added (about 2-3x faster than level 11) -- in squash benchmark you still see the initial behavior where level 10 copied level 11 performance.
Out of curiosity: Why doesn't pacman just use HTTP's built-in compression? It could cache packages in gzipped form, but there's no reason to re-compress them over the wire.
Mahoney's benchmark is missing the large-window brotli numbers, which are about 5 % better than those of zstd and 10 % better than those of small-window brotli.
True! I suppose what I actually want is a Windows utility that can make a .tar.zst archive, ideally from a GUI.
In the Windows world, archiving and compression are usually in a single file type (.zip, .rar, .7z). Zstd follows the unix style where it can't directly compress folders of files, they need to be in a tar (or other archive format) first. This isn't really an issue on Linux, since Zstd support is built into tar, which ships on pretty much every system.
There haven't yet been any extensions to zip or 7z for zstd support. There is a branch of wimlib that has experimental zstd support, though it's unlikely it will ever be merged into the master branch.
You could make uncompressed zip or 7z files and compress that independently as a zst file, but that's a bit baroque compared to just using tar. :)
7-Zip does often seem bent on supporting everything, I imagine some day in the future it'll support zstd at least as an independent archive, if not extending the Zip and 7z formats as well.
There are two ways of doing compression or archives. One where a local small corruption destroys one data entity, and another where a local small corruption destroys most if not all the archive. The latter kind gives a small density improvement, but can prove to be the wrong option some time later.
And for resources that don't compress well, web servers like nginx (and I presume others) support listing what mimetypes to compress, so they won't double-compress those things.
It allows trading lower CPU for gzip-or-worse compression, and you can mix the settings within a single file. This means you can e.g. use the lowest setting (or no compression at all - it supports that too) to append to a file, while occasionally recompressing recent appends into a single block using the highest setting -- so the cost of compression can be amortized
The only petty annoyance with it is ecosystem support - e.g. GNU tar has no option for it, so it's slightly more painful to work with