Project

General

Profile

[Solved] ModDeflate questions (possibly some feature requests too)

Added by da almost 3 years ago

I’ve got a few questions regarding the behavior of ModDeflate. I couldn’t find answers to any of these questions in the mod_deflate documentation.

  1. Issue #3040 defines the valid range of deflate.compression-level as 1–9. However, Brotli goes from 1 to 11, and Zstandard goes from -3 to 22. So, is the range really limited to Gzip’s range?
  2. How do I set different ranges for different encodings? E.g. 9 for gzip, 10 for Brotli, and 20 for Zstd. (The two latter encodings needlessly wastes cycles on their highest settings.)
  3. Is client preferences respected? Assuming that the encodings are enabled, what gets servers when the client requests Accept: gzip;q=0.6, br;q=1.0, zstd;q=0.8.
  4. How do I specify the server’s preferred order of encodings when the client has no preference? E.g. when the client sends Accept: gzip, br (equivalent to Accept: gzip;q=1.0, br;q=1.0); how do I get the server to return Brotli? Does the deflate.allowed-encodings option also specify the server’s preference?
  5. Variation on the above: Given that the deflate.cache-dir is enabled, can Lighttpd return the smallest already compressed file supported by the client? For this to work, the first request for an uncached file would received the first supported compression encoding (e.g. gzip), and the second request get the next supported encoding (e.g. br) until all supported encodings exists in the cache. When all supported types are cached, future requests should compare the file sizes and serve the smallest supported encoding.

Replies (11)

RE: ModDeflate questions (possibly some feature requests too) - Added by gstrauss almost 3 years ago

Thank you for taking a moment to update the mod_deflate documentation (and thank you for reading the documentation).

TBH, lighttpd developers get very little feedback from technically competent users, so many features that have been added have been done to provide reasonably functionality without too much overhead and without adding too much complexity to code or to config options.

1. Issue #3040 defines the valid range of deflate.compression-level as 1–9. However, Brotli goes from 1 to 11, and Zstandard goes from -3 to 22. So, is the range really limited to Gzip’s range?
2. How do I set different ranges for different encodings? E.g. 9 for gzip, 10 for Brotli, and 20 for Zstd. (The two latter encodings needlessly wastes cycles on their highest settings.)

As a common denominator, yes, the range is limited to 0-9 (and 0 uses the default compression level compiled into zlib). The code contains the following comment for brotli and zstd request init routines:

    /* future: consider allowing tunables by encoder algorithm,
     * (i.e. not generic "compression_level") */

If such fine configuration per-encoder is important to you, would you help us to understand why? The general assumption that has been made thus far is that if an admin needs such fine control, then for static content, the admin can prep the deflate.cache-dir in advance, using desired compression settings per compression type; and for dynamic content, the admin can have the content generator perform the desired compression at the desired compression setting per compression type, rather than deferring to lighttpd mod_deflate.

If you can provide convincing reasons why such tunables are very important to have in lighttpd, I will look into adding additional tuning options.

3. Is client preferences respected? Assuming that the encodings are enabled, what gets servers when the client requests Accept: gzip;q=0.6, br;q=1.0, zstd;q=0.8.
4. How do I specify the server’s preferred order of encodings when the client has no preference? E.g. when the client sends Accept: gzip, br (equivalent to Accept: gzip;q=1.0, br;q=1.0); how do I get the server to return Brotli? Does the deflate.allowed-encodings option also specify the server’s preference?

The client q= preferences are not currently respected, and you currently can not specify the server's preferred order. lighttpd creates a simple bitmask of encodings supported in lighttpd, combines that with the encodings in Accept, and then chooses the first match in the following order: zstd, br, bzip2, gzip, deflate

This occurs in the code: mod_deflate_choose_encoding()
https://git.lighttpd.net/lighttpd/lighttpd1.4/src/branch/master/src/mod_deflate.c#L1384

As above, if you can provide convincing reasons why such tunables are very important to have in lighttpd, I will look into adding additional tuning options. I posit that for typical use, parsing the q= and ordering them so precisely provides little to no benefit for most servers, again, for typical use.

You can configure lighttpd.conf with conditions for certain content, and set deflate.allowed-encodings = only ("br","gzip") or ("zstd","gzip") if certain content types (or certain classes of requests) are better handled by "br" or by "zstd", and you expect a client to support both. No, this is not perfect, but is a ready option if it makes a difference for you.

5. Variation on the above: Given that the deflate.cache-dir is enabled, can Lighttpd return the smallest already compressed file supported by the client? For this to work, the first request for an uncached file would received the first supported compression encoding (e.g. gzip), and the second request get the next supported encoding (e.g. br) until all supported encodings exists in the cache. When all supported types are cached, future requests should compare the file sizes and serve the smallest supported encoding.

Similar to my comment in https://redmine.lighttpd.net/boards/3/topics/9787, this sounds to me more like a very specific use case for mobile, and something that might be better handled by your specific application. If you were to pre-fill deflate.cache-dir with all the types you would like to support, then that might get you 90%+ of the way there.

RE: ModDeflate questions (possibly some feature requests too) - Added by da almost 3 years ago

Uhm. Can you rescue my reply to this thread from the server? It ate it and returned a Redmine error message. Going back erased it.

RE: ModDeflate questions (possibly some feature requests too) - Added by gstrauss almost 3 years ago

Sorry, that may have been tripped up when I fixed a spelling error in my post. If the text did not pre-fill when you posted the above, I don't think it is in the database, though I also admin I am not an expert with redmine.

RE: ModDeflate questions (possibly some feature requests too) - Added by da almost 3 years ago

Crap. I’d written up such a nice reply with rationals and reasoning. Here’s a quick retype instead.

Lighttpd doesn’t need individual switches for each encoders. But it does need to be able to access Zstd and Brotli’s higher compression levels without erroring out gzip. Pseudo code to normalize different encoders to the gzip scale:

compression_scale_max = 9

if encoder == 'zstd'
  encoder_max = 22
else if encoder == 'br'
  encoder_max = ''
else
  encoder_max = compression_scale_max
end

config_comp_level = 8
compress_level = round( ( encoder_max / compression_scale_max ) * config_comp_level )
// gzip = 8, zstd = 20, br = 10

config_comp_level = 5
compress_level = round( ( encoder_max / compression_scale_max ) * config_comp_level )
// gzip = 5, zstd = 12, br = 6

config_comp_level = 2
compress_level = round( ( encoder_max / compression_scale_max ) * config_comp_level )
// gzip = 2, zstd = 5, br = 2

Ignores Zstd’s fastest/worst compression options (-3, -2, -1, 0) but 2 probably covers most situations.

RE: ModDeflate questions (possibly some feature requests too) - Added by da almost 3 years ago

I don’t actually need to configure the encoder order. I was just curios as to what encoders where used in different situation. A hard-coded list is probably good enough, but I disagree with the current order.

Here’s another pseudo patch:

-the following order: zstd, br, bzip2, gzip, deflate
+the following order: br, zstd, gzip, deflate, bzip2

… and the rational behind the reordering:

  • Brotli always yields better compression than Zstd in benchmarks with the same window size (which is the case in Lighttpd.) It uses more CPU than Zstd but not by that much, and the compression gains are worth it.
  • Zstd and Gzip/deflate yields similar compression for small files. Zstd is eversoslight larger for small files (like the typical webpages and their assets), but tends to be better for huge files. Zstd also compresses and decompresses faster than Gzip/deflate. So keep it in front of Gzip/deflate.
  • Deflate is the same as Gzip sans the format header. It’s a legacy format that’s still in use by bots and all sorts of things. Keeping it behind Gzip increases change of clients that support both gets served an already disk-cached gzip.
  • Bzip2 is disabled in the default build flags. (I think it was only ever implemented by OmniWeb on macOS?) Slightly better compression than Gzip, but slower compression and decompression. Keep it at the end of the list, as it’s better than no compression at all. Moving it to the back increases likelihood of clients being served already disk-cached gzip.

RE: ModDeflate questions (possibly some feature requests too) - Added by gstrauss almost 3 years ago

Bzip2 is disabled in the default build flags. (I think it was only ever implemented by OmniWeb on macOS?) Slightly better compression than Gzip, but slower compression and decompression. Keep it at the end of the list, as it’s better than no compression at all. Moving it to the back increases likelihood of clients being served already disk-cached gzip.

I do not think the bzip2 order should change. bzip2 compression can be slow, especially at higher levels. I disabled it in the build by default since brotli is clearly the more widely supported and better successor for the web. I don't see much value in changing the bzip2 place in the ordering since if you do want to use bzip2, and you have enabled it in the build, and especially if you pre-fill the deflate.cache-dir, you probably want that to take preference over gzip. Also, you still have to explicitly specify "bzip2" in deflate.allowed-encodings. Put another way, the only reason I haven't deleted bzip2 support just yet is just in case someone is using it. I don't plan to change it unless it is causing demonstrable issues.

Brotli always yields better compression than Zstd in benchmarks with the same window size (which is the case in Lighttpd.) It uses more CPU than Zstd but not by that much, and the compression gains are worth it.
Zstd and Gzip/deflate yields similar compression for small files. Zstd is eversoslight larger for small files (like the typical webpages and their assets), but tends to be better for huge files. Zstd also compresses and decompresses faster than Gzip/deflate. So keep it in front of Gzip/deflate.

I am open to swapping the server preference order to br, zstd in a future lighttpd release, but need to look into it further. Do you have any links or data to which you could point me to support your statements above?

FYI: The reason I put zstd in front of brotli in the server preference order is that I conjectured that if a client supports zstd, then that client likely also supports brotli. However, the opposite is less likely to be true. Either way, you can configure deflate.allowed-encodings if you want to exclude one or the other or both.

If a portion of your site, e.g. /css is small files that are better served with brotli, and you want zstd enabled elsewhere, then configure lighttpd.conf

deflate.allowed-encodings = ("zstd","br","gzip","deflate")
$HTTP["url"] =~ "^/css/" {
    deflate.allowed-encodings = ("br","gzip","deflate")
}

I think it more likely that you might want the global setting to be deflate.allowed-encodings = ("br","gzip","deflate") and for a location with large assets to prefer zstd
deflate.allowed-encodings = ("br","gzip","deflate")
$HTTP["url"] =~ "^/large_and_highly_compressible_content/" {
    deflate.allowed-encodings = ("zstd","br","gzip","deflate")
}

Overall, I think lighttpd provides decent control of encoding preferences, but it could be better. I don't really think debating zstd vs brotli ordering is the best use of our time, so I'll consider changing deflate.allowed-encodings to specify the server preference order. I know how I would do it, but I consider doing so a lower priority feature request. (Don't expect it anytime soon.)

.

Lighttpd doesn’t need individual switches for each encoders. But it does need to be able to access Zstd and Brotli’s higher compression levels without erroring out gzip.

I do not think further overloading deflate.compression-level is the best approach. I'll try next weekend to see about sketching out per-encoder detailed configuration options.

RE: ModDeflate questions (possibly some feature requests too) - Added by gstrauss almost 3 years ago

I posted some (very lightly tested) patches to my git development branch personal/gstrauss/master. Feedback appreciated.

Aside: I am still not convinced that any of these changes are necessary or have any real impact, and that lighttpd can already be configured effectively when using zstd is desired (or else zstd should not be configured). That said, since lighttpd 1.4.56, lighttpd generally parses configuration options into more structured data at startup, so the runtime impact is minimal for adding the features being discussed in this topic (as sketched out in the patches on my development branch).

.

An interesting discussion of brotli and zstd can be found in the Mozilla issue tracker feature request (still open) for zstd support in Firefox
https://bugzilla.mozilla.org/show_bug.cgi?id=1301878

RE: ModDeflate questions (possibly some feature requests too) - Added by da almost 3 years ago

gstrauss wrote in RE: ModDeflate questions (possibly some feature requests ...:

I posted some (very lightly tested) patches to my git development branch personal/gstrauss/master. Feedback appreciated.

On line 563 in commit 47ca6c9e6e:

This should probably say USE_GZIP and not USE_ZSTD.

On commit 47ca6c9e6e":

Maybe normalize the name as clevel instead of using quality for Brotli? It’s the same thing using different terminologies, and using one name would reduce configuration errors (e.g. setting quality on one of the other or clevel on Brotli.)

On commit 64198e4d00:

That plus documentation is a good way to do it. (And you avoided the trap Caddy fell in with its implementation of the same idea.)

On the default compression levels in commit 7048c4eb91.

Google recommends using Brotli 5 and Gzip 6 for dynamically compressed content. Cloudflare also found that Brotli 5 to be the best trade off between speed and compression levels. The current default for Gzip is 0? I’m unsure if these values behave similar to ZSTD_CLEVEL_DEFAULT’s default of 3, though.

On lines 1067–1069 in commit eabb17562e.

(Minor enhancement): Should check if the string ends with /json, +json, /xml, or +xml instead. That would cover image/svg+xml, application/rss+xml, application/feed+json, and three dozens more in a generic way.

gstrauss wrote in RE: ModDeflate questions (possibly some feature requests ...:

An interesting discussion of brotli and zstd can be found in the Mozilla issue tracker feature request (still open) for zstd support in Firefox
https://bugzilla.mozilla.org/show_bug.cgi?id=1301878

Interesting indeed. Zstd’s future on the web greatly depends on whether IoT and budget phones adopting it. Its reduced memory and processor resources required for decoding can prolong battery life compared to Gzip and Brotli. It would be very Apple-like to force everyone to support Zstd for compression on iPhones too, though.

RE: ModDeflate questions (possibly some feature requests too) - Added by da almost 3 years ago

You’ve mentioned pre-populating the deflate.cache-dir, but is this documented? I didn’t find any details on how this directory works, naming scheme, prepopping and cache invalidation, etc.

Known limitations on the ModDeflate wiki wrote:

mod_deflate deflate.cache-dir, if set, contains cached output of static files, but does not cache dynamic responses unless the response contains an ETag response header. (dynamic responses, if eligible, are still compressed by mod_deflate before the response is sent to the client.)

Does that also apply to static content when unsetting the ETag response header?

RE: ModDeflate questions (possibly some feature requests too) - Added by gstrauss almost 3 years ago

On line 563 in commit 47ca6c9e6e:
This should probably say USE_GZIP and not USE_ZSTD.

Good catch. Fixed.

Maybe normalize the name as clevel instead of using quality for Brotli? It’s the same thing using different terminologies, and using one name would reduce configuration errors (e.g. setting quality on one of the other or clevel on Brotli.)

I would rather directly map to C library headers. If you're modifying these parameters, you ought to have read the documentation for the encoder library, and perhaps looked in the encoder library header for that documentation.

I did add some trace so that lighttpd will issue a warning if an unrecognized param is encountered in the deflate.params list.

Google recommends using Brotli 5 and Gzip 6 for dynamically compressed content.

Thanks for the links. I changed the lighttpd default for brotli to 5. NBD to me between brotli 4 and brotli 5, but a huge difference from brotli 11. (gzip default is already 6)

You’ve mentioned pre-populating the deflate.cache-dir, but is this documented? I didn’t find any details on how this directory works, naming scheme, prepopping and cache invalidation, etc.

Unfortunately, it is not currently well-documented other than being defined in the code. To use the same compression settings, you can simply hit lighttpd front-end with requests (e.g. from localhost) to trigger compression and caching. To use a different setting, you'll have to look in the code, and it is possible that the structure might change. See comments in routine mod_deflate_cache_file_name(). Currently: /<cache dir> + /<fully-qualified-path-to-file> + "-" + ETag (without surrounding '"'). As you can see, the ETag is part of the compressed filename so that lighttpd creates a different entry in the cache if the ETag of the base resource changes.

I already have an item on my wishlist to have a supplemental script which ships with lighttpd which can be used to pre-populate the cache, and use higher compression levels. However, it is not likely to be done any time soon. You're welcome to take a crack at it and submit a pull request on github. Could be as simple as a shell script which takes two arguments: cache-dir and fully-qualified-path-to-file, calculates ETag according to http_etag.c, and runs encoders at highest compression level to produce the encoded paths. Probably should have a third argument to specify the encoding to use. Probably reasonable to assume default etag settings, using inode, file size, and high-precision mtime timestamp. Probably easier to compute the DEK hash if the script were written in Perl.

Does that also apply to static content when unsetting the ETag response header?.

Same as above: if no ETag, the no caching in deflate.cache-dir

(Minor enhancement): Should check if the string ends with /json, +json, /xml, or +xml instead. That would cover image/svg+xml, application/rss+xml, application/feed+json, and three dozens more in a generic way.

I had copied that list from someone example for setting BROTLI_PARAM_MODE. Your suggestion makes sense.

Thanks for all the suggestions! My git development branch personal/gstrauss/master has been updated.

RE: [Solved] ModDeflate questions (possibly some feature requests too) - Added by gstrauss over 2 years ago

Thanks for all the suggestions! The next release of lighttpd will contain the changes discussed above.

    (1-11/11)