Project

General

Profile

Actions

Bug #3089

closed

Slow upload / Increase CPU/Memory usage with HTTP/2 enabled

Added by n00b42 over 2 years ago. Updated over 1 year ago.

Status:
Fixed
Priority:
Normal
Category:
core
Target version:
ASK QUESTIONS IN Forums:
No

Description

After upgrading from 1.4.58 to 1.4.59 we noticed our uploads slowing down.

Our uploads are ~200MB binary files which are handed to a cgi program.
The upload took around 1-2min before and after the upload took ~30min

Our environment is quite feeble and with .59 reached 100% CPU usage which probably is the reason for the slowdown.

After some testing, we noticed that disabling http/2 solved the problem.

Attached our config with only the last line changed to disable http/2.


Files

lighttpd.conf (2.1 KB) lighttpd.conf n00b42, 2021-07-13 12:20
lighttpd-bug-3089.tar.gz (1012 Bytes) lighttpd-bug-3089.tar.gz DamienT, 2022-06-02 09:06
Actions #1

Updated by n00b42 over 2 years ago

Actions #2

Updated by gstrauss over 2 years ago

If you have sufficient disk space in /var/tmp but other resources are feeble, as you say, then you might consider using

server.stream-response-body = 1
server.stream-request-body = 1

or even the defaults of 0, so that client and backend offload to lighttpd as quickly as possible, freeing up those resources used by your CGI, which are (often) much less efficient than lighttpd.

If your server has feeble resources, then server.max-connections = 1024 is also likely way too high for that feeble server. Since you describe the server as feeble, you might consider server.max-connections = 16, and possibly less. How many simultaneous requests do you typically expect to be serviced by this feeble server?

Actions #3

Updated by n00b42 over 2 years ago

Thanks for your suggestions.

Note: It is running in an embedded device.

If you have sufficient disk space in /var/tmp

All writable disk space is mapped to a ramdisk, therefore dependent on free memory.

server.stream-response-body = 1
server.stream-request-body = 1

We used "2" to potentially decrease our memory consumption as our backend could already start working.

How many simultaneous requests do you typically expect to be serviced by this feeble server?

typically just 1. So, yes we could reduce "max-connections" but we never have this many already.

Actions #4

Updated by gstrauss over 2 years ago

What is the CGI script? Is it C or is it a scripting language. Scripting languages can consume much more memory than reasonably-written C programs.

Are you able to reproduce the issue with server.max-connections = 1 along with leaving HTTP/2 enabled? What about server.max-connections = 4 ? I am curious if your system is already really short on memory. How much memory is on the embedded system and how much is free (available) when lighttpd is not running? How about when lighttpd is running?

HTTP/2 support in lighttpd legitimately uses more memory than HTTP/1.1, #3084 notwithstanding.

Actions #5

Updated by gstrauss over 2 years ago

  • Status changed from New to Need Feedback
Actions #6

Updated by n00b42 over 2 years ago

Actually, I was able to pinpoint the problem on CPU utilization rather than memory consumption, sry for the wrong suggestions...

The system is an embedded device with only 1G memory and a singlecore of ~1Ghz.
Idle it consumes ~125MB RAM and ~25% CPU (while lighttpd runs but no transfer happens)

The upload process then results in a very high cpu usage
With http/2 enabled: nearly 99% (lighttpd only shows ~30%, and the cgi backend ~10%, but kernel/network take the rest)
Without http/2: around 90% (lighttpd only shows ~25%, and the cgi backend ~2.5%, but kernel/network take the rest)

The cgi backend is a C++ program, for testing purposes it was temporarily replaced by a C++ program that just reads from stdin (in 8B blocks) and discards the results.

Setting

server.max-connections = 1
did not change the CPU usage in a noticeable manner.

Actions #7

Updated by gstrauss over 2 years ago

The system is an embedded device with only 1G memory and a singlecore of ~1Ghz.

FYI: That is not feeble.
Also, the system is active if "idle" CPU usage is 25% CPU. (That's not idle.)

If you are not memory bound, I suggest testing with
server.stream-request-body = 0
server.stream-response-body = 0
and also with your C++ discard program as the CGI.

All writable disk space is mapped to a ramdisk, therefore dependent on free memory.

If memory is not the issue, but the kernel/network is, then you should look into what the kernel/network is doing.

Ensure that if ssl.read-ahead is set, that it is set to ssl.read-ahead = "disable"

Test on localhost (on the embedded device) with a client program that uses HTTP (without TLS) and generates 200 MB of 'a's to upload.

What is the additional ~35% of CPU going towards when uploaded to lighttpd? If you have iostat available, try using iostat to gather some stats. Do you have auditing or some other logging enabled on the system? What happens to the processing using that 25% CPU usage during the upload to the CGI. Do those other processes increase CPU usage?

Actions #8

Updated by gstrauss over 2 years ago

Why are you using server.network-backend = "writev" instead of the default ("sendfile")?

If you're using lighttpd 1.4.59, then depending on the performance of your PAM backend, you might also try auth.cache = ("max-age" => "600") to cache passwords for 10 mins. (Please do consider the security implications of caching passwords, even for only 10 mins.)

If all your filesystems are directed to in-memory filesystems, e.g. server.upload-dirs = ( "/var/tmp" ), then check the usage of these filesystem during an upload. You have posted that memory is not an issue, but please verify.

Even with the above, you still need to find out where the CPU is going.

Actions #9

Updated by n00b42 over 2 years ago

FYI: That is not feeble.
Also, the system is active if "idle" CPU usage is 25% CPU. (That's not idle.)

Sorry if my description was misleading, was comparing it to kinda powerful server HW.
Also with "idle" I was referring to the system when lighttpd has nothing to do.

If you are not memory bound, I suggest testing with
server.stream-request-body = 0
server.stream-response-body = 0
and also with your C++ discard program as the CGI.

1. We are "kinda" memory bound: We have no writeable storage, just tmpfs volumes (/tmp, /var/tmp).
I.e. as far as I understand it, buffering to /var/tmp will double the required memory (as it stored and then handed to the application)
Combined with the CGI backend decoding quickly can fill up RAM (as the file size is ~200MB)
That was the initial reason to switch to server.stream-*-body = 2.

2. Tried the settings with 0. Did not change noticable.

If memory is not the issue, but the kernel/network is, then you should look into what the kernel/network is doing.

We are investigating this as well, as we needed to patch drivers to work with newer kernel versions, there is always the possibility of something gone wrong.

Ensure that if ssl.read-ahead is set, that it is set to ssl.read-ahead = "disable"

It is set to disable.

Test on localhost (on the embedded device) with a client program that uses HTTP (without TLS) and generates 200 MB of 'a's to upload.

Used curl to send data on localhost:
- Sending using https slows down, shows the previous observe behavior.
- Using http is fine.
- using --http1.1 also is fine, with bot http/https

What is the additional ~35% of CPU going towards when uploaded to lighttpd? If you have iostat available, try using iostat to gather some stats. Do you have auditing or some other logging enabled on the system? What happens to the processing using that 25% CPU usage during the upload to the CGI. Do those other processes increase CPU usage?

According to htop, the additional ~35%+ are different kernel threads (K_TXQ_TASK, kworker/0:*events, etc), the wifi chip driver is probably involved as well.
The "other" processes more or less keep their CPU usage, or drop a little as they might not get enough.
iostat shows

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.26    0.00   21.28    0.00    0.00   74.47
After starting upload:
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          14.14    0.00   79.80    0.00    0.00    6.06

Why are you using server.network-backend = "writev" instead of the default ("sendfile")?

On the one hand, the default config we based our config on (some time ago) states: "sendfile [...] for[..] small files. [...] writev [...] for [...] large files"
(On the other hand I do not know if we have compatibility reasons or just kept it from some old config

If you're using lighttpd 1.4.59, then depending on the performance of your PAM backend, you might also try auth.cache = ("max-age" => "600") to cache passwords for 10 mins. (Please do consider the security implications of caching passwords, even for only 10 mins.)

Tested the upload without PAM. No noticable improvement.

I understand that this is a quite specific issue that might occur if http2 + https cause a higher CPU usage and when CPU is fully utilized timings/buffering breaks somehow (maybe in network stack/driver).
Therefore, I fully understand if you would want not to further spend time diagnosing it.

Regards

Actions #10

Updated by gstrauss over 2 years ago

I am willing to continue the conversation here to see if there are things lighttpd could do better, even though I do not think your issue is attributable to a bug in lighttpd.

HTTP/2 processing is (typically) expected to use more CPU and memory than HTTP/1.1, but the tradeoff is often lower-latency responses to the client, and overall better performance in the aggregate (e.g. with each client making 1 socket connection and multiplexing requests, instead of making 8 socket connections).

The upload process then results in a very high cpu usage
With http/2 enabled: nearly 99% (lighttpd only shows ~30%, and the cgi backend ~10%, but kernel/network take the rest)
Without http/2: around 90% (lighttpd only shows ~25%, and the cgi backend ~2.5%, but kernel/network take the rest)

lighttpd using 5% more CPU with HTTP/2 (and TLS) seems quite reasonable to me.

I have a hunch that you may be on the brink of running out of resources for a combination of reasons, and the slightly additional memory/CPU usage with HTTP/2 and TLS in lighttpd results in the system thrashing.

Does the system go to 99% CPU if you upload a 100 MB test file instead of a 200 MB test file (using lighttpd with HTTP/2 and TLS)?

Have you considered testing with an alternative lighttpd TLS module better for low-resource and embedded systems, such as lighttpd mod_mbedtls or lighttpd mod_wolfssl? See lighttpd TLS docs. The memory footprint of TLS using lighttpd mod_mbedtls or mod_wolfssl is smaller than that of mod_openssl, though mod_openssl and mod_gnutls are more performant.

.

Thinking aloud:

Since you're using
server.stream-response-body = 2
server.stream-request-body = 2
then even if lighttpd stores some of the data in temp files, it should be no more than a MB or so as lighttpd empties the kernel socket and pipe buffers to the temporary files. There should not be anywhere near 200 MB (or more) memory or disk usage by lighttpd to serve that request. Combined with your C++ test program which reads and discards the request body, you should be able to verify that you have plenty of memory available on the system during the file upload.

Have you tried testing a file upload with curl over HTTP/2 without TLS?
curl --http2 ... or curl --http2-prior-knowledge ... to an "http://" address instead of to an "https://" address.

What kernel version are you using? What is the architecture? x86? x86_64? ARM? ARM64?
On what system are you building/packaging lighttpd? 32-bit lighttpd? 64-bit lighttpd? If using an old kernel, you should ./configure --enable-mmap so that lighttpd will use mmap with the temporary files. On newer kernels, lighttpd uses splice(), where available, to send data from temporary files to the pipe to the CGI. With modern kernels, you should prefer server.network-backend = "sendfile", though I don't think that will help your issue (and it shouldn't hurt any, either).

Actions #11

Updated by gstrauss over 2 years ago

You might also test with a slower curl --limit-rate <speed> to see if the HTTP/2 protocol, including flow control, is overwhelming the network driver and wifi. (HTTP/2 is chattier than HTTP/1.1; HTTP/1.1 has no application-level flow control.

Actions #12

Updated by n00b42 over 2 years ago

Does the system go to 99% CPU if you upload a 100 MB test file instead of a 200 MB test file (using lighttpd with HTTP/2 and TLS)?

The file size actually doe not matter much (tested with 50/100/200MB) as the CPU peak happens right after starting.

Have you considered testing with an alternative lighttpd TLS module better for low-resource and embedded systems, such as lighttpd mod_mbedtls or lighttpd mod_wolfssl? See lighttpd TLS docs. The memory footprint of TLS using lighttpd mod_mbedtls or mod_wolfssl is smaller than that of mod_openssl, though mod_openssl and mod_gnutls are more performant.

Thanks for the hint, tried using mbedtls as it is already provided:
I think I noticed a small delay before the upload slows down (but quite small).

Thinking aloud:

Since you're using
server.stream-response-body = 2
server.stream-request-body = 2
then even if lighttpd stores some of the data in temp files, it should be no more than a MB or so as lighttpd empties the kernel socket and pipe buffers to the temporary files. There should not be anywhere near 200 MB (or more) memory or disk usage by lighttpd to serve that request. Combined with your C++ test program which reads and discards the request body, you should be able to verify that you have plenty of memory available on the system during the file upload.

Yes, using server.stream-*-body = 2 and my test cgi tool, the memory usage is not of concern.

Have you tried testing a file upload with curl over HTTP/2 without TLS?
curl --http2 ... or curl --http2-prior-knowledge ... to an "http://" address instead of to an "https://" address.

Yes, as stated before: Upload via HTTP/2 without TLS also works well (CPU not overloaded)

What kernel version are you using? What is the architecture? x86? x86_64? ARM? ARM64?

Quite recent kernel 5.13.0. ARM architecture

On what system are you building/packaging lighttpd?

Using Buildroot on a 64-bit debian system.

With modern kernels, you should prefer server.network-backend = "sendfile", though I don't think that will help your issue (and it shouldn't hurt any, either).

Tested. As you already noted, no major change

You might also test with a slower curl --limit-rate <speed> to see if the HTTP/2 protocol, including flow control, is overwhelming the network driver and wifi. (HTTP/2 is chattier than HTTP/1.1; HTTP/1.1 has no application-level flow control.

I will test this.

Interestingly, while switching to HTTP/1.1 or disabling TLS drops the CPU usage from 97-100% to 91-92%, switching to HTTP/1.1 and disabling TLS does not drop the CPU usage further.

Actions #13

Updated by gstrauss over 2 years ago

Yes, as stated before: Upload via HTTP/2 without TLS also works well (CPU not overloaded)

This suggests that HTTP/2 in lighttpd is not the issue, though the combination of lighttpd HTTP/2 + TLS on your overloaded system appears to be enough to push your system to thrash on CPU, possibly excessive task switch and inefficient networking use.

Interestingly, while switching to HTTP/1.1 or disabling TLS drops the CPU usage from 97-100% to 91-92%, switching to HTTP/1.1 and disabling TLS does not drop the CPU usage further.

If you're at 91-92% CPU without TLS (or with HTTP/1.1 with TLS), then you're already running low on CPU. Since lighttpd and your test CGI program are less than 50% of that, your issue probably lies in whatever kernel/driver/other is using the other 40-50%

If you're using a custom kernel, then you may have omitted some important features which affect CPU usage. Consider testing with a stock kernel for comparison.

To potentially (slightly) lower the CPU usage when TLS is in use, please consider the RECOMMENDED or STRONGER lighttpd TLS configs in lighttpd TLS Perfect Forward Secrecy. Your choice of algorithms and whether or not to include "Options" => "-ServerPreference" comes down to the features on your ARM64 chip and whether some TLS algorithms are preferred (for performance) over others for your specific ARM64 chip feature set. You may want to try putting CHACHA20 first in the cipher list, and using "Options" => "+ServerPreference" (the default in lighttpd; note the '+')

Actions #14

Updated by gstrauss over 2 years ago

  • Status changed from Need Feedback to Missing Feedback

It does not appear that you have the interest or that you do not have the time to follow up here, so I am marking this issue as "Missing Feedback"

I will still get messages if you post updates here.

Actions #15

Updated by n00b42 over 2 years ago

Thank you for all your assistance.

I will further investigate this, and test your suggested settings, try to reduce our overall CPU usage as well as check kernel/drivers.

But it will take some time and since I am not sure if I can contribute anything useful in the near future, you may as well close this issue.

Best Regards

Actions #16

Updated by gstrauss over 2 years ago

lighttpd 1.4.60 contains some tuning for HTTP/2 and generally uses less memory than lighttpd 1.4.59. You might give it a try.

Actions #17

Updated by DamienT almost 2 years ago

I have a similar problem since upgrading (from 1.4.58) to 1.4.64 which now have HTTP/2 enabled by default.

Uploading a ~100MB file onto the embedded system goes from ~2s (HTTP/1.1 SSL) to ~76s (HTTP/2 SSL).

I tested a few things, but setting ssl.read-ahead = "enable" fixes the issue, which seems strange because it's "disable" by default which seems the recommended value (including in this thread).

Other options (server.stream-request/response-body, server.network-backend) have almost no effect, HTTP/2 without SSL has good performance similar to HTTP/1.1 (Around 1.2 to 1.4s).

curl using HTTP/2 SSL negotiate with lighttpd TLSv1.3 / TLS_AES_256_GCM_SHA384.

Are there any other ssl parameter that I could try ?
What are the disadvantages of using ssl.read-ahead = "enable" ?

Actions #18

Updated by gstrauss almost 2 years ago

Are there any other ssl parameter that I could try ?

lighttpd Performance Tuning
lighttpd TLS documentation

What are the disadvantages of using ssl.read-ahead = "enable" ?

lighttpd Performance Tuning
ssl.read-ahead = "disable" (default) is strongly recommended for slower, embedded systems which process TLS packets more slowly than network wire-speed. For faster systems, test if ssl.read-ahead = "enable" improves performance (or not)

I have a similar problem since upgrading (from 1.4.58) to 1.4.64 which now have HTTP/2 enabled by default.
Uploading a ~100MB file onto the embedded system goes from ~2s (HTTP/1.1 SSL) to ~76s (HTTP/2 SSL).

Are both of those times using lighttpd 1.4.64? You probably upgraded your TLS libraries when upgrading lighttpd. What TLS library and what version are you using? (You're using mod_openssl if ssl.read-ahead = "enable" made a difference) You might try performance testing using a different lighttpd TLS module, such as mod_gnutls which should have similar performance to mod_openssl. (mbedTLS and wolfSSL are aimed at embedded systems and tend to sacrifice some performance for lower memory usage)

Actions #19

Updated by gstrauss almost 2 years ago

Since you're on an embedded system, the quality of the network driver makes a difference, and also whether or not the embedded CPU contains hardware AES acceleration. You might try using the ChaCha20-Poly1305 cipher.

You might also try testing curl --http2-prior-knowledge ... to connect to non-TLS port to see the performance using lighttpd HTTP/2 without TLS.

Actions #20

Updated by gstrauss almost 2 years ago

What version of curl and what OS is the client?
Have you tested with any other clients on other OS?
You can try any of these options individually or together.
curl -v --tls-max 1.2 --no-sessionid --cipher ECDHE-ECDSA-CHACHA20-POLY1305 https://.....

Please ensure that you are running the latest version of curl and latest TLS libraries used by curl on your client machine, and test with multiple client machines (with different OS).

You should also consider testing your lighttpd config on a machine that is not your embedded hardware, e.g. a VPS running Linux with the same config as you have on your embedded system.

Actions #21

Updated by DamienT almost 2 years ago

gstrauss wrote in #note-18:

I have a similar problem since upgrading (from 1.4.58) to 1.4.64 which now have HTTP/2 enabled by default.
Uploading a ~100MB file onto the embedded system goes from ~2s (HTTP/1.1 SSL) to ~76s (HTTP/2 SSL).

Are both of those times using lighttpd 1.4.64? You probably upgraded your TLS libraries when upgrading lighttpd. What TLS library and what version are you using? (You're using mod_openssl if ssl.read-ahead = "enable" made a difference) You might try performance testing using a different lighttpd TLS module, such as mod_gnutls which should have similar performance to mod_openssl. (mbedTLS and wolfSSL are aimed at embedded systems and tend to sacrifice some performance for lower memory usage)

Yes both 1.4.64. For TLS it's openssl, up-to-date version 1.1.1n in all cases.

Thank you for the information

Actions #22

Updated by gstrauss almost 2 years ago

You might try performance testing using a different lighttpd TLS module, such as mod_gnutls which should have similar performance to mod_openssl. (mbedTLS and wolfSSL are aimed at embedded systems and tend to sacrifice some performance for lower memory usage)

I am still interested in finding out if an alternate TLS library makes a difference in your environment.

Actions #23

Updated by gstrauss almost 2 years ago

You might try performance testing using a different lighttpd TLS module, such as mod_gnutls which should have similar performance to mod_openssl. (mbedTLS and wolfSSL are aimed at embedded systems and tend to sacrifice some performance for lower memory usage)

I am still interested in finding out if an alternate TLS library makes a difference in your environment, or if a (temporarily) downgraded version of openssl makes a difference, or if using openssl 3.0.x makes a difference.

Actions #24

Updated by DamienT almost 2 years ago

Sorry for the late reply.

I tested with gnutls and mbedtls, both have unfortunately the same problem as openssl (and the ssl.read-ahead workaround doesn't seem to work as it's not implemented).

I wasn't able to make wolfssl work.

And I had to change mod_mbedtls.c because I had errors because of mod_mbedtls_construct_crt_chain().
I think this function is supposed to return 0 on success, but the check in the code is if(!mod_mbedtls_construct_crt_chain(...)) { /* error */ }.

Actions #25

Updated by gstrauss almost 2 years ago

Thanks for noticing the logic inversion checking return value from mod_mbedtls_construct_crt_chain(). I have added a patch to fix that.

And thanks for testing alternative TLS libs. It is useful to know that this occurs with other TLS libs, so that it is not unique to openssl, though it is still curious to me that using mod_openssl config ssl.read-ahead = "enable" works around the issue for you.

In https://redmine.lighttpd.net/issues/3089#note-19 I asked:

Since you're on an embedded system, the quality of the network driver makes a difference, and also whether or not the embedded CPU contains hardware AES acceleration. You might try using the ChaCha20-Poly1305 cipher.

You might also try testing curl --http2-prior-knowledge ... to connect to non-TLS port to see the performance using lighttpd HTTP/2 without TLS.

In https://redmine.lighttpd.net/issues/3089#note-20 I asked:

What version of curl and what OS is the client?
Have you tested with any other clients on other OS?
You can try any of these options individually or together.
curl -v --tls-max 1.2 --no-sessionid --cipher ECDHE-ECDSA-CHACHA20-POLY1305 https://.....

Please ensure that you are running the latest version of curl and latest TLS libraries used by curl on your client machine, and test with multiple client machines (with different OS).

You should also consider testing your lighttpd config on a machine that is not your embedded hardware, e.g. a VPS running Linux with the same config as you have on your embedded system.

If there is something lighttpd can do better, then I need some help figuring out how to reproduce this, as I have not been able to reproduce what you are seeing.

Actions #26

Updated by gstrauss almost 2 years ago

What are the specifications of your embedded device? What is the OS (distro, version, etc)? Is the device something that can be purchased for a reasonable cost? How much memory is on the embedded device? How do the TCP tunables compare to a desktop system?

Does the embedded device have wireless and/or wired connections? Is there a difference in performance between using one or the other?

Actions #27

Updated by DamienT almost 2 years ago

It's a custom board with a SoC that has a quad-core cortex A53. I think it should be quite like a raspberry 3B/3B+ from the Linux point of view.

But actually, I'm reproducing the problem locally on my computer as well, so I will try to provide a small test config.

Actions #28

Updated by DamienT almost 2 years ago

Here is a directory with files that reproduce the problem.

To reproduce, first run gen-cert.sh to generate the HTTPS certificate, then run.sh to start lighttpd with the config file in this directory.
Then run either upload-cgi-https.sh or upload-proxy-https.sh, it generates a 100MiB file from /dev/urandom and sends to lighttpd using curl. On my computer it takes around 15 to 16s (including the ~1.3s of file generation).

To compare with http 1.1, it's possible to run the scripts with --http1.1 (forwarded to curl), which reduce the total time from 15-16 to ~1.4s (so not much more than the time to generate the file. With http it's fast as well.

My original use-case was with a reverse proxy, but the issue is similar with cgi. This configuration uses an invalid reverse proxy address, but it reproduces the problem anyway.

My computer is running archlinux kernel 5.12.14 with package lighttpd 1.4.64-1 on an Intel i7-4770 (~3.4GHz x86_64, 16GB RAM), and this issue is reproduced on localhost.
My original issue was on a Quad-Core Cortex A53 (~1.2 GHz, 4GB RAM) on a buildroot 2022.02 rootfs, linux 5.10, lighttpd 1.4.64, accessed through the network.

The effect is even worse locally than on the board, as I get 11s remote upload time on the board but 16s locally on my computer (including file generation).

I left ssl.read-ahead = "enable" commented in the config file, it can be un-commented to test as well.

Actions #29

Updated by gstrauss almost 2 years ago

@DamienT thank you for the repro instructions.

I was able to reproduce what you are seeing using your scripts, and it appears to me that it might be specific to curl. During the upload, curl soon starts sending one byte at a time, even though lighttpd has sent HTTP/2 WINDOW_UPDATE frames for each data frame received by lighttpd. In my coarse testing, having lighttpd artificially send increased space in WINDOW_UPDATE for the request stream id (e.g. stream id 1) does not make a difference, but having lighttpd artificially send increased space in WINDOW_UPDATE for the control stream (stream id 0) eliminates the slowdown. With curl sending 1 byte at a time, lighttpd is taking much more CPU as lighttpd is executing well over 100x the number of system calls to read() the 100MB upload 1 byte at a time (versus 16k bytes at a time).

I tested your script upload-cgi-https.sh and on my system it took over 23 seconds.

#!/bin/sh
time dd if=/dev/urandom bs=1M count=100 \
| curl -k --data-binary '@-' "$@" https://localhost:44300/cgi-bin/md5sum.sh

When I changed the client from curl to nghttp, running similar commands on my system took less than 2 seconds (and a large part of that 2 seconds is dd reading /dev/urandom)
#!/bin/sh
time dd if=/dev/urandom bs=1M count=100 \
| nghttp -d - https://localhost:44300/cgi-bin/md5sum.sh

When I wrote a simple HTML upload form and used Firefox to POST a 100MB file to /cgi-bin/md5sum.sh, the upload was likewise quick, easily taking less than 2 seconds, too.

When I used non-TLS, but telling curl to use HTTP/2, the upload took less than 1 second: sh upload-cgi-http.sh --http2-prior-knowledge

Likewise, it is curious why ssl.read-ahead = "enable" in lighttpd makes a difference for curl, eliminating the slowdown otherwise seen with sh upload-cgi-https.sh.

I wonder why curl appears to be miscounting the WINDOW_UPDATE frames from lighttpd with curl using TLS and HTTP/2, but not curl using HTTP/2 (non-TLS). From an strace, it appears that there is an off-by-one in the WINDOW_UPDATE count somewhere. I do not believe this is an issue with lighttpd, as lighttpd always aims to send WINDOW_UPDATE on stream id 0 with the length of the DATA frame received. Also, other HTTP/2 clients using TLS do not exhibit the slowdown you are seeing.

Would you please confirm whether or not you see slow uploads to lighttpd using TLS and HTTP/2 clients besides curl?

Actions #30

Updated by gstrauss almost 2 years ago

  • Status changed from Missing Feedback to Patch Pending
  • Target version changed from 1.4.x to 1.4.65

curl uses libnghttp2 under the covers. In a degenerative case where libnghttp2 is sending data as quickly as it is receiving WINDOW_UPDATE, and the default SETTINGS_INITIAL_WINDOW_SIZE is 65535 (1 byte less than 64k), then if libnghttp2 sends a sequence of DATA frames with the following payload bytes: 16384, 16384, 16384, 16383, 1, then the next round will be 16383, 16384, 16384, 16383, 1, 1, and the next round 16382, 168384, 16384, 16383, 1, 1, 1, continuing until most of the DATA frames are sent with a single byte. To avoid this degenerative case, my suggestion to libnghttp2 would be to have a low watermark of 8 or 16, and to wait for the window size to exceed the low watermark before sending more data (if the data queued to be sent also exceeds the low watermark). However, since there are so many existing versions of libnghttp2 out there which may trigger this degenerative case, I'll see about implementing a workaround in lighttpd.

Actions #31

Updated by gstrauss almost 2 years ago

For others who might find this: I was able to more reliable reproduce the issue using DamienT tests when I used taskset to assign lighttpd and curl to separate CPUs on the same machine. At some points, I was also using strace on lighttpd and sending the strace output to a file (not console).

Actions #32

Updated by gstrauss almost 2 years ago

  • Status changed from Patch Pending to Fixed
Actions #33

Updated by gstrauss almost 2 years ago

Reported the degenerative behavior to libnghttp2: https://github.com/nghttp2/nghttp2/issues/1722

Actions #34

Updated by gstrauss almost 2 years ago

Reported degenerative behavior to curl and proposed a mitigation patch: https://github.com/curl/curl/pull/8965

Actions #35

Updated by gstrauss almost 2 years ago

My patch in https://github.com/curl/curl/pull/8965 has been accepted, so a future version of curl (some version soon after curl 7.83.1) will not have this issue.

@n00b42 what clients were you using to upload files to lighttpd when you saw this issue? Did you see this problem with clients other than curl?

Actions #36

Updated by gstrauss over 1 year ago

curl 7.84 (released 27 Jun 2022) contains the patches I submitted in https://github.com/curl/curl/pull/8965

Actions #37

Updated by DamienT over 1 year ago

Thank you !

Actions

Also available in: Atom