Project

General

Profile

Actions

Bug #3089

closed

Slow upload / Increase CPU/Memory usage with HTTP/2 enabled

Added by n00b42 2 months ago. Updated about 2 months ago.

Status:
Missing Feedback
Priority:
Normal
Category:
core
Target version:
ASK QUESTIONS IN Forums:
No

Description

After upgrading from 1.4.58 to 1.4.59 we noticed our uploads slowing down.

Our uploads are ~200MB binary files which are handed to a cgi program.
The upload took around 1-2min before and after the upload took ~30min

Our environment is quite feeble and with .59 reached 100% CPU usage which probably is the reason for the slowdown.

After some testing, we noticed that disabling http/2 solved the problem.

Attached our config with only the last line changed to disable http/2.


Files

lighttpd.conf (2.1 KB) lighttpd.conf n00b42, 2021-07-13 12:20
Actions #1

Updated by n00b42 2 months ago

Actions #2

Updated by gstrauss 2 months ago

If you have sufficient disk space in /var/tmp but other resources are feeble, as you say, then you might consider using

server.stream-response-body = 1
server.stream-request-body = 1

or even the defaults of 0, so that client and backend offload to lighttpd as quickly as possible, freeing up those resources used by your CGI, which are (often) much less efficient than lighttpd.

If your server has feeble resources, then server.max-connections = 1024 is also likely way too high for that feeble server. Since you describe the server as feeble, you might consider server.max-connections = 16, and possibly less. How many simultaneous requests do you typically expect to be serviced by this feeble server?

Actions #3

Updated by n00b42 2 months ago

Thanks for your suggestions.

Note: It is running in an embedded device.

If you have sufficient disk space in /var/tmp

All writable disk space is mapped to a ramdisk, therefore dependent on free memory.

server.stream-response-body = 1
server.stream-request-body = 1

We used "2" to potentially decrease our memory consumption as our backend could already start working.

How many simultaneous requests do you typically expect to be serviced by this feeble server?

typically just 1. So, yes we could reduce "max-connections" but we never have this many already.

Actions #4

Updated by gstrauss 2 months ago

What is the CGI script? Is it C or is it a scripting language. Scripting languages can consume much more memory than reasonably-written C programs.

Are you able to reproduce the issue with server.max-connections = 1 along with leaving HTTP/2 enabled? What about server.max-connections = 4 ? I am curious if your system is already really short on memory. How much memory is on the embedded system and how much is free (available) when lighttpd is not running? How about when lighttpd is running?

HTTP/2 support in lighttpd legitimately uses more memory than HTTP/1.1, #3084 notwithstanding.

Actions #5

Updated by gstrauss 2 months ago

  • Status changed from New to Need Feedback
Actions #6

Updated by n00b42 2 months ago

Actually, I was able to pinpoint the problem on CPU utilization rather than memory consumption, sry for the wrong suggestions...

The system is an embedded device with only 1G memory and a singlecore of ~1Ghz.
Idle it consumes ~125MB RAM and ~25% CPU (while lighttpd runs but no transfer happens)

The upload process then results in a very high cpu usage
With http/2 enabled: nearly 99% (lighttpd only shows ~30%, and the cgi backend ~10%, but kernel/network take the rest)
Without http/2: around 90% (lighttpd only shows ~25%, and the cgi backend ~2.5%, but kernel/network take the rest)

The cgi backend is a C++ program, for testing purposes it was temporarily replaced by a C++ program that just reads from stdin (in 8B blocks) and discards the results.

Setting

server.max-connections = 1
did not change the CPU usage in a noticeable manner.

Actions #7

Updated by gstrauss 2 months ago

The system is an embedded device with only 1G memory and a singlecore of ~1Ghz.

FYI: That is not feeble.
Also, the system is active if "idle" CPU usage is 25% CPU. (That's not idle.)

If you are not memory bound, I suggest testing with
server.stream-request-body = 0
server.stream-response-body = 0
and also with your C++ discard program as the CGI.

All writable disk space is mapped to a ramdisk, therefore dependent on free memory.

If memory is not the issue, but the kernel/network is, then you should look into what the kernel/network is doing.

Ensure that if ssl.read-ahead is set, that it is set to ssl.read-ahead = "disable"

Test on localhost (on the embedded device) with a client program that uses HTTP (without TLS) and generates 200 MB of 'a's to upload.

What is the additional ~35% of CPU going towards when uploaded to lighttpd? If you have iostat available, try using iostat to gather some stats. Do you have auditing or some other logging enabled on the system? What happens to the processing using that 25% CPU usage during the upload to the CGI. Do those other processes increase CPU usage?

Actions #8

Updated by gstrauss 2 months ago

Why are you using server.network-backend = "writev" instead of the default ("sendfile")?

If you're using lighttpd 1.4.59, then depending on the performance of your PAM backend, you might also try auth.cache = ("max-age" => "600") to cache passwords for 10 mins. (Please do consider the security implications of caching passwords, even for only 10 mins.)

If all your filesystems are directed to in-memory filesystems, e.g. server.upload-dirs = ( "/var/tmp" ), then check the usage of these filesystem during an upload. You have posted that memory is not an issue, but please verify.

Even with the above, you still need to find out where the CPU is going.

Actions #9

Updated by n00b42 2 months ago

FYI: That is not feeble.
Also, the system is active if "idle" CPU usage is 25% CPU. (That's not idle.)

Sorry if my description was misleading, was comparing it to kinda powerful server HW.
Also with "idle" I was referring to the system when lighttpd has nothing to do.

If you are not memory bound, I suggest testing with
server.stream-request-body = 0
server.stream-response-body = 0
and also with your C++ discard program as the CGI.

1. We are "kinda" memory bound: We have no writeable storage, just tmpfs volumes (/tmp, /var/tmp).
I.e. as far as I understand it, buffering to /var/tmp will double the required memory (as it stored and then handed to the application)
Combined with the CGI backend decoding quickly can fill up RAM (as the file size is ~200MB)
That was the initial reason to switch to server.stream-*-body = 2.

2. Tried the settings with 0. Did not change noticable.

If memory is not the issue, but the kernel/network is, then you should look into what the kernel/network is doing.

We are investigating this as well, as we needed to patch drivers to work with newer kernel versions, there is always the possibility of something gone wrong.

Ensure that if ssl.read-ahead is set, that it is set to ssl.read-ahead = "disable"

It is set to disable.

Test on localhost (on the embedded device) with a client program that uses HTTP (without TLS) and generates 200 MB of 'a's to upload.

Used curl to send data on localhost:
- Sending using https slows down, shows the previous observe behavior.
- Using http is fine.
- using --http1.1 also is fine, with bot http/https

What is the additional ~35% of CPU going towards when uploaded to lighttpd? If you have iostat available, try using iostat to gather some stats. Do you have auditing or some other logging enabled on the system? What happens to the processing using that 25% CPU usage during the upload to the CGI. Do those other processes increase CPU usage?

According to htop, the additional ~35%+ are different kernel threads (K_TXQ_TASK, kworker/0:*events, etc), the wifi chip driver is probably involved as well.
The "other" processes more or less keep their CPU usage, or drop a little as they might not get enough.
iostat shows

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.26    0.00   21.28    0.00    0.00   74.47
After starting upload:
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          14.14    0.00   79.80    0.00    0.00    6.06

Why are you using server.network-backend = "writev" instead of the default ("sendfile")?

On the one hand, the default config we based our config on (some time ago) states: "sendfile [...] for[..] small files. [...] writev [...] for [...] large files"
(On the other hand I do not know if we have compatibility reasons or just kept it from some old config

If you're using lighttpd 1.4.59, then depending on the performance of your PAM backend, you might also try auth.cache = ("max-age" => "600") to cache passwords for 10 mins. (Please do consider the security implications of caching passwords, even for only 10 mins.)

Tested the upload without PAM. No noticable improvement.

I understand that this is a quite specific issue that might occur if http2 + https cause a higher CPU usage and when CPU is fully utilized timings/buffering breaks somehow (maybe in network stack/driver).
Therefore, I fully understand if you would want not to further spend time diagnosing it.

Regards

Actions #10

Updated by gstrauss 2 months ago

I am willing to continue the conversation here to see if there are things lighttpd could do better, even though I do not think your issue is attributable to a bug in lighttpd.

HTTP/2 processing is (typically) expected to use more CPU and memory than HTTP/1.1, but the tradeoff is often lower-latency responses to the client, and overall better performance in the aggregate (e.g. with each client making 1 socket connection and multiplexing requests, instead of making 8 socket connections).

The upload process then results in a very high cpu usage
With http/2 enabled: nearly 99% (lighttpd only shows ~30%, and the cgi backend ~10%, but kernel/network take the rest)
Without http/2: around 90% (lighttpd only shows ~25%, and the cgi backend ~2.5%, but kernel/network take the rest)

lighttpd using 5% more CPU with HTTP/2 (and TLS) seems quite reasonable to me.

I have a hunch that you may be on the brink of running out of resources for a combination of reasons, and the slightly additional memory/CPU usage with HTTP/2 and TLS in lighttpd results in the system thrashing.

Does the system go to 99% CPU if you upload a 100 MB test file instead of a 200 MB test file (using lighttpd with HTTP/2 and TLS)?

Have you considered testing with an alternative lighttpd TLS module better for low-resource and embedded systems, such as lighttpd mod_mbedtls or lighttpd mod_wolfssl? See lighttpd TLS docs. The memory footprint of TLS using lighttpd mod_mbedtls or mod_wolfssl is smaller than that of mod_openssl, though mod_openssl and mod_gnutls are more performant.

.

Thinking aloud:

Since you're using
server.stream-response-body = 2
server.stream-request-body = 2
then even if lighttpd stores some of the data in temp files, it should be no more than a MB or so as lighttpd empties the kernel socket and pipe buffers to the temporary files. There should not be anywhere near 200 MB (or more) memory or disk usage by lighttpd to serve that request. Combined with your C++ test program which reads and discards the request body, you should be able to verify that you have plenty of memory available on the system during the file upload.

Have you tried testing a file upload with curl over HTTP/2 without TLS?
curl --http2 ... or curl --http2-prior-knowledge ... to an "http://" address instead of to an "https://" address.

What kernel version are you using? What is the architecture? x86? x86_64? ARM? ARM64?
On what system are you building/packaging lighttpd? 32-bit lighttpd? 64-bit lighttpd? If using an old kernel, you should ./configure --enable-mmap so that lighttpd will use mmap with the temporary files. On newer kernels, lighttpd uses splice(), where available, to send data from temporary files to the pipe to the CGI. With modern kernels, you should prefer server.network-backend = "sendfile", though I don't think that will help your issue (and it shouldn't hurt any, either).

Actions #11

Updated by gstrauss 2 months ago

You might also test with a slower curl --limit-rate <speed> to see if the HTTP/2 protocol, including flow control, is overwhelming the network driver and wifi. (HTTP/2 is chattier than HTTP/1.1; HTTP/1.1 has no application-level flow control.

Actions #12

Updated by n00b42 2 months ago

Does the system go to 99% CPU if you upload a 100 MB test file instead of a 200 MB test file (using lighttpd with HTTP/2 and TLS)?

The file size actually doe not matter much (tested with 50/100/200MB) as the CPU peak happens right after starting.

Have you considered testing with an alternative lighttpd TLS module better for low-resource and embedded systems, such as lighttpd mod_mbedtls or lighttpd mod_wolfssl? See lighttpd TLS docs. The memory footprint of TLS using lighttpd mod_mbedtls or mod_wolfssl is smaller than that of mod_openssl, though mod_openssl and mod_gnutls are more performant.

Thanks for the hint, tried using mbedtls as it is already provided:
I think I noticed a small delay before the upload slows down (but quite small).

Thinking aloud:

Since you're using
server.stream-response-body = 2
server.stream-request-body = 2
then even if lighttpd stores some of the data in temp files, it should be no more than a MB or so as lighttpd empties the kernel socket and pipe buffers to the temporary files. There should not be anywhere near 200 MB (or more) memory or disk usage by lighttpd to serve that request. Combined with your C++ test program which reads and discards the request body, you should be able to verify that you have plenty of memory available on the system during the file upload.

Yes, using server.stream-*-body = 2 and my test cgi tool, the memory usage is not of concern.

Have you tried testing a file upload with curl over HTTP/2 without TLS?
curl --http2 ... or curl --http2-prior-knowledge ... to an "http://" address instead of to an "https://" address.

Yes, as stated before: Upload via HTTP/2 without TLS also works well (CPU not overloaded)

What kernel version are you using? What is the architecture? x86? x86_64? ARM? ARM64?

Quite recent kernel 5.13.0. ARM architecture

On what system are you building/packaging lighttpd?

Using Buildroot on a 64-bit debian system.

With modern kernels, you should prefer server.network-backend = "sendfile", though I don't think that will help your issue (and it shouldn't hurt any, either).

Tested. As you already noted, no major change

You might also test with a slower curl --limit-rate <speed> to see if the HTTP/2 protocol, including flow control, is overwhelming the network driver and wifi. (HTTP/2 is chattier than HTTP/1.1; HTTP/1.1 has no application-level flow control.

I will test this.

Interestingly, while switching to HTTP/1.1 or disabling TLS drops the CPU usage from 97-100% to 91-92%, switching to HTTP/1.1 and disabling TLS does not drop the CPU usage further.

Actions #13

Updated by gstrauss 2 months ago

Yes, as stated before: Upload via HTTP/2 without TLS also works well (CPU not overloaded)

This suggests that HTTP/2 in lighttpd is not the issue, though the combination of lighttpd HTTP/2 + TLS on your overloaded system appears to be enough to push your system to thrash on CPU, possibly excessive task switch and inefficient networking use.

Interestingly, while switching to HTTP/1.1 or disabling TLS drops the CPU usage from 97-100% to 91-92%, switching to HTTP/1.1 and disabling TLS does not drop the CPU usage further.

If you're at 91-92% CPU without TLS (or with HTTP/1.1 with TLS), then you're already running low on CPU. Since lighttpd and your test CGI program are less than 50% of that, your issue probably lies in whatever kernel/driver/other is using the other 40-50%

If you're using a custom kernel, then you may have omitted some important features which affect CPU usage. Consider testing with a stock kernel for comparison.

To potentially (slightly) lower the CPU usage when TLS is in use, please consider the RECOMMENDED or STRONGER lighttpd TLS configs in lighttpd TLS Perfect Forward Secrecy. Your choice of algorithms and whether or not to include "Options" => "-ServerPreference" comes down to the features on your ARM64 chip and whether some TLS algorithms are preferred (for performance) over others for your specific ARM64 chip feature set. You may want to try putting CHACHA20 first in the cipher list, and using "Options" => "+ServerPreference" (the default in lighttpd; note the '+')

Actions #14

Updated by gstrauss about 2 months ago

  • Status changed from Need Feedback to Missing Feedback

It does not appear that you have the interest or that you do not have the time to follow up here, so I am marking this issue as "Missing Feedback"

I will still get messages if you post updates here.

Actions #15

Updated by n00b42 about 2 months ago

Thank you for all your assistance.

I will further investigate this, and test your suggested settings, try to reduce our overall CPU usage as well as check kernel/drivers.

But it will take some time and since I am not sure if I can contribute anything useful in the near future, you may as well close this issue.

Best Regards

Actions

Also available in: Atom