Project

General

Profile

Actions

Bug #3033

closed

Memory Growth with PUT and full buffered streams

Added by flynn about 4 years ago. Updated almost 4 years ago.

Status:
Fixed
Priority:
Normal
Category:
core
Target version:
ASK QUESTIONS IN Forums:
No

Description

My memory monitoring for lighttpd shows, that some lighttpd instances leak memory.

I could track it down to PUT-requests:
- memory leak only occurs with the setting server.stream-request-body = 0 (buffer entire request body)
- the big memory allocation happens at the end of upload before sending to the backend (here php/sabredav)
- no difference whether the request is with Content-Length or without (chunked-encoding)
- the amount of leaked memory is smaller than the uploaded file, about 15% of the file size in my tests
- no memory leak with setting server.stream-request-body = 1 or 2
- no difference whether lighttpd version 1.4.55 or the upcoming 1.4.56 is used


Files

lighttpd-memory.png (71.9 KB) lighttpd-memory.png Lighttpd memory leak flynn, 2020-11-12 07:47
mod_proxy.c.diff (606 Bytes) mod_proxy.c.diff Adapt patch from mod_fastcgi.c to mod_proxy.c flynn, 2020-12-22 11:01
Actions #1

Updated by gstrauss about 4 years ago

Does a program such as valgrind report the memory as leaked? Or are you merely seeing the lighttpd memory footprint grow? Is some of the memory released after a few minutes? If you repeat the upload, does memory grow again, or is the memory reused?

Are you using PHP/sabredev with mod_cgi or mod_fastcgi? (Probably the latter, but best to confirm)

Actions #2

Updated by gstrauss about 4 years ago

  • Description updated (diff)
Actions #3

Updated by flynn about 4 years ago

- currently I do not use valgrind
- I use the processes plugin of collectd and a self written plugin which sums up the private dirty segments, both show the same values
- memory only grows, I do not see any memory reduction/release, even after hours
. repeated uploads let the memory grow again
- php/sabredav is connected with fastcgi/fpm
- I see now the same problem with backends connected through the proxy modules (gitlab-runners uploading artifacts with chunked encoding)
Here I must use server.stream-request-body = 0, otherwise it fails with 411 Length required (this limitation should be at least documented?)

Actions #4

Updated by gstrauss about 4 years ago

Trying to reproduce. Unable to reproduce using basic CGI

lighttpd.conf

server.document-root = "/dev/shm" 
server.bind = "127.0.0.1" 
server.port = 8080

server.modules += ("mod_cgi")
cgi.assign = ("" => "")

/dev/shm/up.cgi
#!/bin/sh
printf "Status: 200\n\nContent-Length: $CONTENT_LENGTH\nRequest-Method: $REQUEST_METHOD\n" 

/dev/shm/up contains 1 MB text

while true; do curl -X PUT -T /dev/shm/up http://127.0.0.1:8080/up.cgi; done

Resident memory used goes up a little bit, but then stabilizes.

Aside: testing with a 2 MB text file exposed a bug in mod_cgi, commited to lighttpd master branch last month, and now fixed on master

I'll try with mod_fastcgi next. @flynn approx what size files are you uploading?

Actions #5

Updated by flynn about 4 years ago

I use big files to trigger the memory leak, currently I use a debian netinst iso image with 350MB.

The first upload results in memory "step" of 49MB.
The second upload results in addtitional memory "step" of 46MB.
The third upload results in addtitional memory "step" of 8MB.
The fourth upload - no change in memory, but still the same high value.

In my tests I never see a release of memory, in reality memory is sometimes released.

I attached an image of this night, it does NOT illustrate the tests above with debian netinst image,
but a real server usage.
Sometimes the memory goes up and down of the same amount, but often not ...

Actions #6

Updated by gstrauss about 4 years ago

Something is clearly wrong in lighttpd for it to use that much memory from sequential PUT requests which ought to be buffered to disk.

Are the connections over TLS? Have you modified other default settings? For example, ssl.read-ahead = "disable" is the default, but if you enable it on slow systems (where TLS may be slower than the network speed), you might end up reading a whole lot into memory there. For that reason, ssl.read-ahead = "disable" should remain disabled on embedded systems.

Here I must use server.stream-request-body = 0, otherwise it fails with 411 Length required (this limitation should be at least documented?)

FYI: the CGI gateway specification requires CONTENT_LENGTH. This is a requirement for CGI, FastCGI, SCGI, etc., namely any CGI-based gateway. Other (non-CGI-gateway) protocols include the HTTP protocol (used by mod_proxy) or AJP.

Actions #7

Updated by flynn about 4 years ago

I did not change/configure ssl.read-ahead.

The connections are all over TLS, I try to make tests without ...

Actions #8

Updated by flynn about 4 years ago

FYI: we already have had a discussion about PUT and chunked encoding, see ticket #2156. You fixed it, but only if set server.stream-request-body = 0.

I know at least two clients in use, which do PUT requests without Content-Length, so it is important for me.

Actions #9

Updated by gstrauss about 4 years ago

I know at least two clients in use, which do PUT requests without Content-Length, so it is important for me.

No worries. My comment was about the requirement that lighttpd send the CONTENT_LENGTH environment variable to the backend via the CGI gateway protocol, not that the client must provide a Content-Length HTTP request header. It is the CGI gateway protocol that necessitates lighttpd buffering the HTTP request body and then determining the size in order to send CONTENT_LENGTH to the backend via the CGI gateway protocol (if Content-Length is not provided by the HTTP client).

I am trying some simple tests with FastCGI, and have so far not reproduced the large memory usage.

lighttpd.conf

server.document-root = "/dev/shm" 
server.bind = "127.0.0.1" 
server.port = 8080

server.modules += ("mod_fastcgi")
fastcgi.server = ( ".fcgi" =>
  (( "socket" => "/dev/shm/up.sock",
     "bin-path" => "/dev/shm/up.fcgi",
     "max-procs" => 1
  ))
)

/dev/shm/up.c (compiled into /dev/shm/up.fcgi)
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#include <fcgi_stdio.h>

int main (void) {
    char buf[16384];

    while (FCGI_Accept() >= 0) {
        char * const p = getenv("CONTENT_LENGTH");
        ssize_t clen = p ? (ssize_t)atoi(p) : 0;
        while (clen > 0) {
            size_t rd = FCGI_fread(buf, 1, sizeof(buf), FCGI_stdin);
            if (feof(FCGI_stdin) || ferror(FCGI_stdin))
                break;
            clen -= (ssize_t)rd;
        }
        printf("Status: 200 OK\r\nContent-Length: 0\r\n\r\n");
    }

    return 0;
}

/dev/shm/up contains 64 MB text

while true; do curl -X PUT -T /dev/shm/up -H 'Expect:' http://127.0.0.1:8080/up.fcgi; done

Actions #10

Updated by flynn about 4 years ago

Same behaviout without TLS connection: the memory steps for each iteration are different, the memory amount after four iterations is the same as with TLS connection. Even after 10 and more iterations with same file it does not grow again (around 25% of the filesize). Using a bigger file, memory grows again.

I also tried valgrind and clang LeakSanitizer, but both tools need a proper end of the program to report memory leaks.
With ctrl+C after the request I see no memory leak with both tools.
Do have an idea, how to use valgrind in this case correctly?

Using valgrind with big files is very slow, it takes hour(s)

Actions #11

Updated by gstrauss about 4 years ago

Tested my config above adding mod_openssl, and, separately, adding mod_mbedtls. Testing over TLS used more memory, as expected, but memory usage stabilized for both.

So far, I have been testing with a single shell script with a single loop calling curl. I'll try a few in parallel, but I think I need some additional help reproducing this. I also need some sleep, and so will pick this up again tomorrow. :)

Based on your recent bug reports, it is quite likely you are using lighttpd-1.4.56 (pre-release) from lighttpd master branch. Upon which commit have you built? (If you are proxying back to a lighttpd instance prior to 1.4.54, then lighttpd has a bug in #2948 where lighttpd 1.4.52 and lighttpd 1.4.53 used a lot of memory for HTTP request bodies. That was fixed in lighttpd 1.4.54, released 27 May 2019.)

Actions #12

Updated by gstrauss about 4 years ago

With larger files, I do see more memory usage, so I have something to look into and instrument. More tomorrow.

Actions #13

Updated by flynn about 4 years ago

I can reproduce the problem with your up.fcgi backend and a 900MB file upload.

I use always the current Debian packages in the following versions:
- 1.4.55-1 for the servers, where the image is made from
- 1.4.56-rc for my local tests with the patches from #3030 and #3031

The deeper reason for this memory montoring is to get memory "baseline" before enabling HTTP/2.

Actions #14

Updated by gstrauss about 4 years ago

I need to verify this further, as this might be an incomplete patch, but the issue appears to be in mod_fastcgi aggressively framing ready input to send to the backend, and doing so in memory. I don't think the memory is leaking, but clearly too much is being used by the process. Even if the memory does get reused, we want to avoid expanding the address space so much.

--- a/src/mod_fastcgi.c
+++ b/src/mod_fastcgi.c
@@ -221,8 +221,9 @@ static handler_t fcgi_stdin_append(handler_ctx *hctx) {
        FCGI_Header header;
        chunkqueue * const req_cq = &hctx->r->reqbody_queue;
        off_t offset, weWant;
-       const off_t req_cqlen = chunkqueue_length(req_cq);
+       off_t req_cqlen = chunkqueue_length(req_cq);
        int request_id = hctx->request_id;
+       if (req_cqlen > MAX_WRITE_LIMIT) req_cqlen = MAX_WRITE_LIMIT;

        /* something to send ? */
        for (offset = 0; offset != req_cqlen; offset += weWant) {

Actions #15

Updated by gstrauss about 4 years ago

At the very least, I need to reschedule the request once the queue to the backend clears, in the case where we have additional data buffered from the client, ready to be framed and sent to backend FastCGI.

Actions #16

Updated by flynn about 4 years ago

I applied your patch and tested three iterations with upload of 350MB, still only 5MB private dirty memory (before this patch ~80-90MB).

But this fixes only the fastcgi case, I have a similar issue with mod_proxy:

gitlab_runner -> upload artifacts with chunked encoding -> lighttpd reverse proxy -> gitlab

At the moment I have not verified this in single test instance, I see this only on production servers.

Actions #17

Updated by gstrauss about 4 years ago

  • Subject changed from Memory Leak with PUT and full buffered streams to Memory Growth with PUT and full buffered streams
Actions #18

Updated by gstrauss about 4 years ago

  • Status changed from New to Patch Pending

I pushed a small patch to the tip of my dev branch personal/gstrauss/master. This should address the potential large memory use with mod_fastcgi and server.stream-request-body = 0

For mod_proxy, a similar two-line patch could be applied to proxy_stdin_append() in mod_proxy as was applied to fcgi_stdin_append() in mod_fastcgi, but I am not yet convinced this additional patch is needed for mod_proxy. What you are seeing with mod_proxy might be a different issue, or it might already be addressed with the changes I made to gw_backend.c on my dev branch as part of the patch for fastcgi memory use. I need to look further.

flynn: are you using server.stream-request-body = 0 with reverse proxy to gitlab? If so, I need to try to repro large memory use with mod_proxy. If not, then the patch on my dev branch likely addresses this, too, as the framing (whether FastCGI or Transfer-Encoding: chunked) is now done when the write buffer to the backend drains below a threshold, rather than on all available input all at once.

Actions #19

Updated by gstrauss about 4 years ago

  • Description updated (diff)
Actions #20

Updated by gstrauss about 4 years ago

@flynn: I have seen clean behavior from mod_proxy as I tried to reverse-proxy a 256 MB PUT.

If I understand correctly, your production systems are running Debian package lighttpd-1.4.55-1 and that is where you are seeing the increased memory usage from the uploads? Are you able to reproduce this issue with the lighttpd-1.4.56 release candidate?

BTW, lighttpd 1.4.56 does not require server.stream-request-body = 0 for mod_proxy, as mod_proxy in lighttpd 1.4.56 supports making HTTP/1.1 requests to the reverse-proxy backend, including both sending and receiving Transfer-Encoding: chunked.

Actions #21

Updated by gstrauss about 4 years ago

  • Status changed from Patch Pending to Fixed
Actions #22

Updated by gstrauss about 4 years ago

(issue automatically closed with patch push)

If this issue is still unresolved, please reopen. If a new or different issue, please open a new issue.

Actions #23

Updated by flynn about 4 years ago

After 1-2 days of testing with new version, memory looks good:
- baseline memory between 4-15 MB
- some intermediate peaks up to 50MB, that go back to the baseline memory

So normal memory consumption now.

Actions #24

Updated by flynn about 4 years ago

Most of the servers have very low memory values now over long times, but one server still has high values (lower than before the patches, but still too high): PUT requests with mod_proxy connected backends.

gstrauss:
mod_fastcgi and mod_proxy share the same code handling the chunkqueue length, I think the patch in f2b33e752009065f85611a71c18ac5df94247ce7 to mod_fastcgi should be applied to mod_proxy too?

Actions #26

Updated by gstrauss about 4 years ago

mod_proxy sets func ptr hctx->gw.stdin_append = proxy_stdin_append and the code in gw_backend.c in f2b33e75 should handle this, similar to the code in mod_fastcgi.c (in the same commit) handling FastCGI framing.

So that I have a place to start: what are your settings for server.stream-request-body and server.stream-response-body ? Do you know if the server with high memory usage is receiving large request bodies (likely) and/or sending large responses? Are the requests from clients mostly HTTP/1.1, mostly HTTP/2, or a mix? (I am trying to get an idea of which direction to look for places that might need back-pressure.)

I'll take a closer look after I get some sleep.

Actions #27

Updated by flynn about 4 years ago

server.stream-response-body = 1
server.stream-request-body = 0

The memory growth correlates with HTTP/1.1 PUT-requests to a mod_proxy connected backend. The PUT requests are not that big (1-10MB), but many in parallel (backend has 12 workers at the moment).

Actions #28

Updated by flynn about 4 years ago

Contrary to what I claimed before, I can now reproduce the problem with a pure GET request:
  • big response, e.g. 50MB
  • it happens with HTTP/1.1 and HTTP/2.0
  • backend fastcgi
  • in most of my tests is server.stream-response-body = 1 and server.stream-request-body = 1, but I see the same behaviour also with values of 0 or 2.

The memory growth corresponds to the size of download, a following download of the same size does not grow again. A bigger download download than before grows only by the difference of the size.

Actions #29

Updated by gstrauss about 4 years ago

Quick update:
I was able to reproduce some bad behavior when doing large downloads in parallel.
I do see large increase in memory usage for HTTP/1.1, but not with HTTP/2.
I did not see an issue when server.stream-response-body = 0
I will continue looking into this tomorrow.

Actions #30

Updated by gstrauss about 4 years ago

Disabling the cache in chunk.c for oversized chunks appears to address the memory use in the few cases I tried.

--- a/src/chunk.c
+++ b/src/chunk.c
@@ -194,11 +194,14 @@ static void chunk_release(chunk *c) {
         chunks = c;
     }
     else if (sz > chunk_buf_sz) {
+        chunk_free(c);
+#if 0
         chunk_reset(c);
         chunk **co = &chunks_oversized;
         while (*co && sz < (*co)->mem->size) co = &(*co)->next;
         c->next = *co;
         *co = c;
+#endif
     }
     else {
         chunk_free(c);

Actions #31

Updated by flynn about 4 years ago

I'll test this und report in a few days ...

Actions #32

Updated by gstrauss about 4 years ago

  • Status changed from Fixed to Patch Pending
  • Target version changed from 1.4.56 to 1.4.58

In http-header-glue.c:http_response_read() there is a call to buffer_string_prepare_append(). This call should instead be changed to reuse buffers from cached oversized chunks.

The chunks saved in the oversized chunk cache are otherwise unused, and get freed once every 64 seconds by a lighttpd maintenance process. However, quite a few can build up in that timeframe. Also, if other allocations occur in the meantime, the process might not be able to return the memory to the OS, resulting in a larger memory footprint. The memory is reused -- it is not a memory leak -- but does result in a potentially much larger footprint for the process.

The quick patch is simply to disable the oversized chunk cache, as is done in the patch above.

I am working on an improved patch to properly reuse the chunks, including setting a limit on the number of chunks in the oversized chunk cache.

Actions #33

Updated by gstrauss about 4 years ago

  • Status changed from Patch Pending to Fixed
Actions #34

Updated by gstrauss almost 4 years ago

Additional bug fixed in https://git.lighttpd.net/lighttpd/lighttpd1.4/commit/e16b4503e2df94bdcac356d2738668611a586364
(on my dev branch; not yet on master)

The bug could result in corrupted FastCGI uploads. Bug was introduced in f2b33e75 when adding back-pressure (for this issue #3033). I erroneously combined two conditions which need to be separate.

Actions

Also available in: Atom