Bug #3033: Memory Growth with PUT and full buffered streams - Lighttpd - lighty labs

Actions

Copy link

Bug #3033

closed

Memory Growth with PUT and full buffered streams

Added by flynn over 4 years ago. Updated over 4 years ago.

Status:

Fixed

Priority:

Normal

Category:

core

Target version:

1.4.58

ASK QUESTIONS IN Forums:

Description

My memory monitoring for lighttpd shows, that some lighttpd instances leak memory.

I could track it down to PUT-requests:
- memory leak only occurs with the setting server.stream-request-body = 0 (buffer entire request body)
- the big memory allocation happens at the end of upload before sending to the backend (here php/sabredav)
- no difference whether the request is with Content-Length or without (chunked-encoding)
- the amount of leaked memory is smaller than the uploaded file, about 15% of the file size in my tests
- no memory leak with setting server.stream-request-body = 1 or 2
- no difference whether lighttpd version 1.4.55 or the upcoming 1.4.56 is used

Files

Download all files

lighttpd-memory.png (71.9 KB) lighttpd-memory.png	Lighttpd memory leak	flynn, 2020-11-12 07:47
mod_proxy.c.diff (606 Bytes) mod_proxy.c.diff	Adapt patch from mod_fastcgi.c to mod_proxy.c	flynn, 2020-12-22 11:01

Actions

Copy link

Updated by gstrauss over 4 years ago

Does a program such as valgrind report the memory as leaked? Or are you merely seeing the lighttpd memory footprint grow? Is some of the memory released after a few minutes? If you repeat the upload, does memory grow again, or is the memory reused?

Are you using PHP/sabredev with mod_cgi or mod_fastcgi? (Probably the latter, but best to confirm)

Actions

Copy link

Updated by gstrauss over 4 years ago

Description updated (diff)

Actions

Copy link

Updated by flynn over 4 years ago

- currently I do not use valgrind
- I use the processes plugin of collectd and a self written plugin which sums up the private dirty segments, both show the same values
- memory only grows, I do not see any memory reduction/release, even after hours
. repeated uploads let the memory grow again
- php/sabredav is connected with fastcgi/fpm
- I see now the same problem with backends connected through the proxy modules (gitlab-runners uploading artifacts with chunked encoding)
Here I must use server.stream-request-body = 0, otherwise it fails with 411 Length required (this limitation should be at least documented?)

Actions

Copy link

Updated by gstrauss over 4 years ago

Trying to reproduce. Unable to reproduce using basic CGI

lighttpd.conf

server.document-root = "/dev/shm" 
server.bind = "127.0.0.1" 
server.port = 8080

server.modules += ("mod_cgi")
cgi.assign = ("" => "")

/dev/shm/up.cgi

#!/bin/sh
printf "Status: 200\n\nContent-Length: $CONTENT_LENGTH\nRequest-Method: $REQUEST_METHOD\n"

/dev/shm/up contains 1 MB text

while true; do curl -X PUT -T /dev/shm/up http://127.0.0.1:8080/up.cgi; done

Resident memory used goes up a little bit, but then stabilizes.

Aside: testing with a 2 MB text file exposed a bug in mod_cgi, commited to lighttpd master branch last month, and now fixed on master

I'll try with mod_fastcgi next. @flynn approx what size files are you uploading?

Actions

Copy link

Updated by flynn over 4 years ago

File lighttpd-memory.png lighttpd-memory.png added

I use big files to trigger the memory leak, currently I use a debian netinst iso image with 350MB.

The first upload results in memory "step" of 49MB.
The second upload results in addtitional memory "step" of 46MB.
The third upload results in addtitional memory "step" of 8MB.
The fourth upload - no change in memory, but still the same high value.

In my tests I never see a release of memory, in reality memory is sometimes released.

I attached an image of this night, it does NOT illustrate the tests above with debian netinst image,
but a real server usage.
Sometimes the memory goes up and down of the same amount, but often not ...

Actions

Copy link

Updated by gstrauss over 4 years ago

Something is clearly wrong in lighttpd for it to use that much memory from sequential PUT requests which ought to be buffered to disk.

Are the connections over TLS? Have you modified other default settings? For example, ssl.read-ahead = "disable" is the default, but if you enable it on slow systems (where TLS may be slower than the network speed), you might end up reading a whole lot into memory there. For that reason, ssl.read-ahead = "disable" should remain disabled on embedded systems.

Here I must use server.stream-request-body = 0, otherwise it fails with 411 Length required (this limitation should be at least documented?)

FYI: the CGI gateway specification requires CONTENT_LENGTH. This is a requirement for CGI, FastCGI, SCGI, etc., namely any CGI-based gateway. Other (non-CGI-gateway) protocols include the HTTP protocol (used by mod_proxy) or AJP.

Actions

Copy link

Updated by flynn over 4 years ago

I did not change/configure ssl.read-ahead.

The connections are all over TLS, I try to make tests without ...

Actions

Copy link

Updated by flynn over 4 years ago

FYI: we already have had a discussion about PUT and chunked encoding, see ticket #2156. You fixed it, but only if set server.stream-request-body = 0.

I know at least two clients in use, which do PUT requests without Content-Length, so it is important for me.

Actions

Copy link

Updated by gstrauss over 4 years ago

I know at least two clients in use, which do PUT requests without Content-Length, so it is important for me.

No worries. My comment was about the requirement that lighttpd send the CONTENT_LENGTH environment variable to the backend via the CGI gateway protocol, not that the client must provide a Content-Length HTTP request header. It is the CGI gateway protocol that necessitates lighttpd buffering the HTTP request body and then determining the size in order to send CONTENT_LENGTH to the backend via the CGI gateway protocol (if Content-Length is not provided by the HTTP client).

I am trying some simple tests with FastCGI, and have so far not reproduced the large memory usage.

lighttpd.conf

server.document-root = "/dev/shm" 
server.bind = "127.0.0.1" 
server.port = 8080

server.modules += ("mod_fastcgi")
fastcgi.server = ( ".fcgi" =>
  (( "socket" => "/dev/shm/up.sock",
     "bin-path" => "/dev/shm/up.fcgi",
     "max-procs" => 1
  ))
)

/dev/shm/up.c (compiled into /dev/shm/up.fcgi)

#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#include <fcgi_stdio.h>

int main (void) {
    char buf[16384];

    while (FCGI_Accept() >= 0) {
        char * const p = getenv("CONTENT_LENGTH");
        ssize_t clen = p ? (ssize_t)atoi(p) : 0;
        while (clen > 0) {
            size_t rd = FCGI_fread(buf, 1, sizeof(buf), FCGI_stdin);
            if (feof(FCGI_stdin) || ferror(FCGI_stdin))
                break;
            clen -= (ssize_t)rd;
        }
        printf("Status: 200 OK\r\nContent-Length: 0\r\n\r\n");
    }

    return 0;
}

/dev/shm/up contains 64 MB text

while true; do curl -X PUT -T /dev/shm/up -H 'Expect:' http://127.0.0.1:8080/up.fcgi; done

Actions

Copy link

#10

Updated by flynn over 4 years ago

Same behaviout without TLS connection: the memory steps for each iteration are different, the memory amount after four iterations is the same as with TLS connection. Even after 10 and more iterations with same file it does not grow again (around 25% of the filesize). Using a bigger file, memory grows again.

I also tried valgrind and clang LeakSanitizer, but both tools need a proper end of the program to report memory leaks.
With ctrl+C after the request I see no memory leak with both tools.
Do have an idea, how to use valgrind in this case correctly?

Using valgrind with big files is very slow, it takes hour(s)

Actions

Copy link

#11

Updated by gstrauss over 4 years ago

Tested my config above adding mod_openssl, and, separately, adding mod_mbedtls. Testing over TLS used more memory, as expected, but memory usage stabilized for both.

So far, I have been testing with a single shell script with a single loop calling curl. I'll try a few in parallel, but I think I need some additional help reproducing this. I also need some sleep, and so will pick this up again tomorrow. :)

Based on your recent bug reports, it is quite likely you are using lighttpd-1.4.56 (pre-release) from lighttpd master branch. Upon which commit have you built? (If you are proxying back to a lighttpd instance prior to 1.4.54, then lighttpd has a bug in #2948 where lighttpd 1.4.52 and lighttpd 1.4.53 used a lot of memory for HTTP request bodies. That was fixed in lighttpd 1.4.54, released 27 May 2019.)

Actions

Copy link

#12

Updated by gstrauss over 4 years ago

With larger files, I do see more memory usage, so I have something to look into and instrument. More tomorrow.

Actions

Copy link

#13

Updated by flynn over 4 years ago

I can reproduce the problem with your up.fcgi backend and a 900MB file upload.

I use always the current Debian packages in the following versions:
- 1.4.55-1 for the servers, where the image is made from
- 1.4.56-rc for my local tests with the patches from #3030 and #3031

The deeper reason for this memory montoring is to get memory "baseline" before enabling HTTP/2.

Actions

Copy link

#14

Updated by gstrauss over 4 years ago

I need to verify this further, as this might be an incomplete patch, but the issue appears to be in mod_fastcgi aggressively framing ready input to send to the backend, and doing so in memory. I don't think the memory is leaking, but clearly too much is being used by the process. Even if the memory does get reused, we want to avoid expanding the address space so much.

--- a/src/mod_fastcgi.c
+++ b/src/mod_fastcgi.c
@@ -221,8 +221,9 @@ static handler_t fcgi_stdin_append(handler_ctx *hctx) {
        FCGI_Header header;
        chunkqueue * const req_cq = &hctx->r->reqbody_queue;
        off_t offset, weWant;
-       const off_t req_cqlen = chunkqueue_length(req_cq);
+       off_t req_cqlen = chunkqueue_length(req_cq);
        int request_id = hctx->request_id;
+       if (req_cqlen > MAX_WRITE_LIMIT) req_cqlen = MAX_WRITE_LIMIT;

        /* something to send ? */
        for (offset = 0; offset != req_cqlen; offset += weWant) {

Actions

Copy link

#15

Updated by gstrauss over 4 years ago

At the very least, I need to reschedule the request once the queue to the backend clears, in the case where we have additional data buffered from the client, ready to be framed and sent to backend FastCGI.

Actions

Copy link

#16

Updated by flynn over 4 years ago

I applied your patch and tested three iterations with upload of 350MB, still only 5MB private dirty memory (before this patch ~80-90MB).

But this fixes only the fastcgi case, I have a similar issue with mod_proxy:

gitlab_runner -> upload artifacts with chunked encoding -> lighttpd reverse proxy -> gitlab

At the moment I have not verified this in single test instance, I see this only on production servers.

Actions

Copy link

#17

Updated by gstrauss over 4 years ago

Subject changed from Memory Leak with PUT and full buffered streams to Memory Growth with PUT and full buffered streams

Actions

Copy link

#18

Updated by gstrauss over 4 years ago

Status changed from New to Patch Pending

I pushed a small patch to the tip of my dev branch personal/gstrauss/master. This should address the potential large memory use with mod_fastcgi and server.stream-request-body = 0

For mod_proxy, a similar two-line patch could be applied to proxy_stdin_append() in mod_proxy as was applied to fcgi_stdin_append() in mod_fastcgi, but I am not yet convinced this additional patch is needed for mod_proxy. What you are seeing with mod_proxy might be a different issue, or it might already be addressed with the changes I made to gw_backend.c on my dev branch as part of the patch for fastcgi memory use. I need to look further.

flynn: are you using server.stream-request-body = 0 with reverse proxy to gitlab? If so, I need to try to repro large memory use with mod_proxy. If not, then the patch on my dev branch likely addresses this, too, as the framing (whether FastCGI or Transfer-Encoding: chunked) is now done when the write buffer to the backend drains below a threshold, rather than on all available input all at once.

Actions

Copy link

#19

Updated by gstrauss over 4 years ago

Description updated (diff)

Actions

Copy link

#20

Updated by gstrauss over 4 years ago

@flynn: I have seen clean behavior from mod_proxy as I tried to reverse-proxy a 256 MB PUT.

If I understand correctly, your production systems are running Debian package lighttpd-1.4.55-1 and that is where you are seeing the increased memory usage from the uploads? Are you able to reproduce this issue with the lighttpd-1.4.56 release candidate?

BTW, lighttpd 1.4.56 does not require server.stream-request-body = 0 for mod_proxy, as mod_proxy in lighttpd 1.4.56 supports making HTTP/1.1 requests to the reverse-proxy backend, including both sending and receiving Transfer-Encoding: chunked.

Actions

Copy link

#21

Updated by gstrauss over 4 years ago

Status changed from Patch Pending to Fixed

Applied in changeset f2b33e752009065f85611a71c18ac5df94247ce7.

Actions

Copy link

#22

Updated by gstrauss over 4 years ago

(issue automatically closed with patch push)

If this issue is still unresolved, please reopen. If a new or different issue, please open a new issue.

Actions

Copy link

#23

Updated by flynn over 4 years ago

After 1-2 days of testing with new version, memory looks good:
- baseline memory between 4-15 MB
- some intermediate peaks up to 50MB, that go back to the baseline memory

So normal memory consumption now.

Actions

Copy link

#24

Updated by flynn over 4 years ago

Most of the servers have very low memory values now over long times, but one server still has high values (lower than before the patches, but still too high): PUT requests with mod_proxy connected backends.

gstrauss:
mod_fastcgi and mod_proxy share the same code handling the chunkqueue length, I think the patch in f2b33e752009065f85611a71c18ac5df94247ce7 to mod_fastcgi should be applied to mod_proxy too?

Actions

Copy link

#25

Updated by flynn over 4 years ago

File mod_proxy.c.diff mod_proxy.c.diff added

Actions

Copy link

#26

Updated by gstrauss over 4 years ago

mod_proxy sets func ptr hctx->gw.stdin_append = proxy_stdin_append and the code in gw_backend.c in f2b33e75 should handle this, similar to the code in mod_fastcgi.c (in the same commit) handling FastCGI framing.

So that I have a place to start: what are your settings for server.stream-request-body and server.stream-response-body ? Do you know if the server with high memory usage is receiving large request bodies (likely) and/or sending large responses? Are the requests from clients mostly HTTP/1.1, mostly HTTP/2, or a mix? (I am trying to get an idea of which direction to look for places that might need back-pressure.)

I'll take a closer look after I get some sleep.

Actions

Copy link

#27

Updated by flynn over 4 years ago

server.stream-response-body = 1 server.stream-request-body = 0

The memory growth correlates with HTTP/1.1 PUT-requests to a mod_proxy connected backend. The PUT requests are not that big (1-10MB), but many in parallel (backend has 12 workers at the moment).

Actions

Copy link

#28

Updated by flynn over 4 years ago

Contrary to what I claimed before, I can now reproduce the problem with a pure GET request:

big response, e.g. 50MB
it happens with HTTP/1.1 and HTTP/2.0
backend fastcgi
in most of my tests is server.stream-response-body = 1 and server.stream-request-body = 1, but I see the same behaviour also with values of 0 or 2.

The memory growth corresponds to the size of download, a following download of the same size does not grow again. A bigger download download than before grows only by the difference of the size.

Actions

Copy link

#29

Updated by gstrauss over 4 years ago

Quick update:
I was able to reproduce some bad behavior when doing large downloads in parallel.
I do see large increase in memory usage for HTTP/1.1, but not with HTTP/2.
I did not see an issue when server.stream-response-body = 0
I will continue looking into this tomorrow.

Actions

Copy link

#30

Updated by gstrauss over 4 years ago

Disabling the cache in chunk.c for oversized chunks appears to address the memory use in the few cases I tried.

--- a/src/chunk.c
+++ b/src/chunk.c
@@ -194,11 +194,14 @@ static void chunk_release(chunk *c) {
         chunks = c;
     }
     else if (sz > chunk_buf_sz) {
+        chunk_free(c);
+#if 0
         chunk_reset(c);
         chunk **co = &chunks_oversized;
         while (*co && sz < (*co)->mem->size) co = &(*co)->next;
         c->next = *co;
         *co = c;
+#endif
     }
     else {
         chunk_free(c);

Actions

Copy link

#31

Updated by flynn over 4 years ago

I'll test this und report in a few days ...

Actions

Copy link

#32

Updated by gstrauss over 4 years ago

Status changed from Fixed to Patch Pending
Target version changed from 1.4.56 to 1.4.58

In http-header-glue.c:http_response_read() there is a call to buffer_string_prepare_append(). This call should instead be changed to reuse buffers from cached oversized chunks.

The chunks saved in the oversized chunk cache are otherwise unused, and get freed once every 64 seconds by a lighttpd maintenance process. However, quite a few can build up in that timeframe. Also, if other allocations occur in the meantime, the process might not be able to return the memory to the OS, resulting in a larger memory footprint. The memory is reused -- it is not a memory leak -- but does result in a potentially much larger footprint for the process.

The quick patch is simply to disable the oversized chunk cache, as is done in the patch above.

I am working on an improved patch to properly reuse the chunks, including setting a limit on the number of chunks in the oversized chunk cache.

Actions

Copy link

#33

Updated by gstrauss over 4 years ago

Status changed from Patch Pending to Fixed

Applied in changeset 7ba521ffb4959f6f74a609d5d4acafc29a038337.

Actions

Copy link

#34

Updated by gstrauss over 4 years ago

Additional bug fixed in https://git.lighttpd.net/lighttpd/lighttpd1.4/commit/e16b4503e2df94bdcac356d2738668611a586364
(on my dev branch; not yet on master)

The bug could result in corrupted FastCGI uploads. Bug was introduced in f2b33e75 when adding back-pressure (for this issue #3033). I erroneously combined two conditions which need to be separate.

Actions

Copy link

Also available in: Atom

Project

General

Profile

Lighttpd

Custom queries

Bug #3033

Memory Growth with PUT and full buffered streams

Updated by gstrauss over 4 years ago

Updated by gstrauss over 4 years ago

Updated by flynn over 4 years ago

Updated by gstrauss over 4 years ago

Updated by flynn over 4 years ago

Updated by gstrauss over 4 years ago

Updated by flynn over 4 years ago

Updated by flynn over 4 years ago

Updated by gstrauss over 4 years ago

Updated by flynn over 4 years ago

Updated by gstrauss over 4 years ago

Updated by gstrauss over 4 years ago

Updated by flynn over 4 years ago

Updated by gstrauss over 4 years ago

Updated by gstrauss over 4 years ago

Updated by flynn over 4 years ago

Updated by gstrauss over 4 years ago

Updated by gstrauss over 4 years ago

Updated by gstrauss over 4 years ago

Updated by gstrauss over 4 years ago

Updated by gstrauss over 4 years ago

Updated by gstrauss over 4 years ago

Updated by flynn over 4 years ago

Updated by flynn over 4 years ago

Updated by flynn over 4 years ago

Updated by gstrauss over 4 years ago

Updated by flynn over 4 years ago

Updated by flynn over 4 years ago

Updated by gstrauss over 4 years ago

Updated by gstrauss over 4 years ago

Updated by flynn over 4 years ago

Updated by gstrauss over 4 years ago

Updated by gstrauss over 4 years ago

Updated by gstrauss over 4 years ago