Project

General

Profile

Actions

Bug #3084

open

Memory fragmentation with HTTP/2 enabled

Added by ZivHung 19 days ago. Updated 2 days ago.

Status:
New
Priority:
Normal
Category:
core
Target version:
ASK QUESTIONS IN Forums:
Yes

Description

Hello,

We use lighttpd 1.4.59 to receive file for our linux server, according to our test process, we repeatedly transfer a 4 KB file per 10 milliseconds a time, memory of lighttpd process will increase around 3000 KB after 20 minutes, (it will increase about 7500 KB after one hour). However, we use same condition to check lighttpd 1.4.58 , we find memory won’t increase any KB, and after test one hour, memory size would even become less than an hour ago.

In these two experiments, lighttpd.conf is the same, and so does our linux server, therefore, we think there is a memory leak problem in 1.4.59.
Please check our lighttpd.conf setting in attached file.


Files

lighttpd.conf (5.13 KB) lighttpd.conf ZivHung, 2021-06-01 02:14
lighttpd.conf (5.15 KB) lighttpd.conf ZivHung, 2021-06-02 05:44
lighttpd.conf (5.05 KB) lighttpd.conf ZivHung, 2021-06-02 14:13
lighttpd-memory.png (87.3 KB) lighttpd-memory.png Lighttpd memory usage for one week flynn, 2021-06-03 13:03
Actions #1

Updated by gstrauss 19 days ago

Thank you for providing your lighttpd.conf.

Would you please help narrow down what behavior is causing the memory increase?
What requests?
FastCGI requests?
WSTunnel requests?
CGI requests? I see CGI directives in your lighttpd.conf, but you have commented out "mod_cgi" in server.modules

Does the issue go away if you change the streaming parameters? (one or both)

server.stream-request-body = 1
server.stream-response-body = 1

or with
server.stream-request-body = 0
server.stream-response-body = 0

Is there a reason you have disabled TLS session tickets? (-SessionTicket)?
Is there a reason you are using writev instead of the default sendfile for server.network-backend="writev" ?

BTW, if you want lighttpd to omit the Server response header, use server.tag = ""

I see that you have disabled ETag generation. If the responses should never be cached, you might also disable the stat cache and see if that makes a difference in memory use. server.stat-cache-engine = "disable"

Actions #2

Updated by gstrauss 19 days ago

Another difference between lighttpd 1.4.58 and lighttpd 1.4.59 is enabling HTTP/2 by default.

Please test with server.feature-flags = ( "server.h2proto" => "disable" ) to see if that makes a difference for you.

Actions #3

Updated by ZivHung 18 days ago

Hello,

Thanks for the tip, after we tested the lighttpd 1.4.59 in adding “server.feature-flags = ( "server.h2proto" => "disable" )”, the memory size didn’t climb up(match our expectation).

About your question, we use FastCGI to transfer CGI command. Besides, change the setting of server.stream-request(response)-body doesn’t change the situation of memory size climbing up, so does the “server.stat-cache-engine = "disable"”, however, we will add this setting because we indeed don’t need to cache the response. (please check our latest lighttpd.conf)

Last question, “Is there a reason you have disabled TLS session tickets? (-SessionTicket)?”, we saw the definition suggest that this setting should be add before lighttpd 1.4.56, so if we use the version after that, is this setting necessary?

Actions #4

Updated by gstrauss 18 days ago

Thanks for the tip, after we tested the lighttpd 1.4.59 in adding “server.feature-flags = ( "server.h2proto" => "disable" )”, the memory size didn’t climb up(match our expectation).

Depending on usage, lighttpd memory use might increase slightly from use at startup, but memory use should stabilize as the memory is reused. After two hours, has memory use grown from one hour prior? (when "server.h2proto" => "enable") Can you share some details about the client that you are using to upload the 4k file? I would like to try to reproduce the issue. Thanks.

Last question, “Is there a reason you have disabled TLS session tickets? (-SessionTicket)?”, we saw the definition suggest that this setting should be add before lighttpd 1.4.56, so if we use the version after that, is this setting necessary?

The setting is not necessary with lighttpd 1.4.56 or later, and not recommended with lighttpd 1.4.56 and later. lighttpd 1.4.56 and later have built-in logic to rotate the session ticket encryption key, whereas earlier versions did not. Therefore, for earlier versions, it is recommended to use -SessionTicket or to restart lighttpd daily (which will result in openssl regenerating the session ticket encryption key).

BTW, to slightly reduce lighttpd memory footprint on embedded systems, you might comment out modules in server.modules that you are not using, as you have done with "mod_cgi". In your config, I do not see you using "mod_access", "mod_authn_file", "mod_auth", "mod_proxy", or "mod_redirect". Each module adds a minimum of 20k to memory footprint, and modules such as "mod_authn_file" and "mod_proxy" add more.

Actions #5

Updated by ZivHung 18 days ago

Hello,
Thank you for giving us advice about lighttpd configure settings, indeed there are several settings in “server.modules” that no longer needed to our service, please help to check our latest configure file, to see if there still has any inappropriate setting.

However, unfortunately we cannot directly provide our test details, but we will try to simplify the condition, and provide the test flow if we find out simpler way, thank you.

Actions #6

Updated by gstrauss 18 days ago

If your site is producing files on disk which change frequently, and do not have unique filenames (e.g. for generated responses and using X-Sendfile), then server.stat-cache-engine = "disable" is appropriate.

If you are serving static content, such as I see in your aliases to css, images, js, and videos, then you will benefit from server.stat-cache-engine = "simple" (the default if omitted)

In short, if your site requires that the stat cache be disabled for proper behavior, then you should disable the stat cache. Otherwise, the default is to enable the stat cache because it provides faster responses to static resources.

I had suggested disabling the stat cache as a troubleshooting step to help isolate which lighttpd subsystem might be affecting the memory usage you are seeing. The stat cache is enabled by default in lighttpd because it is so useful and is generally safe to do so.

Actions #7

Updated by flynn 17 days ago

I have a similar issue with memory, but I could not make it reproducable enough to write a ticket.

But I have a memory monitoring for lighttpd and I made the following observations:

- the memory is freed, lighttpd returns very close to the memory used after start
- but: it may take days (!) to free the memory, see the attached graph (lighttpd is started at Friday 14:37, look only at the last 6 days)

Maybe freeing the memory is not triggered often enough.

Actions #8

Updated by gstrauss 17 days ago

This all or in part might be due to memory fragmentation. lighttpd allocates memory for its chunkqueue in chunks, and those chunks are pushed onto a queue to be reused. Every 64 seconds, lighttpd frees the queue of unused chunks. However, the order in which they are freed may differ from the order in which they are allocated, and some long-running connections may still be using later-allocated chunks. After (some) chunks are freed, it is still up to the memory allocator (libc malloc/calloc/realloc unless another is preloaded) when the memory allocator chooses to release the memory back to the system.

It is a good sign to me that the memory (at least in flynn's case) is eventually returned to the system, as that suggests the memory is not leaked. Also, when I tested some scenarios with valgrind some months back, I did not find any leaks.

Actions #9

Updated by gstrauss 17 days ago

  • Category set to core

BTW, the chunk size defaults to 8k and is configurable with server.chunkqueue-chunk-sz = 8192. Minimum size is 1024, and server.chunkqueue-chunk-sz is rounded up to the closest power of 2.

Actions #10

Updated by flynn 16 days ago

I started a test with libjemalloc as allocator and some days later I'll test different values of server.chunkqueue-chunk-sz ...

Actions #11

Updated by gstrauss 16 days ago

Another thought occurred to me. In lighttpd 1.4.59, HTTP/2 is enabled by default in lighttpd.

Starting in lighttpd 1.4.56, lighttpd keeps a pool of struct request_st (and associated buffers), which are used by HTTP/2 connections, one for each stream. The size of this pool can grow to server.max-connections. Now, each HTTP/2 connection can have up to 8 simultaneous streams (hard-coded limit in lighttpd), and additional request_st allocated are freed immediately upon release.

Depending on the access patterns that your server sees, and on whether the streams are handled quickly or have a long lifetime, you might want to reduce server.max-connections, but not so much that you unnecessarily delay lighttpd from answering bursts of connections. Of course, this tuning is necessary only on small, memory restricted servers, which are likely resource-constrained in other ways, too. On larger servers, 30M lighttpd memory use for a very busy server is unlikely to be an issue.

If it turns out that the request_st allocations -- which occur at runtime during bursty activity and then are saved to the request pool -- are causing excessive fragmentation and preventing release of other memory, then I can consider pre-allocating the request pool at startup, up to server.max-connections. Doing so would increase lighttpd's initial memory footprint for the base case (everyone), so I am leaning against this unless the data proves that it would make a drastic difference. That said, the initial graph provided by flynn suggests that this is not the case, since the memory is eventually released (which does not currently happen with the request pool until server shutdown or restart). Another option would be to periodically clear the request pool, as is currently done every 64 seconds with the chunk queue. Since ZivHung reported this increased memory use with HTTP/2 enabled in lighttpd, I might consider tweaking the lighttpd reqpool behavior if the data supports doing so.

BTW, if not specified, server.max-connections is 1/3 server.max-fds. If not specified, server.max-fds is currently 4096, making the default server.max-connections 1365.

Pondering freeing up to 16 request_st from the pool each 64 seconds. After a busy burst of activity which filled the pool, freeing 16 request_st each 64 seconds would clear the pool in ~ 90 mins (with the default server.max-connections).

Actions #12

Updated by flynn 14 days ago

Using jemalloc as allocator definitily improves the memory usage:

  • base memory usage (4-5MB) is reached more often
  • high memory usage (more than 10MB) is lower and shorter

I consider this memory usage (with jemalloc) to be normal.

So the interaction between lighttpd memory buffers and glibc malloc can be improved, maybe by reducing the memory arenas to 1. I'll test this later.

I had a similar issue with java and glibc malloc, where reducing the glibc memory arenas improved the situation, but jemalloc was the best solution to achieve the lowest memory usage in production.

Actions #13

Updated by gstrauss 14 days ago

flynn: thank you for the update

ZivHung: try export MALLOC_ARENA_MAX=2 in the same shell in which you start the lighttpd daemon (and do not disable "server.h2proto")

lighttpd is single-threaded and (in many common use cases) does not heavily use the memory allocator at runtime, so while MALLOC_ARENA_MAX=1 is an option, too, MALLOC_ARENA_MAX=2 is probably a good default as a tradeoff, just in case some custom modules are threaded. If MALLOC_ARENA_MAX is not set in the environment, then I might consider using mallopt to set M_ARENA_MAX to 1 during startup and config processing, which is single threaded and does heavily use the memory allocator, and then setting M_ARENA_MAX to 2 after setup.

[Edit:]
I would prefer to find a reasonable value of MALLOC_ARENA_MAX instead of periodically running the glibc-specific malloc_trim

Actions #14

Updated by gstrauss 14 days ago

  • Subject changed from Memory leak problem in Lighttpd 1.4.59 to Memory fragmentation with HTTP/2 enabled
Actions #15

Updated by flynn 8 days ago

In my tests over several days I can not see/measure any significant change with MALLOC_ARENA_MAX=1.

Only with jemalloc I get significant changes in memory usage.

gstrauss: shall I test different values of server.chunkqueue-chunk-sz, r.g. 4096 or 16384?

Actions #16

Updated by gstrauss 8 days ago

Hmmmm. I was really hoping that MALLOC_ARENA_MAX=1 (or MALLOC_ARENA_MAX=2) would be an good change for the single-threaded lighttpd server, and I might still consider it if there is no indication of harm to do so.

server.chunkqueue-chunk-sz tunable was originally intended for use on severely memory-constrained systems, where a chunk size of 1k might more fully utilize the allocated memory, at a slight cost of increasing the number of memory chunks used.

My guess is that server.chunkqueue-chunk-sz will not have any real differences with regards to memory fragmentation. On the other hand, it might make a difference if chunk size exceeds the threshold for mmap chunks, then that could change the way that chunks are allocated and released back to the system. On the other end of the spectrum, a very small chunk size of 1k might be allocated from a different pool in the arena. I have a hunch that jemalloc has a better algorithm for segmenting the allocation of blocks of different orders of magnitude of size used by lighttpd.

While playing with chunkqueue-chunk-sz might yield a difference, I think the best next step would be to try to identify the source(s) of memory fragmentation. Is it the cache of chunkqueue chunks? Is it the stat_cache? Is it the request_st pool (used by lighttpd HTTP/2 support)? Is it something else or a combination thereof? Since this issue seems to appear with lighttpd HTTP/2 support enabled, it is probably not the stat_cache. In chunk.c, there is chunk_release() and chunk_buffer_release() which could be modified to disable the caching, and to always take the branch to free the object. Similarly, in reqpool.c, request_release could be modified to always free the object.

If you have a decent idea on the number of active connections at any one time (including occasional peaks), and have a good idea how many are brief and how many are long-running (and for how long), then you might lower server.max-connections to see if doing so affects the memory fragmentation over time.

I am also open to other ideas if you have any suggestions.

Actions #17

Updated by gstrauss 6 days ago

To collect a bit of data, here is a small patch to lighttpd-1.4.59 to write out malloc statistics.

--- a/src/server.c
+++ b/src/server.c
@@ -68,6 +68,8 @@ static const buffer default_server_tag = { CONST_STR_LEN(PACKAGE_DESC)+1, 0 };
 # include <sys/prctl.h>
 #endif

+#include <malloc.h>
+
 #include "sys-crypto.h" 
 #if defined(USE_OPENSSL_CRYPTO) \
  || defined(USE_MBEDTLS_CRYPTO) \
@@ -1826,6 +1828,10 @@ static void server_handle_sigalrm (server * const srv, time_t min_ts, time_t las
                                if (0 == (min_ts & 0x3f)) { /*(once every 64 secs)*/
                                        /* free excess chunkqueue buffers every 64 secs */
                                        chunkqueue_chunk_pool_clear();
+
+                                       if (0 == (mono_ts & 0xff))
+                                               malloc_stats();
+
                                        /* attempt to restart dead piped loggers every 64 secs */
                                        if (0 == srv->srvconf.max_worker)
                                                fdevent_restart_logger_pipes(min_ts);

The patch will write something like the following out to stderr every 256 seconds (4 min 16 seconds)

Arena 0:
system bytes     =     946176
in use bytes     =     419920
Total (incl. mmap):
system bytes     =    1282048
in use bytes     =     755792
max mmap regions =          2
max mmap bytes   =     335872

ZivHung or flynn, would you mind running with this patch for an hour (during which you see the memory use increase), and share the results? Since MALLOC_ARENA_MAX=1 did not have a noticeable effect for flynn, then it is likely that we will see a larger number of mmap regions.

If a small number of bytes from each mmap region remain in use for long periods of time, then the glibc allocator will not be able to return those regions to the OS until a mmap region is completely unused. As the memory usage is smoother with jemalloc, one possible mitigation might be for me to allocate multiple pools of chunks and to prefer reusing chunks from the original pool and releasing excess chunks from later overflow pools back to the OS. The libc memory allocator will still be leveraged, not replaced. Then again, we have not yet established that the chunks are source of the issue, though their allocations may contribute.

As an aside, I have already committed code earlier this year to lighttpd git master which reduces memory fragmentation by doubling buffer sizes when reallocating, rather than the previous historical method in lighttpd of adding needed space and rounding up to nearest 64 byte boundary. This change in behavior should reduce the number of memory reallocations as well as play more nicely with memory allocator buckets. (Note: lighttpd git master should be functional for testing in a staging environment, but I strongly recommend against pushing lighttpd git master straight into production without more thorough testing. Substantial code changes have been made since lighttpd 1.4.59 was released.)

Actions #18

Updated by flynn 2 days ago

I tested different values of chunkqueue-chunk-sz and cannot measure a significant effect on memory usage.

I'll try your memory patches later ...

Actions

Also available in: Atom