Project

General

Profile

Bug #2922

[regression][Bisected] lighttpd gets killed after uploading a big file

Added by rgenoud 8 months ago. Updated 8 months ago.

Status:
Fixed
Priority:
Normal
Assignee:
-
Category:
core
Target version:
Start date:
2018-12-11
Due date:
% Done:

100%

Estimated time:
Missing in 1.5.x:

Description

Since commit 88ee73d0a216 ("[multiple] perf: simplify chunkqueue_get_memory()")
uploading a big file via POST crashes lighttpd with :
"buffer.c.100: assertion failed: NULL != b->ptr"

The strace is :

ioctl(7, FIONREAD, [405752])            = 0
mmap2(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb410f000
read(7, "\206>\373lo"..., 262143) = 262143
write(8, "\206>\373lo"..., 262143) = 262143
gettimeofday({1544529369, 141088}, NULL) = 0
epoll_wait(6, {{EPOLLIN, {u32=7, u64=7}}}, 2049, 1000) = 1
ioctl(7, FIONREAD, [143609])            = 0
read(7, "4\353\244"..., 262143) = 143609
write(8, "4\353\244"..., 143609) = 143609
socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 9
connect(9, {sa_family=AF_LOCAL, sun_path="/tmp/php.socket-0"}, 19) = 0
mmap2(NULL, 47304704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
brk(0x2e99000)                          = 0x17c000
mmap2(NULL, 47435776, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0xb3f0f000
munmap(0xb3f0f000, 987136)              = 0
munmap(0xb4100000, 61440)               = 0
mprotect(0xb4000000, 135168, PROT_READ|PROT_WRITE) = 0
mmap2(NULL, 47304704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
write(2, "buffer.c.100: assertion failed: NULL != b->ptr\n", 47) = 47
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
tgkill(13921, 13921, SIGABRT)           = 0
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=13921, si_uid=0} ---
+++ killed by SIGABRT +++

What I understand is that after this commit, a lot more memory is used for reassembling the chunks:
we can see here that the mmap2 for 47304704 bytes fails (because there's not enough memory on my board).
(47304704 bytes is more or less the size of the uploaded file)

lighttpd.confdir.tar.gz (6.41 KB) lighttpd.confdir.tar.gz lighttpd config directory rgenoud, 2018-12-12 08:44

Associated revisions

Revision a1b527e4 (diff)
Added by gstrauss 8 months ago

[multiple] reduce initial buffer sz if large POST (fixes #2922)

reduce initial buffer size if large POST to backend stored in temp files

regression in lighttpd 1.4.52

(thx rgenoud)

x-ref:
"[regression] lighttpd gets killed after uploading a big file"
https://redmine.lighttpd.net/issues/2922

Revision fe3dc179 (diff)
Added by gstrauss 8 months ago

[mod_fastcgi] fix NULL ptr deref from bugfix #2922 (fixes #2923)

(thx rgenoud)

x-ref:
"SIGSEGV on file upload"
https://redmine.lighttpd.net/issues/2923

History

#1

Updated by gstrauss 8 months ago

  • Description updated (diff)
#2

Updated by gstrauss 8 months ago

Thank you for taking the time to bisect and report this.

write(2, "buffer.c.100: assertion failed: NULL != b->ptr\n", 47) = 47

If would be helpful to have a stacktrace here, if possible.

If you're on a low-memory system, please consider the following in lighttpd.conf
server.chunkqueue-chunk-sz = 1024 (default 4096; 1024 is lowest accepted value)
and separately
server.stream-request-body = 2 (see doc Server_stream-request-bodyDetails)
server.stream-response-body = 2 (see doc Server_stream-response-bodyDetails)
If using TLS on a slow, embedded system, please also set
ssl.read-ahead = "disable" (which is the default in current versions of lighttpd)

Given your report that this happens with POST, server.stream-request-body=2 will probably be the quickest mitigation for you.

I'll have a chance to look into this further later this evening.

#3

Updated by gstrauss 8 months ago

I reviewed 88ee73d0 and nothing immediately jumped out at me as incorrect.

Would you please share your lighttpd.conf and provide some more details so that I can try to reproduce this? Are you using openssl? What is the size of the POST request? Can this be reproduced right when you start lighttpd and send a single request, or is it something that happens after many requests have been sent?

#4

Updated by gstrauss 8 months ago

It looks like you're using openssl, and that the buffers received from openssl in your strace are large.

With commits around 88ee73d0, lighttpd attempts to reuse large buffers, and frees unused buffers every 64 seconds.

lighttpd should be reuse the large buffer as it reads a large buffer from openssl and writes the buffer to a temporary file. I'll look into this further to confirm.

#5

Updated by gstrauss 8 months ago

Looks like the buffer I was talking about is reused. It's a large buffer: 256kb. On a low-memory system, you might want to adjust sysctl parameters to use 64kb kernel socket send/recv buffers.

The attempts by lighttpd to mmap 45+ MB of memory look large, expecially on a low-memory system.

I am still interested in a backtrace of the what is calling and triggering the assert failure "write(2, "buffer.c.100: assertion failed: NULL != b->ptr\n", 47) = 47"

#6

Updated by rgenoud 8 months ago

In order to set a breakpoint when the assertion is verified, I expanded the macro force_assert(x) in buffer_string_prepare_copy() and buffer_realloc()
and compile lighttpd with -O0 -g

# gdb lighttpd
GNU gdb (GDB) 7.5.1
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying" 
and "show warranty" for details.
This GDB was configured as "arm-buildroot-linux-gnueabi".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/lighttpd...done.
(gdb) list buffer.c:86
warning: Source file is more recent than executable.
81    force_assert(sz > len);
82
83    b->size = sz;
84    b->ptr = realloc(b->ptr, sz);
85
86do {
87if (!(NULL != b->ptr))
88log_failed_assert(__FILE__, __LINE__, "assertion failed: NULL != b->ptr");
89} while(0);
90} 
(gdb) break buffer.c:88
Breakpoint 1 at 0x2599c: file buffer.c, line 88.
(gdb) list
91
92__attribute_cold__
93static void buffer_alloc_replace(buffer *b, size_t size) {
94    /*(discard old data so realloc() does not copy)*/
95    if (NULL != b->ptr) {
96        free(b->ptr);
97        b->ptr = NULL;
98    }
99    buffer_realloc(b, size);
100}
(gdb) list
101
102char* buffer_string_prepare_copy(buffer *b, size_t size) {
103do {
104if (!(NULL != b))
105log_failed_assert(__FILE__, __LINE__, "assertion failed: NULL != b");
106} while(0);
107
108if (size >= b->size) buffer_alloc_replace(b, size);
109
110b->used = 0;
(gdb) break buffer.c:105
Breakpoint 2 at 0x25a34: file buffer.c, line 105.
(gdb) run -D -f /etc/lighttpd/lighttpd.conf 
Breakpoint 1, buffer_realloc (b=0xe3048, len=47284223) at buffer.c:88
88log_failed_assert(__FILE__, __LINE__, "assertion failed: NULL != b->ptr");
(gdb) bt
#0  buffer_realloc (b=0xe3048, len=47284223) at buffer.c:88
#1  0x00025a0c in buffer_alloc_replace (b=0xe3048, size=47284223)
    at buffer.c:99
#2  0x00025a64 in buffer_string_prepare_copy (b=0xe3048, size=47284223)
    at buffer.c:108
#3  0x0002e6b4 in chunk_init (sz=47284224) at chunk.c:76
#4  0x0002f308 in chunkqueue_buffer_open_resize (c=0xb00d8, sz=47280832)
    at chunk.c:347
#5  0x0002f390 in chunkqueue_prepend_buffer_open_sz (cq=0xb2360, sz=47280832)
    at chunk.c:358
#6  0xb6c24c2c in fcgi_create_env (srv=0x6a008, hctx=0xc8798)
    at mod_fastcgi.c:230
#7  0x00039a54 in gw_write_request (srv=0x6a008, hctx=0xc8798)
    at gw_backend.c:1813
#8  0x0003a004 in gw_send_request (srv=0x6a008, hctx=0xc8798)
    at gw_backend.c:1934
#9  0x0003a484 in gw_handle_subrequest (srv=0x6a008, con=0x85988, p_d=0x77e00)
    at gw_backend.c:2026
#10 0x0003de90 in plugins_call_handle_subrequest (srv=0x6a008, con=0x85988)
    at plugin.c:338
#11 0x000158b4 in http_response_prepare (srv=0x6a008, con=0x85988)
    at response.c:695
#12 0x00018760 in connection_state_machine (srv=0x6a008, con=0x85988)
    at connections.c:1157
#13 0x000138a0 in server_main (srv=0x6a008, argc=4, argv=0xbeefad04)
    at server.c:2037
#14 0x00013b04 in main (argc=4, argv=0xbeefad04) at server.c:2091

The strace corresponding to this is :

gettimeofday({1544613432, 15036}, NULL) = 0
epoll_wait(6, {{EPOLLIN, {u32=7, u64=7}}}, 2049, 1000) = 1
ioctl(7, FIONREAD, [389724])            = 0
brk(0x22fc000)                          = 0x22fc000
read(7, "\243\337\341\262\240^B\364\33#\v\t\20\242_oYh\343_H\350wv\262v/\256\220\260\317\177"..., 262143) = 262143
write(8, "\243\337\341\262\240^B\364\33#\v\t\20\242_oYh\343_H\350wv\262v/\256\220\260\317\177"..., 262143) = 262143
gettimeofday({1544613432, 31560}, NULL) = 0
epoll_wait(6, {{EPOLLIN, {u32=7, u64=7}}}, 2049, 1000) = 1
ioctl(7, FIONREAD, [127581])            = 0
read(7, "@a\340/\315\210\312\344\303\215S\265\304\32r\"\5ry\263\237\247\213\346\f\33xhhk.\267"..., 262143) = 127581
write(8, "@a\340/\315\210\312\344\303\215S\265\304\32r\"\5ry\263\237\247\213\346\f\33xhhk.\267"..., 127581) = 127581
socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 9
connect(9, {sa_family=AF_LOCAL, sun_path="/tmp/php.socket-1"}, 19) = 0
mmap2(NULL, 34619392, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
brk(0x43ff000)                          = 0x22fc000
mmap2(NULL, 34750464, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0xb6a00000
munmap(0xb6b00000, 1048576)             = 0
mprotect(0xb6a00000, 135168, PROT_READ|PROT_WRITE) = 0
mmap2(NULL, 34619392, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
write(2, "buffer.c.86: assertion failed: N"..., 46buffer.c.86: assertion failed: NULL != b->ptr
) = 46
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
tgkill(9228, 9228, SIGABRT)             = 0
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=9228, si_uid=0} ---
+++ killed by SIGABRT +++

I attached the lighttpd.conf I use.

As you said, with server.stream-request-body=2, it's working all right.

Here is the ./configure result:

Plugins:

enabled: 
  mod_access
  mod_accesslog
  mod_alias
  mod_auth
  mod_authn_file
  mod_cgi
  mod_compress
  mod_deflate
  mod_dirlisting
  mod_evhost
  mod_expire
  mod_extforward
  mod_fastcgi
  mod_flv_streaming
  mod_indexfile
  mod_openssl
  mod_proxy
  mod_redirect
  mod_rewrite
  mod_rrdtool
  mod_scgi
  mod_secdownload
  mod_setenv
  mod_simple_vhost
  mod_sockproxy
  mod_ssi
  mod_staticfile
  mod_status
  mod_userdir
  mod_usertrack
  mod_vhostdb
  mod_webdav
  mod_wstunnel
disabled: 
  mod_authn_gssapi
  mod_authn_ldap
  mod_authn_mysql
  mod_authn_pam
  mod_cml
  mod_geoip
  mod_magnet
  mod_mysql_vhost
  mod_trigger_b4_dl
  mod_vhostdb_dbi
  mod_vhostdb_ldap
  mod_vhostdb_mysql
  mod_vhostdb_pgsql

Features:

enabled: 
  auth-crypt
  compress-deflate
  compress-gzip
  large-files
  network-ipv6
  network-openssl
  regex-conditionals
disabled: 
  compress-bzip2
  dbi
  geoip
  kerberos
  ldap
  lua
  mysql
  pam
  postgresql
  stat-cache-fam
  storage-gdbm
  storage-memcached
  webdav-locks
  webdav-properties

Are you using openssl?

So, yes, lighttpd is compiled with openssl, but I don't use it (the web pages are on http:// )

What is the size of the POST request?

It's 47279924 bytes.

Can this be reproduced right when you start lighttpd and send a single request, or is it something that happens after many requests have been sent?

I can reproduce it each time, with a single request.

The attempts by lighttpd to mmap 45+ MB of memory look large, expecially on a low-memory system.

Yes, definitely. (My system has 128MiB, and the uploaded file is in a tmpfs, so if it's mapped in memory and in the tmpfs, it's the end !)

Tell me if you need more infos.

#7

Updated by gstrauss 8 months ago

Thank you. The information was very useful and points to the source of your issue. There is an issue with pre-sizing the environment buffer for requests to backends, originating with f410431b5 and 9f6a46738

server.stream-request-body = 2 is the recommended setting for systems with a small amount of memory since lighttpd attempts to send everything it already has read to the backend.

On slower embedded systems, it is interesting that lighttpd was able to read all 45+ MB of the POST before parsing the header and connecting to the backend. The embedded system must be slower than wire speed and I am going to look for places in the code to perhaps limit what is optimistically read beyond the maximum header size. The idea is to read everything available in kernel socket buffers and pass that quickly to the backend. However, it looks like a limit needs to be implemented for slower systems. Perhaps if lighttpd checked the default kernel socket recv buffer size and then set the limit based on the larger of that or maximum header size, that would be a good balance.

I'll dig into this further later this evening.

#8

Updated by gstrauss 8 months ago

  • Status changed from New to Patch Pending
  • Target version changed from 1.4.x to 1.4.53
#9

Updated by gstrauss 8 months ago

  • Status changed from Patch Pending to Fixed
  • % Done changed from 0 to 100
#10

Updated by gstrauss 8 months ago

Correction to what I posted earlier: if not streaming to the backend, lighttpd buffers the full POST request to disk. The creation of FastCGI packets to the backend was erroneously allocating memory for the entire POST, even though the large POST had been buffered to temporary files on disk. This has been fixed in the patch above.

Thank you for your assistance, rgenoud.

Also available in: Atom