Bug #2922
closed[regression][Bisected] lighttpd gets killed after uploading a big file
Description
Since commit 88ee73d0a216 ("[multiple] perf: simplify chunkqueue_get_memory()")
uploading a big file via POST crashes lighttpd with :
"buffer.c.100: assertion failed: NULL != b->ptr"
The strace is :
ioctl(7, FIONREAD, [405752]) = 0 mmap2(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb410f000 read(7, "\206>\373lo"..., 262143) = 262143 write(8, "\206>\373lo"..., 262143) = 262143 gettimeofday({1544529369, 141088}, NULL) = 0 epoll_wait(6, {{EPOLLIN, {u32=7, u64=7}}}, 2049, 1000) = 1 ioctl(7, FIONREAD, [143609]) = 0 read(7, "4\353\244"..., 262143) = 143609 write(8, "4\353\244"..., 143609) = 143609 socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 9 connect(9, {sa_family=AF_LOCAL, sun_path="/tmp/php.socket-0"}, 19) = 0 mmap2(NULL, 47304704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) brk(0x2e99000) = 0x17c000 mmap2(NULL, 47435776, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0xb3f0f000 munmap(0xb3f0f000, 987136) = 0 munmap(0xb4100000, 61440) = 0 mprotect(0xb4000000, 135168, PROT_READ|PROT_WRITE) = 0 mmap2(NULL, 47304704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) write(2, "buffer.c.100: assertion failed: NULL != b->ptr\n", 47) = 47 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0 tgkill(13921, 13921, SIGABRT) = 0 --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=13921, si_uid=0} --- +++ killed by SIGABRT +++
What I understand is that after this commit, a lot more memory is used for reassembling the chunks:
we can see here that the mmap2 for 47304704 bytes fails (because there's not enough memory on my board).
(47304704 bytes is more or less the size of the uploaded file)
Files
Updated by gstrauss almost 6 years ago
Thank you for taking the time to bisect and report this.
write(2, "buffer.c.100: assertion failed: NULL != b->ptr\n", 47) = 47
If would be helpful to have a stacktrace here, if possible.
If you're on a low-memory system, please consider the following in lighttpd.conf
server.chunkqueue-chunk-sz = 1024
(default 4096; 1024 is lowest accepted value)
and separately
server.stream-request-body = 2
(see doc Server_stream-request-bodyDetails)
server.stream-response-body = 2
(see doc Server_stream-response-bodyDetails)
If using TLS on a slow, embedded system, please also set
ssl.read-ahead = "disable"
(which is the default in current versions of lighttpd)
Given your report that this happens with POST, server.stream-request-body=2
will probably be the quickest mitigation for you.
I'll have a chance to look into this further later this evening.
Updated by gstrauss almost 6 years ago
I reviewed 88ee73d0 and nothing immediately jumped out at me as incorrect.
Would you please share your lighttpd.conf and provide some more details so that I can try to reproduce this? Are you using openssl? What is the size of the POST request? Can this be reproduced right when you start lighttpd and send a single request, or is it something that happens after many requests have been sent?
Updated by gstrauss almost 6 years ago
It looks like you're using openssl, and that the buffers received from openssl in your strace are large.
With commits around 88ee73d0, lighttpd attempts to reuse large buffers, and frees unused buffers every 64 seconds.
lighttpd should be reuse the large buffer as it reads a large buffer from openssl and writes the buffer to a temporary file. I'll look into this further to confirm.
Updated by gstrauss almost 6 years ago
Looks like the buffer I was talking about is reused. It's a large buffer: 256kb. On a low-memory system, you might want to adjust sysctl parameters to use 64kb kernel socket send/recv buffers.
The attempts by lighttpd to mmap 45+ MB of memory look large, expecially on a low-memory system.
I am still interested in a backtrace of the what is calling and triggering the assert failure "write(2, "buffer.c.100: assertion failed: NULL != b->ptr\n", 47) = 47"
Updated by rgenoud almost 6 years ago
- File lighttpd.confdir.tar.gz lighttpd.confdir.tar.gz added
In order to set a breakpoint when the assertion is verified, I expanded the macro force_assert(x) in buffer_string_prepare_copy() and buffer_realloc()
and compile lighttpd with -O0 -g
# gdb lighttpd
GNU gdb (GDB) 7.5.1
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-buildroot-linux-gnueabi".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/lighttpd...done.
(gdb) list buffer.c:86
warning: Source file is more recent than executable.
81 force_assert(sz > len);
82
83 b->size = sz;
84 b->ptr = realloc(b->ptr, sz);
85
86do {
87if (!(NULL != b->ptr))
88log_failed_assert(__FILE__, __LINE__, "assertion failed: NULL != b->ptr");
89} while(0);
90}
(gdb) break buffer.c:88
Breakpoint 1 at 0x2599c: file buffer.c, line 88.
(gdb) list
91
92__attribute_cold__
93static void buffer_alloc_replace(buffer *b, size_t size) {
94 /*(discard old data so realloc() does not copy)*/
95 if (NULL != b->ptr) {
96 free(b->ptr);
97 b->ptr = NULL;
98 }
99 buffer_realloc(b, size);
100}
(gdb) list
101
102char* buffer_string_prepare_copy(buffer *b, size_t size) {
103do {
104if (!(NULL != b))
105log_failed_assert(__FILE__, __LINE__, "assertion failed: NULL != b");
106} while(0);
107
108if (size >= b->size) buffer_alloc_replace(b, size);
109
110b->used = 0;
(gdb) break buffer.c:105
Breakpoint 2 at 0x25a34: file buffer.c, line 105.
(gdb) run -D -f /etc/lighttpd/lighttpd.conf
Breakpoint 1, buffer_realloc (b=0xe3048, len=47284223) at buffer.c:88
88log_failed_assert(__FILE__, __LINE__, "assertion failed: NULL != b->ptr");
(gdb) bt
#0 buffer_realloc (b=0xe3048, len=47284223) at buffer.c:88
#1 0x00025a0c in buffer_alloc_replace (b=0xe3048, size=47284223)
at buffer.c:99
#2 0x00025a64 in buffer_string_prepare_copy (b=0xe3048, size=47284223)
at buffer.c:108
#3 0x0002e6b4 in chunk_init (sz=47284224) at chunk.c:76
#4 0x0002f308 in chunkqueue_buffer_open_resize (c=0xb00d8, sz=47280832)
at chunk.c:347
#5 0x0002f390 in chunkqueue_prepend_buffer_open_sz (cq=0xb2360, sz=47280832)
at chunk.c:358
#6 0xb6c24c2c in fcgi_create_env (srv=0x6a008, hctx=0xc8798)
at mod_fastcgi.c:230
#7 0x00039a54 in gw_write_request (srv=0x6a008, hctx=0xc8798)
at gw_backend.c:1813
#8 0x0003a004 in gw_send_request (srv=0x6a008, hctx=0xc8798)
at gw_backend.c:1934
#9 0x0003a484 in gw_handle_subrequest (srv=0x6a008, con=0x85988, p_d=0x77e00)
at gw_backend.c:2026
#10 0x0003de90 in plugins_call_handle_subrequest (srv=0x6a008, con=0x85988)
at plugin.c:338
#11 0x000158b4 in http_response_prepare (srv=0x6a008, con=0x85988)
at response.c:695
#12 0x00018760 in connection_state_machine (srv=0x6a008, con=0x85988)
at connections.c:1157
#13 0x000138a0 in server_main (srv=0x6a008, argc=4, argv=0xbeefad04)
at server.c:2037
#14 0x00013b04 in main (argc=4, argv=0xbeefad04) at server.c:2091
The strace corresponding to this is :
gettimeofday({1544613432, 15036}, NULL) = 0 epoll_wait(6, {{EPOLLIN, {u32=7, u64=7}}}, 2049, 1000) = 1 ioctl(7, FIONREAD, [389724]) = 0 brk(0x22fc000) = 0x22fc000 read(7, "\243\337\341\262\240^B\364\33#\v\t\20\242_oYh\343_H\350wv\262v/\256\220\260\317\177"..., 262143) = 262143 write(8, "\243\337\341\262\240^B\364\33#\v\t\20\242_oYh\343_H\350wv\262v/\256\220\260\317\177"..., 262143) = 262143 gettimeofday({1544613432, 31560}, NULL) = 0 epoll_wait(6, {{EPOLLIN, {u32=7, u64=7}}}, 2049, 1000) = 1 ioctl(7, FIONREAD, [127581]) = 0 read(7, "@a\340/\315\210\312\344\303\215S\265\304\32r\"\5ry\263\237\247\213\346\f\33xhhk.\267"..., 262143) = 127581 write(8, "@a\340/\315\210\312\344\303\215S\265\304\32r\"\5ry\263\237\247\213\346\f\33xhhk.\267"..., 127581) = 127581 socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 9 connect(9, {sa_family=AF_LOCAL, sun_path="/tmp/php.socket-1"}, 19) = 0 mmap2(NULL, 34619392, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) brk(0x43ff000) = 0x22fc000 mmap2(NULL, 34750464, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0xb6a00000 munmap(0xb6b00000, 1048576) = 0 mprotect(0xb6a00000, 135168, PROT_READ|PROT_WRITE) = 0 mmap2(NULL, 34619392, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) write(2, "buffer.c.86: assertion failed: N"..., 46buffer.c.86: assertion failed: NULL != b->ptr ) = 46 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0 tgkill(9228, 9228, SIGABRT) = 0 --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=9228, si_uid=0} --- +++ killed by SIGABRT +++
I attached the lighttpd.conf I use.
As you said, with server.stream-request-body=2
, it's working all right.
Here is the ./configure result:
Plugins: enabled: mod_access mod_accesslog mod_alias mod_auth mod_authn_file mod_cgi mod_compress mod_deflate mod_dirlisting mod_evhost mod_expire mod_extforward mod_fastcgi mod_flv_streaming mod_indexfile mod_openssl mod_proxy mod_redirect mod_rewrite mod_rrdtool mod_scgi mod_secdownload mod_setenv mod_simple_vhost mod_sockproxy mod_ssi mod_staticfile mod_status mod_userdir mod_usertrack mod_vhostdb mod_webdav mod_wstunnel disabled: mod_authn_gssapi mod_authn_ldap mod_authn_mysql mod_authn_pam mod_cml mod_geoip mod_magnet mod_mysql_vhost mod_trigger_b4_dl mod_vhostdb_dbi mod_vhostdb_ldap mod_vhostdb_mysql mod_vhostdb_pgsql Features: enabled: auth-crypt compress-deflate compress-gzip large-files network-ipv6 network-openssl regex-conditionals disabled: compress-bzip2 dbi geoip kerberos ldap lua mysql pam postgresql stat-cache-fam storage-gdbm storage-memcached webdav-locks webdav-properties
Are you using openssl?
So, yes, lighttpd is compiled with openssl, but I don't use it (the web pages are on http:// )
What is the size of the POST request?
It's 47279924 bytes.
Can this be reproduced right when you start lighttpd and send a single request, or is it something that happens after many requests have been sent?
I can reproduce it each time, with a single request.
The attempts by lighttpd to mmap 45+ MB of memory look large, expecially on a low-memory system.
Yes, definitely. (My system has 128MiB, and the uploaded file is in a tmpfs, so if it's mapped in memory and in the tmpfs, it's the end !)
Tell me if you need more infos.
Updated by gstrauss almost 6 years ago
Thank you. The information was very useful and points to the source of your issue. There is an issue with pre-sizing the environment buffer for requests to backends, originating with f410431b5 and 9f6a46738
server.stream-request-body = 2
is the recommended setting for systems with a small amount of memory since lighttpd attempts to send everything it already has read to the backend.
On slower embedded systems, it is interesting that lighttpd was able to read all 45+ MB of the POST before parsing the header and connecting to the backend. The embedded system must be slower than wire speed and I am going to look for places in the code to perhaps limit what is optimistically read beyond the maximum header size. The idea is to read everything available in kernel socket buffers and pass that quickly to the backend. However, it looks like a limit needs to be implemented for slower systems. Perhaps if lighttpd checked the default kernel socket recv buffer size and then set the limit based on the larger of that or maximum header size, that would be a good balance.
I'll dig into this further later this evening.
Updated by gstrauss almost 6 years ago
- Status changed from New to Patch Pending
- Target version changed from 1.4.x to 1.4.53
Updated by gstrauss almost 6 years ago
- Status changed from Patch Pending to Fixed
- % Done changed from 0 to 100
Applied in changeset a1b527e47374d8432e12c9ff732a944f27dd1365.
Updated by gstrauss almost 6 years ago
Correction to what I posted earlier: if not streaming to the backend, lighttpd buffers the full POST request to disk. The creation of FastCGI packets to the backend was erroneously allocating memory for the entire POST, even though the large POST had been buffered to temporary files on disk. This has been fixed in the patch above.
Thank you for your assistance, rgenoud.
Also available in: Atom