Bug #2700
closedSegfault with version 1.4.38
Description
Lighttpd until version 1.4.37 was running with no problems on our servers.
With version 1.4.38 we get several segfaults a day, which we cannot fully trigger
(often it is triggered by the auto-preview function of trac).
I caught one while gdb is attached and this is the backtrace:
Program received signal SIGSEGV, Segmentation fault.
GI_libc_free (mem=0x60) at malloc.c:2929
2929 malloc.c: No such file or directory.
(gdb) bt
#0 GI_libc_free (mem=0x60) at malloc.c:2929
#1 0x00007f8facd76868 in buffer_reset ()
#2 0x00007f8facd79a5c in ?? ()
#3 0x00007f8facd79b0e in ?? ()
#4 0x00007f8facd79fb7 in chunkqueue_reset ()
#5 0x00007f8facd6cdfa in connection_state_machine ()
#6 0x00007f8facd67fb7 in main ()
Other segfauls I could decode to the follwing line (with the help of addr2line):
Library: lighttpd, offset 0xd440
connection_handle_read_state
/usr/src/lighttpd-1.4.38/src/connections.c:929
which is the last line of
for (c = cq->first; c; c = c->next) {
size_t i;
size_t len = buffer_string_length(c->mem) - c->offset;
const char *b = c->mem->ptr + c->offset;
for (i = 0; i < len; ++i) {
char ch = b[i];
if ('\r' == ch) {
I must switch back to version 1.4.37 for my servers, but I'm open to test patches to solve the problem ...
Files
Updated by stbuehler about 9 years ago
we seem to suffer from some memory corruption, but I couldn't find the origin so far. It seems to be related to POST requests (or other requests with request body).
I had it running in valgrind but didn't get anything so far (but also couldn't trigger the crash myself).
Updated by flynn about 9 years ago
OK.
Then we try to trigger the crash running with valgrind.
Is there anything special running lighttpd with valgrind except
./configure --with-valgrind ??
Updated by stbuehler about 9 years ago
--with-valgrind is not really necessary, more important is to compile with debug symbols ("-g"), and spawning with valgrind in "foreground" mode, i.e. something like valgrind lighttpd -D -f /etc/lighttpd/lighttpd.conf
(i.e. either spawn manually in a screen/tmux terminal or use something that can handle "non-daemonized" services like systemd).
Also I'd like to warn you that valgrind makes lighttpd really really slow :)
Updated by flynn about 9 years ago
We could not reproduce the crash with valgrind, but we got some important log messages regarding the crash in libc_free:
==26229== Invalid free() / delete / delete[] / realloc() ==26229== at 0x4C2BDEC: free (vg_replace_malloc.c:473) ==26229== by 0x42310E: chunk_free (chunk.c:91) ==26229== by 0x423230: chunkqueue_free (chunk.c:125) ==26229== by 0xC0A54A6: handler_ctx_free (mod_fastcgi.c:511) ==26229== by 0xC0A83CD: fcgi_connection_close (mod_fastcgi.c:1504) ==26229== by 0xC0AD680: fcgi_handle_fdevent (mod_fastcgi.c:3104) ==26229== by 0x40B894: main (server.c:1515) ==26229== Address 0xeb39250 is 0 bytes inside a block of size 96 free'd ==26229== at 0x4C2BDEC: free (vg_replace_malloc.c:473) ==26229== by 0x42310E: chunk_free (chunk.c:91) ==26229== by 0x423230: chunkqueue_free (chunk.c:125) ==26229== by 0xC0A54A6: handler_ctx_free (mod_fastcgi.c:511) ==26229== by 0xC0A83CD: fcgi_connection_close (mod_fastcgi.c:1504) ==26229== by 0xC0AD680: fcgi_handle_fdevent (mod_fastcgi.c:3104) ==26229== by 0x40B894: main (server.c:1515) ==26229==
I think, the crash would happen here, but valgrind catches it.
Before this, we see alot of these messages:
==26229== Invalid write of size 8 ==26229== at 0x4230B8: chunk_reset (chunk.c:80) ==26229== by 0x4230E2: chunk_free (chunk.c:86) ==26229== by 0x423230: chunkqueue_free (chunk.c:125) ==26229== by 0xC0A54A6: handler_ctx_free (mod_fastcgi.c:511) ==26229== by 0xC0A83CD: fcgi_connection_close (mod_fastcgi.c:1504) ==26229== by 0xC0AD680: fcgi_handle_fdevent (mod_fastcgi.c:3104) ==26229== by 0x40B894: main (server.c:1515) ==26229== Address 0xeb392a8 is 88 bytes inside a block of size 96 free'd ==26229== at 0x4C2BDEC: free (vg_replace_malloc.c:473) ==26229== by 0x42310E: chunk_free (chunk.c:91) ==26229== by 0x423230: chunkqueue_free (chunk.c:125) ==26229== by 0xC0A54A6: handler_ctx_free (mod_fastcgi.c:511) ==26229== by 0xC0A83CD: fcgi_connection_close (mod_fastcgi.c:1504) ==26229== by 0xC0AD680: fcgi_handle_fdevent (mod_fastcgi.c:3104) ==26229== by 0x40B894: main (server.c:1515) ==26229== ==26229== Invalid read of size 8 ==26229== at 0x4230E7: chunk_free (chunk.c:88) ==26229== by 0x423230: chunkqueue_free (chunk.c:125) ==26229== by 0xC0A54A6: handler_ctx_free (mod_fastcgi.c:511) ==26229== by 0xC0A83CD: fcgi_connection_close (mod_fastcgi.c:1504) ==26229== by 0xC0AD680: fcgi_handle_fdevent (mod_fastcgi.c:3104) ==26229== by 0x40B894: main (server.c:1515) ==26229== Address 0xeb39258 is 8 bytes inside a block of size 96 free'd ==26229== at 0x4C2BDEC: free (vg_replace_malloc.c:473) ==26229== by 0x42310E: chunk_free (chunk.c:91) ==26229== by 0x423230: chunkqueue_free (chunk.c:125) ==26229== by 0xC0A54A6: handler_ctx_free (mod_fastcgi.c:511) ==26229== by 0xC0A83CD: fcgi_connection_close (mod_fastcgi.c:1504) ==26229== by 0xC0AD680: fcgi_handle_fdevent (mod_fastcgi.c:3104) ==26229== by 0x40B894: main (server.c:1515) ==26229==
As far as we can see, the crash happens only, if non-ascii characters are (e.g. Umlaute) used in Header or Post Requests.
So maybe a length calculation problem of url-encoded buffers ...
Updated by stbuehler about 9 years ago
Thanks, that looks like very helpful data!
Updated by stbuehler about 9 years ago
- File 0001-chunk-fix-use-after-free-double-free-fixes-2700.patch 0001-chunk-fix-use-after-free-double-free-fixes-2700.patch added
This is probably a regression introduced in r2976 (released in 1.4.36).
The following patch should fix it; I'd be happy to get some feedback on this.
--- a/src/chunk.c +++ b/src/chunk.c @@ -172,6 +172,7 @@ static void chunkqueue_prepend_chunk(chunkqueue *cq, chunk *c) { } static void chunkqueue_append_chunk(chunkqueue *cq, chunk *c) { + c->next = NULL; if (cq->last) { cq->last->next = c; }
Updated by stbuehler about 9 years ago
- Status changed from New to Fixed
- % Done changed from 0 to 100
Applied in changeset r3065.
Updated by flynn about 9 years ago
Seems to work, in valgrind the messages above do not appear on a small test.
I switch my productive server back to version 1.4.38 with this patch.
Also available in: Atom