Project

General

Profile

Bug #2700

Segfault with version 1.4.38

Added by flynn over 1 year ago. Updated over 1 year ago.

Status:
Fixed
Priority:
High
Assignee:
-
Category:
core
Target version:
Start date:
2015-12-14
Due date:
% Done:

100%

Estimated time:
Missing in 1.5.x:

Description

Lighttpd until version 1.4.37 was running with no problems on our servers.

With version 1.4.38 we get several segfaults a day, which we cannot fully trigger
(often it is triggered by the auto-preview function of trac).

I caught one while gdb is attached and this is the backtrace:

Program received signal SIGSEGV, Segmentation fault.
GI_libc_free (mem=0x60) at malloc.c:2929
2929 malloc.c: No such file or directory.
(gdb) bt
#0 GI_libc_free (mem=0x60) at malloc.c:2929
#1 0x00007f8facd76868 in buffer_reset ()
#2 0x00007f8facd79a5c in ?? ()
#3 0x00007f8facd79b0e in ?? ()
#4 0x00007f8facd79fb7 in chunkqueue_reset ()
#5 0x00007f8facd6cdfa in connection_state_machine ()
#6 0x00007f8facd67fb7 in main ()

Other segfauls I could decode to the follwing line (with the help of addr2line):

Library: lighttpd, offset 0xd440
connection_handle_read_state
/usr/src/lighttpd-1.4.38/src/connections.c:929

which is the last line of

for (c = cq->first; c; c = c->next) {
size_t i;
size_t len = buffer_string_length(c->mem) - c->offset;
const char *b = c->mem->ptr + c->offset;

for (i = 0; i < len; ++i) {
char ch = b[i];
if ('\r' == ch) {

I must switch back to version 1.4.37 for my servers, but I'm open to test patches to solve the problem ...

Associated revisions

Revision 3065 (diff)
Added by stbuehler over 1 year ago

[chunk] fix use after free / double free (fixes #2700)

From: Stefan Bühler <>

Revision 6ef3b709 (diff)
Added by stbuehler over 1 year ago

[chunk] fix use after free / double free (fixes #2700)

From: Stefan Bühler <>

git-svn-id: svn://svn.lighttpd.net/lighttpd/branches/lighttpd-1.4.x@3065 152afb58-edef-0310-8abb-c4023f1b3aa9

History

#1 Updated by stbuehler over 1 year ago

we seem to suffer from some memory corruption, but I couldn't find the origin so far. It seems to be related to POST requests (or other requests with request body).

I had it running in valgrind but didn't get anything so far (but also couldn't trigger the crash myself).

#2 Updated by flynn over 1 year ago

OK.

Then we try to trigger the crash running with valgrind.

Is there anything special running lighttpd with valgrind except

./configure --with-valgrind ??

#3 Updated by stbuehler over 1 year ago

--with-valgrind is not really necessary, more important is to compile with debug symbols ("-g"), and spawning with valgrind in "foreground" mode, i.e. something like valgrind lighttpd -D -f /etc/lighttpd/lighttpd.conf (i.e. either spawn manually in a screen/tmux terminal or use something that can handle "non-daemonized" services like systemd).

Also I'd like to warn you that valgrind makes lighttpd really really slow :)

#4 Updated by flynn over 1 year ago

We could not reproduce the crash with valgrind, but we got some important log messages regarding the crash in libc_free:

==26229== Invalid free() / delete / delete[] / realloc()
==26229==    at 0x4C2BDEC: free (vg_replace_malloc.c:473)
==26229==    by 0x42310E: chunk_free (chunk.c:91)
==26229==    by 0x423230: chunkqueue_free (chunk.c:125)
==26229==    by 0xC0A54A6: handler_ctx_free (mod_fastcgi.c:511)
==26229==    by 0xC0A83CD: fcgi_connection_close (mod_fastcgi.c:1504)
==26229==    by 0xC0AD680: fcgi_handle_fdevent (mod_fastcgi.c:3104)
==26229==    by 0x40B894: main (server.c:1515)
==26229==  Address 0xeb39250 is 0 bytes inside a block of size 96 free'd
==26229==    at 0x4C2BDEC: free (vg_replace_malloc.c:473)
==26229==    by 0x42310E: chunk_free (chunk.c:91)
==26229==    by 0x423230: chunkqueue_free (chunk.c:125)
==26229==    by 0xC0A54A6: handler_ctx_free (mod_fastcgi.c:511)
==26229==    by 0xC0A83CD: fcgi_connection_close (mod_fastcgi.c:1504)
==26229==    by 0xC0AD680: fcgi_handle_fdevent (mod_fastcgi.c:3104)
==26229==    by 0x40B894: main (server.c:1515)
==26229== 

I think, the crash would happen here, but valgrind catches it.

Before this, we see alot of these messages:

==26229== Invalid write of size 8
==26229==    at 0x4230B8: chunk_reset (chunk.c:80)
==26229==    by 0x4230E2: chunk_free (chunk.c:86)
==26229==    by 0x423230: chunkqueue_free (chunk.c:125)
==26229==    by 0xC0A54A6: handler_ctx_free (mod_fastcgi.c:511)
==26229==    by 0xC0A83CD: fcgi_connection_close (mod_fastcgi.c:1504)
==26229==    by 0xC0AD680: fcgi_handle_fdevent (mod_fastcgi.c:3104)
==26229==    by 0x40B894: main (server.c:1515)
==26229==  Address 0xeb392a8 is 88 bytes inside a block of size 96 free'd
==26229==    at 0x4C2BDEC: free (vg_replace_malloc.c:473)
==26229==    by 0x42310E: chunk_free (chunk.c:91)
==26229==    by 0x423230: chunkqueue_free (chunk.c:125)
==26229==    by 0xC0A54A6: handler_ctx_free (mod_fastcgi.c:511)
==26229==    by 0xC0A83CD: fcgi_connection_close (mod_fastcgi.c:1504)
==26229==    by 0xC0AD680: fcgi_handle_fdevent (mod_fastcgi.c:3104)
==26229==    by 0x40B894: main (server.c:1515)
==26229== 
==26229== Invalid read of size 8
==26229==    at 0x4230E7: chunk_free (chunk.c:88)
==26229==    by 0x423230: chunkqueue_free (chunk.c:125)
==26229==    by 0xC0A54A6: handler_ctx_free (mod_fastcgi.c:511)
==26229==    by 0xC0A83CD: fcgi_connection_close (mod_fastcgi.c:1504)
==26229==    by 0xC0AD680: fcgi_handle_fdevent (mod_fastcgi.c:3104)
==26229==    by 0x40B894: main (server.c:1515)
==26229==  Address 0xeb39258 is 8 bytes inside a block of size 96 free'd
==26229==    at 0x4C2BDEC: free (vg_replace_malloc.c:473)
==26229==    by 0x42310E: chunk_free (chunk.c:91)
==26229==    by 0x423230: chunkqueue_free (chunk.c:125)
==26229==    by 0xC0A54A6: handler_ctx_free (mod_fastcgi.c:511)
==26229==    by 0xC0A83CD: fcgi_connection_close (mod_fastcgi.c:1504)
==26229==    by 0xC0AD680: fcgi_handle_fdevent (mod_fastcgi.c:3104)
==26229==    by 0x40B894: main (server.c:1515)
==26229== 

As far as we can see, the crash happens only, if non-ascii characters are (e.g. Umlaute) used in Header or Post Requests.
So maybe a length calculation problem of url-encoded buffers ...

#5 Updated by stbuehler over 1 year ago

Thanks, that looks like very helpful data!

#6 Updated by stbuehler over 1 year ago

This is probably a regression introduced in r2976 (released in 1.4.36).

The following patch should fix it; I'd be happy to get some feedback on this.

--- a/src/chunk.c
+++ b/src/chunk.c
@@ -172,6 +172,7 @@ static void chunkqueue_prepend_chunk(chunkqueue *cq, chunk *c) {
 }

 static void chunkqueue_append_chunk(chunkqueue *cq, chunk *c) {
+    c->next = NULL;
     if (cq->last) {
         cq->last->next = c;
     }

#7 Updated by stbuehler over 1 year ago

  • Status changed from New to Fixed
  • % Done changed from 0 to 100

Applied in changeset r3065.

#8 Updated by flynn over 1 year ago

Seems to work, in valgrind the messages above do not appear on a small test.

I switch my productive server back to version 1.4.38 with this patch.

Also available in: Atom