Project

General

Profile

Actions

Bug #1398

closed

Segfault on x86_64 and 2*connections > max-fds

Added by Anonymous about 17 years ago. Updated almost 17 years ago.

Status:
Fixed
Priority:
Normal
Category:
core
Target version:
ASK QUESTIONS IN Forums:

Description

I run lighttpd on Debian Etch x86_64 and it segfaults when the number of connections exceeds half of max-fds (give or take), killing off all the clients that it serves. I think the expected behavior should be for light to post a message into the error log and then temporarily disabling the accept handler.

Output from gdb and dmesg (note: max-fds is set to 4096):

Program received signal SIGSEGV, Segmentation fault.
fdevent_get_handler (ev=0x57b510, fd=4088) at fdevent.c:171
171 if (ev->fdarrayfd->fd != fd) SEGFAULT;
(gdb) bt
#0 fdevent_get_handler (ev=0x57b510, fd=4088) at fdevent.c:171
#1 0x0000000000407d12 in main (argc=<value optimized out>, argv=<value optimized out>) at server.c:1405

Oct 2 15:51:48 xxxx kernel: lighttpdr11090: segfault at 00000ff800000011 rip 0000000000415027 rsp 00007fffde91b1f0 error 4

-- doubleukay

Actions #1

Updated by Anonymous about 17 years ago

Here's another spot where it segfaults (max-fd = 1024 in this test)

(gdb) bt
#0 0xf7ec4709 in free () from /lib/tls/i686/cmov/libc.so.6
#1 0x0805dfb7 in fdevent_unregister (ev=0x80a77f0, fd=1022) at fdevent.c:125
#2 0x08052092 in connection_close (srv=0x806e008, con=0x8171768) at connections.c:124
#3 0x080523ef in connection_state_machine (srv=0x806e008, con=0x8171768) at connections.c:1716
#4 0x0804e5b2 in main (argc=3, argv=0xfface034) at server.c:1279

Actions #2

Updated by Amr_not_Amr about 17 years ago

I'm facing the same problem on CentOS 5, x86_64 ..
When the number of connections exceeds that half max-fds it give Segfaults .. here is some examples I get in the messages log ..

Nov 4 13:23:24 dellway kernel: lighttpdr20469: segfault at 0000000400000010 rip 0000000000414d54 rsp 00007fff36333930 error 4
Nov 4 13:34:07 dellway kernel: lighttpdr20773: segfault at 0000000400000010 rip 0000000000414d54 rsp 00007fff36333930 error 4
Nov 4 14:35:09 dellway kernel: lighttpdr25474: segfault at 000003e000000011 rip 0000000000414d54 rsp 00007fff71ab50a0 error 4
Nov 4 14:35:11 dellway kernel: lighttpdr25475: segfault at 000003e000000011 rip 0000000000414d54 rsp 00007fff71ab50a0 error 4
Nov 4 14:35:18 dellway kernel: lighttpdr25476: segfault at 000003e000000011 rip 0000000000414d54 rsp 00007fff71ab50a0 error 4
Nov 4 14:36:12 dellway kernel: lighttpdr25502: segfault at 000000000000011a rip 0000000000414d54 rsp 00007fff71ab50a0 error 4
Nov 4 14:36:32 dellway kernel: lighttpdr25473: segfault at 00000000000001fe rip 0000000000414d54 rsp 00007fff71ab50a0 error 4
Nov 4 15:49:48 dellway kernel: lighttpdr25510: segfault at 000003e000000011 rip 0000000000414d54 rsp 00007fff71ab50a0 error 4
Nov 4 15:50:35 dellway kernel: lighttpdr25516: segfault at 0000000000000023 rip 0000000000414d54 rsp 00007fff71ab50a0 error 4
Nov 4 16:19:39 dellway kernel: lighttpdr25519: segfault at 00000000000003f2 rip 0000000000414d54 rsp 00007fff71ab50a0 error 4

Actions #3

Updated by slyphon almost 17 years ago

This also occurs on solaris, compiled with Sun native compiler.


slyphon@light01 ~ $ lighttpd -V
lighttpd-1.4.18 (ssl) - a light and fast webserver
Build-Date: Dec  4 2007 02:44:17

Event Handlers:

        + select (generic)
        + poll (Unix)
        - rt-signals (Linux 2.4+)
        - epoll (Linux 2.6)
        + /dev/poll (Solaris)
        - kqueue (FreeBSD)

Network handler:

        + sendfile

Features:

        + IPv6 support
        + zlib support
        + bzip2 support
        + crypt support
        + SSL Support
        + PCRE support
        - mySQL support
        - LDAP support
        - memcached support
        + FAM support
        - LUA support
        - xml support
        - SQLite support
        - GDBM support

We configured lighttpd to run 3 rails fcgi processes, and then tortured it with ab.


slyphon@light01 ~ $ ab -c 200 -n 1000 -v1 http://localhost/
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests

Test aborted after 10 failures

apr_socket_connect(): Connection refused (146)
Total of 111 requests completed

using a small dtrace script, I was able to get a stacktrace when the SIGSEGV gets sent:


lighttpd`fdevent_get_handler+0x15
lighttpd`main+0xf80
lighttpd`_start+0x7d

the logs show many lines similar to:


2007-12-25 06:06:48: (mod_fastcgi.c.2816) wait for fd at connection: 58 
2007-12-25 06:06:48: (mod_fastcgi.c.2816) wait for fd at connection: 59 
2007-12-25 06:06:48: (mod_fastcgi.c.2816) wait for fd at connection: 126 
2007-12-25 06:06:48: (mod_fastcgi.c.2816) wait for fd at connection: 125 
2007-12-25 06:06:48: (mod_fastcgi.c.2816) wait for fd at connection: 124 
2007-12-25 06:06:48: (mod_fastcgi.c.2816) wait for fd at connection: 123 
2007-12-25 06:06:48: (mod_fastcgi.c.2816) wait for fd at connection: 122 
2007-12-25 06:06:48: (mod_fastcgi.c.2816) wait for fd at connection: 121 
2007-12-25 06:06:48: (mod_fastcgi.c.2816) wait for fd at connection: 120 
2007-12-25 06:06:48: (mod_fastcgi.c.2816) wait for fd at connection: 119 

Actions #4

Updated by stbuehler almost 17 years ago

See #1562 for patch.

Actions #5

Updated by stbuehler almost 17 years ago

  • Status changed from New to Fixed
  • Resolution set to duplicate
Actions

Also available in: Atom