Bug #1562: sigsegv @ fdevent_get_handler - when congestion occurs, and file descriptor arrays is full. - Lighttpd - lighty labs

Actions

Copy link

Bug #1562

closed

sigsegv @ fdevent_get_handler - when congestion occurs, and file descriptor arrays is full.

Added by fdeletang about 16 years ago. Updated almost 16 years ago.

Status:

Fixed

Priority:

Urgent

Category:

core

Target version:

1.5.0

ASK QUESTIONS IN Forums:

Description

I'm experiencing segfaults when congestion occurs, at 800-850Mbps.

The crashes occurs here:
lighttpdr1334: segfault at 0000000c eip 0805ef46 esp bfc63e80 error 4


#!asm
0x0805ef44 <fdevent_get_handler+20>:    je     0x805ef4f <fdevent_get_handler+31>
0x0805ef46 <fdevent_get_handler+22>:    cmp    0x8(%eax),%edx
0x0805ef49 <fdevent_get_handler+25>:    jne    0x805ef7c <fdevent_get_handler+76>


#!c
171         if (ev->fdarray[fd]->fd != fd) SEGFAULT();

I guess eax is ev->fdarrayfd, it's not NULL but it's not a valid pointer either, thus, trying to access ev->fdarrayfd->fd makes lighty read from inexistent segments or segments without read permission.

So what's wrong ? Maybe fd is just an index out of the allocated array. Let's have a look on how it's being created.

fdevent.c:


#!c
 15 fdevents *fdevent_init(size_t maxfds, fdevent_handler_t type) {
 19         ev->fdarray = calloc(maxfds, sizeof(*ev->fdarray));

server.c:


#!c
1076         if (NULL == (srv->ev = fdevent_init(srv->max_fds + 1, srv->event_handler))) {

and earlier in the same file:


#!c
 679                 if (0 != getrlimit(RLIMIT_NOFILE, &rlim)) {
 680                         log_error_write(srv, __FILE__, __LINE__,
 681                                         "ss", "couldn't get 'max filedescriptors'",
 682                                         strerror(errno));
 683                         return -1;
 684                 }
 685 
 686                 if (use_rlimit && srv->srvconf.max_fds) {
 687                         /* set rlimits */
 688 
 689                         rlim.rlim_cur = srv->srvconf.max_fds;
 690                         rlim.rlim_max = srv->srvconf.max_fds;
 691 
 692                         if (0 != setrlimit(RLIMIT_NOFILE, &rlim)) {
 693                                 log_error_write(srv, __FILE__, __LINE__,
 694                                                 "ss", "couldn't set 'max filedescriptors'",
 695                                                 strerror(errno));
 696                                 return -1;
 697                         }
 698                 }

 700                 /* #372: solaris need some fds extra for devpoll */
 701                 if (rlim.rlim_cur > 10) rlim.rlim_cur -= 10;

 827                         srv->max_fds = rlim.rlim_cur;

So, here's what's being done:
- the process fetch the current configured rlimits and save it in rlim
- if the configuration has a setting for max_fds, it override the one that's configured for the current task
- The, the max_fds get decremented by 10 (solaris bugfix, yay)
- and the allocation of the file descriptor array is being made using max_fds as size.

And here's what happen:
- The system can give you more than max_fds file descriptors
- fd > max_fds
- sigsegv

Possible workarounds:
- comment line 701 in server.c if you're not running solaris
or
- replace maxfds by maxfds + 10 in line 19 of fdevent.c
or
- fix this race condition ;-)

Files

Fix-372-and-1562.patch (1.23 KB) Fix-372-and-1562.patch

Patch for 1.4 and 1.5

stbuehler, 2008-02-13 14:38

Actions

Copy link

Updated by stbuehler about 16 years ago

If i understood the #372 problem correctly, Solaris doesn't want to poll for rlim.cur fds, as one is used for the /dev/poll fd, and returns an error.

But it doesn't seem to matter if the dopoll.dp_nfds value is a little bit smaller - it is just the max number of events to be polled in one syscall.

So i think reducing the number dopoll.dp_nfds by one in fdevent_solaris_devpoll_poll should fix #372 and we can remove the previous "fix" for it to fix this bug (#1562).

Actions

Copy link