Bug #286
closedlighttpd crashes under highload
Description
we could trace down to a performance issue of lighttpd. sporadicly lighttpd crashes...
valgrind log is here: http://www.thecenter.at/lighttpd.1025.txt
-- sl
Updated by jan over 19 years ago
- Status changed from New to Assigned
please verify if the problem persists with 1.4.5
Updated by Anonymous over 19 years ago
still is an issue. but its not as hard as before anymore. compare yourself: 1.4.4 i had a dozen crashs a day, with 1.4.5 i have "only" a couple.
-- sl
Updated by Anonymous over 19 years ago
I have seen similar during DOS condition. No core dump (though enabled). lighttpd seemed to 'stop'. php-cgi processes continued until I send a killall -TERM php-cgi. Did not need to send KILL, so however lighttpd stopped, it did not do so in an entirely orderly manner.
Trying
server.max-connections = 1024
server.max-fds = 3072
to see if max-connections protects against this problem. Well hopefully the DOS won't re-occurr ;)
Hope this extra information is useful.
Have a great weekend!
-- richardgreen1965
Updated by Anonymous over 19 years ago
I can confirm this too. I'm evaluating 1.4.7 and unexpectedly crashes after 10 minutes or so of high load. My environment is Debian 3.1 (sarge) with the stock 2.6.8 (-686-smp) kernel package.
I set it up to exclusively have mod_proxy distribute load to several (11) backend servers. No "regular" file requests were served by the server. At a output-rate of more than 150 Mbps and 1800 rps the process quietly exits all of a sudden. When I started lighttpd with the -D flag to see if anything was printed to stderr, I didn't see anything there either when it crashed again. However, I noticed that it did leave with an "aborted" exit code.
I switched off both the rrdtool- and accesslog-modules and could exclude them from suspicion.
I will try a more recent kernel revision later on, but my gut feeling hints me that the problem is indeed in Lighttpd.
-- conny
Updated by jan over 19 years ago
Can you generate a strace for me ? The wiki knows how to report a bug.
Updated by Anonymous over 19 years ago
I'll try to make one. Problem is that under high loads strace itself becomes the performance penalty, thus limiting the rq/sec rate and apparently the chance of the crash to occur...
-- conny
Updated by Anonymous over 19 years ago
Here are my premier results:
11:37:14.450805 accept(5, {sa_family=AF_INET, sin_port=htons(2315), sin_addr=inet_addr("[xxxxxxxxxxxxx]")}, [16]) = 42 11:37:14.450900 fcntl64(42, F_SETFD, FD_CLOEXEC) = 0 11:37:14.450941 fcntl64(42, F_SETFL, O_RDWR|O_NONBLOCK) = 0 11:37:14.450980 ioctl(42, FIONREAD, [7935]) = 0 11:37:14.451026 read(42, "POST /[xxxxxxxxxxx]\r\n[xxxxxxxxxxxxx]"..., 7935) = 7935 11:37:14.452304 ioctl(42, FIONREAD, [0]) = 0 11:37:14.452361 read(42, 0x886ec38, 4159) = -1 EAGAIN (Resource temporarily unavailable) 11:37:14.452440 write(2, "lighttpd: connections.c:962: connection_handle_read_state: Assertion `c->mem->used\' failed.\n", 92) = 92 11:37:14.452580 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0 11:37:14.452664 gettid() = 2539 11:37:14.452703 tgkill(2539, 2539, SIGABRT) = 0 11:37:14.452740 --- SIGABRT (Aborted) @ 0 (0) ---
A connection is accepted from a client and a POST request is read. Then we ask to read an additional 0 bytes from ...?
-- conny
Updated by Anonymous over 19 years ago
- Status changed from Fixed to Need Feedback
- Resolution deleted (
fixed)
Wonderful! That patch fixed the problem..._in most cases_! I can still make it crash however (though it seems even less common now).
lighttpd: connections.c:962: connection_handle_read_state: Assertion `c->mem->used' failed.
I have not had time to make a new strace run yet. It looks like a variant of the same problem, no? That some certain chunk sequences still can slip through the cleanup?
-- conny
Updated by Anonymous over 19 years ago
I reproduced the crash with strace attached again. It's exactly the order of calls as last time (see above).
-- conny
Updated by Anonymous over 19 years ago
...but that was with 1.4.7+patch. I have not seen this after I upgraded to the 1.4.8 release. (On the other hand I also switched to slightly faster hardware.)
Let's close it and reopen if someone can reproduce with 1.4.8
-- conny
Updated by Anonymous over 19 years ago
- Status changed from Need Feedback to Fixed
- Resolution set to fixed
I can now confirm that this issue never appeared again after the 1.4.8 release.
-- conny
Also available in: Atom