Bug #530
closedLighttpd dies under load when using linux-sysepoll event handler
Description
I've had a problem just lately with lighttpd dying randomly on a customer's systems (a pair of systems used for serving media files via HTTP in a load-balanced configuration). After adding a cron job to restart lighttpd automatically if it wasn't running, I discovered that it kept dying for some reason. I tried running lighttpd through strace, and it died like this:
read(188, 0x8a44540, 4159) = -1 EAGAIN (Resource temporarily unavailable) epoll_ctl(6, EPOLL_CTL_MOD, 188, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=188, u64=188}}) = 0 epoll_ctl(6, EPOLL_CTL_MOD, 246, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=246, u64=246}}) = 0 epoll_ctl(6, EPOLL_CTL_MOD, 913, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=913, u64=913}}) = 0 epoll_ctl(6, EPOLL_CTL_MOD, 206, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=206, u64=206}}) = 0 epoll_ctl(6, EPOLL_CTL_MOD, 188, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=188, u64=188}}) = 0 time(NULL) = 1140188048 epoll_wait(6, {{EPOLLOUT, {u32=84, u64=84}}, {EPOLLIN, {u32=1018, u64=1018}}, {EPOLLIN, {u32=4, u64=4}}, {EPOLLOUT, {u32=65, u64=65}}, {EPOLLOUT, {u32=915, u64=915}}, {EPOLLOUT, {u32=708, u64=708}}, {EPOLLOUT, {u32=377, u64=377}}}, 1015, 1000) = 7 sendfile64(84, 151, [900595], 3519075) = 37960 write(2, "fdevent.c.170: aborted\n", 23) = 23 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0 tgkill(10857, 10857, SIGABRT) = 0 --- SIGABRT (Aborted) @ 0 (0) ---
The above occurred when using the linux-sysepoll event handler with the linux-sendfile network backend. I then tried the writev network backend, and lighttpd then died as follows:
epoll_ctl(6, EPOLL_CTL_MOD, 863, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=863, u64=863}}) = 0 epoll_ctl(6, EPOLL_CTL_MOD, 192, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=192, u64=192}}) = 0 epoll_ctl(6, EPOLL_CTL_MOD, 530, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=530, u64=530}}) = 0 epoll_ctl(6, EPOLL_CTL_MOD, 278, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=278, u64=278}}) = 0 epoll_ctl(6, EPOLL_CTL_MOD, 438, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=438, u64=438}}) = 0 epoll_ctl(6, EPOLL_CTL_MOD, 594, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=594, u64=594}}) = 0 epoll_ctl(6, EPOLL_CTL_MOD, 616, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=616, u64=616}}) = 0 epoll_ctl(6, EPOLL_CTL_MOD, 889, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=889, u64=889}}) = 0 epoll_ctl(6, EPOLL_CTL_MOD, 773, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=773, u64=773}}) = 0 epoll_ctl(6, EPOLL_CTL_MOD, 329, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=329, u64=329}}) = 0 epoll_ctl(6, EPOLL_CTL_MOD, 934, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=934, u64=934}}) = 0 epoll_ctl(6, EPOLL_CTL_MOD, 757, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=757, u64=757}}) = 0 epoll_ctl(6, EPOLL_CTL_MOD, 923, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=923, u64=923}}) = 0 epoll_ctl(6, EPOLL_CTL_MOD, 944, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=944, u64=944}}) = 0 epoll_ctl(6, EPOLL_CTL_MOD, 945, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=945, u64=945}}) = 0 epoll_ctl(6, EPOLL_CTL_MOD, 973, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=973, u64=973}}) = 0 epoll_ctl(6, EPOLL_CTL_MOD, 744, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=744, u64=744}}) = 0 time(NULL) = 1140188147 epoll_wait(6, {{EPOLLIN|EPOLLERR|EPOLLHUP, {u32=642, u64=642}}, {EPOLLOUT, {u32=1022, u64=1022}}, {EPOLLOUT, {u32=1020, u64=1020}}, {EPOLLOUT, {u32=372, u64=372}}, ...}, 1015, 1000) = 224 --- SIGSEGV (Segmentation fault) @ 0 (0) ---
However, I seem to be encountering similar problems using the linux-rtsig event handler, so I've switched to using poll instead.
Lighttpd is running on CentOS 4.2/i386. It is running the 2.6.9-22.0.2.ELsmp kernel on single Xeon systems (SMP kernels to take advantage of hyperthreading). Any thoughts or suggestions are appreciated.
-- Derrik Pates <dpates
Updated by conny about 19 years ago
Not saying that this is not a bug - but did you try with HT switched off with a non-SMP kernel? (I think I can remember seeing more stable performance from Lighttpd in uniprocessor configuration compared to that of HT+SMP...)
Updated by cam about 19 years ago
it looks like you might be running out of file descriptors. try setting the following in the conf file:
server.max-fds = 2048
Updated by conny about 19 years ago
Perhaps you should have a look at ticket #545 which mentions CentOS-specific tweaks.
Updated by carenas about 18 years ago
not sure about linux-sysepoll as the errors shown seem to have different root causes, but linux-rtsig can segfault as show by ticket 941 because of a null pointer dereference
Updated by stbuehler about 17 years ago
- Status changed from New to Fixed
- Resolution set to duplicate
See #1562.
Also available in: Atom