Bug #2024
closedInfinite Syscall loop on broken SCGI connection while performing upload.
Description
Hi,
Tonight i found the following problem:
If my scgi Programm dies, or closes scgi connection, while it not has read any byte of the request body, my lighttpd-1.4.20 (ssl) / gento amd64 stable, loops infinitly on the following sycalls:
getsockname(8, {sa_family=AF_INET, sin_port=htons(81), sin_addr=inet_addr("85.88.11.66")}, [16]) = 0 writev(9, [{"1205:CONTENT_LENGTH\000171990\0SCGI\0001"..., 1211}], 1) = 1211 open("/var/tmp/lighttpd-upload-yZagGS", O_RDONLY) = 10 fcntl(10, F_SETFD, FD_CLOEXEC) = 0 sendfile(9, 10, [0], 171990) = 110592 epoll_ctl(6, EPOLL_CTL_ADD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0 epoll_ctl(6, EPOLL_CTL_DEL, 8, {0, {u32=0, u64=0}}) = 0 epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0 epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1 sendfile(9, 10, [110592], 61398) = -1 EPIPE (Broken pipe) --- SIGPIPE (Broken pipe) @ 0 (0) --- epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0 epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1 sendfile(9, 10, [110592], 61398) = -1 EPIPE (Broken pipe) --- SIGPIPE (Broken pipe) @ 0 (0) --- epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0 epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1 sendfile(9, 10, [110592], 61398) = -1 EPIPE (Broken pipe) --- SIGPIPE (Broken pipe) @ 0 (0) --- epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0 epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1 sendfile(9, 10, [110592], 61398) = -1 EPIPE (Broken pipe) --- SIGPIPE (Broken pipe) @ 0 (0) --- epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0 epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1 sendfile(9, 10, [110592], 61398) = -1 EPIPE (Broken pipe) --- SIGPIPE (Broken pipe) @ 0 (0) --- epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0 epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1 sendfile(9, 10, [110592], 61398) = -1 EPIPE (Broken pipe) --- SIGPIPE (Broken pipe) @ 0 (0) --- epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0 epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1 sendfile(9, 10, [110592], 61398) = -1 EPIPE (Broken pipe) and so on untill killed or stopped
Steps to reproduces:
1. Configure trivial SCGI Service
2. Upload a file to it while closing connection afer reading the header.
3. Keep your eyes on htop / top / ...
The lighttpd will serve requests as normally while keeping the cpu busy (90% system).
So IHMO the pending (and failed) "write the file to scgi socket" operation is not sucessfully removed from chunk queune, and fails and fails and fails ;).
It happens both with the sendfile and writev handler.
Tried to lookup the problem in the code, but was unsucessfull. Im was unfamiliar with the codebase before, so under heavy investigation (tomorrow) I hopefully will provide a patch here.
But maybe someone with more experience can give me some pointers ;)
I set the category to 'core', ihmo the problem lies in chunk failure handling, not scgi.
Because im going to sleep now, i will not try other lighttpd versions / plattforms for this time. Tomorrow...
Regards
Markus
Files
Updated by stbuehler over 15 years ago
- File fix-mod-scgi-2024.patch fix-mod-scgi-2024.patch added
- Category changed from core to mod_scgi
- Target version set to 1.4.24
would be nice if you could try the attached patch and report back if it works. thx!
Updated by mbj over 15 years ago
Appliet your patch, recompiled, problem gone.
Thx for your really fast response!
Not tested this on the production maschine, test was on a funtoo/x86 unstable with lighttpd-1.4.x -svn.
The problem was reproducable without the patch in this environment.
Relevant strace (after patch):
connect(5, {sa_family=AF_FILE, path="/home/mbj/tmp/scgi.sock"...}, 25) = 0 getsockname(4, {sa_family=AF_INET, sin_port=htons(2000), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0 writev(5, [{"915:CONTENT_LENGTH\000171853\0SCGI\0001\0"..., 920}], 1) = 920 open("/var/tmp/lighttpd-upload-57l88a", O_RDONLY|O_LARGEFILE) = 6 fcntl64(6, F_SETFD, FD_CLOEXEC) = 0 sendfile64(5, 6, [0], 171853) = 106496 time(NULL) = 1247185051 poll([{fd=3, events=POLLIN}, {fd=-1}, {fd=5, events=POLLOUT}], 3, 1000) = 1 ([{fd=5, revents=POLLOUT|POLLERR|POLLHUP}]) sendfile64(5, 6, [106496], 65357) = -1 EPIPE (Broken pipe) --- SIGPIPE (Broken pipe) @ 0 (0) --- stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 write(2, "2009-07-10 02:17:31: (mod_scgi.c."..., 1542009-07-10 02:17:31: (mod_scgi.c.2325) [REPORT ME] connection was dropped after accept(). reconnect() denied: write-offset: 107416 reconnect attempts: 0 ) = 154 write(2, "2009-07-10 02:17:31: (mod_scgi.c."..., 892009-07-10 02:17:31: (mod_scgi.c.2393) fcgi-server disabled: 0 /home/mbj/tmp/scgi.sock ) = 89 close(5) = 0 time(NULL) = 1247185051 poll([{fd=3, events=POLLIN}, {fd=-1}, {fd=-1}], 3, 1000) = 0 (Timeout) time(NULL) = 1247185052 poll([{fd=3, events=POLLIN}, {fd=-1}, {fd=-1}], 3, 1000) = 0 (Timeout) time(NULL) = 1247185053 poll([{fd=3, events=POLLIN}, {fd=-1}, {fd=-1}], 3, 1000) = 0 (Timeout) time(NULL) = 1247185054 stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 write(2, "2009-07-10 02:17:33: (mod_scgi.c."..., 912009-07-10 02:17:33: (mod_scgi.c.2087) fcgi-server re-enabled: 0 /home/mbj/tmp/scgi.sock ... normal 1sec event loop...
1. try stderr printout:
2009-07-10 02:17:02: (log.c.172) server started 2009-07-10 02:17:06: (mod_scgi.c.2639) error: unexpected close of scgi connection for / (no scgi process on host: , port: 0 ?)3 2009-07-10 02:17:06: (mod_scgi.c.2478) emergency exit: scgi: connection-fd: 4 fcgi-fd: 5
2. try stderr printout:
2009-07-10 02:19:41: (log.c.172) server started 2009-07-10 02:19:54: (mod_scgi.c.2325) [REPORT ME] connection was dropped after accept(). reconnect() denied: write-offset: 107416 reconnect attempts: 0 2009-07-10 02:19:54: (mod_scgi.c.2393) fcgi-server disabled: 0 /home/mbj/tmp/scgi.sock
sever.configuration:
server.modules = ( "mod_scgi" ) server.port = 2000 server.document-root = "." scgi.server = ( "/" => (( "socket" => "/home/mbj/tmp/scgi.sock", "check-local" => "disable", "disable-time" => 1 )))
just wondering about the stderr printouts, 1. time was differend from the next ones? Indeterminism???
Regarts
Markus
Updated by stbuehler over 15 years ago
There are different places where errors are handled, so yes, indeterminism.
Updated by stbuehler over 15 years ago
- Status changed from New to Fixed
- % Done changed from 0 to 100
Applied in changeset r2585.
Updated by jbj over 15 years ago
stbuehler wrote:
Applied in changeset r2585.
hi stbuehler, sorry if its wrong place to ask. But the same problem (broken pipe) will also be in mod_proxy while uploading right?
Also available in: Atom