Bug #2024
Infinite Syscall loop on broken SCGI connection while performing upload.
| Status: | Fixed | Start date: | 2009-07-09 | |
|---|---|---|---|---|
| Priority: | Normal | Due date: | ||
| Assignee: | - | % Done: | 100% | |
| Category: | mod_scgi | |||
| Target version: | 1.4.24 | |||
| Missing in 1.5.x: |
Description
Hi,
Tonight i found the following problem:
If my scgi Programm dies, or closes scgi connection, while it not has read any byte of the request body, my lighttpd-1.4.20 (ssl) / gento amd64 stable, loops infinitly on the following sycalls:
getsockname(8, {sa_family=AF_INET, sin_port=htons(81), sin_addr=inet_addr("85.88.11.66")}, [16]) = 0
writev(9, [{"1205:CONTENT_LENGTH\000171990\0SCGI\0001"..., 1211}], 1) = 1211
open("/var/tmp/lighttpd-upload-yZagGS", O_RDONLY) = 10
fcntl(10, F_SETFD, FD_CLOEXEC) = 0
sendfile(9, 10, [0], 171990) = 110592
epoll_ctl(6, EPOLL_CTL_ADD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0
epoll_ctl(6, EPOLL_CTL_DEL, 8, {0, {u32=0, u64=0}}) = 0
epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0
epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1
sendfile(9, 10, [110592], 61398) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0
epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1
sendfile(9, 10, [110592], 61398) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0
epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1
sendfile(9, 10, [110592], 61398) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0
epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1
sendfile(9, 10, [110592], 61398) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0
epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1
sendfile(9, 10, [110592], 61398) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0
epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1
sendfile(9, 10, [110592], 61398) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0
epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1
sendfile(9, 10, [110592], 61398) = -1 EPIPE (Broken pipe)
and so on untill killed or stopped
Steps to reproduces:
1. Configure trivial SCGI Service
2. Upload a file to it while closing connection afer reading the header.
3. Keep your eyes on htop / top / ...
The lighttpd will serve requests as normally while keeping the cpu busy (90% system).
So IHMO the pending (and failed) "write the file to scgi socket" operation is not sucessfully removed from chunk queune, and fails and fails and fails ;).
It happens both with the sendfile and writev handler.
Tried to lookup the problem in the code, but was unsucessfull. Im was unfamiliar with the codebase before, so under heavy investigation (tomorrow) I hopefully will provide a patch here.
But maybe someone with more experience can give me some pointers ;)
I set the category to 'core', ihmo the problem lies in chunk failure handling, not scgi.
Because im going to sleep now, i will not try other lighttpd versions / plattforms for this time. Tomorrow...
Regards
Markus
Associated revisions
Fix hanging connection in mod_scgi (fixes #2024)
Fix handling network-write return values (#2024)
History
#1 Updated by stbuehler almost 4 years ago
- File fix-mod-scgi-2024.patch
added - Category changed from core to mod_scgi
- Target version set to 1.4.24
would be nice if you could try the attached patch and report back if it works. thx!
#2 Updated by mbj almost 4 years ago
Appliet your patch, recompiled, problem gone.
Thx for your really fast response!
Not tested this on the production maschine, test was on a funtoo/x86 unstable with lighttpd-1.4.x -svn.
The problem was reproducable without the patch in this environment.
Relevant strace (after patch):
connect(5, {sa_family=AF_FILE, path="/home/mbj/tmp/scgi.sock"...}, 25) = 0
getsockname(4, {sa_family=AF_INET, sin_port=htons(2000), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
writev(5, [{"915:CONTENT_LENGTH\000171853\0SCGI\0001\0"..., 920}], 1) = 920
open("/var/tmp/lighttpd-upload-57l88a", O_RDONLY|O_LARGEFILE) = 6
fcntl64(6, F_SETFD, FD_CLOEXEC) = 0
sendfile64(5, 6, [0], 171853) = 106496
time(NULL) = 1247185051
poll([{fd=3, events=POLLIN}, {fd=-1}, {fd=5, events=POLLOUT}], 3, 1000) = 1 ([{fd=5, revents=POLLOUT|POLLERR|POLLHUP}])
sendfile64(5, 6, [106496], 65357) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
write(2, "2009-07-10 02:17:31: (mod_scgi.c."..., 1542009-07-10 02:17:31: (mod_scgi.c.2325) [REPORT ME] connection was dropped after accept(). reconnect() denied: write-offset: 107416 reconnect attempts: 0
) = 154
write(2, "2009-07-10 02:17:31: (mod_scgi.c."..., 892009-07-10 02:17:31: (mod_scgi.c.2393) fcgi-server disabled: 0 /home/mbj/tmp/scgi.sock
) = 89
close(5) = 0
time(NULL) = 1247185051
poll([{fd=3, events=POLLIN}, {fd=-1}, {fd=-1}], 3, 1000) = 0 (Timeout)
time(NULL) = 1247185052
poll([{fd=3, events=POLLIN}, {fd=-1}, {fd=-1}], 3, 1000) = 0 (Timeout)
time(NULL) = 1247185053
poll([{fd=3, events=POLLIN}, {fd=-1}, {fd=-1}], 3, 1000) = 0 (Timeout)
time(NULL) = 1247185054
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
write(2, "2009-07-10 02:17:33: (mod_scgi.c."..., 912009-07-10 02:17:33: (mod_scgi.c.2087) fcgi-server re-enabled: 0 /home/mbj/tmp/scgi.sock
... normal 1sec event loop...
1. try stderr printout:
2009-07-10 02:17:02: (log.c.172) server started 2009-07-10 02:17:06: (mod_scgi.c.2639) error: unexpected close of scgi connection for / (no scgi process on host: , port: 0 ?)3 2009-07-10 02:17:06: (mod_scgi.c.2478) emergency exit: scgi: connection-fd: 4 fcgi-fd: 5
2. try stderr printout:
2009-07-10 02:19:41: (log.c.172) server started 2009-07-10 02:19:54: (mod_scgi.c.2325) [REPORT ME] connection was dropped after accept(). reconnect() denied: write-offset: 107416 reconnect attempts: 0 2009-07-10 02:19:54: (mod_scgi.c.2393) fcgi-server disabled: 0 /home/mbj/tmp/scgi.sock
sever.configuration:
server.modules = ( "mod_scgi" ) server.port = 2000 server.document-root = "." scgi.server = ( "/" => (( "socket" => "/home/mbj/tmp/scgi.sock", "check-local" => "disable", "disable-time" => 1 )))
just wondering about the stderr printouts, 1. time was differend from the next ones? Indeterminism???
Regarts
Markus
#3 Updated by stbuehler almost 4 years ago
There are different places where errors are handled, so yes, indeterminism.
#4 Updated by stbuehler almost 4 years ago
- Status changed from New to Fixed
- % Done changed from 0 to 100
Applied in changeset r2585.
#5 Updated by jbj almost 4 years ago
stbuehler wrote:
Applied in changeset r2585.
hi stbuehler, sorry if its wrong place to ask. But the same problem (broken pipe) will also be in mod_proxy while uploading right?
#6 Updated by stbuehler almost 4 years ago
Yes, right. see r2596.
Also available in: Atom