Project

General

Profile

Actions

Bug #2024

closed

Infinite Syscall loop on broken SCGI connection while performing upload.

Added by mbj over 15 years ago. Updated over 15 years ago.

Status:
Fixed
Priority:
Normal
Category:
mod_scgi
Target version:
ASK QUESTIONS IN Forums:

Description

Hi,

Tonight i found the following problem:

If my scgi Programm dies, or closes scgi connection, while it not has read any byte of the request body, my lighttpd-1.4.20 (ssl) / gento amd64 stable, loops infinitly on the following sycalls:

getsockname(8, {sa_family=AF_INET, sin_port=htons(81), sin_addr=inet_addr("85.88.11.66")}, [16]) = 0
writev(9, [{"1205:CONTENT_LENGTH\000171990\0SCGI\0001"..., 1211}], 1) = 1211
open("/var/tmp/lighttpd-upload-yZagGS", O_RDONLY) = 10
fcntl(10, F_SETFD, FD_CLOEXEC)          = 0
sendfile(9, 10, [0], 171990)            = 110592
epoll_ctl(6, EPOLL_CTL_ADD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0
epoll_ctl(6, EPOLL_CTL_DEL, 8, {0, {u32=0, u64=0}}) = 0
epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0
epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1
sendfile(9, 10, [110592], 61398)        = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0
epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1
sendfile(9, 10, [110592], 61398)        = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0
epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1
sendfile(9, 10, [110592], 61398)        = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0
epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1
sendfile(9, 10, [110592], 61398)        = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0
epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1
sendfile(9, 10, [110592], 61398)        = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0
epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1
sendfile(9, 10, [110592], 61398)        = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0
epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1
sendfile(9, 10, [110592], 61398)        = -1 EPIPE (Broken pipe)

and so on untill killed or stopped

Steps to reproduces:

1. Configure trivial SCGI Service
2. Upload a file to it while closing connection afer reading the header.
3. Keep your eyes on htop / top / ...

The lighttpd will serve requests as normally while keeping the cpu busy (90% system).
So IHMO the pending (and failed) "write the file to scgi socket" operation is not sucessfully removed from chunk queune, and fails and fails and fails ;).

It happens both with the sendfile and writev handler.

Tried to lookup the problem in the code, but was unsucessfull. Im was unfamiliar with the codebase before, so under heavy investigation (tomorrow) I hopefully will provide a patch here.

But maybe someone with more experience can give me some pointers ;)

I set the category to 'core', ihmo the problem lies in chunk failure handling, not scgi.

Because im going to sleep now, i will not try other lighttpd versions / plattforms for this time. Tomorrow...

Regards

Markus


Files

fix-mod-scgi-2024.patch (479 Bytes) fix-mod-scgi-2024.patch stbuehler, 2009-07-09 23:49
Actions #1

Updated by stbuehler over 15 years ago

would be nice if you could try the attached patch and report back if it works. thx!

Actions #2

Updated by mbj over 15 years ago

Appliet your patch, recompiled, problem gone.
Thx for your really fast response!

Not tested this on the production maschine, test was on a funtoo/x86 unstable with lighttpd-1.4.x -svn.
The problem was reproducable without the patch in this environment.

Relevant strace (after patch):

connect(5, {sa_family=AF_FILE, path="/home/mbj/tmp/scgi.sock"...}, 25) = 0
getsockname(4, {sa_family=AF_INET, sin_port=htons(2000), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
writev(5, [{"915:CONTENT_LENGTH\000171853\0SCGI\0001\0"..., 920}], 1) = 920
open("/var/tmp/lighttpd-upload-57l88a", O_RDONLY|O_LARGEFILE) = 6
fcntl64(6, F_SETFD, FD_CLOEXEC)         = 0
sendfile64(5, 6, [0], 171853)           = 106496
time(NULL)                              = 1247185051
poll([{fd=3, events=POLLIN}, {fd=-1}, {fd=5, events=POLLOUT}], 3, 1000) = 1 ([{fd=5, revents=POLLOUT|POLLERR|POLLHUP}])
sendfile64(5, 6, [106496], 65357)       = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
write(2, "2009-07-10 02:17:31: (mod_scgi.c."..., 1542009-07-10 02:17:31: (mod_scgi.c.2325) [REPORT ME] connection was dropped after accept(). reconnect() denied: write-offset: 107416 reconnect attempts: 0 
) = 154
write(2, "2009-07-10 02:17:31: (mod_scgi.c."..., 892009-07-10 02:17:31: (mod_scgi.c.2393) fcgi-server disabled:  0 /home/mbj/tmp/scgi.sock 
) = 89
close(5)                                = 0
time(NULL)                              = 1247185051
poll([{fd=3, events=POLLIN}, {fd=-1}, {fd=-1}], 3, 1000) = 0 (Timeout)
time(NULL)                              = 1247185052
poll([{fd=3, events=POLLIN}, {fd=-1}, {fd=-1}], 3, 1000) = 0 (Timeout)
time(NULL)                              = 1247185053
poll([{fd=3, events=POLLIN}, {fd=-1}, {fd=-1}], 3, 1000) = 0 (Timeout)
time(NULL)                              = 1247185054
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
write(2, "2009-07-10 02:17:33: (mod_scgi.c."..., 912009-07-10 02:17:33: (mod_scgi.c.2087) fcgi-server re-enabled:  0 /home/mbj/tmp/scgi.sock 
... normal 1sec event loop...

1. try stderr printout:

2009-07-10 02:17:02: (log.c.172) server started 
2009-07-10 02:17:06: (mod_scgi.c.2639) error: unexpected close of scgi connection for / (no scgi process on host: , port: 0 ?)3 
2009-07-10 02:17:06: (mod_scgi.c.2478) emergency exit: scgi: connection-fd: 4 fcgi-fd: 5 

2. try stderr printout:

2009-07-10 02:19:41: (log.c.172) server started 
2009-07-10 02:19:54: (mod_scgi.c.2325) [REPORT ME] connection was dropped after accept(). reconnect() denied: write-offset: 107416 reconnect attempts: 0 
2009-07-10 02:19:54: (mod_scgi.c.2393) fcgi-server disabled:  0 /home/mbj/tmp/scgi.sock 

sever.configuration:

server.modules = ( "mod_scgi" )
server.port          = 2000
server.document-root = "." 
scgi.server = ( "/" => ((
  "socket" => "/home/mbj/tmp/scgi.sock",
  "check-local" => "disable",
  "disable-time" => 1
)))

just wondering about the stderr printouts, 1. time was differend from the next ones? Indeterminism???

Regarts

Markus

Actions #3

Updated by stbuehler over 15 years ago

There are different places where errors are handled, so yes, indeterminism.

Actions #4

Updated by stbuehler over 15 years ago

  • Status changed from New to Fixed
  • % Done changed from 0 to 100

Applied in changeset r2585.

Actions #5

Updated by jbj over 15 years ago

stbuehler wrote:

Applied in changeset r2585.

hi stbuehler, sorry if its wrong place to ask. But the same problem (broken pipe) will also be in mod_proxy while uploading right?

Actions #6

Updated by stbuehler over 15 years ago

Yes, right. see r2596.

Actions

Also available in: Atom