Bug #2024: Infinite Syscall loop on broken SCGI connection while performing upload. - Lighttpd - lighty labs

Actions

Copy link

Bug #2024

closed

Infinite Syscall loop on broken SCGI connection while performing upload.

Added by mbj over 15 years ago. Updated over 15 years ago.

Status:

Fixed

Priority:

Normal

Category:

mod_scgi

Target version:

1.4.24

ASK QUESTIONS IN Forums:

Description

Hi,

Tonight i found the following problem:

If my scgi Programm dies, or closes scgi connection, while it not has read any byte of the request body, my lighttpd-1.4.20 (ssl) / gento amd64 stable, loops infinitly on the following sycalls:

getsockname(8, {sa_family=AF_INET, sin_port=htons(81), sin_addr=inet_addr("85.88.11.66")}, [16]) = 0
writev(9, [{"1205:CONTENT_LENGTH\000171990\0SCGI\0001"..., 1211}], 1) = 1211
open("/var/tmp/lighttpd-upload-yZagGS", O_RDONLY) = 10
fcntl(10, F_SETFD, FD_CLOEXEC)          = 0
sendfile(9, 10, [0], 171990)            = 110592
epoll_ctl(6, EPOLL_CTL_ADD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0
epoll_ctl(6, EPOLL_CTL_DEL, 8, {0, {u32=0, u64=0}}) = 0
epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0
epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1
sendfile(9, 10, [110592], 61398)        = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0
epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1
sendfile(9, 10, [110592], 61398)        = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0
epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1
sendfile(9, 10, [110592], 61398)        = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0
epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1
sendfile(9, 10, [110592], 61398)        = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0
epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1
sendfile(9, 10, [110592], 61398)        = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0
epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1
sendfile(9, 10, [110592], 61398)        = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
epoll_ctl(6, EPOLL_CTL_MOD, 9, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}) = 0
epoll_wait(6, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9, u64=9}}}, 1025, 1000) = 1
sendfile(9, 10, [110592], 61398)        = -1 EPIPE (Broken pipe)

and so on untill killed or stopped

Steps to reproduces:

1. Configure trivial SCGI Service
2. Upload a file to it while closing connection afer reading the header.
3. Keep your eyes on htop / top / ...

The lighttpd will serve requests as normally while keeping the cpu busy (90% system).
So IHMO the pending (and failed) "write the file to scgi socket" operation is not sucessfully removed from chunk queune, and fails and fails and fails ;).

It happens both with the sendfile and writev handler.

Tried to lookup the problem in the code, but was unsucessfull. Im was unfamiliar with the codebase before, so under heavy investigation (tomorrow) I hopefully will provide a patch here.

But maybe someone with more experience can give me some pointers ;)

I set the category to 'core', ihmo the problem lies in chunk failure handling, not scgi.

Because im going to sleep now, i will not try other lighttpd versions / plattforms for this time. Tomorrow...

Regards

Markus

Files

fix-mod-scgi-2024.patch (479 Bytes) fix-mod-scgi-2024.patch

stbuehler, 2009-07-09 23:49

Actions

Copy link

Updated by stbuehler over 15 years ago

File fix-mod-scgi-2024.patch fix-mod-scgi-2024.patch added
Category changed from core to mod_scgi
Target version set to 1.4.24

would be nice if you could try the attached patch and report back if it works. thx!

Actions

Copy link

Updated by mbj over 15 years ago

Appliet your patch, recompiled, problem gone.
Thx for your really fast response!

Not tested this on the production maschine, test was on a funtoo/x86 unstable with lighttpd-1.4.x -svn.
The problem was reproducable without the patch in this environment.

Relevant strace (after patch):

connect(5, {sa_family=AF_FILE, path="/home/mbj/tmp/scgi.sock"...}, 25) = 0
getsockname(4, {sa_family=AF_INET, sin_port=htons(2000), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
writev(5, [{"915:CONTENT_LENGTH\000171853\0SCGI\0001\0"..., 920}], 1) = 920
open("/var/tmp/lighttpd-upload-57l88a", O_RDONLY|O_LARGEFILE) = 6
fcntl64(6, F_SETFD, FD_CLOEXEC)         = 0
sendfile64(5, 6, [0], 171853)           = 106496
time(NULL)                              = 1247185051
poll([{fd=3, events=POLLIN}, {fd=-1}, {fd=5, events=POLLOUT}], 3, 1000) = 1 ([{fd=5, revents=POLLOUT|POLLERR|POLLHUP}])
sendfile64(5, 6, [106496], 65357)       = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
write(2, "2009-07-10 02:17:31: (mod_scgi.c."..., 1542009-07-10 02:17:31: (mod_scgi.c.2325) [REPORT ME] connection was dropped after accept(). reconnect() denied: write-offset: 107416 reconnect attempts: 0 
) = 154
write(2, "2009-07-10 02:17:31: (mod_scgi.c."..., 892009-07-10 02:17:31: (mod_scgi.c.2393) fcgi-server disabled:  0 /home/mbj/tmp/scgi.sock 
) = 89
close(5)                                = 0
time(NULL)                              = 1247185051
poll([{fd=3, events=POLLIN}, {fd=-1}, {fd=-1}], 3, 1000) = 0 (Timeout)
time(NULL)                              = 1247185052
poll([{fd=3, events=POLLIN}, {fd=-1}, {fd=-1}], 3, 1000) = 0 (Timeout)
time(NULL)                              = 1247185053
poll([{fd=3, events=POLLIN}, {fd=-1}, {fd=-1}], 3, 1000) = 0 (Timeout)
time(NULL)                              = 1247185054
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
write(2, "2009-07-10 02:17:33: (mod_scgi.c."..., 912009-07-10 02:17:33: (mod_scgi.c.2087) fcgi-server re-enabled:  0 /home/mbj/tmp/scgi.sock 
... normal 1sec event loop...

1. try stderr printout:

2009-07-10 02:17:02: (log.c.172) server started 
2009-07-10 02:17:06: (mod_scgi.c.2639) error: unexpected close of scgi connection for / (no scgi process on host: , port: 0 ?)3 
2009-07-10 02:17:06: (mod_scgi.c.2478) emergency exit: scgi: connection-fd: 4 fcgi-fd: 5

2. try stderr printout:

2009-07-10 02:19:41: (log.c.172) server started 
2009-07-10 02:19:54: (mod_scgi.c.2325) [REPORT ME] connection was dropped after accept(). reconnect() denied: write-offset: 107416 reconnect attempts: 0 
2009-07-10 02:19:54: (mod_scgi.c.2393) fcgi-server disabled:  0 /home/mbj/tmp/scgi.sock

sever.configuration:

server.modules = ( "mod_scgi" )
server.port          = 2000
server.document-root = "." 
scgi.server = ( "/" => ((
  "socket" => "/home/mbj/tmp/scgi.sock",
  "check-local" => "disable",
  "disable-time" => 1
)))

just wondering about the stderr printouts, 1. time was differend from the next ones? Indeterminism???

Regarts

Markus

Actions

Copy link