Project

General

Profile

Actions

Bug #2827

closed

POST to mod_cgi sometimes hangs

Added by davidm about 7 years ago. Updated about 7 years ago.

Status:
Fixed
Priority:
Normal
Category:
core
Target version:
ASK QUESTIONS IN Forums:

Description

We have been seeing cases where lighttpd 1.4.45 would occasionally hang after POSTing all bytes to a CGI binary. Specifically, what happens is that cgi_write_request() would correctly detect that cq->bytes_out == con->request.content_length and as a result it would call cgi_connection_close_fdtocgi(). This in turn calls cgi_connection_close_fdtocgi() to schedule closing of the file descriptor. So far so good.

The problem is that fedevent_sched_run() then never gets called, so the file descriptor never actually gets closed and the CGI binary never sees an EOF. I think the reason this is happening is that the final call to cgi_write_request() was triggered through connection_state_machine() (after the fdevent_poll() loop). In this case, if there are no further HTTP requests, fdevent_poll() will never return a pending event and thus will never run fdevent_sched_run().

We have:

server.stream-request-body = 2
server.stream-response-body = 2

in case that matters.

Actions #1

Updated by gstrauss about 7 years ago

  • Status changed from New to Patch Pending
  • Target version changed from 1.4.x to 1.4.46

Technically speaking, CGI should check (and validate) CONTENT_LENGTH before reading input, and should read CONTENT_LENGTH of input, instead of reading until EOF.

Still, you have quite thoroughly described an edge case in lighttpd which should be fixed. The easiest fix is to move the call to fdevent_sched_run() to be outside that conditional block. lighttpd will still wait 1 second if no new events arrive, but will then proceed as expected.

While I could add some logic so that lighttpd polls for events immediately (instant timeout) if there are pending fds to be closed, closes the fds pending close, and then polls for a longer timeout, I think the simple fix is sufficient, and if you do not want to wait 1 second for your CGI to continue, then you should follow the CGI spec and use CONTENT_LENGTH.

https://tools.ietf.org/html/rfc3875

4.2.  Request Message-Body
   [...]
   A request-body is supplied with the request if the CONTENT_LENGTH is
   not NULL.  The server MUST make at least that many bytes available
   for the script to read.  The server MAY signal an end-of-file
   condition after CONTENT_LENGTH bytes have been read or it MAY supply
   extension data.  Therefore, the script MUST NOT attempt to read more
   than CONTENT_LENGTH bytes, even if more data is available.  However,
   it is not obliged to read any of the data.

Actions #2

Updated by gstrauss about 7 years ago

--- a/src/server.c
+++ b/src/server.c
@@ -2085,13 +2085,14 @@ static int server_main (server * const srv, int argc, char **argv) {
                                        (*handler)(srv, context, revents);
                                }
                        } while (--n > 0);
-                       fdevent_sched_run(srv, srv->ev);
                } else if (n < 0 && errno != EINTR) {
                        log_error_write(srv, __FILE__, __LINE__, "ss",
                                        "fdevent_poll failed:",
                                        strerror(errno));
                }

+               if (n >= 0) fdevent_sched_run(srv, srv->ev);
+
                for (ndx = 0; ndx < srv->joblist->used; ndx++) {
                        connection *con = srv->joblist->ptr[ndx];
                        connection_state_machine(srv, con);
Actions #3

Updated by davidm about 7 years ago

Wouldn't it be better to do fdevent_sched_run() unconditionally? Otherwise, if epoll_wait() gets interrupted by a signal, we wouldn't call fdevent_sched_run() even though there might be pending work from the connection_state_machine(), right?

Actions #4

Updated by gstrauss about 7 years ago

Wouldn't it be better to do fdevent_sched_run() unconditionally?

Not if there are things intended to be sent to the kernel which were interrupted before being sent to the kernel. (This might not be entirely relevant in this call to poll, but is relevant in other event loops I have written elsewhere.)

Otherwise, if epoll_wait() gets interrupted by a signal, we wouldn't call fdevent_sched_run() even though there might be pending work from the connection_state_machine(), right?

This code is in a loop. If interrupted, it will be tried again on the next loop. On any busy system, there should be actual fd events ready before interrupts, and on mostly idle systems, there is unlikely to be a large number of syscall interrupts (without fd events) which might interrupt the loop many times before it completes at least once (with either an event or a 1 second timeout without any event).

I have already pointed out that the root cause of your problem is the insecure CGI script which does not follow the CGI standard and might read an arbitrarily large amount of data.

Actions #5

Updated by gstrauss about 7 years ago

  • Status changed from Patch Pending to Fixed
  • % Done changed from 0 to 100
Actions #6

Updated by stbuehler about 7 years ago

Why not just set the timeout to 1 (ms) when there are close requests pending (e.g. in fdevent_poll)? Also why is fdevent_sched_run called in fdevent_libev_poll ?

Actions #7

Updated by gstrauss about 7 years ago

Also why is fdevent_sched_run called in fdevent_libev_poll ?

See eb37615a

Actions

Also available in: Atom