Project

General

Profile

Bug #2542

Race leads to incorrect output in some circumstances

Added by anomie over 3 years ago. Updated about 1 year ago.

Status:
Fixed
Priority:
Normal
Assignee:
-
Category:
mod_cgi
Target version:
Start date:
2013-12-26
Due date:
% Done:

100%

Missing in 1.5.x:
No

Description

It appears that the intended behavior of the 1.4.x branch when the CGI executable's output does not contain valid CGI or NPH headers is to return a 200 response if there was any output to stdout, or a "500 internal server error" when there was no output to stdout at all. Due to a race condition, a 200 response with an empty body will sometimes be returned when nothing was printed to stdout.

When the child process exits, there is a short delay between when its stdout filehandle is closed and when it becomes waitable with waitpid(). If mod_cgi happens to receive the FDEVENT_HUP event due to the client's stdout closing and call waitpid from cgi_connection_close before the child process becomes waitable, instead of interpreting "no output" as an error lighttpd will returns it as a 200 response with an empty body.

The attached C program can be used to test this behavior. As written, it will return an empty 200 response in all cases. If the "sleep(1);" line is removed, it will usually return an empty 200 response but rarely (about 6% of the time in my local testing) will return a 500 response. If the "close(STDOUT_FILENO);" line is removed, it will usually return a 500 response but rarely (about 2% of the time in my local testing) will return an empty 200 response.

Tested in 1.4.33 and svn lighttpd-1.4.x branch r2925. The issue is not present in trunk r2907, as that appears to always returns a 502 error response if the CGI executable's output does not contain CGI or NPH headers.

foo.c View - Simple test program (94 Bytes) anomie, 2013-12-26 16:16


Related issues

Related to Bug #756: transfer-encoding chunked broken (OSX) Fixed
Related to Bug #757: internal server error gives 200 OK sometimes Duplicate

Associated revisions

Revision 3089 (diff)
Added by stbuehler about 1 year ago

[mod_cgi] send 500 if CGI ends and there is no response (fixes #2542)

(Thx, anomie, who identified and explained problem in above ticket)

From: Glenn Strauss <>

Revision 94647804 (diff)
Added by gstrauss about 1 year ago

[mod_cgi] send 500 if CGI ends and there is no response (fixes #2542)

(Thx, anomie, who identified and explained problem in above ticket)

From: Glenn Strauss <>

git-svn-id: svn://svn.lighttpd.net/lighttpd/branches/lighttpd-1.4.x@3089 152afb58-edef-0310-8abb-c4023f1b3aa9

History

#1 Updated by stbuehler about 3 years ago

  • Target version set to 1.4.x

#2 Updated by stbuehler about 1 year ago

  • Related to Bug #756: transfer-encoding chunked broken (OSX) added

#3 Updated by stbuehler about 1 year ago

  • Related to Bug #757: internal server error gives 200 OK sometimes added

#4 Updated by gstrauss about 1 year ago

(For the record, I think mod_cgi should return 502 Bad Gateway if CGI response is missing a valid CGI header.)

That said, I looked through mod_cgi and seem to have found a solution to the reported issue.

Why does mod_cgi_handle_subrequest() call waitpid()? (It seems to be historical.) Why not wait for a read event on pipe from CGI, as a read() of 0-bytes indicates EOF, whether the CGI closed its stdout, or the CGI exited.

Modifying mod_cgi_handle_subrequest() to always return HANDLER_WAIT_FOR_EVENT has foo.c running and returning the expected 500 Internal Server Error, and doing so 100% of the time.

The quick patch I tested when running foo (from the attached foo.c) with weighttp against lighttpd:

diff --git a/src/mod_cgi.c b/src/mod_cgi.c
index bfbfa13..44dcd69 100644
--- a/src/mod_cgi.c
+++ b/src/mod_cgi.c
@@ -1382,6 +1382,8 @@ SUBREQUEST_FUNC(mod_cgi_handle_subrequest) {
        }

 #ifndef __WIN32
+       return HANDLER_WAIT_FOR_EVENT;
+#if 0
        switch(waitpid(hctx->pid, &status, WNOHANG)) {
        case 0:
                /* we only have for events here if we don't have the header yet,
@@ -1449,6 +1451,7 @@ SUBREQUEST_FUNC(mod_cgi_handle_subrequest) {
                con->plugin_ctx[p->id] = NULL;
                return HANDLER_FINISHED;
        }
+#endif
 #else
        return HANDLER_ERROR;
 #endif

Please review this and look back at old commit 2cc4f967 and commit 73e0bb27 to see if the above patch would handle those cases. It looks to me as if it would (and more from 73e0bb27 could be removed from mod_cgi_handle_subrequest())

#5 Updated by gstrauss about 1 year ago

Submitted https://github.com/lighttpd/lighttpd1.4/pull/30 with better fix.

Thx, anomie, for both identifying and explaining the problem:

When the child process exits, there is a short delay between when its stdout filehandle is closed and when it becomes waitable with waitpid(). If mod_cgi happens to receive the FDEVENT_HUP event due to the client's stdout closing and call waitpid from cgi_connection_close before the child process becomes waitable, instead of interpreting "no output" as an error lighttpd will returns it as a 200 response with an empty body.

#6 Updated by gstrauss about 1 year ago

Submitted https://github.com/lighttpd/lighttpd1.4/pull/31 with a larger set of changes to consolidate CGI cleanup code for more consistent behavior. This includes the above patch.

#7 Updated by stbuehler about 1 year ago

  • Status changed from New to Fixed
  • % Done changed from 0 to 100

Applied in changeset r3089.

#8 Updated by stbuehler about 1 year ago

  • Target version changed from 1.4.x to 1.4.40

#9 Updated by gstrauss about 1 year ago

Not a likely scenario, but should fix the below patch to signal child with SIGTERM if waitpid() error and errno is not ECHILD. (i.e. only set pid = 0 if errno == ECHILD)

            if (errno != ECHILD) {
                log_error_write(srv, __FILE__, __LINE__, "ss", "waitpid failed: ", strerror(errno));
            }
            /* anyway: don't wait for it anymore */
            pid = 0;
            break;

Please take another look at https://github.com/lighttpd/lighttpd1.4/pull/30 compared to what was committed in r3089.

#10 Updated by stbuehler about 1 year ago

It is completely unclear what such error would mean; I'd rather not touch anything in this case. Could call abort()...

It is very unlikely that future waitpid() calls would succeed for the pid, which is why I removed it from the cgi_pid_add handling.

Also available in: Atom