Bug #2542: Race leads to incorrect output in some circumstances - Lighttpd - lighty labs

Actions

Copy link

Bug #2542

closed

Race leads to incorrect output in some circumstances

Added by anomie over 11 years ago. Updated about 9 years ago.

Status:

Fixed

Priority:

Normal

Category:

mod_cgi

Target version:

1.4.40

ASK QUESTIONS IN Forums:

Description

It appears that the intended behavior of the 1.4.x branch when the CGI executable's output does not contain valid CGI or NPH headers is to return a 200 response if there was any output to stdout, or a "500 internal server error" when there was no output to stdout at all. Due to a race condition, a 200 response with an empty body will sometimes be returned when nothing was printed to stdout.

When the child process exits, there is a short delay between when its stdout filehandle is closed and when it becomes waitable with waitpid(). If mod_cgi happens to receive the FDEVENT_HUP event due to the client's stdout closing and call waitpid from cgi_connection_close before the child process becomes waitable, instead of interpreting "no output" as an error lighttpd will returns it as a 200 response with an empty body.

The attached C program can be used to test this behavior. As written, it will return an empty 200 response in all cases. If the "sleep(1);" line is removed, it will usually return an empty 200 response but rarely (about 6% of the time in my local testing) will return a 500 response. If the "close(STDOUT_FILENO);" line is removed, it will usually return a 500 response but rarely (about 2% of the time in my local testing) will return an empty 200 response.

Tested in 1.4.33 and svn lighttpd-1.4.x branch r2925. The issue is not present in trunk r2907, as that appears to always returns a 502 error response if the CGI executable's output does not contain CGI or NPH headers.

Files

foo.c (94 Bytes) foo.c

Simple test program

anomie, 2013-12-26 16:16

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by stbuehler over 11 years ago

Target version set to 1.4.x

Actions

Copy link

Updated by stbuehler about 9 years ago

Related to Bug #756: transfer-encoding chunked broken (OSX) added

Actions

Copy link

Updated by stbuehler about 9 years ago

Related to Bug #757: internal server error gives 200 OK sometimes added

Actions

Copy link

Updated by gstrauss about 9 years ago

(For the record, I think mod_cgi should return 502 Bad Gateway if CGI response is missing a valid CGI header.)

That said, I looked through mod_cgi and seem to have found a solution to the reported issue.

Why does mod_cgi_handle_subrequest() call waitpid()? (It seems to be historical.) Why not wait for a read event on pipe from CGI, as a read() of 0-bytes indicates EOF, whether the CGI closed its stdout, or the CGI exited.

Modifying mod_cgi_handle_subrequest() to always return HANDLER_WAIT_FOR_EVENT has foo.c running and returning the expected 500 Internal Server Error, and doing so 100% of the time.

The quick patch I tested when running foo (from the attached foo.c) with weighttp against lighttpd:

diff --git a/src/mod_cgi.c b/src/mod_cgi.c
index bfbfa13..44dcd69 100644
--- a/src/mod_cgi.c
+++ b/src/mod_cgi.c
@@ -1382,6 +1382,8 @@ SUBREQUEST_FUNC(mod_cgi_handle_subrequest) {
        }

 #ifndef __WIN32
+       return HANDLER_WAIT_FOR_EVENT;
+#if 0
        switch(waitpid(hctx->pid, &status, WNOHANG)) {
        case 0:
                /* we only have for events here if we don't have the header yet,
@@ -1449,6 +1451,7 @@ SUBREQUEST_FUNC(mod_cgi_handle_subrequest) {
                con->plugin_ctx[p->id] = NULL;
                return HANDLER_FINISHED;
        }
+#endif
 #else
        return HANDLER_ERROR;
 #endif

Please review this and look back at old commit 2cc4f967 and commit 73e0bb27 to see if the above patch would handle those cases. It looks to me as if it would (and more from 73e0bb27 could be removed from mod_cgi_handle_subrequest())

Actions

Copy link

Updated by gstrauss about 9 years ago

Submitted https://github.com/lighttpd/lighttpd1.4/pull/30 with better fix.

Thx, anomie, for both identifying and explaining the problem:

When the child process exits, there is a short delay between when its stdout filehandle is closed and when it becomes waitable with waitpid(). If mod_cgi happens to receive the FDEVENT_HUP event due to the client's stdout closing and call waitpid from cgi_connection_close before the child process becomes waitable, instead of interpreting "no output" as an error lighttpd will returns it as a 200 response with an empty body.

Actions

Copy link

Updated by gstrauss about 9 years ago

Submitted https://github.com/lighttpd/lighttpd1.4/pull/31 with a larger set of changes to consolidate CGI cleanup code for more consistent behavior. This includes the above patch.

Actions

Copy link

Updated by stbuehler about 9 years ago

Status changed from New to Fixed
% Done changed from 0 to 100

Applied in changeset r3089.

Actions

Copy link

Updated by stbuehler about 9 years ago

Target version changed from 1.4.x to 1.4.40

Actions

Copy link

Updated by gstrauss about 9 years ago

Not a likely scenario, but should fix the below patch to signal child with SIGTERM if waitpid() error and errno is not ECHILD. (i.e. only set pid = 0 if errno == ECHILD)

            if (errno != ECHILD) {
                log_error_write(srv, __FILE__, __LINE__, "ss", "waitpid failed: ", strerror(errno));
            }
            /* anyway: don't wait for it anymore */
            pid = 0;
            break;

Please take another look at https://github.com/lighttpd/lighttpd1.4/pull/30 compared to what was committed in r3089.

Actions

Copy link

#10

Updated by stbuehler about 9 years ago

It is completely unclear what such error would mean; I'd rather not touch anything in this case. Could call abort()...

It is very unlikely that future waitpid() calls would succeed for the pid, which is why I removed it from the cgi_pid_add handling.

Actions

Copy link

Also available in: Atom

Project

General

Profile

Lighttpd

Custom queries

Bug #2542

Race leads to incorrect output in some circumstances

Updated by stbuehler over 11 years ago

Updated by stbuehler about 9 years ago

Updated by stbuehler about 9 years ago

Updated by gstrauss about 9 years ago

Updated by gstrauss about 9 years ago

Updated by gstrauss about 9 years ago

Updated by stbuehler about 9 years ago

Updated by stbuehler about 9 years ago

Updated by gstrauss about 9 years ago

Updated by stbuehler about 9 years ago

	Related to Bug #756: transfer-encoding chunked broken (OSX)	Fixed					Actions
	Related to Bug #757: internal server error gives 200 OK sometimes	Duplicate					Actions