Bug #2542
closedRace leads to incorrect output in some circumstances
Description
It appears that the intended behavior of the 1.4.x branch when the CGI executable's output does not contain valid CGI or NPH headers is to return a 200 response if there was any output to stdout, or a "500 internal server error" when there was no output to stdout at all. Due to a race condition, a 200 response with an empty body will sometimes be returned when nothing was printed to stdout.
When the child process exits, there is a short delay between when its stdout filehandle is closed and when it becomes waitable with waitpid(). If mod_cgi happens to receive the FDEVENT_HUP event due to the client's stdout closing and call waitpid from cgi_connection_close before the child process becomes waitable, instead of interpreting "no output" as an error lighttpd will returns it as a 200 response with an empty body.
The attached C program can be used to test this behavior. As written, it will return an empty 200 response in all cases. If the "sleep(1);" line is removed, it will usually return an empty 200 response but rarely (about 6% of the time in my local testing) will return a 500 response. If the "close(STDOUT_FILENO);" line is removed, it will usually return a 500 response but rarely (about 2% of the time in my local testing) will return an empty 200 response.
Tested in 1.4.33 and svn lighttpd-1.4.x branch r2925. The issue is not present in trunk r2907, as that appears to always returns a 502 error response if the CGI executable's output does not contain CGI or NPH headers.
Files
Updated by stbuehler almost 9 years ago
- Related to Bug #756: transfer-encoding chunked broken (OSX) added
Updated by stbuehler almost 9 years ago
- Related to Bug #757: internal server error gives 200 OK sometimes added
Updated by gstrauss over 8 years ago
(For the record, I think mod_cgi should return 502 Bad Gateway if CGI response is missing a valid CGI header.)
That said, I looked through mod_cgi and seem to have found a solution to the reported issue.
Why does mod_cgi_handle_subrequest() call waitpid()? (It seems to be historical.) Why not wait for a read event on pipe from CGI, as a read() of 0-bytes indicates EOF, whether the CGI closed its stdout, or the CGI exited.
Modifying mod_cgi_handle_subrequest() to always return HANDLER_WAIT_FOR_EVENT has foo.c running and returning the expected 500 Internal Server Error, and doing so 100% of the time.
The quick patch I tested when running foo (from the attached foo.c) with weighttp against lighttpd:
diff --git a/src/mod_cgi.c b/src/mod_cgi.c index bfbfa13..44dcd69 100644 --- a/src/mod_cgi.c +++ b/src/mod_cgi.c @@ -1382,6 +1382,8 @@ SUBREQUEST_FUNC(mod_cgi_handle_subrequest) { } #ifndef __WIN32 + return HANDLER_WAIT_FOR_EVENT; +#if 0 switch(waitpid(hctx->pid, &status, WNOHANG)) { case 0: /* we only have for events here if we don't have the header yet, @@ -1449,6 +1451,7 @@ SUBREQUEST_FUNC(mod_cgi_handle_subrequest) { con->plugin_ctx[p->id] = NULL; return HANDLER_FINISHED; } +#endif #else return HANDLER_ERROR; #endif
Please review this and look back at old commit 2cc4f967 and commit 73e0bb27 to see if the above patch would handle those cases. It looks to me as if it would (and more from 73e0bb27 could be removed from mod_cgi_handle_subrequest())
Updated by gstrauss over 8 years ago
Submitted https://github.com/lighttpd/lighttpd1.4/pull/30 with better fix.
Thx, anomie, for both identifying and explaining the problem:
When the child process exits, there is a short delay between when its stdout filehandle is closed and when it becomes waitable with waitpid(). If mod_cgi happens to receive the FDEVENT_HUP event due to the client's stdout closing and call waitpid from cgi_connection_close before the child process becomes waitable, instead of interpreting "no output" as an error lighttpd will returns it as a 200 response with an empty body.
Updated by gstrauss over 8 years ago
Submitted https://github.com/lighttpd/lighttpd1.4/pull/31 with a larger set of changes to consolidate CGI cleanup code for more consistent behavior. This includes the above patch.
Updated by stbuehler over 8 years ago
- Status changed from New to Fixed
- % Done changed from 0 to 100
Applied in changeset r3089.
Updated by stbuehler over 8 years ago
- Target version changed from 1.4.x to 1.4.40
Updated by gstrauss over 8 years ago
Not a likely scenario, but should fix the below patch to signal child with SIGTERM if waitpid() error and errno is not ECHILD. (i.e. only set pid = 0 if errno == ECHILD)
if (errno != ECHILD) { log_error_write(srv, __FILE__, __LINE__, "ss", "waitpid failed: ", strerror(errno)); } /* anyway: don't wait for it anymore */ pid = 0; break;
Please take another look at https://github.com/lighttpd/lighttpd1.4/pull/30 compared to what was committed in r3089.
Updated by stbuehler over 8 years ago
It is completely unclear what such error would mean; I'd rather not touch anything in this case. Could call abort()
...
It is very unlikely that future waitpid()
calls would succeed for the pid, which is why I removed it from the cgi_pid_add
handling.
Also available in: Atom