Project

General

Profile

FastCGI premature socket close with response streaming and 204 status

Added by jefftharris about 2 months ago

OS: Linux 4.4
Version: lighttpd/1.4.59 (ssl)
Client: Python http.client module and libcurl

I am sometimes receiving a socket broken pipe error in my FastCGI server when a 204 response is sent with no body contents when server.stream-response-body is set to 1 or 2. Using strace on my server, it appears that the write of the FastCGI END_RESPONSE is returning a broken pipe error. To exacerbate the issue, I placed a 1s delay before the write of the END_RESPONSE.

In lighttpd, the attached strace fragment in lighttpd-brokenpipe-segment.txt shows the issue. The last read(9, ...) shows the 204 response from the FastCGI server. The socket is closed before the END_RESPONSE is read (e.g. the second read(9, ...) from the authorizer response). When the response succeeds, the timing works out that the END_RESPONSE data is present in the buffer on that last read, but the artificial 1s delay stops it.

In trying to search the code for a fix, I came across http_response_write_prepare which checks for a 204 status and sets r->resp_body_finished=1. If I comment-out that line, the END_RESPONSE is always read, but the user response is not correct as it seems to do a chunked response of zero bytes. Not a good fix, but I'm not sure how to ensure the back-end has completely finished the FastCGI protocol when the response looks like it is complete. I've attached lighttpd-ok-204response-segment.txt which has the strace output with the change that shows the wait for the END_RESPONSE.


Replies (11)

RE: FastCGI premature socket close with response streaming and 204 status - Added by gstrauss about 2 months ago

Are you using FastCGI "mode" => "authorizer"? If so, this might be related to #3106, fixed in lighttpd 1.4.60.

RE: FastCGI premature socket close with response streaming and 204 status - Added by jefftharris about 2 months ago

The URI uses FastCGI as both the authorizer and the HTTP responder. In the strace output, the first two reads of fd 9 are the authorizer response (succcessful).

I should have mentioned that I have applied the fix mentioned in that issue (as I had reported it). Looking back at the issue, the comment at https://redmine.lighttpd.net/boards/2/topics/9969?r=9988#message-9988 has an strace with the EPIPE error which would be what the FastCGI server is receiving now as well.

So, maybe there is a deeper cause for both issues.

RE: FastCGI premature socket close with response streaming and 204 status - Added by gstrauss about 2 months ago

I should have mentioned that I have applied the fix mentioned in that issue (as I had reported it).

Sorry for the crossed-streams. I saw Version: lighttpd/1.4.59 (ssl) in your original post and wanted to give a quick pointer to the recent fix in lighttpd 1.4.60

So, maybe there is a deeper cause for both issues.

I think they are separate concerns, but both related to streaming. Please see if the following "test" patch works for you. See discussion below; patch might not get committed.

--- a/src/mod_fastcgi.c
+++ b/src/mod_fastcgi.c
@@ -452,6 +452,10 @@ static handler_t fcgi_recv_parse(request_st * const r, struct http_response_opts
                                        r->conf.stream_response_body &=
                                          ~(FDEVENT_STREAM_RESPONSE|FDEVENT_STREAM_RESPONSE_BUFMIN);
                                }
+                               else if (r->http_status == 204) {  /* ??? 204, 205, 304, HEAD requests, ... ??? */
+                                       r->conf.stream_response_body &=
+                                         ~(FDEVENT_STREAM_RESPONSE|FDEVENT_STREAM_RESPONSE_BUFMIN);
+                               }
                        } else if (hctx->send_content_body) {
                                if (0 != mod_fastcgi_transfer_cqlen(r, hctx->rb, packet.len - packet.padding)) {
                                        /* error writing to tempfile;

Is lighttpd that much faster than your backend that lighttpd does a read() of the FastCGI response, processes the headers, produces a response (to client), and cleans up the request, including close() of the socket to the backend CGI, all before the backend does a send() or write() of FCGI_END_REQUEST? The close() by lighttpd would need to be done before the backend does a write() of the FastCGI packet FCGI_END_REQUEST since otherwise, the FastCGI packet FCGI_END_REQUEST from the backend would be successfully submitted into kernel socket buffers (where it would be discarded when lighttpd does close() on the socket) (Presumably, the kernel socket buffers for the FastCGI connection are not full from a 204 response on a FastCGI connection with a single request -- not multiplexing.)

What is your backend FastCGI app doing between sending the 204 response in FCGI_STDOUT packet, and then sending FCGI_END_REQUEST packet?

If this performance difference is true, then do you really want lighttpd to wait for the FCGI_END_REQUEST from the FastCGI backend (as the above patch does) before sending response headers to the client? (when response streaming is enabled)

Question: Is it unreasonable to ask that the backend to send FCGI_END_REQUEST immediately following the response, and to handle the EPIPE more gracefully after sending a response status that has no response body (204, 205, 304, or responding to a HEAD request)?

In the (distant) future, I have plans to add backend connection pools, in which case the FastCGI connection could be handled past the end of the client request, but at the moment, client request and backend connection are tied together. When the client request is finished (or aborted by the client for any reason), then the backend connection is cleaned up.

RE: FastCGI premature socket close with response streaming and 204 status - Added by jefftharris about 2 months ago

gstrauss wrote in RE: FastCGI premature socket close with response streamin...:

I think they are separate concerns, but both related to streaming. Please see if the following "test" patch works for you. See discussion below; patch might not get committed.
[...]

The patch seems to work. I don't see the PIPE error, and a strace shows a read of the END_REQUEST.

Is lighttpd that much faster than your backend that lighttpd does a read() of the FastCGI response, processes the headers, produces a response (to client), and cleans up the request, including close() of the socket to the backend CGI, all before the backend does a send() or write() of FCGI_END_REQUEST? The close() by lighttpd would need to be done before the backend does a write() of the FastCGI packet FCGI_END_REQUEST since otherwise, the FastCGI packet FCGI_END_REQUEST from the backend would be successfully submitted into kernel socket buffers (where it would be discarded when lighttpd does close() on the socket) (Presumably, the kernel socket buffers for the FastCGI connection are not full from a 204 response on a FastCGI connection with a single request -- not multiplexing.)

I'm not sure if the situation arose due to speed or simply OS scheduling. Our server is multi-threaded, and another thread is active during the send of the response.

What is your backend FastCGI app doing between sending the 204 response in FCGI_STDOUT packet, and then sending FCGI_END_REQUEST packet?

Normally, there are two different functions being called to write the response headers and body and to close the connection. The END_REQUEST is in the close.

If this performance difference is true, then do you really want lighttpd to wait for the FCGI_END_REQUEST from the FastCGI backend (as the above patch does) before sending response headers to the client? (when response streaming is enabled)

Isn't waiting for the END_REQUEST part of the FastCGI protocol? Is it difficult, then, for the response handling code to wait for the backend to finished with the request as well before shutting down the connection? In the streaming mode, shouldn't the backend be more in control?

Question: Is it unreasonable to ask that the backend to send FCGI_END_REQUEST immediately following the response, and to handle the EPIPE more gracefully after sending a response status that has no response body (204, 205, 304, or responding to a HEAD request)?

Trying to time OS calls and such on a stream socket seems difficult and prone to specific situations. E.g., with large enough headers, there's no guarantee that the lighttpd read would even be able to do a single read and guarantee that all the data was read.

RE: FastCGI premature socket close with response streaming and 204 status - Added by gstrauss about 2 months ago

Isn't waiting for the END_REQUEST part of the FastCGI protocol? Is it difficult, then, for the response handling code to wait for the backend to finished with the request as well before shutting down the connection? In the streaming mode, shouldn't the backend be more in control?

Yes, it is part of the protocol, though connection can be aborted for any number of reasons.

The FastCGI spec contains:

When a Web server is not multiplexing requests over a transport connection, the Web server can abort a request by closing the request's transport connection.

Now, you may argue that lighttpd should not "abort" the FastCGI request to the backend once lighttpd has a "completable" response to send to the client. I agree that lighttpd's behavior here could be more polite to the backend. At the same time, I think that the backend should be able to handle EPIPE more gracefully. Can you help me to understand the affect on your application of getting EPIPE when sending FCGI_END_REQUEST? How is this a problem for you? Does this add noise to your application error log? What sort of action would you take after getting EPIPE trying to send FCGI_END_REQUEST?

Yes, lighttpd is taking a shortcut, short-circuiting the response when response streaming is enabled and there is no response body due to the HTTP status code (204, 205, 304) or HEAD request. Please help me to understand why this is an important problem and its deleterious effects, besides perhaps extra logging in your already slow backend application.

(After your application does a sendmsg(), lighttpd does at least two system calls (read() and close()) before your application does a single system call (sendmsg()) to send FCGI_END_REQUEST.)

I previously wrote:

In the (distant) future, I have plans to add backend connection pools, in which case the FastCGI connection could be handled past the end of the client request, but at the moment, client request and backend connection are tied together. When the client request is finished (or aborted by the client for any reason), then the backend connection is cleaned up.

Changing this behavior with the patch above adds latency before lighttpd responds to the client, though this latency is currently elided only when lighttpd is configured with response streaming enabled.

RE: FastCGI premature socket close with response streaming and 204 status - Added by jefftharris about 2 months ago

I applied the patch and enabled the fix. Testing seems to be going well even with the artificial delay before the END_REQUEST. I'll also update our code to be nicer in the case of a PIPE error with the END_REQUEST send.

Thanks for your help.

RE: FastCGI premature socket close with response streaming and 204 status - Added by gstrauss about 2 months ago

BTW, I am still willing to be convinced that the mitigation should be enabled by default.

(I would prefer if that code did not have to know to take action-at-a-distance to thwart an optimization elsewhere.)

RE: FastCGI premature socket close with response streaming and 204 status - Added by jefftharris about 2 months ago

Some reasons could include: From the perspective of the FastCGI server, it's odd that there is a difference between say a 200 with a 0 length body and a 204 also with a 0 length body. And, generally, a PIPE error would indicate some sort of failure in lighttpd (e.g. a crash or something) which would have closed the connection early.

We are fortunate to have source code access to rebuild with the change enabled. A pre-built distribution wouldn't be able to enable the change.

Is the optimization something that is needed in the streaming response scenarios?

As for keeping the existing behavior, I am only able to comment on our usage. Other users could be broken if they are expecting the current behavior and perhaps not sending an END_RESPONSE with a 204.

RE: FastCGI premature socket close with response streaming and 204 status - Added by gstrauss about 2 months ago

Thanks for your reply.

Is the optimization something that is needed in the streaming response scenarios?

This is an optimization. As such, it is not "needed". I believe this optimization has been present since at least lighttpd 1.4.40 (Jul 2016), in which server.stream-response-body was introduced.

I do hear that you are surprised by the lighttpd FastCGI behavior for responses which do not have a response body according to the HTTP protocol specification. Beyond your surprise, I have not heard any specifics how the behavior negatively affects your application. While it theoretically could affect the application, and that might be an issue if you could not modify your application, this lighttpd optimization is only present when server.stream-response-body is configured with a non-zero value, which is not the default.

In the (distant) future, I have plans to add backend connection pools, in which case the FastCGI connection could be handled past the end of the client request, but at the moment, client request and backend connection are tied together. When the client request is finished (or aborted by the client for any reason), then the backend connection is cleaned up.

RE: FastCGI premature socket close with response streaming and 204 status - Added by jefftharris about 1 month ago

gstrauss wrote in RE: FastCGI premature socket close with response streamin...:

I do hear that you are surprised by the lighttpd FastCGI behavior for responses which do not have a response body according to the HTTP protocol specification. Beyond your surprise, I have not heard any specifics how the behavior negatively affects your application. While it theoretically could affect the application, and that might be an issue if you could not modify your application, this lighttpd optimization is only present when server.stream-response-body is configured with a non-zero value, which is not the default.

With an ignore of the EPIPE error on writing the end response message, there is no adverse behavior. The protocol status field in the EndRequest is not used, so the feedback to lighttpd isn't needed (though perhaps with connection pools). So, that change or the commented-out code in the above changelist resolves the issue for my situation.

Thanks for your help.

    (1-11/11)