Project

General

Profile

[Solved] FastCGI authorizer hang with server.stream-response-body

Added by jefftharris over 2 years ago

OS: Linux 4.4
Version: lighttpd/1.4.59 (ssl)
Client: Python http.client module

We are using FastCGI to both authorize and handle requests. If I enable stream-response-body=2, lighttpd will occasionally hang seemingly receiving the authorizer response. Tracing in gw_recv_response shows the code is getting to the "return HANDLER_COMEBACK" in the HANDLER_FINISHED case with gw_mode == GW_AUTHORIZER. The more debugging I enable, e.g. fastcgi.debug=1, the more reliable the response is received.

In the attached patch, I have attempted a fix to temporarily not stream the response body during authorization. If the line is uncommented to set r->conf.stream_response_body=0, then the authorizer succeeds consistently.


Replies (23)

RE: FastCGI authorizer hang with server.stream-response-body - Added by gstrauss over 2 years ago

We are using FastCGI to both authorize and handle requests. If I enable stream-response-body=2, lighttpd will occasionally hang seemingly receiving the authorizer response.

What do you mean by "hang"? Does the request "hang" and not proceed? Does the server "hang" and not continue processing any requests? Does lighttpd CPU use change (is there an indication of spinning on the CPU?)

How long does it take for your backend to respond? (Is the backend always quick in authorizer mode, or can it take seconds?)

Even when streaming, lighttpd does not send any response to the client until the response headers are received in full from the backend (or the request is aborted due to timeout or other event). If the backend is an authorizer, then if the response is an authorization failure, lighttpd streams that response to the client. If the backend is an authorizer, then if the response is a 200 OK, then lighttpd gw_backend should return HANDLER_COMEBACK.

RE: FastCGI authorizer hang with server.stream-response-body - Added by jefftharris over 2 years ago

gstrauss wrote in RE: FastCGI authorizer hang with server.stream-response-body:

What do you mean by "hang"? Does the request "hang" and not proceed? Does the server "hang" and not continue processing any requests? Does lighttpd CPU use change (is there an indication of spinning on the CPU?)

The server appears to stop processing the request. The non-authorizer fastcgi handler is never called. The client times out after about 30s. There is no indication of spinning in the CPU.

I did reproduce the issue with both debug.log-state-handling and fastcgi.debug enabled. I've attached the success and failure logs. After the second call to gw_recv_response, the next state logs are different. In the success case, the state is handle-req, and in the failure case it is write.

How long does it take for your backend to respond? (Is the backend always quick in authorizer mode, or can it take seconds?)

The server responds quickly in both success and hanging cases.

RE: FastCGI authorizer hang with server.stream-response-body - Added by gstrauss over 2 years ago

Thanks for the additional info. I'll try some experiments later. It would be interesting to know if there is a difference between when it succeeds and fails if the caller of gw_recv_response is the fdevent handler or the subrequest handler.

RE: FastCGI authorizer hang with server.stream-response-body - Added by gstrauss over 2 years ago

In the success case, the state is handle-req, and in the failure case it is write.

The success trace looks to be occurring in the gw_backend fdevent handler, and which is where handling the authorizer response from the backend is typically expected to occur.

Looking in connection.c, the return value from http_response_handler must not be HANDLER_COMEBACK if the connection state changes to 'write'. Since the failure case shows the transition before "state at exit ...", this is occurring within the subrequest handler, and HANDLER_COMEBACK is not being returned. If HANDLER_COMEBACK is not being returned, then what else might be happening? Looking into this further...

RE: FastCGI authorizer hang with server.stream-response-body - Added by gstrauss over 2 years ago

Thank you for the lighttpd-fail.txt log. It suggests that the lighttpd request state changed from 'handle-req' to 'write' while still waiting for the authorizer to finish. That should not happen in the case where the authorizer returns 200 OK, but is expected to happen if the authorizer returns a different status, e.g 401 Unauthorized.

... What should not happen is for lighttpd to cease handling the request and timeout the request after 30 seconds (when one of your lighttpd.conf settings triggers: server.max-read-idle = 30 or server.max-write-idle = 30), which is what I am looking into ...
(BTW, if you set debug.log-request-handling = "enable", then you will see in the lighttpd error log "connection closed - ??? timeout:" when that occurs)

In the lighttpd-fail.txt log, if the request is in the 'write' state, and gw_recv_response returns HANDLER_COMEBACK, as you seem to indicate from the additional trace you added, then the lighttpd error log should have issued trace along the lines of "unexpected subrequest handler ret-value:..." Do you see that in the lighttpd error log?

20210923:213904:+0000: AIHTP999 lighttpd[733]: (connections.c.1414) state at enter 9 handle-req
20210923:213904:+0000: AIHTP999 lighttpd[733]: (connections.c.1060) state for fd:9 id:0 handle-req
20210923:213904:+0000: AIHTP999 lighttpd[733]: (connections.c.1060) state for fd:9 id:0 write
20210923:213904:+0000: AIHTP999 lighttpd[733]: (connections.c.1421) state at exit: 9 write
20210923:213904:+0000: AIHTP999 lighttpd[733]: (gw_backend.c.303) released proc: pid: 0 socket: unix:/var/umi-fastcgi-auth.sock load: 0
20210923:213904:+0000: AIHTP999 lighttpd[733]: (gw_backend.c.2222) gw_recv_response end
20210923:213904:+0000: AIHTP999 lighttpd[733]: (connections.c.1414) state at enter 9 write
20210923:213904:+0000: AIHTP999 lighttpd[733]: (connections.c.1060) state for fd:9 id:0 write
20210923:213904:+0000: AIHTP999 lighttpd[733]: (connections.c.1421) state at exit: 9 write

Some additional questions:

What does a response from your authorizer look like? How many lines are in the response and what does it look like? In the failure case, do you know if the authorizer sent a 200 OK or if it sent some other response status (e.g. 401 Unauthorized)?
Your added trace 20210923:213904:+0000: AIHTP999 lighttpd[733]: (gw_backend.c.2222) gw_recv_response end suggests that a 200 OK was sent.

Does the authorizer backend send the response in one packet or in multiple packets? (Is it efficient and send in one send() or write(), or is it inefficient, possibly making multiple syscalls to write a string as short as "Status: 200\r\n\r\n" ?)

You mentioned that the issue occasionally occurs with server.stream-response-body = 2. Just to confirm, does the issue occur with server.stream-response-body = 1 ?

Also, I don't believe it should matter, but are your requests using HTTP/1.1 or HTTP/2?

RE: FastCGI authorizer hang with server.stream-response-body - Added by gstrauss over 2 years ago

BTW, additional flags may be stored in r->conf.stream_response_body for the backend socket state and the debug patch you are testing may have discarded those flags, leading to the eventual request timeout.

For debugging, you might experiment with
r->conf.stream_response_body &= ~(FDEVENT_STREAM_REQUEST | FDEVENT_STREAM_REQUEST_BUFMIN); // to remove response streaming
and
r->conf.stream_response_body |= FDEVENT_STREAM_REQUEST | FDEVENT_STREAM_REQUEST_BUFMIN; // to re-add response streaming

RE: FastCGI authorizer hang with server.stream-response-body - Added by gstrauss over 2 years ago

Please try this patch (replacing your debug patch)

--- a/src/http-header-glue.c
+++ b/src/http-header-glue.c
@@ -1369,7 +1369,7 @@ handler_t http_response_parse_headers(request_st * const r, http_response_opts *

     if (opts->authorizer
         && (r->http_status == 0 || r->http_status == 200)) {
-        return HANDLER_GO_ON;
+        return HANDLER_FINISHED;
     }

     if (NULL == r->handler_module) {

RE: FastCGI authorizer hang with server.stream-response-body - Added by jefftharris over 2 years ago

gstrauss wrote in RE: FastCGI authorizer hang with server.stream-response-body:

... What should not happen is for lighttpd to cease handling the request and timeout the request after 30 seconds (when one of your lighttpd.conf settings triggers: server.max-read-idle = 30 or server.max-write-idle = 30), which is what I am looking into ...
(BTW, if you set debug.log-request-handling = "enable", then you will see in the lighttpd error log "connection closed - ??? timeout:" when that occurs)

I do not see that connection closed log with log-request-handling enabled.

In the lighttpd-fail.txt log, if the request is in the 'write' state, and gw_recv_response returns HANDLER_COMEBACK, as you seem to indicate from the additional trace you added, then the lighttpd error log should have issued trace along the lines of "unexpected subrequest handler ret-value:..." Do you see that in the lighttpd error log?

I do not see that unexpected handler log.

What does a response from your authorizer look like? How many lines are in the response and what does it look like? In the failure case, do you know if the authorizer sent a 200 OK or if it sent some other response status (e.g. 401 Unauthorized)?
Your added trace 20210923:213904:+0000: AIHTP999 lighttpd[733]: (gw_backend.c.2222) gw_recv_response end suggests that a 200 OK was sent.

Adding log-response-handler shows:

20210924:130318:+0000: AIHTP999 lighttpd[601]: (response.c.158) Response-Header:\nHTTP/1.1 200 OK\r\nTransfer-Encoding: chunked\r\nDate: Fri, 24 Sep 2021 13:03:18 GMT\r\n\r\n
20210924:130318:+0000: AIHTP999 lighttpd[601]: (gw_backend.c.2222) gw_recv_response end

Does the authorizer backend send the response in one packet or in multiple packets? (Is it efficient and send in one send() or write(), or is it inefficient, possibly making multiple syscalls to write a string as short as "Status: 200\r\n\r\n" ?)

The authorizer is using FastCGI. A strace of our server shows the response sent back to lighttpd. We return a variable that has the authenticated user name.

[pid   472] sendmsg(31, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\1\6\0\1\0\34\0\0", iov_len=8}, {iov_base="Variable-REMOTE_USER: ai\r\n", iov_len=26}, {iov_base="\r\n", iov_len=2}], msg_iovlen=3, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL

You mentioned that the issue occasionally occurs with server.stream-response-body = 2. Just to confirm, does the issue occur with server.stream-response-body = 1 ?

The issue occurs with 1. The same connection state logs are produced.

Also, I don't believe it should matter, but are your requests using HTTP/1.1 or HTTP/2?

HTTP/1.1. And, we have ssl.read-ahead set to disabled.

RE: FastCGI authorizer hang with server.stream-response-body - Added by jefftharris over 2 years ago

gstrauss wrote in RE: FastCGI authorizer hang with server.stream-response-body:

Please try this patch (replacing your debug patch)
[...]

The hang does not occur with that patch. However, our server is getting broken pipe errors consistently.

RE: FastCGI authorizer hang with server.stream-response-body - Added by gstrauss over 2 years ago

However, our server is getting broken pipe errors consistently.

Any more details? Do you mean your backend FastCGI authorizer?

It seems that my hunch was correct that your backend FastCGI authorizer is sending more than "Status: 200\r\n\r\n" to authenticated responses. For "authorizer" mode, you should not send back any response body.

RE: FastCGI authorizer hang with server.stream-response-body - Added by gstrauss over 2 years ago

https://fast-cgi.github.io/spec#63-authorizer

When the application gives a 200 response, the server ignores response headers whose names aren’t prefixed with Variable- prefix, and ignores any response content.

The one-line patch I provided "ignores any response content" by not bothering to reading that content from the backend. The previous behavior would read and discard the content, but was not expecting a response body when the "authorizer" succeeded, and so if the response body was not fully received at the same time as the response headers, you might get the behavior that you originally reported.

Are you sending back so much content in the "authorizer" response that lighttpd is sometimes fast enough to close the connection to the backend before your backend finishes sending the response?

RE: FastCGI authorizer hang with server.stream-response-body - Added by jefftharris over 2 years ago

gstrauss wrote in RE: FastCGI authorizer hang with server.stream-response-body:

However, our server is getting broken pipe errors consistently.

Any more details? Do you mean your backend FastCGI authorizer?

Without the patch, here's a trace of the response from our server for the authorize:

[pid   472] sendmsg(31, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\1\6\0\1\0\34\0\0", iov_len=8}, {iov_base="Variable-REMOTE_USER: ai\r\n", iov_len=26}, {iov_base="\r\n", iov_len=2}], msg_iovlen=3, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 36
[pid   472] sendmsg(31, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\1\3\0\1\0\10\0\0", iov_len=8}, {iov_base="\0\0\0\0\0\0\0\0", iov_len=8}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 16

With the patch:

[pid   472] sendmsg(31, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\1\6\0\1\0\34\0\0", iov_len=8}, {iov_base="Variable-REMOTE_USER: ai\r\n", iov_len=26}, {iov_base="\r\n", iov_len=2}], msg_iovlen=3, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 36
[pid   472] sendmsg(31, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\1\3\0\1\0\10\0\0", iov_len=8}, {iov_base="\0\0\0\0\0\0\0\0", iov_len=8}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = -1 EPIPE (Broken pipe)

It seems the patch is closing the socket immediately after the FastCGI FCGI_STDOUT packet is received. The second FCGI_END_REQUEST response packet is then getting the error.

It seems that my hunch was correct that your backend FastCGI authorizer is sending more than "Status: 200\r\n\r\n" to authenticated responses. For "authorizer" mode, you should not send back any response body.

It is valid for a FastCGI authorizer to send back variables which are then passed to the second FastCGI handler for the actual response. There is no other response data being sent back in these success cases.

RE: FastCGI authorizer hang with server.stream-response-body - Added by gstrauss over 2 years ago

I see that the one-line patch is not as friendly as it could be. I'll look into a different solution over the next few days, as mod_fastcgi.c might be modified to handle the return of http_response_parse_headers() differently.

RE: FastCGI authorizer hang with server.stream-response-body - Added by jefftharris over 2 years ago

gstrauss wrote in RE: FastCGI authorizer hang with server.stream-response-body:

I see that the one-line patch is not as friendly as it could be. I'll look into a different solution over the next few days, as mod_fastcgi.c might be modified to handle the return of http_response_parse_headers() differently.

For now, I'll continue using the changes to disable the stream feature during the authorizer. I'll take care to check for the appropriate FDEVENT_STREAM_RESPONSE_ flags and only manipulate the ones for the configured 1 and 2.

RE: FastCGI authorizer hang with server.stream-response-body - Added by gstrauss over 2 years ago

FYI: another solution is for your backend to make a single call to sendmsg() with a combined iovec for msg_iov in the msghdr. Of course, that is not as direct as it sounds if you are using a higher level interface which ends up calling sendmsg().

RE: FastCGI authorizer hang with server.stream-response-body - Added by gstrauss over 2 years ago

The request must be kept in the handle-req state, and must not proceed to the write state, to be able to restart the request with HANDLER_COMEBACK.

Your patch to disable response streaming during the "authorizer" is a bit more descriptive of why, but the following extends an existing flag to keep the request in the handle-req state while waiting for the "authorizer" response to complete. This replaces previous proposed patches. (A similar change is needed in mod_ajp13.c)

--- a/src/mod_fastcgi.c
+++ b/src/mod_fastcgi.c
@@ -410,12 +410,15 @@ mod_fastcgi_transfer_cqlen (request_st * const r, chunkqueue * const src, const
 static handler_t fcgi_recv_parse(request_st * const r, struct http_response_opts_t *opts, buffer *b, size_t n) {
        handler_ctx *hctx = (handler_ctx *)opts->pdata;
        int fin = 0;
+       if (opts->authorizer == 2) r->resp_body_started = 1;

        if (0 == n) {
                if (-1 == hctx->request_id) return HANDLER_FINISHED; /*(flag request ended)*/
                if (!(fdevent_fdnode_interest(hctx->fdn) & FDEVENT_IN)
-                   && !(r->conf.stream_response_body & FDEVENT_STREAM_RESPONSE_POLLRDHUP))
+                   && !(r->conf.stream_response_body & FDEVENT_STREAM_RESPONSE_POLLRDHUP)) {
+                       if (opts->authorizer == 2) r->resp_body_started = 0;
                        return HANDLER_GO_ON;
+               }
                log_error(r->conf.errh, __FILE__, __LINE__,
                  "unexpected end-of-file (perhaps the fastcgi process died):" 
                  "pid: %d socket: %s",
@@ -467,6 +470,7 @@ static handler_t fcgi_recv_parse(request_st * const r, struct http_response_opts
                                         (r->http_status == 0 || r->http_status == 200)) {
                                        /* authorizer approved request; ignore the content here */
                                        hctx->send_content_body = 0;
+                                       opts->authorizer = 2;     /*(flag 200 OK here)*/
                                }
                        } else if (hctx->send_content_body) {
                                if (0 != mod_fastcgi_transfer_cqlen(r, hctx->rb, packet.len - packet.padding)) {
@@ -500,7 +504,9 @@ static handler_t fcgi_recv_parse(request_st * const r, struct http_response_opts
                }
        }

-       return 0 == fin ? HANDLER_GO_ON : HANDLER_FINISHED;
+       if (fin) return HANDLER_FINISHED;
+       if (opts->authorizer == 2) r->resp_body_started = 0;
+       return HANDLER_GO_ON;
 }

 static handler_t fcgi_check_extension(request_st * const r, void *p_d, int uri_path_handler) {

RE: FastCGI authorizer hang with server.stream-response-body - Added by gstrauss over 2 years ago

FYI: I edited the patch above (FYI in case you pulled it from email)

RE: FastCGI authorizer hang with server.stream-response-body - Added by gstrauss over 2 years ago

Alternative patch

--- a/src/gw_backend.c
+++ b/src/gw_backend.c
@@ -2289,6 +2289,9 @@ static handler_t gw_recv_response(gw_handler_ctx * const hctx, request_st * cons
                 physpath = r->physical.path.ptr;
             }

+            /*(restore streaming flags removed during authorizer processing)*/
+            r->conf.stream_response_body |= (hctx->opts.authorizer >> 1);
+
             gw_backend_close(hctx, r);
             handler_ctx_clear(hctx);

--- a/src/mod_fastcgi.c
+++ b/src/mod_fastcgi.c
@@ -445,6 +445,12 @@ static handler_t fcgi_recv_parse(request_st * const r, struct http_response_opts
                                         (r->http_status == 0 || r->http_status == 200)) {
                                        /* authorizer approved request; ignore the content here */
                                        hctx->send_content_body = 0;
+                                       opts->authorizer |= /*(save response streaming flags)*/
+                                         (r->conf.stream_response_body
+                                          & (FDEVENT_STREAM_RESPONSE
+                                            |FDEVENT_STREAM_RESPONSE_BUFMIN)) << 1;
+                                       r->conf.stream_response_body &=
+                                         ~(FDEVENT_STREAM_RESPONSE|FDEVENT_STREAM_RESPONSE_BUFMIN);
                                }
                        } else if (hctx->send_content_body) {
                                if (0 != mod_fastcgi_transfer_cqlen(r, hctx->rb, packet.len - packet.padding)) {

RE: FastCGI authorizer hang with server.stream-response-body - Added by jefftharris over 2 years ago

gstrauss wrote in RE: FastCGI authorizer hang with server.stream-response-body:

Your patch to disable response streaming during the "authorizer" is a bit more descriptive of why, but the following extends an existing flag to keep the request in the handle-req state while waiting for the "authorizer" response to complete. This replaces previous proposed patches. (A similar change is needed in mod_ajp13.c)
[...]

This patch seems to work. I tried my request 100 times successfully with no logging to give the best change of exposing the issue.

RE: FastCGI authorizer hang with server.stream-response-body - Added by gstrauss over 2 years ago

Thanks for the update. The most recent patch above (https://redmine.lighttpd.net/boards/2/topics/9969?r=10002#message-10002) is what I have committed. Is that the patch you tested?

RE: FastCGI authorizer hang with server.stream-response-body - Added by jefftharris over 2 years ago

gstrauss wrote in RE: FastCGI authorizer hang with server.stream-response-body:

Alternative patch
[...]

This patch seems to work as well with the same testing.

RE: FastCGI authorizer hang with server.stream-response-body - Added by jefftharris over 2 years ago

gstrauss wrote in RE: FastCGI authorizer hang with server.stream-response-body:

Thanks for the update. The most recent patch above (https://redmine.lighttpd.net/boards/2/topics/9969?r=10002#message-10002) is what I have committed. Is that the patch you tested?

They both seem to work independently.

    (1-23/23)