Project

General

Profile

Actions

Feature #3078

closed

POST request DATA part for non-existing URI closes HTTP/2 connection prematurely

Added by oldium about 3 years ago. Updated almost 3 years ago.

Status:
Fixed
Priority:
Normal
Category:
core
Target version:
ASK QUESTIONS IN Forums:
No

Description

lighttpd 1.4.59

When POST request on HTTP/2 connection to non-existent URI is made, the 404 is correctly returned, but the connection is closed with GOAWAY PROTOCOL_ERROR. This causes all other already sent requests to be reported as failed in Microsoft Edge browser (version 91.0.852.0).

This is 100% reproducible, I am using lighttpd server to serve LuCi management web on my OpenWrt router. I tried to debug the issue directly on OpenWrt (gdbserver on router + gdb on my PC) and the cause for the GOAWAY PROTOCOL_ERROR (H2_E_PROTOCOL_ERROR) is that the DATA part of the POST request cannot find the request_st. Going further to find the root cause I found that the HEADER is handled (guessed), the 404 response is sent to the wire (seen by WireShark), the request_st is freed (seen in gdb) and the DATA part of the POST request is handled afterwards without having a valid request_st (seen in gdb).

It looks like the DATA part of the POST request is not discarded before the request structure request_st is freed. It would be good to fully discard the DATA part before even sending the response - to be able to keep the number of active requests below the limit.


Files

clipboard-202104211451-gdbu6.png (105 KB) clipboard-202104211451-gdbu6.png DevTools panel oldium, 2021-04-21 12:51
test.zip (87.8 KB) test.zip Test case oldium, 2021-04-22 09:11
clipboard-202104221920-kvgoz.png (62.4 KB) clipboard-202104221920-kvgoz.png WireShark view of the issue oldium, 2021-04-22 17:20
Actions #1

Updated by gstrauss about 3 years ago

Thanks for digging into this.

It looks like the DATA part of the POST request is not discarded before the request structure request_st is freed. It would be good to fully discard the DATA part before even sending the response - to be able to keep the number of active requests below the limit.

Unfortunately, your conclusion is incorrect. lighttpd intentionally sends the response as soon as it knows the response. This is done for numerous reasons, including keeping the number of active requests low.

lighttpd sends a 404 for the non-existent URI and closes the HTTP/2 request stream ID for that stream, subsequently releasing the request_st that was used for that HTTP/2 request stream. If a client sent Expect: 100-continue with the request, then a quick response tells the client not to send the DATA. On the other hand, your suggestion to wait to receive all the data is often the worst possible behavior and user experience.

If the client has already sent (on the wire) additional HTTP/2 frames for that request stream ID -- i.e. before it receives the 404 and end stream flag -- then lighttpd will not find an associated active stream ID when lighttpd receives those DATA frames. It is lighttpd's prerogative whether or not to receive and discard DATA frames for invalid streams, or whether to close the connection. Perhaps lighttpd could handle this a little bit better without expending too much in the way of additional resources to do so, sending STREAM_CLOSED instead of PROTOCOL_ERROR. Keeping track of streams in the "closed" state is not something lighttpd currently expends resources to do.

Actions #2

Updated by gstrauss about 3 years ago

  • Status changed from New to Patch Pending
  • Target version changed from 1.4.x to 1.4.60

You can try this patch (as yet untested)

--- a/src/h2.c
+++ b/src/h2.c
@@ -822,13 +822,24 @@ h2_recv_data (connection * const con, const uint8_t * const s, const uint32_t le
     if (NULL == r) {
         /* XXX: TODO: might need to keep a list of recently retired streams
          * for a few seconds so that if we send RST_STREAM, then we ignore
          * further DATA and do not send connection error, though recv windows
          * still must be updated. */
-        if (h2c->h2_cid < id || (!h2c->sent_goaway && 0 != alen))
-            h2_send_goaway_e(con, H2_E_PROTOCOL_ERROR);
+        /* simplistic heuristic to discard additional DATA from recently-closed
+         * streams, where recently-closed here is the last 8 sequential client
+         * stream ids (h2c->r[8], though in reality not necessarily sequential)
+         * (e.g. clients might fire off POST request followed by DATA,
+         *  and a response might be sent before processing DATA frames)
+         * (id <= h2c->h2_cid) already checked above, else H2_E_PROTOCOL_ERROR*/
         chunkqueue_mark_written(cq, 9+len);
+        if (id + 16 > h2c->h2_cid && (id & 1)) {
+            h2_send_window_update(con, 0, len); /*(h2r->h2_rwin)*/
+            h2_send_rst_stream_id(id, con, H2_E_STREAM_CLOSED);
+            return 1;
+        }
+        else if (!h2c->sent_goaway && 0 != alen)
+            h2_send_goaway_e(con, H2_E_PROTOCOL_ERROR);
         return 0;
     }

     if (r->h2state == H2_STATE_CLOSED
         || r->h2state == H2_STATE_HALF_CLOSED_REMOTE) {

Actions #3

Updated by oldium about 3 years ago

Hi, thanks, this works. Nevertheless, I see this as a quick-fix (workaround), because requiring heuristics to actually discard the rest of request should not be necessary and is potentially buggy (i.e. answering the question of “how many stream ids need to be handled as unfinished POST DATA requests to be safe” should not be necessary at all).

Actions #4

Updated by gstrauss about 3 years ago

Sorry to disappoint, but you're bound to be disappointed, especially given the poor suggestion you made in your original post.

Please see RFC 7540 HTTP/2

Section 6.1

If a DATA frame is received whose stream is not in "open" or "half-closed (local)" state, the recipient MUST respond with a stream error (Section 5.4.2) of type STREAM_CLOSED.

Section 5.1

An endpoint can receive any type of frame in this state [Ed: half-closed (local)]. Providing flow-control credit using WINDOW_UPDATE frames is necessary to continue receiving flow-controlled frames. In this state, a receiver can ignore WINDOW_UPDATE frames, which might arrive for a short period after a frame bearing the END_STREAM flag is sent.

...

In the absence of more specific guidance elsewhere in this document, implementations SHOULD treat the receipt of a frame that is not expressly permitted in the description of a state as a connection error (Section 5.4.1) of type PROTOCOL_ERROR.

Section 5.4.1

An endpoint can end a connection at any time. In particular, an endpoint MAY choose to treat a stream error as a connection error.

.

You wrote:

Nevertheless, I see this as a quick-fix (workaround), because requiring heuristics to actually discard the rest of request should not be necessary and is potentially buggy (i.e. answering the question of “how many stream ids need to be handled as unfinished POST DATA requests to be safe” should not be necessary at all).

Please support your argument with sections from the RFC or take your 2-cents to a wishing well. The incorrect or inaccurate assumptions that you have made -- every single time you have expressed an opinion or conclusion here -- or your opinions that you presume apply to everybody (they do not) -- suggest that you have not taken the time to understand the HTTP/2 specification, specifically the line from Section 5.4.1 I have quoted above. No specification requires an HTTP server to read the entirety of a POST request before sending a response, as that can lead to unwanted resource usage, bandwidth consumption, denial-of-service, and more. One example: on low-powered embedded systems, wire speed can be faster than TLS encryption/decryption.

Additionally, any robust application should have recovery mechanisms to handle network interruptions/outages/packet loss, or other potential failure scenarios, many which may occur outside the purview of lighttpd. Your statement "(i.e. answering the question of “how many stream ids need to be handled as unfinished POST DATA requests to be safe” should not be necessary at all)" suggests that you did not consider the more general best practices of writing robust applications in the face of many other possible failure scenarios.

Actions #5

Updated by oldium about 3 years ago

Thanks for pointers to RFC, but you are only partially correct :-)

I was wrong when I said that it would be good to wait for full request before sending the response, I had in mind maximum number of active streams - not to give client chance to start new request (new stream) when the response is fully delivered. But from the protocol point of view the stream is active until both ends send END_STREAM, so sending a response before the request is fully delivered is completely fine (and allowed, see below).

RFC 7540 (HTTP/2) in section 8.1 states:

An HTTP request/response exchange fully consumes a single stream.  A
request starts with the HEADERS frame that puts the stream into an
"open" state. The request ends with a frame bearing END_STREAM,
which causes the stream to become "half-closed (local)" for the
client and "half-closed (remote)" for the server. A response starts
with a HEADERS frame and ends with a frame bearing END_STREAM, which
places the stream in the "closed" state.
An HTTP response is complete after the server sends -- or the client
receives -- a frame with the END_STREAM flag set (including any
CONTINUATION frames needed to complete a header block). A server can
send a complete response prior to the client sending an entire
request if the response does not depend on any portion of the request
that has not been sent and received. When this is true, a server MAY
request that the client abort transmission of a request without error
by sending a RST_STREAM with an error code of NO_ERROR after sending
a complete response (i.e., a frame with the END_STREAM flag).
Clients MUST NOT discard responses as a result of receiving such a
RST_STREAM, though clients can always discard responses at their
discretion for other reasons.

Sending the response from the lighttpd point of view means sending END_STREAM, effectively putting the connection to the “half-closed (local)” state. In this situation the server still has an active stream (which counts to the maximum limit of active streams) and needs to accept incoming data until END_STREAM from client arrives - i.e. until the stream is closed. After the stream is closed from the client side, it should be safe to free the stream-representing structure, but not before.

Section 5.1 talks about state transitions more in more detail: https://tools.ietf.org/html/rfc7540#section-5.1

Actions #6

Updated by oldium about 3 years ago

Similar statements about sending response before the request is fully received is also in HTTP RFC 7230 in section 6.5, so sorry for my misleading suggestion in this area. Still the stream-representing structure can be freed as soon as the stream enters the “closed” state, not before.

Actions #7

Updated by gstrauss about 3 years ago

and needs to accept incoming data until END_STREAM from client arrives - i.e. until the stream is closed.

citation needed. You continue to make assumptions based on how you want things to work.

The server does not "need" to do anything you have assumed. Please re-read what I wrote, specifically the RFC quotes, which already refute your specious statement in multiple different ways. I also quoted from RFC 7540 Section 5.1 above. Pointing me to it does not bolster anything you have written. If you think I missed something, be more precise.

Older RFCs for HTTP/1.1 suggest that a server might try to continue to read and discard the request body after the server sends a response in order to try to avoid having a naive client -- which blindly sends entire body before reading response -- potentially getting TCP RST and losing the response. The RFCs suggest that there also might be a limit on how much the server reads before giving up. The server is not required to waste resources.

lighttpd has chosen not to expend resources tracking half-closed connections just for the sake of tracking half-closed connections. With HTTP/2, there is an expectation that both the client and the server are capable of reading and writing frames in an interleaved fashion. If the client does not have this capability, the client should probably prefer the serial request-response nature of HTTP/1.1. Given the HTTP/2 expectation to support bidirectional communication, and the best practice of avoiding buffer bloat, lighttpd could expect that the client received the END_STREAM after the time it takes for one TCP round-trip. lighttpd does not bother to attempt to figure this specific timing out, but this will be relevant shortly where a round-trip must have occurred. Instead, my proposed patch (which might change further) checks for "recent" stream id, and lighttpd currently tracks only 8 active HTTP/2 streams per connection. If the POST HEADERS + DATA frames are the most recent stream, my proposed patch handles it. If the DATA frames were not immediately sent, and other requests were sent after it, and if those other requests (>= 7) completed, and new requests were sent, then multiple full round-trips between lighttpd and the client have occurred, and the client should have received the STREAM_END from lighttpd for that POST request before DATA was sent. This is expected to generally hold true, hence the heuristic. It might not be true if lighttpd was able to respond to the newer requests more quickly than it determined that it was going to send an error response for the POST request before reading all the DATA. It also might not be true if the client pipelined too many streams to lighttpd, exceeding the SETTINGS_MAX_CONCURRENT_STREAMS. The proposed patch is nearly free, adding a few conditions on the error path, and does not require any additional resources for lighttpd to track. I might increase the range of acceptable stream ids considered "recent", or I might eliminate that check, but I am leaning towards having some limit to the range. I may or may not keep the sending of STREAM_CLOSED.

Actions #8

Updated by gstrauss about 3 years ago

I posted before above before seeing your latest response (which precedes it)

Still the stream-representing structure can be freed as soon as the stream enters the “closed” state, not before.

Says who? lighttpd, a portmanteau of "light" and "httpd", has chosen to be "light" and frees the structure when it chooses to do so.

Actions #9

Updated by oldium about 3 years ago

gstrauss wrote in #note-4:

Sorry to disappoint, but you're bound to be disappointed, especially given the poor suggestion you made in your original post.

I am not disappointed, I just found a bug, I tried to analyse it with WireShark and command-line gdb directly on the router, so I am just trying to help your project - lighttpd. I spent several hours trying to find out what exactly is wrong to be able to fix it, but it seemed that proper solution would need to take more time or design decisions, so I just reported it.

Additionally, any robust application should have recovery mechanisms to handle network interruptions/outages/packet loss, or other potential failure scenarios, many which may occur outside the purview of lighttpd. Your statement "(i.e. answering the question of “how many stream ids need to be handled as unfinished POST DATA requests to be safe” should not be necessary at all)" suggests that you did not consider the more general best practices of writing robust applications in the face of many other possible failure scenarios.

I considered best practices, this is why I suggested that deleting the stream-representing structure before the request (stream) is fully handled (i.e. closed) is wrong. This is also a reason why I said that the patch is more a quick-fix or work-around, because the implementation does not reflect the reality - the implemented state (i.e. all structures freed) does not reflect the protocol state (i.e. half-closed and expecting data).

Actions #10

Updated by oldium about 3 years ago

gstrauss wrote in #note-8:

I posted before above before seeing your latest response (which precedes it)

Still the stream-representing structure can be freed as soon as the stream enters the “closed” state, not before.

Says who? lighttpd, a portmanteau of "light" and "httpd", has chosen to be "light" and frees the structure when it chooses to do so.

From the protocol point of view you are in the half-closed state. If you think it is safe to free all stream resources for half-closed streams and then handle expected-unexpected data afterwards, I am fine with this. From my experience I can just tell that such assumptions can break at any time (yes, I had to implement several heuristics when the information was not available, but you are knowingly deleting the information about the stream state).

Actions #11

Updated by gstrauss about 3 years ago

From my experience I can just tell that this assumption can break at any time.

So can the network TCP connection. Also, the word "assumption" does not mean the same thing as "heuristic"

I posted in #note-7 why the proposed patch is likely sufficiently lenient for common use cases.

I have not said that the patch is 100% perfect, but I think it improves the current lighttpd behavior. The current lighttpd behavior is permitted by the specification:
Section 5.4.1

An endpoint can end a connection at any time.

Actions #12

Updated by gstrauss about 3 years ago

I appreciate that you have spent time looking into this, which is why in my initial two posts, I put together a (low-resource) patch to improve the behavior. Even though the current behavior is not a clear violation of the spec, the behavior can be improved.

What I have been challenging since have been some of your opinions couched as statements.

Actions #13

Updated by oldium about 3 years ago

gstrauss wrote in #note-11:

From my experience I can just tell that this assumption can break at any time.

So can the network TCP connection. Also, the word "assumption" does not mean the same thing as "heuristic"

I posted in #note-7 why the proposed patch is likely sufficiently lenient for common use cases.

I have not said that the patch is 100% perfect, but I think it improves the current lighttpd behavior. The current lighttpd behavior is permitted by the specification:
Section 5.4.1

An endpoint can end a connection at any time.

gstrauss wrote in #note-12:

I appreciate that you have spent time looking into this, which is why in my initial two posts, I put together a (low-resource) patch to improve the behavior. Even though the current behavior is not a clear violation of the spec, the behavior can be improved.

What I have been challenging since have been some of your opinions couched as statements.

I understand that limited-resources environment means that there are some assumptions/simplifications. I am just not convinced about the fixed number 16, which implements the half-closed stream handling. For me the most safe solution would be to keep track of streams in half-closed (local) state (limited number of stream ids - just keep the id) or simply drop packet + send H2_E_STREAM_CLOSED for all already seen streams for which the state does not exist (received stream id <= highest seen id), but that would be against RFC 7540.

The current patch postpones the issue by 8 another requests. If after 8 requests the DATA packet from the old half-closed stream arrives (delayed for any reason), the lighttpd server would close the whole connection. The issue is still there, it is just harder to reproduce it.

Actions #14

Updated by oldium about 3 years ago

gstrauss wrote in #note-12:

I appreciate that you have spent time looking into this, which is why in my initial two posts, I put together a (low-resource) patch to improve the behavior. Even though the current behavior is not a clear violation of the spec, the behavior can be improved.

[Edited]

Finally I had enough time to read carefully the whole conversation. Just list of facts, no opinions:

  1. When the request from client comes, the stream gets in “open” state.
  2. When lighttpd sends a response, there is an END_STREAM flag. From the specification point of view, this puts the server into “half-closed (local)” and client into “half-closed (remote)” states. See section 5.1, open state, https://httpwg.org/specs/rfc7540.html#rfc.section.5.1: An endpoint sending an END_STREAM flag causes the stream state to become "half-closed (local)"; an endpoint receiving an END_STREAM flag causes the stream state to become "half-closed (remote)"."
  3. DATA frame can be received by server in this “half-closed (local)” state. Specification permits this in section 5.1: “An endpoint can receive any type of frame in this state.”. Section 6.1 limits reception of DATA frames to “open” and “half-closed (local)” states: “If a DATA frame is received whose stream is not in "open" or "half-closed (local)" state, the recipient MUST respond with a stream error (Section 5.4.2) of type STREAM_CLOSED.”
  4. DATA frame can be sent by client in the “half-closed (remote)” state. Specification explicitly permits this in section 6.1: “DATA frames are subject to flow control and can only be sent when a stream is in the "open" or "half-closed (remote)" state.”

Maybe you mis-read the cited section 6.1: “If a DATA frame is received whose stream is not in "open" or "half-closed (local)" state, the recipient MUST respond with a stream error (Section 5.4.2) of type STREAM_CLOSED.” But we are in “half-closed (local)” state, so this statement is not relevant and we do not have a stream error. Hereby further ideas from the #note-4 based on the stream error are not relevant.

Actions #15

Updated by gstrauss about 3 years ago

Your narrative appears to be stuck considering only the ideal protocol flow. While you are correct that the state on the server is half-closed (local) the instant that the server sends END_STREAM, you seem to have considered the subsequent state transition only in one case, where client sends END_STREAM. This continues to be where you are falling short. Your prior comment has numerous incorrect statements, no doubt stemming from similarly faulty logic, and it would be a waste of time for me to point out each deficiency.

You continue to fail to consider that the server can transition the stream to the "closed" state or may close the connection, due to any number of other reasons, in addition to the one scenario where the server receives END_STREAM from client. I have repeated this multiple times here and in comments above, trying to use slightly different wording each time.

Section 5.1

half-closed (local):
...
A stream transitions from this state to "closed" when a frame that contains an END_STREAM flag is received or when either peer sends a RST_STREAM frame.

Either peer can choose to send RST_STREAM at any time, for any reason. Let me emphasize "at any time, for any reason".
lighttpd could send RST_STREAM immediately after sending the 404 response with END_STREAM. The stream state (on the server) would then be "closed".
lighttpd could send RST_STREAM for any reason, at any time. The stream state (on the server) would then be "closed".
lighttpd could send GOAWAY for any reason, at any time, instead of RST_STREAM. Take the trivial example of lighttpd being sent a signal to shut down. lighttpd could choose to send GOAWAY on all open connections. Depending on the signal, lighttpd might exit immediately without sending anything, or lighttpd might send GOAWAY and then exit, or lighttpd might send GOAWAY and then try to gracefully close existing streams for a period of time, or ...

===> Explain why your specific concern is unique and independent from a client gracefully handling network disconnects and other types of failures or scenarios, including trivial server shutdown.

Section 5.

Both endpoints have a subjective view of the state of a stream that could be different when frames are in transit. Endpoints do not coordinate the creation of streams; they are created unilaterally by either endpoint. The negative consequences of a mismatch in states are limited to the "closed" state after sending RST_STREAM, where frames might be received for some time after closing.

The spec mentions in numerous places that there might be a state mismatch between peers while frames are in transit. Your ideal protocol flow is only one of the possible scenarios. Importantly, it is not required to be the only scenario. All your statements about what "must" happen have been incorrect, or applicable to a more narrow scope, not "must" in all situations in all scenarios.

I think I have exhausted how many different ways I can say the same thing: your narrow interpretation of the spec is too narrow.

===> Explain why your specific concern is unique and independent from a client gracefully handling network disconnects and other types of failures or scenarios, including trivial server shutdown.

You have yet to provide a reference to the specification which indicates lighttpd is doing something against the requirements of the specification, which is different from recommendations of the specification. Server implementations are permitted to make tradeoffs, such as for limiting resource usage or for mitigating resource exhaustion.

Actions #16

Updated by oldium about 3 years ago

Ok, I tried to point out from the specification, that the DATA in the actual state (which is “half-closed (local)” on the server and “open” or ”half-closed (remote)” on the client depending on the reception of END_STREAM from the response) can be sent by the client and is not a protocol error. Server has two options, it can either accept all data or send RST_STREAM and get all consequences of out-of-sync state issues, which are also described in the specification 6.4 “However, after sending the RST_STREAM, the sending endpoint MUST be prepared to receive and process additional frames sent on the stream that might have been sent by the peer prior to the arrival of the RST_STREAM.”.

You succeeded in showing lot of independent error cases around, but I cannot find that 1) the state is not what I describe and that 2) the DATA frames are not allowed in this state. I am describing one regular use case, it is not a protocol error.

The lighttpd server does not even use the “half-closed (local)” state. This bug report is a proof that it does not implement the protocol correctly, especially the part of the “half-closed (local)” state.

Anyway, to put the end of this discussion, I am fine with the hot-fix, it fixes my current issue.

Actions #17

Updated by gstrauss about 3 years ago

  • Tracker changed from Bug to Feature

You succeeded in showing lot of independent error cases around, but I cannot find that 1) the state is not what I describe and that 2) the DATA frames are not allowed in this state. I am describing one regular use case, it is not a protocol error.

DATA frames are allowed from the client when the client believes the state is "open", or even "half-closed (remote)".

Independently, GOAWAY is allowed from the server. The server is not required to accept the DATA frames, and can legitimately drop the connection with a TCP RST if the server deems that necessary. That would not be the most polite behavior, but is allowed. I refer you back to prior HTTP specifications which also do not require the server to receive unnecessary data after a response has been sent. (See your #note-6) Suggestions are made, but the requirement for the server to accept data after a response is not a requirement.

I previously highlighted:

===> Explain why your specific concern is unique and independent from a client gracefully handling network disconnects and other types of failures or scenarios, including trivial server shutdown.

That you completely ignored it -- as well as my prior attempts to get you to expand your lens -- demonstrates that you continue to fail to see the forest from the trees.

I'll close with this: The patch I proposed may or may not ultimately be included in lighttpd. There are reasons why the current lighttpd code was written the way it was written -- it would have been just as easy to simply discard all DATA frames received for closed streams.

Actions #18

Updated by oldium about 3 years ago

gstrauss wrote in #note-17:

You succeeded in showing lot of independent error cases around, but I cannot find that 1) the state is not what I describe and that 2) the DATA frames are not allowed in this state. I am describing one regular use case, it is not a protocol error.

DATA frames are allowed from the client when the client believes the state is "open", or even "half-closed (remote)".

Independently, GOAWAY is allowed from the server. The server is not required to accept the DATA frames, and can legitimately drop the connection with a TCP RST if the server deems that necessary. That would not be the most polite behavior, but is allowed. I refer you back to prior HTTP specifications which also do not require the server to receive unnecessary data after a response has been sent. (See your #note-6) Suggestions are made, but the requirement for the server to accept data after a response is not a requirement.

Yes, you can send GOAWAY from server, but unlike in HTTP/1.1 you will be closing all active requests, as can be seen in the browser.

I previously highlighted:

===> Explain why your specific concern is unique and independent from a client gracefully handling network disconnects and other types of failures or scenarios, including trivial server shutdown.

That you completely ignored it -- as well as my prior attempts to get you to expand your lens -- demonstrates that you continue to fail to see the forest from the trees.

I'll close with this: The patch I proposed may or may not ultimately be included in lighttpd. There are reasons why the current lighttpd code was written the way it was written -- it would have been just as easy (and fewer lines of code) to simply discard all DATA frames received for closed streams.

This worked in past and does not work with the switch to HTTP/2, so this looks like a regression to me. Browser is doing everything right, it sends the HEADERS frame and immediately after it the DATA frame. This is a completely valid request. The lighttpd server then handles the HEADERS frame, sends a response and interprets the DATA frame as an error. Browser cannot do anything about that, the DATA frame is already in the server's TCP/IP buffer. The server is wrongly requesting the connection to be closed by sending a GOAWAY frame with stream id 0 (informing that nothing has been processed) together with TLS Alert/Close Notify to close the connection. I also tried to explain that seeing DATA in this state is completely valid, but I failed, obviously.

Anyway, I think that everything necessary has been said. I will keep the web page https://openwrt.org/docs/guide-user/luci/luci.on.lighttpd updated with the status of this bug. Have a nice day :-)

Actions #19

Updated by gstrauss about 3 years ago

The server is wrongly requesting the connection to be closed

No, not wrongly. The server can do this. The spec does not prohibit it. I don't care whether you like it or not; I do not believe that it is a violation of the spec. The lighttpd behavior could be friendlier and I am looking at modifying the behavior, but it is not a violation of the HTTP/2 spec. Again, see Section 5.4.1 An endpoint can end a connection at any time.

The server is wrongly requesting the connection to be closed by sending a GOAWAY frame with stream id 0 (informing that nothing has been processed)

GOAWAY is sent on stream id 0. stream id 0 is reserved for connection-level frames. Part of the GOAWAY frame includes the last-stream-id (for the last stream id successfully processed (see spec for details)), so your statement "(informing that nothing has been processed)" is misinformed. (You seem to make such statements with your 2-cents in almost every one of your posts, which reduces your credibility considerably.)

If lighttpd is sending GOAWAY and the stream id of the POST request (or later) is not set to be the last-stream-id, then that would be a bug, but I don't think that is happening. Is that what you are seeing?

.

I do not immediately see a problem loading LuCI on an openwrt-derivative which runs lighttpd 1.4.59, but I also do not believe I am POSTing to an invalid URI. (I did not verify other than logging in successfully.)

Why is LuCI sending a POST to an invalid URI in your configuration?

It seems that the reason you are bumping into this is that the POST to an invalid URI results in lighttpd closing the HTTP/2 connection, including any not-yet-handled GET requests on different streams, and the browser does not retry those GET requests. (I am guessing, since I have not reproduced this, and I don't use Windows or Edge if I can avoid them). If those GET requests are for static resources, lighttpd likely sent responses to those GETs before the response to the POST request. If those GET requests followed the POST DATA frames, then last-stream-id in the GOAWAY indicated that the POST request was the last stream handled, and a well-behaved HTTP/2 client would create a new connection to re-issue those unhandled streams.

Browser is doing everything right,

It does not sound like that to me. It does not sound like the browser is handling the GOAWAY and last-stream-id well.

I also tried to explain that seeing DATA in this state is completely valid, but I failed, obviously.

You misunderstand. The client request of HEADERS and DATA frames is completely valid. The server's behavior is also valid, though not currently a friendly behavior if the entire request body is not consumed before sending a response. lighttpd behavior could be better here, as I have said multiple times. Separately, the client could also behave better, resending the idempotent GET requests that were implicitly cancelled by the GOAWAY, and resending requests for pipelined streams which were after the POST and not handled. The client behavior on a POST request could also be better, especially to what might be an invalid URI.

Why is LuCI sending a POST to an invalid URI in your configuration?

Actions #20

Updated by oldium about 3 years ago

Please excuse me, I will not argue anymore on what the server does, the browser is sending a completely valid request and lighttpd closes the connection. Regression, bug report created, story ends here for me. But I am happy to answer any questions related to LuCI.

gstrauss wrote in #note-19:

Why is LuCI sending a POST to an invalid URI in your configuration?

This happens for the first request. The LuCI tries to probe for the root URI to send requests for UBUS: https://github.com/openwrt/luci/blob/01a0ec3e6874bff1e17d9d2085d86d1ad03702f7/modules/luci-base/htdocs/luci-static/resources/luci.js#L2552-L2576. It first tries /ubus/ (taken from the configuration) and if that fails (which happens), it uses relative fallback URI admin/ubus. The URI is then saved to the local storage, so when you reload the page, the URI is taken from the local storage and you do not see the error again. The value in local storage is not stored forever, it happens from time to time that the storage is cleared by the browser, so you face the connection-reset issue again and again.

I see the issue for a a long time. Lighttpd is not standard in OpenWrt, but is easily configurable and beacuse I am using it, I updated OpenWrt wiki to be up-to-date to be able to use LuCI. It most probably started with version 1.4.56, which had HTTP/2 explicitly enabled in the configuration by OpenWrt team https://github.com/openwrt/packages/blame/master/net/lighttpd/files/lighttpd.conf#L16-L17.

This is visible from the Browser:

DevTools panel

Please note that the response headers for files ui.js and form.js were already received, so the server cancelled already partly responded requests. Browser did nothing wrong.

Actions #21

Updated by gstrauss about 3 years ago

Does similar behavior happen with Chrome? with Firefox? (Current versions of Edge are based in part on Chromium, so I expect similar behavior with Chrome)

The browser is probably erring on the side of caution and assuming that any GET request with a query string might not be idempotent, and therefore should not be retried.

As a workaround, LuCI client-side javascript could probe the ubus URI with a different method, or by using POST without a body.

Actions #22

Updated by oldium about 3 years ago

gstrauss wrote in #note-21:

Does similar behavior happen with Chrome? with Firefox? (Current versions of Edge are based in part on Chromium, so I expect similar behavior with Chrome)

The browser is probably erring on the side of caution and assuming that any GET request with a query string might not be idempotent, and therefore should not be retried.

[Edited]

  • Firefox 88.0: Aborted requests that are without reply (HEADERS reply not seen) are repeated. Requests with HEADERS reply are not shown as failed, but with status from header 200 OK (but with incomplete file content).
  • MS Edge 91.0.852.0: Aborted requests that are without reply (HEADERS reply not seen) are NOT repeated, but shown as failed. Requests with HEADERS reply are not repeated, but shown as failed.
  • MS Edge 89.0.774.77: Aborted requests that are without reply (HEADERS reply not seen) are NOT repeated, but shown as failed. Requests with HEADERS reply are not repeated, but shown as failed.
  • MS Edge 90.0.818.42: Aborted requests that are without reply (HEADERS reply not seen) are NOT repeated, but shown as failed. Requests with HEADERS reply are not repeated, but shown as failed.
  • Chrome 90.0.4430.85: Aborted requests that are without reply (HEADERS reply not seen) are repeated. Requests with HEADERS reply are not repeated, but shown as failed.

As a workaround, LuCI client-side javascript could probe the ubus URI with a different method, or by using POST without a body.

Yes, that might work. I think that the best proof that service is listening on some URI is to try a real request, so I see the current code as a good approach. Currently the simplest “workaround” is to disable HTTP/2 in the configuration.

Actions #23

Updated by gstrauss about 3 years ago

Your last two posts have been the most useful, demonstrating that client applications should be treated as if not having been written to robustly handle any failures, and that the browser is unable to recover various failure scenarios for the client application.

That said, I have not been able to reproduce the issue you are seeing when I attempted to write a simple HTML page to reproduce the issue (using GET requests with query strings and using javascript for the POST) I might need to simulate a delay in TCP packets between HEADERS and DATA.

Still, a lightly tested patch -- which takes an alternative approach to the original proposed patch above -- is at the tip of https://git.lighttpd.net/lighttpd/lighttpd1.4/src/branch/personal/gstrauss/master

Actions #24

Updated by oldium about 3 years ago

gstrauss wrote in #note-23:

That said, I have not been able to reproduce the issue you are seeing when I attempted to write a simple HTML page to reproduce the issue (using GET requests with query strings and using javascript for the POST) I might need to simulate a delay in TCP packets between HEADERS and DATA.

I attached a simple test case.

Still, a lightly tested patch -- which takes an alternative approach to the original proposed patch above -- is at the tip of https://git.lighttpd.net/lighttpd/lighttpd1.4/src/branch/personal/gstrauss/master

I will have a look at it later today.

Actions #25

Updated by gstrauss about 3 years ago

Thanks. Using your test.zip, I am still not able to reproduce this on a local server (on localhost), even accessed through the DHCP IP rather than 127.0.0.1. I am testing using Firefox on Fedora 33 Linux, connecting over TLS and verifying that HTTP/2 is used. I am using the Firefox feature to disable cache and simulate connecting over 3G. lighttpd 1.4.59 already consumes the DATA frames following HEADER frame if they are received in the same block read() from the socket. I may have to force sending POST HEADER and DATA frames in separate TCP packets, and with a delay between the packets, in order to trigger the issue with lighttpd 1.4.59. Or maybe I'll have to try on Windows. Do you by any chance have any third-party antivirus or corporate malware running on your machine which might be intercepting the traffic (acting as a transparent proxy)?

Actions #26

Updated by oldium about 3 years ago

This is how the HTTP/2 part looks like in WireShark. Each line sent from 10.84.1.20 (left column) is a separate TCP/IP packet, packets from 10.84.1.4 are grouped. I will try the patch from your repository. Anyway, it looks strange that packets from my PC are sent separately, maybe some TCP/IP config change.

WireShark view of the issue

Actions #27

Updated by oldium about 3 years ago

I tried the latest patch plus another patch "[multiple] prefer monotonic time for internal use" on top of latest release 1.4.59 and it works.

Actions #28

Updated by gstrauss about 3 years ago

Glad to hear it. I think that the patch in my dev branch is friendlier behavior to better handle poor client behavior, though there is a tradeoff in potential resource usage.

FYI: I still haven't reproduced the issue, even adding query strings to the fetches in your test.html, and increasing the size of the POST payload to > 256kb in an attempt to use multiple TCP packets and fill socket buffers.

Your client environment seems to be the bigger issue. Have you heard of anybody else having similar problems? You're the first to report this behavior since lighttpd 1.4.56 was released at the end of Nov 2020.

Actions #29

Updated by gstrauss about 3 years ago

I was able to reproduce the issue and to (lightly) verify the patch with using https://github.com/fstab/h2c

Actions #30

Updated by gstrauss about 3 years ago

  • Status changed from Patch Pending to Fixed
Actions #31

Updated by oldium about 3 years ago

gstrauss wrote in #note-28:

Glad to hear it. I think that the patch in my dev branch is friendlier behavior to better handle poor client behavior, though there is a tradeoff in potential resource usage.

Yes, looks better and is usable also for RST_STREAM.

Your client environment seems to be the bigger issue. Have you heard of anybody else having similar problems? You're the first to report this behavior since lighttpd 1.4.56 was released at the end of Nov 2020.

I will try it from different computer and from Linux box. I am not in touch with others using lighttpd, so I cannot say if there is anybody else influenced by the issue.

Actions #32

Updated by gstrauss about 3 years ago

@oldium
Please update https://openwrt.org/docs/guide-user/luci/luci.on.lighttpd to mention that some clients or intermediates might be inefficient, and that the initial load of LuCI might need to be refreshed in the browser before it will work.

Unless there is more evidence that this problem is widespread, I think that refreshing the page is sufficient rather than adding more words to read and more complexity and suggesting that HTTP/2 be disabled in lighttpd, all of which is undesirable.

Thanks.

Actions #33

Updated by oldium about 3 years ago

This is reproducible (real LuCI page loading test and also test.html) from all of my computers (different Windows 10 versions starting from 1909 to latest upcoming 21H1) - 3 DELL notebooks (different models from 10 years old to 3 years old) and 1 ThinkPad. Also on my mobile I am not able to login for the first try (i.e. the same issue). I tried it both from WiFi and from cable. So the problem is really tied to OpenWrt and lighttpd running on it. I do not know how many people are running lighttpd on OpenWrt to serve LuCI, this is really not a default environment.

Actions #34

Updated by oldium about 3 years ago

The F5 refresh of LuCI page is also necessary after login, afterwards it works fine. I will update the Wiki page to mention this.

Is it safe to use the latest commit (https://github.com/lighttpd/lighttpd1.4/commit/81d18a8e359685c169cfd30e6a1574b98aedbaeb) together with “[multiple] prefer monotonic time for internal use” (https://github.com/lighttpd/lighttpd1.4/commit/dbe3e2361b9fac89bfec6a0180975685446edc9b and https://github.com/lighttpd/lighttpd1.4/commit/c035eb77336bc71ad4951b4f853233bf49553ac7)? Maybe the best would be if I simply prepare Pull Request to add those three patches for lighttpd version 1.4.59 in OpenWrt.

Actions #35

Updated by gstrauss about 3 years ago

While the monotonic patches might be useful for embedded devices, they can be made independent from this patch -- you can safely replace log_monotonic_secs with log_epoch_secs in this patch instead of requiring those additional (and sizable) patches.

I am planning on releasing lighttpd 1.4.60 later in Q2, so I do not think it urgent to submit this patch to OpenWRT. When I do release lighttpd 1.4.60, I will submit the hash updates to OpenWRT.

Actions #36

Updated by oldium about 3 years ago

I checked OpenWrt packages, latest stable is lighttpd 1.4.54, so it might be that the issue has not been observed yet. And for the other use cases I think that accessing invalid URI by a POST request is not that usual, so this might explain why nobody else found this before. I will prepare and send a Pull Request to OpenWrt Packages github with log_epoch_secs as you suggested for version 1.4.59, so that it could potentially help somebody else (and help me by not maintaining a separate patch). There is also one more change I would like to do - remove http/2 feature flags from lighttpd.conf (server.h2proto and server.h2c), because those are enabled by default now.

I see that you are author of the OpenWrt package - should I do it (through a PR) or do you want to do it?

Actions #37

Updated by gstrauss about 3 years ago

There is also one more change I would like to do - remove http/2 feature flags from lighttpd.conf (server.h2proto and server.h2c), because those are enabled by default now.

My thought is that the less config changes there are, the better, as there are already enough people who do not keep their systems updated, and even among those who do update, there are many who do not merge configs from newer *-opkg configs.

It looks like OpenWrt 21.02.0-rc1 was tagged in git a few days ago. I am not sure if submitting a patch now will make it in, but I guess it is worth a try. There are a couple other patches to lighttpd 1.4.59 which should be included, too. https://salsa.debian.org/debian/lighttpd/-/tree/master/debian/patches

Please ping here if I don't submit patches before the end of the weekend.

Actions #38

Updated by oldium about 3 years ago

gstrauss wrote in #note-37:

There is also one more change I would like to do - remove http/2 feature flags from lighttpd.conf (server.h2proto and server.h2c), because those are enabled by default now.

My thought is that the less config changes there are, the better, as there are already enough people who do not keep their systems updated, and even among those who do update, there are many who do not merge configs from newer *-opkg configs.

This is why there is /etc/lighttpd/conf.d - to keep user configuration separate and safe from possible conflicts during updates. I tried to disable HTTP/2 in my /etc/lighttpd/conf.d/50-disable-http2.conf, but as long as HTTP/2 was enabled in /etc/lighttpd.conf, this was not possible (+= fails, -= does not exist and I have not found a description on how to override a single value of an array). So it would be good if this default value is removed from the config, because it looks like it cannot be overridden.

It looks like OpenWrt 21.02.0-rc1 was tagged in git a few days ago. I am not sure if submitting a patch now will make it in, but I guess it is worth a try. There are a couple other patches to lighttpd 1.4.59 which should be included, too. https://salsa.debian.org/debian/lighttpd/-/tree/master/debian/patches

Please ping here if I don't submit patches before the end of the weekend.

Perfect, thanks.

Actions #40

Updated by oldium about 3 years ago

gstrauss wrote in #note-39:

https://github.com/openwrt/packages/pull/15505

Great, thanks. I see it merged, so I will test it and update the Wiki page afterwards.

Actions #41

Updated by oldium about 3 years ago

Works, workaround is not necessary anymore. I updated the Wiki.

Actions #42

Updated by gstrauss about 3 years ago

https://openwrt.org/docs/guide-user/luci/luci.on.lighttpd

The problem was fixed in version 1.4.59-1, so updating is recommended.

The patches are for openwrt lighttpd 1.4.59-2

Actions #43

Updated by oldium about 3 years ago

gstrauss wrote in #note-42:

https://openwrt.org/docs/guide-user/luci/luci.on.lighttpd

The problem was fixed in version 1.4.59-1, so updating is recommended.

The patches are for openwrt lighttpd 1.4.59-2

Thanks, typo fixed.

Thanks for fixing the issue.

Actions #44

Updated by gstrauss almost 3 years ago

Some additional context. I set up a Windows client and ran some tests.

When connecting via TLS, I could see lighttpd processing one TLS record at a time. This is expected with the default ssl.read-ahead = "disable". The reason that is the default is that on slower embedded systems, TLS encryption and decryption on the embedded device can be slower than wire speed, leading to lighttpd having openssl just read and decrypt more and more data before proceeding to process all of it. On the other hand, if you are on a fast enough system (e.g. not an embedded system), using ssl.read-ahead = "enable" can speed lighttpd up a bit.

Now then, the client could send a single payload containing both HEADERS and DATA frames. If sent to the TLS library in the same buffer, and the buffer is small enough, then both frames could end up in the same TLS record. However, simpleton clients may send individual frames to the openssl library, which is likely to end up with each frame in a separate TLS record. If TCP nagle algorithm is disabled, then each TLS record may end up in a separate TCP packet.

So there are multiple scenarios for which lighttpd might receive DATA separately from HEADERS. In the default case, lighttpd collects the request body before passing it to the backend, or, if lighttpd is configured to stream the request body, the backend often receives the request body before producing a response. However, if there is an error and lighttpd produces a response before the DATA is received, then we have the scenario that you reported.

I maintain that a client which sends a non-idempotent request (such as POST) to a URL which the client javascript knows might not exist, and which does not delay sending DATA (e.g. with Expect: 100-continue), and which has no recovery mechanism, is lacking in robustness. The specific scenario reported for lua testing a URI ignores and does not recover from network failures, which might result in similar failures on the client. An idempotent request to test the existence of a URI would be a more robust approach.

Actions #45

Updated by oldium almost 3 years ago

gstrauss wrote in #note-44:

Some additional context. I set up a Windows client and ran some tests.

When connecting via TLS, I could see lighttpd processing one TLS record at a time. This is expected with the default ssl.read-ahead = "disable". The reason that is the default is that on slower embedded systems, TLS encryption and decryption on the embedded device can be slower than wire speed, leading to lighttpd having openssl just read and decrypt more and more data before proceeding to process all of it. On the other hand, if you are on a fast enough system (e.g. not an embedded system), using ssl.read-ahead = "enable" can speed lighttpd up a bit.

Now then, the client could send a single payload containing both HEADERS and DATA frames. If sent to the TLS library in the same buffer, and the buffer is small enough, then both frames could end up in the same TLS record. However, simpleton clients may send individual frames to the openssl library, which is likely to end up with each frame in a separate TLS record. If TCP nagle algorithm is disabled, then each TLS record may end up in a separate TCP packet.

So there are multiple scenarios for which lighttpd might receive DATA separately from HEADERS. In the default case, lighttpd collects the request body before passing it to the backend, or, if lighttpd is configured to stream the request body, the backend often receives the request body before producing a response. However, if there is an error and lighttpd produces a response before the DATA is received, then we have the scenario that you reported.

I maintain that a client which sends a non-idempotent request (such as POST) to a URL which the client javascript knows might not exist, and which does not delay sending DATA (e.g. with Expect: 100-continue), and which has no recovery mechanism, is lacking in robustness. The specific scenario reported for lua testing a URI ignores and does not recover from network failures, which might result in similar failures on the client. An idempotent request to test the existence of a URI would be a more robust approach.

Your points are completely valid generally, the client can be written more robustly to solve all kind of network issues. LuCI's use case is a client (almost) directly connected to the server (running on a router to which the client is connected to), so there are not many network issues to be solved. You want to configure router, to which you are connected. Network is not an issue (and never was in this bug report).

Actions

Also available in: Atom