Project

General

Profile

Actions

Bug #3148

closed

HTTP/2 Error

Added by stenvaag about 2 years ago. Updated about 2 years ago.

Status:
Invalid
Priority:
Normal
Category:
core
Target version:
-
ASK QUESTIONS IN Forums:
No

Description

I get HTTP/2 errors from Chrome that is not easy to reproduce. I´m sometimes able to reproduce the error when my application (Single Page Web app) creates a lot of PUTs. I´m using 1.4.64 with modifications from case #3102 and created a log file. Hopefully you can get some information from the log to track down the error.


Files

error.7z (225 KB) error.7z Error log stenvaag, 2022-03-14 19:30
error.7z (3.62 KB) error.7z Lighttpd error.log file stenvaag, 2022-03-15 14:35
chrome_lighttpd_errors.png (124 KB) chrome_lighttpd_errors.png Chrome errors stenvaag, 2022-03-15 14:39
lighttpd.strace.log.7z (1.92 MB) lighttpd.strace.log.7z Strace log stenvaag, 2022-03-19 20:07
chrome_errors_1.4.59_incl_one_line_patch.png (79.1 KB) chrome_errors_1.4.59_incl_one_line_patch.png Error messages in Chrome with lighttpd 1.4.59 incl one line patch stenvaag, 2022-03-20 11:07
error_log_1.4.64_RST_STREAM_error_code.7z (2.18 KB) error_log_1.4.64_RST_STREAM_error_code.7z stenvaag, 2022-03-20 11:48
Actions #1

Updated by gstrauss about 2 years ago

  • Status changed from New to Need Feedback

The log is 35101 lines. Would you care to give a hint as to where to start? I see occasional RST_STREAM before GOAWAY, but the log does not otherwise contain any further information that might help track down the error.

Actions #2

Updated by stenvaag about 2 years ago

I was afraid that cutting the log down I would remove essential information. As stated the error is not easy to reproduce and requires a lot of PUT's to trigger.

I can try to create a smaller logg with the problem.

Any more logging that I could add to the code?

Updated by stenvaag about 2 years ago

The attached error.log file is emptied just prior to the PUT's.

Image of the errors in Chrome is attached.

Actions #4

Updated by stenvaag about 2 years ago

When reloading my application I can see that the 3 first PUTs is successfull, then the next 3 failed. Then there is one successfull and then one more failure.

Actions #5

Updated by gstrauss about 2 years ago

The RST_STREAM you see may indicate that there was a problem that led to RST_STREAM. The RST_STREAM is not the source of the problem, but is a symptom.

Any more logging that I could add to the code?

Is there any information in the lighttpd error log? You can add debug.log-ssl-noise = "enable", though that might end up being noisy.

When reloading my application I can see that the 3 first PUTs is successfull, then the next 3 failed. Then there is one successfull and then one more failure.

You might strace -s 1024 -o /var/tmp/lighttpd.strace.log lighttpd -D -f /etc/lighttpd/lighttpd.conf ... and capture the sequence of requests that succeed and then fail. That might produce some large logs and might contain sensitive info, so please check before sharing.

Before doing that, you might test to see when the error started occurring, between lighttpd 1.4.60 (which you were using in #3102) and lighttpd 1.4.64.

Actions #6

Updated by gstrauss about 2 years ago

BTW, is your FastCGI backend is sending Transfer-Encoding: chunked? FastCGI frames the data, so Transfer-Encoding: chunked is not necessary from a lighttpd backend which is not an HTTP/1.1 proxy (accessed via lighttpd mod_proxy).

Actions #7

Updated by stenvaag about 2 years ago

No, I can not find any Transfer-Encoding setting in my FastCGI backend.

Actions #8

Updated by gstrauss about 2 years ago

Are you able to test different versions of lighttpd to find when the error started occurring between lighttpd 1.4.60 (which you were using in #3102) and lighttpd 1.4.64?

Actions #9

Updated by stenvaag about 2 years ago

I tested 1.4.60 and 1.4.64 now and both have the problem.

Actions #10

Updated by stenvaag about 2 years ago

I have now tested 1.4.61, 1.4.62 and 1.4.63 and the problem is in all versions. This was problably not needed as the problem was present in 1.4.60.

Actions #11

Updated by stenvaag about 2 years ago

Attached is the strace log.

First PUT is in line 39714.

Actions #12

Updated by stenvaag about 2 years ago

I´m not 100% sure, but it looks like the first 5 PUT requests was successfull, then one failing, and then 7 successfull.

Actions #13

Updated by gstrauss about 2 years ago

Does the issue occur with 1.4.59 plus the one-line patch in #3102?

Actions #14

Updated by gstrauss about 2 years ago

Looking again at the second error.7z you attached, I think that your client side app is having the same problem as in #3102 (and suffers the same criticism).

If you are going to log REFUSED_STREAM in h2.c, then please also log the error code. If the error code is REFUSED_STREAM, then lighttpd is behaving as designed when faced with too many initial streams which upload data and might deadlock on the connection if lighttpd does not refuse the excess.

For lighttpd 1.4.64:

--- a/src/h2.c
+++ b/src/h2.c
@@ -313,6 +313,7 @@ h2_send_rst_stream_id (uint32_t h2id, connection * const con, const request_h2er
     rst_stream.u[3] = htonl(e);
     chunkqueue_append_mem(con->write_queue,  /*(+3 to skip over align padding)*/
                           (const char *)rst_stream.c+3, sizeof(rst_stream)-3);
+log_error(NULL, __FILE__, __LINE__, "(%s) sending h2 RST_STREAM on stream id:%u, error code:%d", con->dst_addr_buf.ptr, h2id, e);
 }

@@ -443,6 +444,7 @@ h2_send_refused_stream (uint32_t h2id, connection * const con)

     /* too many active streams; refuse new stream */
     h2c->h2_cid = h2id;
+log_error(NULL, __FILE__, __LINE__, "%s (%s) sending h2 REFUSED_STREAM on stream id:%u, error code:%d", __func__, con->dst_addr_buf.ptr, h2id, H2_E_REFUSED_STREAM);
     h2_send_rst_stream_id(h2id, con, H2_E_REFUSED_STREAM);
     return 1;
 }

Actions #15

Updated by stenvaag about 2 years ago

Yes, but I´m unsure if the error messages shown in the browser is the same.

Actions #16

Updated by stenvaag about 2 years ago

I added that in a later test yesterday and it was h2_send_refused_stream that called h2_send_rst_stream_id with H2_E_REFUSED_STREAM.

Actions #17

Updated by stenvaag about 2 years ago

Attached is errorl.log with RST_STREAM error code.

Actions #18

Updated by gstrauss about 2 years ago

  • Status changed from Need Feedback to Invalid

lighttpd is working as intended and is sending REFUSED_STREAM when you attack the server.

Yes, but I´m unsure if the error messages shown in the browser is the same.

One problem is that your client code continues to send uncontrolled bursts of requests containing request bodies. Whether or not that is the only problem can be determined after you fix that problem. You can serialize the PUTs on the client side and that will probably fix your issue. Then, if it works just fine serialized, you can keep it that way. If the performance is reduced, then try increasing your PUT queue size to 4 on the client side. Submitting your ajax requests through a queue on the client side is something you'll have to code.

When Chrome initiates a new HTTP/2 connection to lighttpd, it will get a response from lighttpd indicating the stream limit. If you have sent an uncontrolled burst of requests containing request bodies, and Chrome opens a new connection, then Chrome might initially send too many HTTP/2 stream requests before lighttpd responds with the stream limit. If that happens, lighttpd refuses the streams in excess of the lighttpd limit, and Chrome gives you those errors.

Your application should not send uncontrolled bursts of requests containing request bodies, or your application should gracefully recover and retry after failures.

On a broader design perspective, if you are sending messages back and forth, then websockets is a different protocol to consider using instead of uncontrolled bursts of PUTs.

Actions #19

Updated by gstrauss about 2 years ago

  • Target version deleted (1.4.65)
Actions

Also available in: Atom