Bug #3148
closedHTTP/2 Error
Description
I get HTTP/2 errors from Chrome that is not easy to reproduce. I´m sometimes able to reproduce the error when my application (Single Page Web app) creates a lot of PUTs. I´m using 1.4.64 with modifications from case #3102 and created a log file. Hopefully you can get some information from the log to track down the error.
Files
Updated by gstrauss over 1 year ago
- Status changed from New to Need Feedback
The log is 35101 lines. Would you care to give a hint as to where to start? I see occasional RST_STREAM before GOAWAY, but the log does not otherwise contain any further information that might help track down the error.
Updated by stenvaag over 1 year ago
I was afraid that cutting the log down I would remove essential information. As stated the error is not easy to reproduce and requires a lot of PUT's to trigger.
I can try to create a smaller logg with the problem.
Any more logging that I could add to the code?
Updated by stenvaag over 1 year ago
- File error.7z error.7z added
- File chrome_lighttpd_errors.png chrome_lighttpd_errors.png added
The attached error.log file is emptied just prior to the PUT's.
Image of the errors in Chrome is attached.
Updated by stenvaag over 1 year ago
When reloading my application I can see that the 3 first PUTs is successfull, then the next 3 failed. Then there is one successfull and then one more failure.
Updated by gstrauss over 1 year ago
The RST_STREAM you see may indicate that there was a problem that led to RST_STREAM. The RST_STREAM is not the source of the problem, but is a symptom.
Any more logging that I could add to the code?
Is there any information in the lighttpd error log? You can add debug.log-ssl-noise = "enable"
, though that might end up being noisy.
When reloading my application I can see that the 3 first PUTs is successfull, then the next 3 failed. Then there is one successfull and then one more failure.
You might strace -s 1024 -o /var/tmp/lighttpd.strace.log lighttpd -D -f /etc/lighttpd/lighttpd.conf ...
and capture the sequence of requests that succeed and then fail. That might produce some large logs and might contain sensitive info, so please check before sharing.
Before doing that, you might test to see when the error started occurring, between lighttpd 1.4.60 (which you were using in #3102) and lighttpd 1.4.64.
Updated by gstrauss over 1 year ago
BTW, is your FastCGI backend is sending Transfer-Encoding: chunked
? FastCGI frames the data, so Transfer-Encoding: chunked
is not necessary from a lighttpd backend which is not an HTTP/1.1 proxy (accessed via lighttpd mod_proxy).
Updated by stenvaag over 1 year ago
No, I can not find any Transfer-Encoding setting in my FastCGI backend.
Updated by gstrauss over 1 year ago
Are you able to test different versions of lighttpd to find when the error started occurring between lighttpd 1.4.60 (which you were using in #3102) and lighttpd 1.4.64?
Updated by stenvaag over 1 year ago
I tested 1.4.60 and 1.4.64 now and both have the problem.
Updated by stenvaag over 1 year ago
I have now tested 1.4.61, 1.4.62 and 1.4.63 and the problem is in all versions. This was problably not needed as the problem was present in 1.4.60.
Updated by stenvaag over 1 year ago
- File lighttpd.strace.log.7z lighttpd.strace.log.7z added
Attached is the strace log.
First PUT is in line 39714.
Updated by stenvaag over 1 year ago
I´m not 100% sure, but it looks like the first 5 PUT requests was successfull, then one failing, and then 7 successfull.
Updated by gstrauss over 1 year ago
Does the issue occur with 1.4.59 plus the one-line patch in #3102?
Updated by gstrauss over 1 year ago
Looking again at the second error.7z you attached, I think that your client side app is having the same problem as in #3102 (and suffers the same criticism).
If you are going to log REFUSED_STREAM in h2.c, then please also log the error code. If the error code is REFUSED_STREAM, then lighttpd is behaving as designed when faced with too many initial streams which upload data and might deadlock on the connection if lighttpd does not refuse the excess.
For lighttpd 1.4.64:
--- a/src/h2.c +++ b/src/h2.c @@ -313,6 +313,7 @@ h2_send_rst_stream_id (uint32_t h2id, connection * const con, const request_h2er rst_stream.u[3] = htonl(e); chunkqueue_append_mem(con->write_queue, /*(+3 to skip over align padding)*/ (const char *)rst_stream.c+3, sizeof(rst_stream)-3); +log_error(NULL, __FILE__, __LINE__, "(%s) sending h2 RST_STREAM on stream id:%u, error code:%d", con->dst_addr_buf.ptr, h2id, e); } @@ -443,6 +444,7 @@ h2_send_refused_stream (uint32_t h2id, connection * const con) /* too many active streams; refuse new stream */ h2c->h2_cid = h2id; +log_error(NULL, __FILE__, __LINE__, "%s (%s) sending h2 REFUSED_STREAM on stream id:%u, error code:%d", __func__, con->dst_addr_buf.ptr, h2id, H2_E_REFUSED_STREAM); h2_send_rst_stream_id(h2id, con, H2_E_REFUSED_STREAM); return 1; }
Updated by stenvaag over 1 year ago
- File chrome_errors_1.4.59_incl_one_line_patch.png chrome_errors_1.4.59_incl_one_line_patch.png added
Yes, but I´m unsure if the error messages shown in the browser is the same.
Updated by stenvaag over 1 year ago
I added that in a later test yesterday and it was h2_send_refused_stream that called h2_send_rst_stream_id with H2_E_REFUSED_STREAM.
Updated by stenvaag over 1 year ago
Attached is errorl.log with RST_STREAM error code.
Updated by gstrauss over 1 year ago
- Status changed from Need Feedback to Invalid
lighttpd is working as intended and is sending REFUSED_STREAM when you attack the server.
Yes, but I´m unsure if the error messages shown in the browser is the same.
One problem is that your client code continues to send uncontrolled bursts of requests containing request bodies. Whether or not that is the only problem can be determined after you fix that problem. You can serialize the PUTs on the client side and that will probably fix your issue. Then, if it works just fine serialized, you can keep it that way. If the performance is reduced, then try increasing your PUT queue size to 4 on the client side. Submitting your ajax requests through a queue on the client side is something you'll have to code.
When Chrome initiates a new HTTP/2 connection to lightptd, it will get a response from lighttpd indicating the stream limit. If you have sent an uncontrolled burst of requests containing request bodies, and Chrome opens a new connection, then Chrome might initially send too many HTTP/2 stream requests before lighttpd responds with the stream limit. If that happens, lighttpd refuses the streams in excess of the lighttpd limit, and Chrome gives you those errors.
Your application should not send uncontrolled bursts of requests containing request bodies, or your application should gracefully recover and retry after failures.
On a broader design perspective, if you are sending messages back and forth, then websockets is a different protocol to consider using instead of uncontrolled bursts of PUTs.
Also available in: Atom