Project

General

Profile

Actions

Bug #3093

closed

Chrome 92, HTTP/2, fcgi, mutiple puts no response

Added by stenvaag about 1 month ago. Updated 2 days ago.

Status:
Fixed
Priority:
Normal
Category:
core
Target version:
ASK QUESTIONS IN Forums:
No

Description

I use Lighttpd (self compiled) with a self developed fcgi backend. I have used this in a production environment for 10+ years. After upgrading to 1.4.59 I have an issue when my web system sends multiple (more than 2, I'm unsure at what number the problem starts) puts. When using Chrome 92 my fcgi backend is not called. Using Firefox 90 the problem does not occur.


Files

lighttpd_chrome_bug.zip (6.22 KB) lighttpd_chrome_bug.zip stenvaag, 2021-08-10 12:03
lighttpd.conf (4.03 KB) lighttpd.conf Config file with some parts removed stenvaag, 2021-08-11 14:16

Related issues

Related to Bug #3100: Random TLS errors on established connectionsFixedActions
Actions #1

Updated by stenvaag about 1 month ago

When disabling HTTP/2 there is no problem with Chrome.

Actions #2

Updated by gstrauss about 1 month ago

When using Chrome 92 my fcgi backend is not called.

The logs suggest otherwise. The logs you shared show
2021-08-10 13:20:28: gw_backend.c.2571) handling it in mod_gw
but do not show responses for the chrome scenario with multiple requests.

This sounds like an issue with your FastCGI backend. However it is hard to say since you have not read How to get help and you have not provided a lighttpd.conf or any hints how to reproduce besides "multiple PUT requests".

The mod_fastcgi doc has a section near the top of the page about debugging.
Try fastcgi.debug = 3

Look into what your fastcgi backed is doing differently between requests from Chrome and requests from Firefox.

Actions #3

Updated by stenvaag about 1 month ago

Sorry about this. Setting fastcgi.debug = 3 suggests that fcgi is called. The bug have to be in my system.

Actions #4

Updated by gstrauss about 1 month ago

  • Status changed from New to Invalid
  • Priority changed from Normal to Low
  • Target version deleted (1.4.x)

Once you track the issue down, if the issue might be something others run into, please post a summary of what the problem was in how the fastcgi backend was handling or queuing requests.

Actions #5

Updated by stenvaag about 1 month ago

I have tried to locate the error today. My lastest finding is that when the number of put requests passes 8, gw_write_request (and my fcgi server) is never called. When using 8 (or less) puts lighttpd receives all the header frames and then all the data frames, and all works as expected. When using 9 puts lighttpd receives all the header frames, but no data frames. I have not managed to check if Chrome is actually sending the data frames or not.

Reading "How to get help" I have the following

Linux Ubuntu 18.04.
lighttpd/1.4.59 (ssl) - a light and fast webserver

Actions #6

Updated by gstrauss about 1 month ago

lighttpd HTTP/2 support limits SETTINGS_MAX_CONCURRENT_STREAMS to 8 and communicates this in the initial SETTINGS sent by the server. Your client should recover if additional streams are rejected with REFUSED_STREAM.

Are you getting REFUSED_STREAM when your client sends too many requests before any in the 8 already-in-flight requests finish?

Actions #7

Updated by stenvaag about 1 month ago

This have to be a bug in Chrome. In h2_recv_headers, when the 9th header frame is received, h2c->sent_settings is set so it does not send REFUSED_STREAM but silently ignores the error.

When testing I have seen occasionally correct behavior of Chrome, sending 8 header frames + 8 data frames and then 1 header frame and 1 data frame. Then all is working as expected.

I have to try to report a chrome bug.

Actions #8

Updated by gstrauss about 1 month ago

If you do, please post a link to the issue, so that I can track it, too.

RFC 7540 (on which Google participated) sets SETTINGS_MAX_CONCURRENT_STREAMS to 100 by default, but permits smaller values -- including 0 for short periods of time. lighttpd limits SETTINGS_MAX_CONCURRENT_STREAMS to 8 to put a constraint on memory and other resource use, and I believe that lighttpd is doing so in a way that respects RFC 7540 Hypertext Transfer Protocol Version 2 (HTTP/2). lighttpd attempts to process all the data initially received before lighttpd sends the initial SETTINGS frame from the server, but after that presumes that the client should have received the initial SETTINGS frame from the server.

You can see from the comment in h2_recv_headers that I had tested this with h2load -n 100 -m 100. However, I tested on localhost. With multiple systems, there is a windows with a chance of latency and multiple packets delayed/dropped/resent where the client has sent multiple packets, but lighttpd has only received some of them before lighttpd sends the initial SETTINGS frame from the server, and then more time before the client might receive and process the SETTINGS frame from the server.

I am open to ideas how lighttpd can do this better as long as it does not open lighttpd to additional resource-based attacks.

Actions #9

Updated by gstrauss about 1 month ago

  • Category changed from core to TLS
  • Status changed from Invalid to Patch Pending
  • Priority changed from Low to Normal
  • Target version set to 1.4.60

My hunch (guess) is that Chrome is sending multiple TLS frames with the initial connection, possibly including TLS early data. If your server has sufficient CPU and is fast enough to handle TLS at wire speed, e.g. not an embedded device, please try ssl.read-ahead = "enable" if you are using mod_openssl. (Search for "ssl.read-ahead" in lighttpd TLS docs for more info.)

Whether or not that addresses your issue, if you would, please try this patch, which is also valid for embedded systems. The patch attempts to handle the case where some impolite browser (Chrome) might attack at a server with more than 8 streams in the initial h2 connection setup, in more than one TCP (or other) transport packet or possibly exceeding the size of a single enclosing TLS frame, before the client receives server SETTINGS, and where some of those packets are delayed or dropped and do not arrive at the same time in kernel socket buffers of the server, or are part of multiple TLS frames and ssl.read-ahead = "disable", which is the default.

--- a/src/h2.c
+++ b/src/h2.c
@@ -1241,7 +1241,8 @@ h2_recv_headers (connection * const con, uint8_t * const s, uint32_t flen)

     if (id > h2c->h2_cid) {
         if (h2c->rused == sizeof(h2c->r)/sizeof(*h2c->r)) {
-            if (0 == h2c->sent_settings) { /*(see h2_recv_settings() comments)*/
+            if (0 == h2c->sent_settings    /*(see h2_recv_settings() comments)*/
+                || (id < 200 && log_epoch_secs - con->connection_start < 2)) {
                 /* too many active streams; refuse new stream */
                 h2c->h2_cid = id;
                 h2_send_rst_stream_id(id, con, H2_E_REFUSED_STREAM);

Actions #10

Updated by gstrauss about 1 month ago

  • Category changed from TLS to core
Actions #11

Updated by stenvaag about 1 month ago

I tried some changes in lighttpd yesterday and was able to make Chrome work without errors. One of the changes was to always send REFUSED_STREAM, similar to your suggestion. Some more changes was needed. I´m working on a description of the changes and will post this later today.

Actions #12

Updated by gstrauss about 1 month ago

I´m working on a description of the changes and will post this later today.

Thanks. I'll look for it.

One of the changes was to always send REFUSED_STREAM

At the moment, I don't recall exactly why, but there was a reason why I chose not to always send REFUSED_STREAM, or else I would not have written that extra code. Maybe I was handling it as a protocol violation after SETTINGS_MAX_CONCURRENT_STREAMS should have been processed by the client.

.

Unrelated to this issue, but since you helpfully provided your lighttpd.conf:

Given your cipher list, restricting to TLS1.2 or later, and lighttpd 1.4.59, the following lines are no longer necessary:
ssl.ca-file = "/etc/lighttpd/ssl/.../chain.pem"
ssl.dh-file = "/etc/lighttpd/ssl/dh2048.pem"

If you support mobile devices, you may want to add CHACHA20 to your cipher list, and then replace the line:
ssl.honor-cipher-order = "enable"
with
ssl.openssl.ssl-conf-cmd += ("Options" => "-ServerPreference")

For clearer syntax about supported TLS versions, you might use the syntax
ssl.openssl.ssl-conf-cmd += ("MinProtocol" => "TLSv1.2") to replace "Protocol" => "-TLSv1.1, -TLSv1, -SSLv3".
MinProtocol was added in openssl 1.1.0 and you're using 1.1.x in Ubuntu 18.04

Further doc at lighttpd TLS docs

Actions #13

Updated by stenvaag about 1 month ago

I tried your purposed change in h2_recv_headers now and it is working. My description of the changes tried yesterday is no longer needed.

To make this work you need one more change. h2_recv_data must not send PROTOCOL_ERROR when receiving data frames on a prior REFUSED_STREAM stream but "ignore" it. I just commented out the lines in h2_recv_data:

if (h2c->h2_cid < id || (!h2c->sent_goaway && 0 != alen))
h2_send_goaway_e(con, H2_E_PROTOCOL_ERROR);

to make it work. You have probably a better solution than commenting this out, that I can test.

Actions #14

Updated by gstrauss about 1 month ago

To make this work you need one more change. h2_recv_data must not send PROTOCOL_ERROR when receiving data frames on a prior REFUSED_STREAM stream but "ignore" it.

stenvaag see #3078 and commit 81d18a8e which is on lighttpd git master and will be part of the next lighttpd release.

(Note: 81d18a8e uses log_monotonic_secs, but you can substitute log_epoch_secs if you want to test the patch against the lighttpd 1.4.59 release)

Actions #15

Updated by gstrauss about 1 month ago

I applied a new patch to the (current) HEAD of my development branch. It is slightly different from the two-line patch I posted further up.
https://git.lighttpd.net/lighttpd/lighttpd1.4/src/branch/personal/gstrauss/master
Some of this code is not yet on lighttpd git master, but the intention is that it all will be in the next month or two.
Also, please be aware that I often rewrite git history of my dev branch during development.
Usual caveats apply: this is a development branch: ok for testing, but at your own risk anywhere else.

Actions #16

Updated by stenvaag about 1 month ago

I made a test with the first proposed change in #3078. And this worked.

Actions #17

Updated by gstrauss about 1 month ago

Thanks for testing. FYI: the patch you tested from #3078 is not the patch which ended up being committed, but the idea and result should be the same.

Actions #18

Updated by gstrauss 7 days ago

  • Status changed from Patch Pending to Fixed
Actions #19

Updated by gstrauss 3 days ago

  • Related to Bug #3100: Random TLS errors on established connections added
Actions #20

Updated by gstrauss 3 days ago

stenvaag
It appears that my patch to handle the burst of PUT requests on initial connection has caused a regression when Firefox sends a burst of (non-PUT) requests on initial connection. I posted an updated patch in #3100 which hopefully addresses both scenarios.

Actions #21

Updated by stenvaag 2 days ago

I downloaded and tested https://git.lighttpd.net/lighttpd/lighttpd1.4/src/branch/master 40 minutes ago. "My" error is still fixed, but I get a random HTTP2 protocol error from Chrome.

Actions #22

Updated by gstrauss 2 days ago

I downloaded and tested https://git.lighttpd.net/lighttpd/lighttpd1.4/src/branch/master 40 minutes ago. "My" error is still fixed, but I get a random HTTP2 protocol error from Chrome.

Would you please provide some additional details? What kind of requests are now resulting in random HTTP/2 protocol errors from Chrome? With the current patches on lighttpd git master, the behavior "should" (famous last words) be the same as in lighttpd 1.4.59, except when the connecting client sends more than 8 requests with request bodies (e.g. PUT) before the request bodies for the first 8 have been sent.

Actions #23

Updated by stenvaag 2 days ago

It is not easy to get the error. I have to run my application multiple times (reload the page) to possible get the error. I got the error now when navigating to a page (inside my application) that in this case have 12 GETS to different URL's (all that resulting in a call to my fcgi backend). It is the first one that fails. The second has the SSL setup delay. All, but the first one, succeeds. The error from Chrome is "net::ERR_HTTP2_PROTOCOL_ERROR".

Actions #24

Updated by gstrauss 2 days ago

Would you test with lighttpd 1.4.59 to check if you saw this behavior with lighttpd 1.4.59, too? I have tested with h2load, which IIRC sends 100 GET requests upon the initial connection.

Actions #25

Updated by stenvaag 2 days ago

When this occur my fcgi backend reports that first byte received is not 1 (FCGI_VERSION_1), but a random number. This could be a error in my fcgi backend, but after a decade++, this error only appeared in 1.4.59 when the http/2 protocol was activated.

Actions #26

Updated by gstrauss 2 days ago

lighttpd mod_fastcgi always assigns header->version = FCGI_VERSION_1; when creating FCGI headers.
https://git.lighttpd.net/lighttpd/lighttpd1.4/src/branch/master/src/mod_fastcgi.c#L181

If you're seeing something different in the FCGI application, then something might be getting corrupted somewhere (either in lighttpd or in your app, not sure which).

This could be a error in my fcgi backend, but after a decade++, this error only appeared in 1.4.59 when the http/2 protocol was activated.

lighttpd HTTP/2 code currently is used only between lighttpd and the client, not between lighttpd and backends. Use of HTTP/2 can enable faster requests, so maybe this is exposing a race condition that has been present before. ...My next step is to run a load test on a FastCGI backend and see how things go...

Actions #27

Updated by gstrauss 2 days ago

FYI: I see 0 failures with h2load running 20000+ request per second against a FastCGI backend written in C (lighttpd tests/fcgi-responder). I tested using h2load against both http and https targets, with lighttpd tests/fcgi-responder handling the requests.

Actions #28

Updated by gstrauss 2 days ago

The error from Chrome is "net::ERR_HTTP2_PROTOCOL_ERROR".

BTW, it still concerns me that you're seeing that. If the backend FastCGI fails, lighttpd should send an HTTP response for the error handling the request, not a connection error.

Actions #29

Updated by gstrauss 2 days ago

this error only appeared in 1.4.59 when the http/2 protocol was activated.

Perhaps we should treat this as a separate issue from this one ("Chrome 92, HTTP/2, fcgi, mutiple puts no response"). Would you mind opening a new bug, and continuing to try to help me to reproduce these observations of yours? (It could be one or more issues.)

Actions #30

Updated by stenvaag 2 days ago

Yes, I will open a new bug. (I was out of office, but will now look into this.)

I tried to get the error with 1.4.59 with HTTP/2 enabled, but with no "luck".

Actions

Also available in: Atom