Project

General

Profile

[Solved] A 400 error introduced in v1.4.70 from Chrome(?)

Added by HenrikHolst 4 months ago

Hi,

this is at the moment just an initial report because I have yet not been able to find the root cause nor a way to reproduce the problem.

Starting with 1.4.70 (1.4.69 works just fine) and continuing in 1.4.71 we have encountered a problem where some end users gets 400 faults but where those are not logged in the access log. If they change from using the DNS name to make the request directly to the ip address the request works (aka 200 reply), that combined with the no entry in the access log makes me somewhat lean towards this being a SSL issue.

It also looks to be related to Chromium. The initial report came from people using the Chromium Embedded Framework and if they redid the same request in e.g cURL then everything was fine but once they used CEF they get 400 (and I have gotten a dump of the reply where I can see that the server portion of the reply is lighttpd so it isn't some internal 400 by the framework or by some strange proxy server, which was my initial suspicion).

And now we have gotten some reports from end users using Edge (which is Chrome based AFAIK) and Chrome so it looks to have something strange to do with Chrome here.

I have been unable to reproduce it myself yet so but will keep on trying until I can find it, just wanted to let you know about this issue on an early stage.


Replies (17)

RE: A 400 error introduced in v1.4.70 from Chrome(?) - Added by HenrikHolst 4 months ago

And with unable to reproduce I mean that I have tested this in Chromium on Linux and Chrome and Edge in Windows 10 and get no errors. Meanwhile some end users do, so this is very very strange.

RE: A 400 error introduced in v1.4.70 from Chrome(?) - Added by gstrauss 4 months ago

Please ensure that you are running lighttpd 1.4.71 (and that lighttpd has been restarted after the executable was updated).

There was a bug introduced in lighttpd 1.4.70 which is fixed in lighttpd 1.4.71. See also #3207 and #3210

RE: A 400 error introduced in v1.4.70 from Chrome(?) - Added by HenrikHolst 4 months ago

It was when I switched to 1.4.71 that we noticed the issue and it's a plain 400 so not a 403 and not a segfault. Anyway my idea of it being SSL can be tossed in the wind because I copied mod_openssl.c from 1.4.69 to 1.4.71 and recompiled and the problem persists.

RE: A 400 error introduced in v1.4.70 from Chrome(?) - Added by gstrauss 4 months ago

Is there any error in the lighttpd error log along with the 400?

If request headers are rejected with 400 Bad Request, then lighttpd does not trust the request header, and nothing is logged in the access log. If the error is (sometimes) reproducible under CEF, then please try with debug.log-request-header = "enable" in lighttpd.conf and see if that results in trace in the error log.

In lighttpd 1.4.70, I recalling making some RFC-strict optimizations for HTTP/2, so the request headers printed to the error log (when an error occurs and debug.log-request-header = "enable") will be useful in identifying what is happening. When the error occurs, is it always with HTTP/2? Has the error occurred with HTTP/1.1 requests?

RE: A 400 error introduced in v1.4.70 from Chrome(?) - Added by gstrauss 4 months ago

FYI: commits related to h2 changes in lighttpd 1.4.70: 4b9581ad and 9eb99181

If you can reproduce the issue, please test by reverting 4b9581ad, and then, separately, 9eb99181. If reverting either works, that would suggest that ls-hpack which provides HPACK underneath lighttpd h2 support might not be tagging received request headers in the way the lighttpd expects, and so lighttpd would have to be changed to accommodate.

RE: A 400 error introduced in v1.4.70 from Chrome(?) - Added by HenrikHolst 4 months ago

gstrauss wrote in RE: A 400 error introduced in v1.4.70 from Chrome(?):

FYI: commits related to h2 changes in lighttpd 1.4.70: 4b9581ad and 9eb99181

If you can reproduce the issue, please test by reverting 4b9581ad, and then, separately, 9eb99181. If reverting either works, that would suggest that ls-hpack which provides HPACK underneath lighttpd h2 support might not be tagging received request headers in the way the lighttpd expects, and so lighttpd would have to be changed to accommodate.

Man thanks, reverted 4b9581ad and the problem disappeared so under some extreme specific circumstance ls-hpack does not properly tag the headers.

RE: A 400 error introduced in v1.4.70 from Chrome(?) - Added by gstrauss 4 months ago

Man thanks, reverted 4b9581ad and the problem disappeared so under some extreme specific circumstance ls-hpack does not properly tag the headers.

Well, the problem is more likely lighttpd making an incorrect assumption about ls-hpack in those circumstances.

Were you able to capture the request headers for a request from CEF which failed with 400 Bad Request?

RE: A 400 error introduced in v1.4.70 from Chrome(?) - Added by HenrikHolst 4 months ago

gstrauss wrote in RE: A 400 error introduced in v1.4.70 from Chrome(?):

Man thanks, reverted 4b9581ad and the problem disappeared so under some extreme specific circumstance ls-hpack does not properly tag the headers.

Well, the problem is more likely lighttpd making an incorrect assumption about ls-hpack in those circumstances.

Were you able to capture the request headers for a request from CEF which failed with 400 Bad Request?

unfortuately not, with the commit reverted the reply is a 200 and with the commit the reply is a 400 but for some reason that 400 is not logged to the accesslog (or if it is logged it is not logged as the 400 it is being sent as) which is ofc another problem in itself.

And for the end user that experienced this issue they have tons of requests to the same server that did work so I have not been able to pinpoint which of the ones that now gave 200 that was the problematic ones before.

Initially when they first discovered the problem they did send me a screenshot (ofc because who on earth can mail text...) of the request+reply from CEF and if I censor out username+password from the url and convert from image to text their reply looks like this:

:Authority: bon2.millistream.com
:Method: GET
:Path: /mws.fcgi?cmd=quote&fields=name,diff1dprc&filetype=json&limit=5&list=35208&order=desc&orderby=diff1dprc&pwd=xxx&usr=xxx
:Scheme: https
Accept: */*
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
User-Agent: Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML like Gecko) Chrome/71.0.3578.98 Safari/537.36

RE: A 400 error introduced in v1.4.70 from Chrome(?) - Added by HenrikHolst 4 months ago

and since they claimed that it worked if they changed from using DNS name to ip I guess that it is :Authority: that is the problematic header here.

RE: A 400 error introduced in v1.4.70 from Chrome(?) - Added by gstrauss 4 months ago

Thanks for the info. I would expect that to work. Do you know if this was the first request on an HTTP/2 connection, or a later request on the connection?

Were you able to capture the request headers for a request from CEF which failed with 400 Bad Request?

unfortuately not, with the commit reverted the reply is a 200 and with the commit the reply is a 400 but for some reason that 400 is not logged to the accesslog (or if it is logged it is not logged as the 400 it is being sent as) which is ofc another problem in itself.

As I mentioned above, lighttpd does not log to the access log when the request headers are rejected with 400 Bad Request. If debug.log-request-header = "enable", then all request headers will be logged to the lighttpd error log.

It would be useful to me to know if CEF was sending an actual bad request, or if lighttpd did not handle when ls-pack did not mark the pseudo-headers in the way that lighttpd expected them to be marked.

I'll try to look into this more this weekend.

RE: A 400 error introduced in v1.4.70 from Chrome(?) - Added by HenrikHolst 4 months ago

Unsure if it was the first request or if it was later in a reused connection. Regarding CEF it is my understanding that they send a proper request (and when they tested with cURL it worked so I take it that the request was valid).

Ok, wasn't aware that lighttpd didn't log Bad Requests (I must have skimmed past that in your reply earlier). Good to know for the future though.

RE: A 400 error introduced in v1.4.70 from Chrome(?) - Added by HenrikHolst 4 months ago

Since I now know exactly what causes the problem I will do a bit of deep dive into h2 next week and try to add logging to catch the requests where we can find :Authority: but hpack doesn't to see exactly what is up with those requests, I'll start to do this next week.

RE: A 400 error introduced in v1.4.70 from Chrome(?) - Added by gstrauss 4 months ago

I took a peek in ls-hpack/lshpack.c:lshpack_dec_decode(). Inside the HPACK encouding, I think that it is poosible that the client can tell the server not to index the header. In that case, ls-hpack does not check to see if the header field matches anything in ls-hpack:lshpack.c:static_table. lighttpd (seemingly incorrectly) expected the header field to be recognized, whether or not indexed by hpack, as the field name might be in ls-hpack:lshpack.c:static_table, even if the value is not indexed.

I'll probably end up reverting commit 4b9581ad since it is only a minor optimization, and seemingly incorrect.

If the above assessment is accurate, then logging request headers in lighttpd won't show the HPACK encoding, as the underlying HPACK decoding is handled by ls-hpack.

RE: A 400 error introduced in v1.4.70 from Chrome(?) - Added by HenrikHolst 4 months ago

Could be a bug in ls-hpack, looking at RFC 7541 both :authority and :path is labelled as static so AFAIK they should be indexed regardless of what the client says.

RE: A 400 error introduced in v1.4.70 from Chrome(?) - Added by HenrikHolst 4 months ago

but indexing the header is done by the sender if I read things correctly so looks like CEF (or perhaps CEF in some configuration) can be stupid enough to send the header name instead of the index in the request and when that happens ls-pack does not update the internal index (and to do so it would have to perform string compares so logically that it doesn't). So yes it looks like the optimization in 4b9481ad is not reliable.

RE: A 400 error introduced in v1.4.70 from Chrome(?) - Added by HenrikHolst 4 months ago

according to the rfc this is apparently done for security reasons, aka to avoid compressing specific headers like :authority to avoid "dynamic table probing".

"
Implementations can also choose to protect sensitive header fields by not compressing them and instead encoding their value as literals.
...
An encoder might also choose not to index values for header fields that are considered to be highly valuable or sensitive to recovery, such as the Cookie or Authorization header fields.
"

Looks like this answers the whole thing.

RE: [Solved] A 400 error introduced in v1.4.70 from Chrome(?) - Added by gstrauss 3 months ago

Patch is now on lighttpd git master branch. Thank you for the report and for the research.

Yes, the client can avoid compressing specific headers. That is tangentially related to the bug in lighttpd, which made an incorrect assumption using the ls-hpack interface that ls-hpack would map know pseudo-headers to enums. The ls-hpack internals perform the enum-mapping only when the headers have been compressed, so lighttpd has to continue to check and perform the enum mapping if it has not been done by ls-hpack, since lighttpd expects known headers to have been mapped to lighttpd enums, regardless of the underlying details of the HPACK compression.

    (1-17/17)