Project

General

Profile

Actions

Bug #3100

closed

Random TLS errors on established connections

Added by flynn 10 days ago. Updated 6 days ago.

Status:
Fixed
Priority:
Normal
Category:
TLS
Target version:
ASK QUESTIONS IN Forums:
No

Description

With the current git-version I get sometimes TLS errors in the browser (firefox) on already existing tabs. At the moment I cannot trigger/reproduce this, but it happens several times a day:
  • open a new tab with a new URL
  • the tab reloads itself through javascript code
  • after many successful reloads the browser gives an error, that it cannot establish a secure connection
  • explicit reload the tab solves the problem

All requests are HTTP/2.

This did not happen with version 1.4.59.

What can I do to isolate the problem?


Related issues

Related to Bug #3093: Chrome 92, HTTP/2, fcgi, mutiple puts no responseFixedActions
Actions #1

Updated by gstrauss 10 days ago

What can I do to isolate the problem?

I did a quick review of mod_openssl changes since since lighttpd 1.4.59. Changes were made for compatibility with OpenSSL 3.0.0 (GA release 7 Sep 2021) and to detect time jumping backwards a large amount (which might break Session Tickets in mod_openssl). Other changes largely affect mod_openssl at setup time, not at run time. Related issues: #3075, #3080, #3081. Some changes in HTTP/2 support in lighttpd were made to address #3078 and #3093.

As I think aloud about possible runtime failure scenarios:

Does the problem tend to occur after you update OCSP stapling responses? Within 256 seconds of the expiration time marked in the OCSP stapling file (not file modification time), lighttpd detects and reload the OCSP stapling response. (I don't think this is related to the issue you are seeing unless your OCSP stapling responses expire multiple times a day, which is not typical practice.)

It is more likely that there is an issue somewhere with TLS session ticket rotation. You can isolate and test this by temporarily disabling Session Tickets (ssl.openssl.ssl-conf-cmd += ("Options" => "ServerPreference,-SessionTicket")) and seeing if this issue goes away. Please note that disabling Session Tickets will result in a slight performance (some added latency) hit when establishing new TLS connections, most visible on high-latency, low-power devices. That should not be as noticeable on a high-performance LAN (low-latency) with powerful computers.

Another option is to test with an alternate TLS library, such as lighttpd mod_gnutls. However, unless that is already on your system, that might be a bit of an effort, so I am not suggesting you do this.

Actions #2

Updated by flynn 10 days ago

OCSP-Stapling updates are running in the morning at 6:25, we can exclude this.

I test disabling Session Tickets first.

Actions #3

Updated by flynn 9 days ago

Disabling TLS session tickets is useless.

I disabled HTTP/2 in the browser and now the problem is gone, so it seems a HTTP/2 problem, which is reported by the bowser as TLS error.

Using h2spec with your patch (https://github.com/summerwind/h2spec/issues/120) I get:
- no errors with lighttpd version 1.4.59
- 5 errors with current git version

Failures: 

Hypertext Transfer Protocol Version 2 (HTTP/2)
  5. Streams and Multiplexing
    5.1. Stream States
      using source address [2a02:8440:4504:3e0a:7406:c855:a832:8ab1]:56284
      × 8: closed: Sends a DATA frame after sending RST_STREAM frame
        -> The endpoint MUST treat this as a stream error of type STREAM_CLOSED.
           Expected: GOAWAY Frame (Error Code: STREAM_CLOSED)
                     RST_STREAM Frame (Error Code: STREAM_CLOSED)
                     Connection closed
             Actual: WINDOW_UPDATE Frame (length:4, flags:0x00, stream_id:0)

  6. Frame Definitions
    6.9. WINDOW_UPDATE
      using source address [2a02:8440:4504:3e0a:7406:c855:a832:8ab1]:56396
      × 2: Sends a WINDOW_UPDATE frame with a flow control window increment of 0 on a stream
        -> The endpoint MUST treat this as a connection error of type PROTOCOL_ERROR.
           Expected: GOAWAY Frame (Error Code: PROTOCOL_ERROR)
                     RST_STREAM Frame (Error Code: PROTOCOL_ERROR)
                     Connection closed
             Actual: DATA Frame (length:0, flags:0x01, stream_id:1)

      6.9.1. The Flow-Control Window
        using source address [2a02:8440:4504:3e0a:7406:c855:a832:8ab1]:56406
        × 3: Sends multiple WINDOW_UPDATE frames increasing the flow control window to above 2^31-1 on a stream
          -> The endpoint MUST sends a RST_STREAM frame with a FLOW_CONTROL_ERROR code.
             Expected: RST_STREAM Frame (Error Code: FLOW_CONTROL_ERROR)
               Actual: RST_STREAM Frame (length:4, flags:0x00, stream_id:1)

  8. HTTP Message Exchanges
    8.1. HTTP Request/Response Exchange
      8.1.2. HTTP Header Fields
        8.1.2.6. Malformed Requests and Responses
          using source address [2a02:8440:4504:3e0a:7406:c855:a832:8ab1]:56478
          × 1: Sends a HEADERS frame with the "content-length" header field which does not equal the DATA frame payload length
            -> The endpoint MUST treat this as a stream error of type PROTOCOL_ERROR.
               Expected: GOAWAY Frame (Error Code: PROTOCOL_ERROR)
                         RST_STREAM Frame (Error Code: PROTOCOL_ERROR)
                         Connection closed
                 Actual: WINDOW_UPDATE Frame (length:4, flags:0x00, stream_id:0)
          using source address [2a02:8440:4504:3e0a:7406:c855:a832:8ab1]:56480
          × 2: Sends a HEADERS frame with the "content-length" header field which does not equal the sum of the multiple DATA frames payload length
            -> The endpoint MUST treat this as a stream error of type PROTOCOL_ERROR.
               Expected: GOAWAY Frame (Error Code: PROTOCOL_ERROR)
                         RST_STREAM Frame (Error Code: PROTOCOL_ERROR)
                         Connection closed
                 Actual: WINDOW_UPDATE Frame (length:4, flags:0x00, stream_id:0)

Can it be, that one of these failure matches my problem?

Actions #4

Updated by gstrauss 9 days ago

Please run the h2spec tests on a target which produces a response body of at least 16k+1 bytes (16385).

Actions #5

Updated by gstrauss 7 days ago

If you copy src/h2.c from lighttpd-1.4.59 and then s/HTTP_HEADER_H2_UNKNOWN/HTTP_HEADER_OTHER/g in the 3 locations in src/request.c, please see if you can reproduce the issue. That would narrow things down to src/h2.c, whose changes since lighttpd 1.4.59 are predominantly to try to better handle aggressive connection behavior from Chrome (#3093), and better handle dumb clients that send POST requests to non-existent URLs, and then fail to recover from a disconnect (#3078).

  • the tab reloads itself through javascript code
  • after many successful reloads the browser gives an error, that it cannot establish a secure connection

To give me some hints when trying to reproduce this: What type of request is the reload? GET? POST? What size payload (approximate)? What is the response? What size payload (approximate)? How frequently does the javascript code reload? (More or less than the lighttpd.conf idle timeout for reads and for keep-alives?) Is it possible that the javascript code is not retrying (similar to #3078), and if so, let's see what lighttpd might try to do better to workaround that.

Actions #6

Updated by flynn 7 days ago

h2spec with bigger response body runs without errors on current git version, I was just worried because version 1.4.59 worked with smaller response body.

Some more information about the reload:
  • correction: reload not by javscript, but HTML meta tag <meta http-equiv="refresh" content="90">
  • GET request, return code 200, reloaded are multiple (usually between 1-20) data-files with ~175kB each

I'll try your suggested change this evening.

Actions #7

Updated by gstrauss 7 days ago

flynn
Please see commit c82ca970 "refuse excess h2 streams at con start" for #3093

If the refreshed page is making more than 8 simultaneous requests after having sent lighttpd a SETTINGS acknowledgment about lighttpd h2 SETTINGS frame, then this might be the change from lighttpd 1.4.59 that is causing this issue.

To surgically disable the change in #3093 made for Chrome:

--- a/src/h2.c
+++ b/src/h2.c
@@ -413,7 +413,7 @@ h2_send_refused_stream (uint32_t h2id, connection * const con)
 {
     h2con * const h2c = con->h2;
     if (0 == h2c->sent_settings  /*(see h2_recv_settings() comments)*/
-        || (h2id < 200 && log_monotonic_secs - con->connection_start < 2)) {
+        /*|| (h2id < 200 && log_monotonic_secs - con->connection_start < 2)*/) {
         /* too many active streams; refuse new stream */
         h2c->h2_cid = h2id;
         h2_send_rst_stream_id(h2id, con, H2_E_REFUSED_STREAM);

Actions #8

Updated by gstrauss 7 days ago

This attempts to strike the right balance to still handle #3093 (all PUT requests) while preserving the better behavior (from lighttpd 1.4.59) for everything else.

--- a/src/h2.c
+++ b/src/h2.c
@@ -412,8 +412,33 @@ static int
 h2_send_refused_stream (uint32_t h2id, connection * const con)
 {
     h2con * const h2c = con->h2;
-    if (0 == h2c->sent_settings  /*(see h2_recv_settings() comments)*/
-        || (h2id < 200 && log_monotonic_secs - con->connection_start < 2)) {
+
+    /* h2c->sent_settings is non-zero if sent SETTINGS frame to
+     * client and have not yet received SETTINGS ACK from client */
+    int refuse = (0 == h2c->sent_settings);/*(see h2_recv_settings() comments)*/
+
+    if (!refuse
+        && (h2id < 200 && log_monotonic_secs - con->connection_start < 2)) {
+        /* client connected and immediately sent flurry of request streams.
+         * Check if all active streams have pending request body.  If all have
+         * pending request body, then must refuse new stream since unable to
+         * defer processing frames (below), as deferring will block receipt of
+         * DATA frames for active streams, and active streams then may not be
+         * able to complete properly. */
+        refuse = 1;
+        for (uint32_t i = 0, rused = h2c->rused; i < rused; ++i) {
+            const request_st * const r = h2c->r[i];
+            if (r->reqbody_length == r->reqbody_queue.bytes_in) {
+                /* At least one request may proceed, though others waiting for
+                 * request body may block until new request streams become
+                 * active if new request streams precede DATA frames */
+                refuse = 0;
+                break;
+            }
+        }
+    }
+
+    if (refuse) {
         /* too many active streams; refuse new stream */
         h2c->h2_cid = h2id;
         h2_send_rst_stream_id(h2id, con, H2_E_REFUSED_STREAM);

Actions #9

Updated by gstrauss 7 days ago

  • Related to Bug #3093: Chrome 92, HTTP/2, fcgi, mutiple puts no response added
Actions #10

Updated by gstrauss 6 days ago

  • Status changed from New to Patch Pending

Slightly different patch committed and pushed.

Actions #11

Updated by gstrauss 6 days ago

  • Status changed from Patch Pending to Fixed
Actions

Also available in: Atom