Project

General

Profile

[Solved] Problems with Netskope client connection using HTTP/2

Added by flynn about 2 months ago

Some users must access the web services through the Netskope client. Using HTTP/2 it sometimes works, but mostly not, an unspecified HTTP/2 error is reported.
The client debug log does not give more usefull information.

Disabling HTTP/2 (server.feature-flags += ("server.h2proto" => "disable")) on the server side solves the problem.
Unfortunately it is not possible to restrict HTTP/2 support in lighttpd by remote IP using

$HTTP["remoteip"] == "163.116.178.0/23" { # Netskope IPs
  server.feature-flags       += ("server.h2proto" => "disable")
  server.feature-flags       += ("server.h2c"     => "disable")
}

(should I a write a feature request for this?)

On the server side I see only one line with PRI and very few bytes, e.g.

163.116.179.56 x.y.z - [14/Oct/2022:11:36:10 +0200] "PRI * HTTP/2.0" 100 61 "-" "-" 
163.116.179.56 x.y.z - [14/Oct/2022:11:36:10 +0200] "PRI * HTTP/2.0" 100 78 "-" "-" 
163.116.179.56 x.y.z - [14/Oct/2022:11:36:12 +0200] "PRI * HTTP/2.0" 100 78 "-" "-" 
163.116.179.56 x.y.z - [14/Oct/2022:11:36:13 +0200] "PRI * HTTP/2.0" 100 61 "-" "-

But I see no further information, why the connection was terminated.

Any hints how to debug this?


Replies (13)

RE: Problems with Netskope client connection using HTTP/2 - Added by gstrauss about 2 months ago

It is strange that it works "sometimes".

Any hints how to debug this?

It would be useful to me to get those bytes in hex (after being decoded from TLS)

As the data is brief, I am guessing it is all being sent in the initial request from Netskope.

If this is easily reproducible, one quick method is to attach a debugger to lighttpd, set a breakpoint at h2_recv_client_connection_preface(), send a request from Netskope, and then dump the contents of con->read_queue->first->mem.

Alternatively, if you would prefer a debug patch to log the initial read in hex, I can provide one.

A short-term workaround might be to set up HAProxy in front of lighttpd and see if things work with HAProxy handling TLS termination and HTTP/2 from Netskope.

Disabling HTTP/2 (server.feature-flags += ("server.h2proto" => "disable")) on the server side solves the problem.
Unfortunately it is not possible to restrict HTTP/2 support in lighttpd by remote IP using
(should I a write a feature request for this?)

The choice to use HTTP/2 occurs deep in the TLS code, during the TLS handshake, and before the lighttpd configuration has been evaluated against the request. The global config is what is in scope at that point. I can not defer selecting HTTP/2 because the ALPN h2 is negotiated as part of the TLS handshake. While I could duplicate the work to run the configuration against remote IP before processing the TLS handshake, doing so doubles the work since the configuration has to be run again once the request is received. In short, while this could be done, I have chosen not to do the extra work for performance implications on large lighttpd configs. (lighttpd configuration evaluation is quick, but not free.)

If this became a critically needed feature, then I might consider adding a new TLS configuration directive to disable ALPN h2 for IPs -- which would be processed in each TLS module -- by reusing code from mod_extforward, instead of repeating the server-wide evaluation of lighttpd conditional configuration.

RE: Problems with Netskope client connection using HTTP/2 - Added by flynn about 2 months ago

I'll send you the PRI-bytes later, when the traffic is lower ...

Two more oberservations regarding "sometimes":
  • the Netskope client seems to be filtering proxy, some instances (=remote ips) work, other not. I identified at least two class-C networks, so we are talking about up to 500 instances, which may differ a little
  • also the size of the first responses have an influence: small packets are ok, e.g. a response with a return value 401. Big first response, e.g. > 40kB fail more often

RE: Problems with Netskope client connection using HTTP/2 - Added by flynn about 2 months ago

Which hex variant do want to have:
  • single chars: 0x10 0x11 0x12 ....
  • multipile chars: 0x101112 ...
  • other format with example?

RE: Problems with Netskope client connection using HTTP/2 - Added by gstrauss about 2 months ago

Which hex variant do want to have

Does not matter to me. I'll be walking along decoding the HTTP/2 protocol in my head (with the RFC as a reference).

If the client sends the request and lighttpd is sending a protocol error in response, then the client connection preface plus initial bytes might be enough to identify the issue. If there are some round trips between client and server, then getting I might need to instrument the lighttpd code so that we see the sequence and quantity of bytes that went to TLS for read/write.

RE: Problems with Netskope client connection using HTTP/2 - Added by flynn about 2 months ago

I used this additional code in h2_recv_client_connection_preface():

    if (cq->first) {
      fprintf(stderr, "Length con->read_queue->first->mem: %d\n", buffer_clen(cq->first->mem));
      for (uint32_t n = 0; n < buffer_clen(cq->first->mem); ++n)
        fprintf(stderr, "%02X ", cq->first->mem->ptr[n]);
      fprintf(stderr, "\n");
    }

and got:
Length con->read_queue->first->mem: 24
50 52 49 20 2A 20 48 54 54 50 2F 32 2E 30 0D 0A 0D 0A 53 4D 0D 0A 0D 0A 
Length con->read_queue->first->mem: 24
50 52 49 20 2A 20 48 54 54 50 2F 32 2E 30 0D 0A 0D 0A 53 4D 0D 0A 0D 0A

The access log shows different length:
163.116.179.56 x.y.z - [15/Oct/2022:02:31:01 +0200] "PRI * HTTP/2.0" 100 78 "-" "-" 
163.116.179.56 x.y.z - [15/Oct/2022:02:31:08 +0200] "PRI * HTTP/2.0" 100 61 "-" "-" 

Note: I'll be not able to make further tests until Tuesday

RE: Problems with Netskope client connection using HTTP/2 - Added by gstrauss about 2 months ago

24 bytes is the exact length of the HTTP/2 client connection preface PRI * HTTP/2.0\r\n\r\nSM\r\n\r\n, so whatever the issue occurs some time after.

I'll try to put together an instrumentation patch for Tue.

Thinking aloud: maybe the Netskope client does not handle the SETTINGS frame that is sent by the server, which includes some newer HTTP/2 frame settings SETTINGS_ENABLE_CONNECT_PROTOCOL and SETTINGS_NO_RFC7540_PRIORITIES. You might try testing Netskope client with lighttpd 1.4.64, as the newer frame settings were added in lighttpd 1.4.65.

RE: Problems with Netskope client connection using HTTP/2 - Added by flynn about 2 months ago

Switching to lighttpd 1.4.64 seems to solve the issue, several connection attempts are successfull.

RE: Problems with Netskope client connection using HTTP/2 - Added by gstrauss about 2 months ago

Client and server should be able to handle (and ignore) unknown settings. For testing, you can try disabling those extra settings:

--- a/src/h2.c
+++ b/src/h2.c
@@ -1974,7 +1974,7 @@ h2_init_con (request_st * const restrict h2r, connection * const restrict con, c

     static const uint8_t h2settings[] = { /*(big-endian numbers)*/
       /* SETTINGS */
-      0x00, 0x00, 0x1e        /* frame length */ /* 5 * (6 bytes per setting) */
+      0x00, 0x00, 0x12        /* frame length */ /* 3 * (6 bytes per setting) */
      ,H2_FTYPE_SETTINGS       /* frame type */
      ,0x00                    /* frame flags */
      ,0x00, 0x00, 0x00, 0x00  /* stream identifier */
@@ -1999,10 +1999,12 @@ h2_init_con (request_st * const restrict h2r, connection * const restrict con, c
      #endif
      ,0x00, H2_SETTINGS_MAX_HEADER_LIST_SIZE
      ,0x00, 0x00, 0xFF, 0xFF  /* 65535 */
+  #if 0
      ,0x00, H2_SETTINGS_ENABLE_CONNECT_PROTOCOL
      ,0x00, 0x00, 0x00, 0x01  /* 1 */
      ,0x00, H2_SETTINGS_NO_RFC7540_PRIORITIES
      ,0x00, 0x00, 0x00, 0x01  /* 1 */
+  #endif

       /* WINDOW_UPDATE */
      ,0x00, 0x00, 0x04        /* frame length */

RE: Problems with Netskope client connection using HTTP/2 - Added by flynn about 2 months ago

It works with the patch above (disabling both extra settings).

I also tested only disabling one extension and can track the problem down to H2_SETTINGS_NO_RFC7540_PRIORITIES:

      0x00, 0x00, 0x18        /* frame length */ /* 4 * (6 bytes per setting) */
 ...
#if 0
     ,0x00, H2_SETTINGS_NO_RFC7540_PRIORITIES
     ,0x00, 0x00, 0x00, 0x01  /* 1 */
#endif

With this minimal changes it still connects and transfers data.

RE: Problems with Netskope client connection using HTTP/2 - Added by gstrauss about 2 months ago

Would you open an issue with Netskope and ask them to look into it? At this moment, this does not appear to be an issue with lighttpd.
https://www.rfc-editor.org/rfc/rfc9218 Extensible Prioritization Scheme for HTTP

RE: Problems with Netskope client connection using HTTP/2 - Added by gstrauss about 2 months ago

BTW, you can safely use lighttpd 1.4.67 with your minimal patch. Sending SETTINGS_NO_RFC7540_PRIORITIES is merely a hint to the client that lighttpd will ignore HTTP/2 PRIORITY frame from the client, and so the client need not waste resources keeping state to support PRIORITY. lighttpd (and most web servers) never implemented support for PRIORITY because of complexity and DoS risks, among other reasons, so the lighttpd behavior is unchanged whether or not SETTINGS_NO_RFC7540_PRIORITIES is sent. The dearth of uptake for PRIORITY is what led to the development of PRIORITY_UPDATE as an alternative. lighttpd supports HTTP/2 PRIORITY_UPDATE.

RE: Problems with Netskope client connection using HTTP/2 - Added by flynn about 2 months ago

gstrauss wrote in RE: Problems with Netskope client connection using HTTP/2:

Would you open an issue with Netskope and ask them to look into it? At this moment, this does not appear to be an issue with lighttpd.
https://www.rfc-editor.org/rfc/rfc9218 Extensible Prioritization Scheme for HTTP

I cannot really open an issue, because i'm not the official customer. The support in the company just says: use the Netskope client (it is enabled by default) and disable it if there is problem, that's all. We have had other issues with Netskope, e.g. the missing LetsEncrypt CA, where we explained exact the problem and nothing happened. I will not waste any more time on this product.

gstrauss wrote in RE: Problems with Netskope client connection using HTTP/2:

BTW, you can safely use lighttpd 1.4.67 with your minimal patch. Sending SETTINGS_NO_RFC7540_PRIORITIES is merely a hint to the client that lighttpd will ignore HTTP/2 PRIORITY frame from the client, and so the client need not waste resources keeping state to support PRIORITY. lighttpd (and most web servers) never implemented support for PRIORITY because of complexity and DoS risks, among other reasons, so the lighttpd behavior is unchanged whether or not SETTINGS_NO_RFC7540_PRIORITIES is sent. The dearth of uptake for PRIORITY is what led to the development of PRIORITY_UPDATE as an alternative. lighttpd supports HTTP/2 PRIORITY_UPDATE.

That would have been my next question and I can live well with this solution.
Thank you for your help tracking down this weird client issue.

RE: [Solved] Problems with Netskope client connection using HTTP/2 - Added by gstrauss about 2 months ago

FYI: In the chat box on netskope's website, I reported the apparent issue with Netskope mishandling the HTTP/2 setting. Jessica in marketing there said she would try to forward the bug to the internal dev team. We'll see...

    (1-13/13)