Project

General

Profile

Actions

Bug #3241

closed

[mod_wstunnel] websockets wss wstunnel.ping-interval

Added by jushou 2 months ago. Updated 2 months ago.

Status:
Fixed
Priority:
Normal
Category:
mod_wstunnel
Target version:
ASK QUESTIONS IN Forums:
No

Description

The configuration for mod_wstunnel is as follows:
cat /etc/lighttpd/conf.d/ws.conf

server.modules += ( "mod_wstunnel" )
$HTTP["url"] =~ "^/webchat" {
    wstunnel.server = (
        "" => ((
            "host" => "0.0.0.0",
            "port" => "5000" 
        ))
    )
    wstunnel.frame-type = "text" 
    wstunnel.ping-interval = "30" 
    wstunnel.debug = "1" 
}

When connecting with ws://192.168.1.1:80/webchat using websocket, everything works fine (ping-pong is normal).
However, when switching to wss connection with wss://192.168.1.1:443/webchat, the wstunnel.ping-interval parameter does not take effect (no ping is sent).

I added debug logging to the mod_wstunnel.c file in the mod_wstunnel_handle_trigger function as shown in the following code snippet:

gw_handle_trigger(srv, p_d);
for (connection *con = srv->conns; con; con = con->next) {
    request_st * const r = &con->request;
    handler_ctx *hctx = r->plugin_ctx[p->id];
    log_error(srv->errh, __FILE__, __LINE__, "svr=%p p_d=%p con=%p con->fd=%d hctx=%p r=%p p->id=%d", srv, p_d, con, con->fd, hctx,r, p->id);
    if (NULL == hctx || r->handler_module != p->self)

The log output at this point is as follows:
2024-02-28 09:33:49: (mod_wstunnel.c.620) svr=0x151c4a0 p_d=0x15c59b0 con=0x15f94f0, hctx=(nil) r=0x15f94f0 p->id=7
The hctx is NULL, which causes the subsequent code not to be executed. Why is it NULL in this case?


Files

error.log (20.5 KB) error.log jushou, 2024-02-28 10:09
Actions #1

Updated by gstrauss 2 months ago

  • Category set to mod_wstunnel

mod_wstunnel is agnostic to clear-text (ws://) or TLS (wss://)

However, at first glance, I think that mod_wstunnel_handle_trigger() in mod_wstunnel might not handle HTTP/2 extended CONNECT.
For testing purposes only, please temporarily disable HTTP/2 in your config and see if wss:// works.
server.feature-flags += ("server.h2proto" => "disable")

Actions #2

Updated by jushou 2 months ago · Edited

gstrauss wrote in #note-1:

mod_wstunnel is agnostic to clear-text (ws://) or TLS (wss://)

However, at first glance, I think that mod_wstunnel_handle_trigger() in mod_wstunnel might not handle HTTP/2 extended CONNECT.
For testing purposes only, please temporarily disable HTTP/2 in your config and see if wss:// works.
server.feature-flags += ("server.h2proto" => "disable")

Indeed, it worked after adding server.feature-flags += ("server.h2proto" => "disable")! Would you consider this a bug in mod_wstunnel under the HTTP/2 protocol? Is there a way to ensure it works correctly under HTTP/2?

Actions #3

Updated by gstrauss 2 months ago · Edited

Indeed, it worked after adding server.feature-flags += ("server.h2proto" => "disable")! Would you consider this a bug in mod_wstunnel under the HTTP/2 protocol? Is there a way to ensure it works correctly under HTTP/2?

Yes, it is a bug. The impact is primarily limited to the functionality of wstunnel.ping-interval for websocket negotiated via HTTP/2 extended CONNECT.

I have a preliminary patch if you would like to test:
[Edit: do not use this patch; see later posts]

diff --git a/src/mod_wstunnel.c b/src/mod_wstunnel.c
index 2789650f..7ac06634 100644
--- a/src/mod_wstunnel.c
+++ b/src/mod_wstunnel.c
@@ -609,30 +609,22 @@ static handler_t mod_wstunnel_check_extension(request_st * const r, void *p_d) {
       : rc;
 }

-TRIGGER_FUNC(mod_wstunnel_handle_trigger) {
-    const plugin_data * const p = p_d;
-    const unix_time64_t cur_ts = log_monotonic_secs + 1;
-
-    gw_handle_trigger(srv, p_d);
-
-    for (connection *con = srv->conns; con; con = con->next) {
-        request_st * const r = &con->request;
-        handler_ctx *hctx = r->plugin_ctx[p->id];
-        if (NULL == hctx || r->handler_module != p->self)
-            continue;
-
+__attribute_cold__
+static void mod_wstunnel_handle_trigger_hctx (handler_ctx * const hctx, void *p_d, const unix_time64_t cur_ts, request_st * const r, connection * const con) {
         if (hctx->gw.state != GW_STATE_WRITE && hctx->gw.state != GW_STATE_READ)
-            continue;
+            return;

-        if (cur_ts - con->read_idle_ts > r->conf.max_read_idle) {
+        /*(note: use per req hctx->gw.write_ts instead of con->read_idle_ts
+         * since we expect traffic from client on websocket) */
+        if (cur_ts - hctx->gw.write_ts > r->conf.max_read_idle) {
             DEBUG_LOG_INFO("timeout client (fd=%d)", con->fd);
             mod_wstunnel_frame_send(hctx,MOD_WEBSOCKET_FRAME_TYPE_CLOSE,NULL,0);
             gw_handle_request_reset(r, p_d);
             joblist_append(con);
-            /* avoid server.c closing connection with error due to max_read_idle
+            /* defer server.c closing connection with error due to max_read_idle
              * (might instead run joblist after plugins_call_handle_trigger())*/
-            con->read_idle_ts = cur_ts;
-            continue;
+            ++con->read_idle_ts;
+            return;
         }

         if (0 != hctx->hybivers
@@ -641,7 +633,31 @@ TRIGGER_FUNC(mod_wstunnel_handle_trigger) {
             hctx->ping_ts = cur_ts;
             mod_wstunnel_frame_send(hctx, MOD_WEBSOCKET_FRAME_TYPE_PING, CONST_STR_LEN("ping"));
             joblist_append(con);
-            continue;
+            return;
+        }
+}
+
+TRIGGER_FUNC(mod_wstunnel_handle_trigger) {
+    const plugin_data * const p = p_d;
+    const unix_time64_t cur_ts = log_monotonic_secs + 1;
+
+    gw_handle_trigger(srv, p_d);
+
+    for (connection *con = srv->conns; con; con = con->next) {
+        hxcon * const h2c = con->hx;
+        if (NULL != h2c) {
+            for (uint32_t i = 0, rused = h2c->rused; i < rused; ++i) {
+                request_st * const r = h2c->r[i];
+                handler_ctx * const hctx = r->plugin_ctx[p->id];
+                if (NULL != hctx && r->handler_module == p->self)
+                    mod_wstunnel_handle_trigger_hctx(hctx, p_d, cur_ts, r, con);
+            }
+        }
+        else {
+            request_st * const r = &con->request;
+            handler_ctx * const hctx = r->plugin_ctx[p->id];
+            if (NULL != hctx && r->handler_module == p->self)
+                mod_wstunnel_handle_trigger_hctx(hctx, p_d, cur_ts, r, con);
         }
     }

Actions #4

Updated by gstrauss 2 months ago

  • Status changed from New to Patch Pending
  • Target version changed from 1.4.xx to 1.4.75
Actions #5

Updated by jushou 2 months ago

gstrauss wrote in #note-3:

Indeed, it worked after adding server.feature-flags += ("server.h2proto" => "disable")! Would you consider this a bug in mod_wstunnel under the HTTP/2 protocol? Is there a way to ensure it works correctly under HTTP/2?

Yes, it is a bug. The impact is primarily limited to the functionality of wstunnel.ping-interval for websocket negotiated via HTTP/2 extended CONNECT.

I have a preliminary patch if you would like to test:
[Edit: patch updated]
[...]

The bug still persists. I tried debugging and came up with the following patch.

diff --git a/src/mod_wstunnel.c b/src/mod_wstunnel.c
index 7ac06634..6801b148 100644
--- a/src/mod_wstunnel.c
+++ b/src/mod_wstunnel.c
@@ -627,10 +627,12 @@ static void mod_wstunnel_handle_trigger_hctx (handler_ctx * const hctx, void *p_
             return;
         }

+        DEBUG_LOG_ERR("hctx->hybivers=%d hctx->conf.ping_interval=%d hctx->ping_ts=%ld cur_ts=%ld",hctx->hybivers, hctx->conf.ping_interval, hctx->ping_ts, cur_ts);
         if (0 != hctx->hybivers
             && hctx->conf.ping_interval > 0
             && (int32_t)hctx->conf.ping_interval + hctx->ping_ts < cur_ts) {
             hctx->ping_ts = cur_ts;
+            DEBUG_LOG_ERR("%s", "send ping");
             mod_wstunnel_frame_send(hctx, MOD_WEBSOCKET_FRAME_TYPE_PING, CONST_STR_LEN("ping"));
             joblist_append(con);
             return;

The logs are as follows, and the wss connection still gets disconnected after 60 seconds.

2024-02-28 16:45:14: (server.c.1939) server started (lighttpd/1.4.74)
2024-02-28 16:45:23: (gw_backend.c.945) gw - found a host 0.0.0.0 5000
2024-02-28 16:45:23: (gw_backend.c.272) got proc: pid: 0 socket: tcp:0.0.0.0:5000 load: 1
2024-02-28 16:45:23: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25867
2024-02-28 16:45:24: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25868
2024-02-28 16:45:25: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25869
2024-02-28 16:45:26: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25870
2024-02-28 16:45:27: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25871
2024-02-28 16:45:28: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25872
2024-02-28 16:45:29: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25873
2024-02-28 16:45:30: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25874
2024-02-28 16:45:31: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25875
2024-02-28 16:45:32: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25876
2024-02-28 16:45:33: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25877
2024-02-28 16:45:34: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25878
2024-02-28 16:45:35: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25879
2024-02-28 16:45:36: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25880
2024-02-28 16:45:37: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25881
2024-02-28 16:45:38: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25882
2024-02-28 16:45:39: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25883
2024-02-28 16:45:40: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25884
2024-02-28 16:45:41: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25885
2024-02-28 16:45:42: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25886
2024-02-28 16:45:43: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25887
2024-02-28 16:45:44: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25888
2024-02-28 16:45:45: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25889
2024-02-28 16:45:46: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25890
2024-02-28 16:45:47: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25891
2024-02-28 16:45:48: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25892
2024-02-28 16:45:49: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25893
2024-02-28 16:45:50: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25894
2024-02-28 16:45:51: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25895
2024-02-28 16:45:52: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25896
2024-02-28 16:45:53: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25866 cur_ts=25897
2024-02-28 16:45:53: (mod_wstunnel.c.635) send ping
2024-02-28 16:45:54: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25898
2024-02-28 16:45:55: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25899
2024-02-28 16:45:56: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25900
2024-02-28 16:45:57: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25901
2024-02-28 16:45:58: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25902
2024-02-28 16:45:59: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25903
2024-02-28 16:46:00: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25904
2024-02-28 16:46:01: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25905
2024-02-28 16:46:02: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25906
2024-02-28 16:46:03: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25907
2024-02-28 16:46:04: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25908
2024-02-28 16:46:05: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25909
2024-02-28 16:46:06: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25910
2024-02-28 16:46:07: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25911
2024-02-28 16:46:08: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25912
2024-02-28 16:46:09: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25913
2024-02-28 16:46:10: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25914
2024-02-28 16:46:11: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25915
2024-02-28 16:46:12: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25916
2024-02-28 16:46:13: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25917
2024-02-28 16:46:14: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25918
2024-02-28 16:46:15: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25919
2024-02-28 16:46:16: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25920
2024-02-28 16:46:17: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25921
2024-02-28 16:46:18: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25922
2024-02-28 16:46:19: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25923
2024-02-28 16:46:20: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25924
2024-02-28 16:46:21: (mod_wstunnel.c.630) hctx->hybivers=13 hctx->conf.ping_interval=30 hctx->ping_ts=25897 cur_ts=25925
2024-02-28 16:46:22: (gw_backend.c.352) released proc: pid: 0 socket: tcp:0.0.0.0:5000 load: 0

Actions #6

Updated by stbuehler 2 months ago

  • Subject changed from [mod_wstunnel ]:使用websockets (wss)wstunnel.ping-interval 参数并没有生效 to [mod_wstunnel ] websockets wss wstunnel.ping-interval
Actions #7

Updated by gstrauss 2 months ago · Edited

the wss connection still gets disconnected after 60 seconds.

I think that is different.

server.max-read-idle default value is 60 seconds. If the client does not respond, the read idle timeout will be reached.

lighttpd mod_wstunnel sending a ping will cause the connection write timestamp to be set to the current time, and will avoid reaching the server.max-write-idle timeout.

Is the client responding to the websocket ping?


In any case, I see from your debug trace that the ping is being sent. I have committed a different version of the patch to my development branch.
https://git.lighttpd.net/lighttpd/lighttpd1.4/src/branch/personal/gstrauss/master

Actions #8

Updated by gstrauss 2 months ago

If you are running into lighttpd idle timeouts, a solution instead of wstunnel.ping-interval may be to modify server.max-read-idle and server.max-write-idle inside $HTTP["url"] =~ "^/webchat" { ... }

Actions #9

Updated by gstrauss 2 months ago

  • Subject changed from [mod_wstunnel ] websockets wss wstunnel.ping-interval to [mod_wstunnel] websockets wss wstunnel.ping-interval
  • Description updated (diff)
Actions #10

Updated by jushou 2 months ago

Is the client responding to the websocket ping? yes
Lines 97 and 98 in the error.log indicate the receipt of a pong message.

Actions #11

Updated by gstrauss 2 months ago

Ok. I'll have to try to reproduce this tomorrow. I am out of time for today, sorry.

Actions #12

Updated by gstrauss 2 months ago

Please test with the patch at the tip of
https://git.lighttpd.net/lighttpd/lighttpd1.4/src/branch/personal/gstrauss/master
(instead of the patch posted above)

Actions #13

Updated by gstrauss 2 months ago

The patch on my dev branch should work better for you. The patch above using hctx->gw.write_ts did not because hctx->gw.write_ts is not updated for PONG as no data was sent to the backend.

Actions #15

Updated by gstrauss 2 months ago

Great. Thank you for testing.

This patch will be in the next release of lighttpd, which will probably be released some time in Q2/2024.

In the meantime, as I posted further above:

If you are running into lighttpd idle timeouts, a solution instead of wstunnel.ping-interval may be to modify server.max-read-idle and server.max-write-idle inside $HTTP["url"] =~ "^/webchat" { ... }

Actions #16

Updated by gstrauss 2 months ago

  • Status changed from Patch Pending to Fixed
Actions

Also available in: Atom