Project

General

Profile

[Solved] mod_proxy not re-enabling proxy with 1.4.48

Added by jens-maus about 6 years ago

Hi lighttpd users and developers,

I am seeking for some help regarding some problems I am currently facing in porting my application from currently perfectly using lighttpd 1.4.45 to lighttpd 1.4.48 where I am facing some issues with mod_proxy features that were working perfectly fine previously.

The mod_proxy configuration I am currently using is:

$HTTP["url"] !~ "^/(config/)|(upnp/)|(webui/)|(ise/)|(api/)|(tools/)|(pda)|(pages/jpages)|(addons).*" {
  proxy.server = ("/" => (( "host" => "127.0.0.1", "port" => 8183 )))
}

As you can see I am redirecting certain URLs to a different port (8183) where an application is usually serving HTTP requests itself.

During bootup of the embedded system using this lighttpd and port 8183 application lighttpd is started up directly after the network had been setup and serves special error pages noticing the user during bootup that the application usually listening on port 8183 is not yet started up. And as soon as this special application was started lighttpd usually starts to forward any http or https requests to port 8183 according to the above proxy.server definition. As said, this is working perfectly with lighttpd 1.4.45 and the logfile shows the following during startup:

2018-03-03 10:45:45: (log.c.217) server started 
2018-03-03 10:46:13: (mod_proxy.c.1041) establishing connection failed: Connection refused 
2018-03-03 10:46:13: (mod_proxy.c.1153) proxy-server disabled: 127.0.0.1 8183 13 
2018-03-03 10:46:13: (mod_proxy.c.523) no proxy-handler found for: / 
2018-03-03 10:46:14: (mod_proxy.c.1041) establishing connection failed: Connection refused 
2018-03-03 10:46:14: (mod_proxy.c.1153) proxy-server disabled: 127.0.0.1 8183 14 
2018-03-03 10:46:14: (mod_proxy.c.523) no proxy-handler found for: /tclrega.exe 
2018-03-03 10:46:17: (mod_proxy.c.523) no proxy-handler found for: /tclrega.exe 
2018-03-03 10:46:18: (mod_proxy.c.1417) proxy - re-enabled: 127.0.0.1 8183 
2018-03-03 10:46:19: (mod_proxy.c.1417) proxy - re-enabled: 127.0.0.1 8183 
2018-03-03 10:46:20: (mod_proxy.c.1041) establishing connection failed: Connection refused 
2018-03-03 10:46:20: (mod_proxy.c.1153) proxy-server disabled: 127.0.0.1 8183 16 
2018-03-03 10:46:20: (mod_proxy.c.523) no proxy-handler found for: /tclrega.exe 
2018-03-03 10:46:23: (mod_proxy.c.523) no proxy-handler found for: /tclrega.exe 
2018-03-03 10:46:25: (mod_proxy.c.1417) proxy - re-enabled: 127.0.0.1 8183

As can be seen, lighttpd starts successfully but recognises that nobody is listening on port 8183 (thus the "Connection refused" message) and thus disables the proxy processes for the time being. And then as soon as something on port 8183 starts to listen it successfully re-enables the proxy processes.

With lighttpd 1.4.48, however, this doesn't seem to work anymore and lighttpd continues to serve the error page notifying usually that the port 8183 application is still not started up correctly. The logfile shows the following:

2018-03-03 20:09:35: (server.c.1412) server started (lighttpd/1.4.48) 
2018-03-03 20:10:24: (gw_backend.c.933) gw - found a host 127.0.0.1 8183 
2018-03-03 20:10:24: (gw_backend.c.972) connect delayed; will continue later: tcp:127.0.0.1:8183 
2018-03-03 20:10:25: (gw_backend.c.243) establishing connection failed: Connection refused socket: tcp:127.0.0.1:8183 
2018-03-03 20:10:25: (gw_backend.c.308) released proc: pid: 0 socket: tcp:127.0.0.1:8183 load: 0 
2018-03-03 20:10:25: (gw_backend.c.958) all handlers for /? on / are down. 
2018-03-03 20:10:32: (gw_backend.c.933) gw - found a host 127.0.0.1 8183 
2018-03-03 20:10:32: (gw_backend.c.972) connect delayed; will continue later: tcp:127.0.0.1:8183 
2018-03-03 20:10:32: (gw_backend.c.243) establishing connection failed: Connection refused socket: tcp:127.0.0.1:8183 
2018-03-03 20:10:32: (gw_backend.c.308) released proc: pid: 0 socket: tcp:127.0.0.1:8183 load: 0 
2018-03-03 20:10:32: (gw_backend.c.958) all handlers for /tclrega.exe? on / are down. 
2018-03-03 20:10:36: (gw_backend.c.933) gw - found a host 127.0.0.1 8183 
2018-03-03 20:10:36: (gw_backend.c.972) connect delayed; will continue later: tcp:127.0.0.1:8183 
2018-03-03 20:10:36: (gw_backend.c.243) establishing connection failed: Connection refused socket: tcp:127.0.0.1:8183 
2018-03-03 20:10:36: (gw_backend.c.308) released proc: pid: 0 socket: tcp:127.0.0.1:8183 load: 0 
2018-03-03 20:10:36: (gw_backend.c.958) all handlers for /? on / are down. 
2018-03-03 20:10:36: (gw_backend.c.933) gw - found a host 127.0.0.1 8183 
2018-03-03 20:10:36: (gw_backend.c.972) connect delayed; will continue later: tcp:127.0.0.1:8183 
2018-03-03 20:10:36: (gw_backend.c.243) establishing connection failed: Connection refused socket: tcp:127.0.0.1:8183 
2018-03-03 20:10:36: (gw_backend.c.308) released proc: pid: 0 socket: tcp:127.0.0.1:8183 load: 0 
2018-03-03 20:10:36: (gw_backend.c.958) all handlers for /tclrega.exe? on / are down. 

As can be seen, also lighttpd 1.4.48 recognises that nothing is listening on port 8183 (connection refused) and then disables the proxy (which actually is a spinoff of gw_backend). However, the proxy is never re-enabled regardless of the fact that the port 8183 application is successfully waiting for requests. Only forcing a reload of the lighttpd configuration (sending a -HUP signal to lighttpd) will recover this issue and lighttpd then successfully starts forwarding requests to the port 8183 application.

Thus, I am really wondering what might have changed between lighttpd 1.4.45 and 1.4.48 as this is currently a blocker for my application updating lighttpd to a newer version. Is there a config setting which might be required or does this potentially point at a new regression bug in lighttpd due to changes regarding mod_proxy -> gw_backend changes?

Any help is highly appreciated.

Jens


Replies (7)

RE: mod_proxy not re-enabling proxy with 1.4.48 - Added by gstrauss about 6 years ago

From the logs you show for 1.4.45, the logs never get to the point of showing a successful connection. Please verify that you are seeing the desired behavior with 1.4.45 as the logs you shared cut off before a successful backend connection is made.

From the logs you show for 1.4.48, it looks like lighttpd is retrying the socket connection. If you strace lighttpd, is that not the case?
Please verify that your backend is indeed running on 127.0.0.1:8183, e.g. look at output of netstat -lnt My first guess is that with your new setup, something is not actually starting your backend on 127.0.0.1:8183, or it is for some reason not reachable. While on the embedded system, try telnet 127.0.0.1 8183 and make sure you can connect. (Then Ctrl-] and quit to quit the telnet)

As an aside: since you're not using the captures, this regex is slightly more efficient:

$HTTP["url"] !~ "^/config/|upnp/|webui/|ise/|api/|tools/|pda|pages/jpages|addons" { ... }

RE: mod_proxy not re-enabling proxy with 1.4.48 - Added by jens-maus about 6 years ago

Thanks for your reply. Please note, however, that the above logs are the content of lighttpd-error.log and not lighttpd-access.log. Therefore you can't of course see the point where the actual connection to the application on 8183 succeeds (with 1.4.45 in place) due to the backend successfully started.

Also note, that I verified that the backend starts up correctly and listens on port 8183 when lighttpd is running (checked with netstat and telnet). So, yes, the new setup is perfectly starting the backend on :8183, but lighttpd 1.4.48 seems to have already shut down all proxy processes without re-enabling them like 1.4.45 is actually doing. In fact, I verified that several times and also by simply downgrading the lighttpd version 1.4.48 to 1.4.45 in the exact same setup and with 1.4.45 it perfectly picks up the backend start at :8183 and 1.4.48 unfortunately doesn't re-enable the proxy processes and thus doesn't recognise that something is listening at port :8183 after a while and thus it never forwards requests to :8183 even thought the backend was started and is in listen mode. And as said, as soon as I send a -HUP signal to the lighttpd process to reinitialise lighttpd starts to correctly forward http requests to the listening backend.

RE: mod_proxy not re-enabling proxy with 1.4.48 - Added by gstrauss about 6 years ago

When I made a simple test to try to reproduce this in 1.4.48, things worked fine for me.

lighttpd 1.4.48 reports "connect delayed; will continue later: tcp:127.0.0.1:8183", which is an indication that lighttpd 1.4.48 has re-enabled the backend and is making an attempt to connect to the backend. Are you not seeing a connect() call in strace of the lighttpd 1.4.48 server? Does your backend only allow one connection at a time?

Have you faithfully supplied the lighttpd config you are using? (I know that you have not done so because your error log shows debug output, which is not enabled by default.) From what you have provided, lighttpd is not starting the backend. As I have said, I have tested a simple scenario where an external process starts the backend, and, as I expected, lighttpd detects when that backend is available, so please try to provide some more information about your configuration and how to reproduce this problem.

Based on the timestamps in the error logs provided, please check that you are running a single instance of lighttpd, that it has been started using the config file you expect, and that there is a single process running, not multiple workers (while you troubleshoot).

BTW, sending lighttpd a SIGHUP does not cause it to re-read configuration files. It currently only cycles log files, so maybe things are working and you do not realize it since you are confused about the log files. Please see documentation about signals in lighttpd InstallFromSource

RE: mod_proxy not re-enabling proxy with 1.4.48 - Added by jens-maus about 6 years ago

You are right that I was actually not verbose enough. Sorry for that. So let me please elaborate a bit more on the issue to make it more obvious to you that lighttpd 1.4.48 actually really behaves different than 1.4.45 in respect to reenabling proxy connections.

After your comments I now tried to reproduce the problem on my Linux x86 development system by once compiling lighttpd 1.4.45 and 1.4.48 from scratch and applying a stripped down configuration that actually demonstrates the problem. So here is the test configuration I am using:

var.log_root    = "/tmp" 
var.state_dir   = "/tmp" 
var.home_dir    = "/tmp" 
var.server_root = "/tmp" 
server.modules = (
  "mod_proxy" 
)
server.port = 8080
$SERVER["socket"] == "[::]:8080" {
  server.document-root = server_root
}
server.document-root = server_root
server.pid-file = state_dir + "/lighttpd.pid" 
server.max-worker = 1
server.errorlog = log_root + "/lighttpd-error.log" 
$HTTP["url"] !~ "^/(config/)|(upnp/)|(webui/)|(ise/)|(api/)|(tools/)|(pda)|(pages/jpages)|(addons).*" {
  proxy.server = ("/" => (( "host" => "127.0.0.1", "port" => 8183 )))
  proxy.debug = 65535
}

With this standalone lighttpd.conf configuration I am able to perfectly reproduce the problem I was trying to explain above.

So please try the following procedure to reproduce this issue on your side:

  1. clone the lighttpd git repository
  2. checkout the "lighttpd-1.4.45" tag
  3. compile lighttpd (after autogen.sh, configure, make, etc.)
  4. place the lighttpd.conf with the above content in the top-level of the checked out lighttpd git repository
  5. start lighttpd 1.4.45 with the following command:
    src/lighttpd -f lighttpd.conf -m src/.libs -D
    
  6. try to connect to port 8080 using a web browser
  7. notice that the web browser will return a 503 - Service Not Available error because nothing is listening on port 127.0.0.1:8183 yet.
  8. use the following netcat command on the same system like you started lighttpd to start listen on port 8183 for forwarded http requests:
    netcat -l -p 8183
    
  9. retry to connect to port 8080 using your web browser and notice that netcat is correctly showing the connection request in its console output.
  10. have a look at the /tmp/lighttpd-error.log and notice the line (mod_proxy.c.1417) proxy - re-enabled: 127.0.0.1 8183 suggesting that lighttpd 1.4.45 is actually re-enabling the proxy as soon as netcat is successfully started to listen at port 8183 and the web browser is refreshed to connect to port 8080.

Now retry the above sequence with lighttpd 1.4.48 and you should be able to notice that step 9 and 10 are actually showing a completely different behaviour. In fact, no matter if netcat is listening at port 8183 or not lighttpd is never forwarding requests to the defined proxy port 8183 and the web browser is thus always showing a 503 - Service Not Available error message. Only if lighttpd is started AFTER netcat it correctly picks up that at port 8183 there is actually something listening and thus correctly forwards http requests to it from the beginning. In fact, even in this scenario you can stop netcat and refresh your web browser to reconnect to port 8080 and then start netcat again and then you will notice that lighttpd 1.4.48 will also not pickup that netcat was restarted and thus never re-enables the proxy functionality.

Hopefully this makes it more clear now and independent of my specific application for which I previously tried to explain the misbehaviour.

Any idea how to fix this issue in lighttpd or is there some new configuration option that might be able to bring back the old behaviour?

RE: mod_proxy not re-enabling proxy with 1.4.48 - Added by gstrauss about 6 years ago

The simpler config, the better:

server.port = 8080
server.document-root = "/tmp"  # (not a good idea except on private machines for testing purposes)
server.modules = ( "mod_proxy" )
proxy.server = ("/" => (( "host" => "127.0.0.1", "port" => 8183 )))
proxy.debug = 65535

Instead of running netcat for your backend, try run a web server. Things work fine for me if I start up a second instance of lighttpd listening on 127.0.0.1:8183 after I start up the proxying lighttpd with the minimal config above.

Try simplifying your config more and (temporarily) removing the URL tests on what you are proxying. After you get the proxying working, you probably should look at that regex more closely and test that it does what you think it should be doing.

RE: mod_proxy not re-enabling proxy with 1.4.48 - Added by jens-maus about 6 years ago

Thanks once more for continuing investigating the issue with me and thanks for the much simpler config. In fact, with your simple config it even works with netcat listening on port 8183 and lighttpd 1.4.48. However, I was playing a around a bit comparing our two simpler configurations and I think I found the config option that causes the different behaviour with 1.4.45 compare to 1.4.48. Have a look at that config:

server.port = 8080
server.document-root = "/tmp"  # (not a good idea except on private machines for testing purposes)
server.modules = ( "mod_proxy" )
proxy.server = ("/" => (( "host" => "127.0.0.1", "port" => 8183 )))
proxy.debug = 65535
server.max-worker = 1

Please note the server.max-worker = 1 at the very end of the config file. As soon as you add this line my above reported behaviour occurs. If I remove it or set it to 0 the re-enabling of the forward proxy starts working fine again. However, in my productive environment I was always using server.max-worker = 4 to actually start 4 workers waiting for connections. I can of course remove that it get it working again, but I would like to understand why setting server.max-worker to something different like 0 in 1.4.48 results in the noticed behaviour? Is this actually something that is intended or does this point at a potential bug in lighttpd?

RE: mod_proxy not re-enabling proxy with 1.4.48 - Added by gstrauss about 6 years ago

That is unfortantely a bug, introduced in lighttpd 1.4.46 with commit 9030cfaec. That commit moved backend restarts to the master lighttpd process when server.max-worker was non-zero. (Otherwise, there was the potential for bad behavior if multiple workers attempted to restart a backend, especially on a unix domain socket.)

Here's a patch, which will be committed to lighttpd git master shortly. Thanks for your persistence in report and reproducing it.

--- a/src/gw_backend.c
+++ b/src/gw_backend.c
@@ -2488,17 +2488,31 @@ static void gw_handle_trigger_exts(server *srv, gw_exts *exts, int debug) {
     }
 }

+static void gw_handle_trigger_exts_wkr(server *srv, gw_exts *exts) {
+    for (size_t j = 0; j < exts->used; ++j) {
+        gw_extension * const ex = exts->exts[j];
+        for (size_t n = 0; n < ex->used; ++n) {
+            gw_host * const host = ex->hosts[n];
+            for (gw_proc *proc = host->first; proc; proc = proc->next) {
+                if (proc->state == PROC_STATE_OVERLOADED)
+                    gw_proc_check_enable(srv, host, proc);
+            }
+        }
+    }
+}
+
 handler_t gw_handle_trigger(server *srv, void *p_d) {
     gw_plugin_data *p = p_d;
-    if (0 != srv->srvconf.max_worker && p->srv_pid != srv->pid)
-        return HANDLER_GO_ON;
+    int wkr = (0 != srv->srvconf.max_worker && p->srv_pid != srv->pid);

     for (size_t i = 0; i < srv->config_context->used; i++) {
         gw_plugin_config *conf = p->config_storage[i];
         gw_exts *exts = conf->exts;
         int debug = conf->debug ? conf->debug : p->config_storage[0]->debug;
         if (NULL == exts) continue;
-        gw_handle_trigger_exts(srv, exts, debug);
+        wkr
+          ? gw_handle_trigger_exts_wkr(srv, exts)
+          : gw_handle_trigger_exts(srv, exts, debug);
     }

     return HANDLER_GO_ON;

As an aside, multiple workers on an embedded system is very likely unnecessary. lighttpd is very fast and so server.max-worker = 0 is recommended unless on very large systems in some high-traffic situations, e.g. where the time accept()ing new connections in the kernel is a bottleneck.

    (1-7/7)