Bug #1519
closed1 of N backends defect -> 500 error to some clients
Description
I am running lighttpd on a central server utilizing a few php fastcgi backend machines.
Everything is working fine until..: It seems that lighttpd mistakenly replies with an internal server error (500) to some clients, when just one backend is down (of course it periodically tries to connect to that one again, the error ~seems to occur when it actually tries to connect) instead of redirecting every client to the other working backends (the remaining ones are well loaded but not overloaded, though).
If there is no configuration switch to enable clean failover, i consider this a bug.
-- daniel
Updated by stbuehler about 17 years ago
Please give some error.log messages which could indicate why the backend failed.
If it failed after a successful connect (and possibly after some data was sent), the request will not be redirected to another backend but fail instead.
Updated by Anonymous about 17 years ago
I had a deeper look into the problem - it's a little different from my first entry.
The backend defects because of a failing NFS, from then on all requests to the backend are marked "died" in the statistic counters. The socket connection to the fastcgi backend still exist in that case, but lighttpd recognized that the backend is dead, too. This is the situation the error comes up.
fastcgi.backend.b6.0.connected: 226767 fastcgi.backend.b6.0.died: 494
So the backend is not down but defect, issuing an internal server error to the clients that are assigned to it. The error log periodically shows:
2008-01-13 16:26:58: (mod_fastcgi.c.2703) fcgi-server re-enabled: tcp:192.168.0.52:1025 2008-01-13 16:27:23: (mod_fastcgi.c.2757) establishing connection failed: Connection timed out socket: tcp:192.168.0.52:1025 2008-01-13 16:27:23: (mod_fastcgi.c.2757) establishing connection failed: Connection timed out socket: tcp:192.168.0.52:1025 2008-01-13 16:27:24: (mod_fastcgi.c.2757) establishing connection failed: Connection timed out socket: tcp:192.168.0.52:1025 2008-01-13 16:27:24: (mod_fastcgi.c.2757) establishing connection failed: Connection timed out socket: tcp:192.168.0.52:1025 2008-01-13 16:27:24: (mod_fastcgi.c.2757) establishing connection failed: Connection timed out socket: tcp:192.168.0.52:1025
As we know it is in NFS's nature to fail from time to time and in this case it seems that lighttpd is not able to deal with that.
FastCGI configuration is as follows:
fastcgi.server = ( ".php" => ( "local" => ( "socket" => "/tmp/php-fastcgi.sock", "disable-time" => 2 ) "b1" => ( "host" => "192.168.0.50", "port" => 1025, "disable-time" => 10 ), "b2" => ( "host" => "192.168.0.51", "port" => 1025, "disable-time" => 10 ), ... ))
-- daniel
Updated by Anonymous about 17 years ago
So far as I'm into this problem now, I think it would be a solution to check the established connection first (e.g. first send a pseudo-(duplicate-)request to the reenabled backend that actually is handled by a neighbour backend and check if it times out?) before assigning any more requests to it. I tried to increase the disable-time setting, but this is not a satisfiable workaround.
Currently lighttpd seems to mark the backend as functionall when the connection can be established without checking if it really works (means: does not produce a timeout).
In fact I assume the solution is not that easy but I hope you find a solution, though. As I said...unhappily medium NFS setups like mine are kinda fragile.
-- daniel
Updated by stbuehler about 17 years ago
- The connection is not established, as the log says. So no way to send dummy packets (and i don't think the protocol knows such things).
- "establishing connection failed" should not trigger a 500 error, it should just restart the request on another backend.
- mod_fastcgi gives a 500 for (search for 500 in mod_fastcgi.c):
- No active backend found (line ~3100). No log message.
- A request failed on 5 backends, for example after 5 "establishing connection failed" as you have (line ~3290). No log message.
- If backend dies after sending data to it. Log: "response not received, request sent:"
- If there is no active backend on request start. Log: "all handlers for ... are down."
- If your backend does not work anymore it should exit - if it does not exit, the backend has a bug. Your backend-manager (whatever it is) should then restart the backend.
Apart from not giving a log message I see no bugs in mod_fastcgi here.
Updated by stbuehler about 17 years ago
- Status changed from New to Fixed
- Resolution set to invalid
Also available in: Atom