Bug #2121: fastcgi failover not working - Lighttpd - lighty labs

Actions

Copy link

Bug #2121

closed

fastcgi failover not working

Added by mm over 14 years ago. Updated about 14 years ago.

Status:

Missing Feedback

Priority:

Normal

Category:

Target version:

ASK QUESTIONS IN Forums:

Description

The failover of FastCGI is not working in 1.4.25.

Here is extract of my configuration file:

fastcgi.server = ( ".php" =>
(
(
"host" => "192.168.0.10",
"port" => 9000
),
( "host" => "192.168.0.11",
"port" => 9000
)
)
)

Fastcgi is started via spawn-fcgi, the behaiviour is like this:

1.) both servers have php-fastcgi running - everything works well, requests are correctly processed
2.) one server has fastcgi not running - first request is processed from the second server, all next requests to the first backend while the failed backend is disabled have 503 errors, after it is re-enabled, the first request is processed, and again all next requests have errors
3.) both servers have fastcgi disabled - 500 or 503 is returned, correct behaiviour

I have reproduced this problem on OpenSolaris and on FreeBSD.

Log file is attached.

I have done some debugging and found out that the reason for "connection was dropped after accept() (perhaps the fastcgi process died)" is ENOTCONN

Files

lighttpd.log (2.6 KB) lighttpd.log

Log output

mm, 2009-12-16 12:03

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by stbuehler over 14 years ago

Ok, the balance problems first: I just tried it locally, killing and restarting two php backends, and lighty always found the working one (after disable-time triggered, which is 1 second by default). So i think that part works as it should, and i guess your problems have something to do with the second part.
Yes, i know the errno is ENOTCONN. And i think i'm gonna blame the operating system - i don't think it is our fault (connect was successful, so we are connected, and ENOTCONN doesn't make sense).
If you can reproduce it without high load (I only got reports of this problem from sites with many php requests), it would be nice if you could provide a ktrace of the syscalls.

Actions

Copy link

Updated by stbuehler over 14 years ago

Status changed from New to Missing Feedback
Target version deleted (~~1.4.26~~)

Actions

Copy link

Updated by mm about 14 years ago

It seems to be related to the TCP_NODELAY option as well.
OpenSolaris does set this in a patch to lighttpd and in FreeBSD it is a kernel tunable:
net.inet.tcp.delayed_ack
net.inet.tcp.delacktime

Turning off improves the situation, but there are still some requests (but very much less than before) that are not passed through. On the contrary, running lighttpd on the target FastCGI servers and sending requests via mod_proxy works without any requests dropped.

Another problem I noticed - lighttpd does check for the existence of local files even if a request should be forwarded to a remote FastCGI server.

Actions

Copy link