Bug #2121
closedfastcgi failover not working
Description
The failover of FastCGI is not working in 1.4.25.
Here is extract of my configuration file:
fastcgi.server = ( ".php" =>
(
(
"host" => "192.168.0.10",
"port" => 9000
),
( "host" => "192.168.0.11",
"port" => 9000
)
)
)
Fastcgi is started via spawn-fcgi, the behaiviour is like this:
1.) both servers have php-fastcgi running - everything works well, requests are correctly processed
2.) one server has fastcgi not running - first request is processed from the second server, all next requests to the first backend while the failed backend is disabled have 503 errors, after it is re-enabled, the first request is processed, and again all next requests have errors
3.) both servers have fastcgi disabled - 500 or 503 is returned, correct behaiviour
I have reproduced this problem on OpenSolaris and on FreeBSD.
Log file is attached.
I have done some debugging and found out that the reason for "connection was dropped after accept() (perhaps the fastcgi process died)" is ENOTCONN
Files
Updated by stbuehler almost 15 years ago
- Ok, the balance problems first: I just tried it locally, killing and restarting two php backends, and lighty always found the working one (after disable-time triggered, which is 1 second by default). So i think that part works as it should, and i guess your problems have something to do with the second part.
- Yes, i know the errno is ENOTCONN. And i think i'm gonna blame the operating system - i don't think it is our fault (connect was successful, so we are connected, and ENOTCONN doesn't make sense).
If you can reproduce it without high load (I only got reports of this problem from sites with many php requests), it would be nice if you could provide a ktrace of the syscalls.
Updated by stbuehler almost 15 years ago
- Status changed from New to Missing Feedback
- Target version deleted (
1.4.26)
Updated by mm almost 15 years ago
It seems to be related to the TCP_NODELAY option as well.
OpenSolaris does set this in a patch to lighttpd and in FreeBSD it is a kernel tunable:
net.inet.tcp.delayed_ack
net.inet.tcp.delacktime
Turning off improves the situation, but there are still some requests (but very much less than before) that are not passed through. On the contrary, running lighttpd on the target FastCGI servers and sending requests via mod_proxy works without any requests dropped.
Another problem I noticed - lighttpd does check for the existence of local files even if a request should be forwarded to a remote FastCGI server.
Updated by stbuehler almost 15 years ago
There is an option to disable the local-file check.
Also available in: Atom