[Solved] failover balance option for fastcgi backends
Added by rohrschacht about 1 month ago
Hello,
I am using lighttpd to host a service using fastCGI. Unfortunately, the fastCGI script I am hosting has to run as a single process because we risk race conditions with the database that cannot be resolved otherwise. Since we experience bursts of many requests sometimes, I have a large listen-backlog configured so that every request will be handled eventually. My current configuration can be found below:
fastcgi.server = ( "/path/to.fcgi" => (( "socket" => "/path/to.socket1" + var.PID, "bin-path" => "/path/to/cgi/script.fcgi", "min-procs" => 1, "max-procs" => 1, "check-local" => "disable", "idle-timeout" => 20, "connect-timeout" => 300, "write-timeout" => 300, "read-timeout" => 300, "listen-backlog" => 21800 )), )
I am experiencing the problem that the fastCGI script crashes on some request. Again, something we are unable to fix, unfortunately. When this happens, lighttpd waits for a second to restart the script. In this time, many requests in the listen-backlog are given a 500 internal server error.
I would like to configure one or more failover instance(s) of the script that will be available immediately when the one handling requests crashes. It seems to me that the abandoned lighttpd 1.5 with mod_proxy_core had the option I need here: a static load balancer mode that does no load-balancing, only failover. I am unable to find something similar in the 1.4 documentation. I am caucious to increase the max-procs or configure additional backends for the script path because of the race condition problem. Is there a way to achieve this?
Replies (8)
RE: failover balance option for fastcgi backends - Added by gstrauss about 1 month ago
Wow. That description is a whole lot of Somebody else's problem
I am hosting has to run as a single process because we risk race conditions with the database that cannot be resolved otherwise.
Somebody can and should fix this. You should look into finding the right person to fix it.
I have a large listen-backlog configured so that every request will be handled eventually.
Such a large value risks livelock and starvation.
I am experiencing the problem that the fastCGI script crashes on some request. Again, something we are unable to fix, unfortunately.
Somebody can and should fix this. You should look into finding the right person to fix it. (Do you see a pattern yet?)
I would like to configure one or more failover instance(s) of the script that will be available immediately when the one handling requests crashes. It seems to me that the abandoned lighttpd 1.5 with mod_proxy_core had the option I need here: a static load balancer mode that does no load-balancing, only failover. I am unable to find something similar in the 1.4 documentation.
You are correct that lighttpd 1.4 does not have this feature. This is the first time I have heard a request for such a feature in over 9 years, and, frankly, your description of your problem suggests many, many, many, many other solutions.
I am caucious (sic) to increase the max-procs or configure additional backends for the script path because of the race condition problem.
Again, somebody can and should fix this.
Is there a way to achieve this?
Sound like your company is lacking a plethora of skills needed to fix this problem. My first suggestion would be to hire somebody competent who could cut through whatever red-tape is preventing the actual problems from being fixed.
Possible workarounds- interpose a service between lighttpd and your single-process, racy, prone-to-crashing FastCGI backend. The intermediate process could handle the crashing and restarting of your backend and your desired failover mode, which you seem to be oversimplifying as a non-programmer since you have specific failure cases in your head, which are not necessarily applicable generically.
- set up the FastCGI process to run independently from lighttpd, i.e. do not use
"bin-path"
so that lighttpd does not start the FastCGI process. If lighttpd does not start the FastCGI process, then lighttpd will wait"disable-time"
(default 1 sec, but configurable to 0) before trying to connect to the backend for new connection, but if the service is not available each of those connections will quickly return an error. You could use mod_magnet to turn that into 503 Service Unavailable and produce a web page with meta Retry-After. (If you do not understand any of this, then you should hire somebody who does.) - lower
"listen-backlog"
to a very low number, and as above, use mod_magnet to turn errors into 503 Service Unavailable and produce a web page with meta Retry-After.
First step: you should hire someone to press you on "why not?" for each of your "we're unable to fix this", and to find ways to fix those things.
RE: failover balance option for fastcgi backends - Added by gstrauss about 1 month ago
I should note that the ancient, dead, unreleased lighttpd 1.5 branch does not have the failover feature you think it has. If there are multiple processes available (e.g. "max-procs"
> 1) and lighttpd 1.4 fails to connect()
, then lighttpd 1.4 will also retry to connect to another backend proc. Your backend issues are preventing you from having multiple backend processes.
Modern lighttpd 1.4 is more advanced than the ancient, dead, unreleased lighttpd 1.5 branch.
[Edit] Correction: you pointed out that 1.5 has a "static" load balancing option which is no load balancing, just failover. lighttpd 1.4 currently does not provide this option.
RE: failover balance option for fastcgi backends - Added by rohrschacht about 1 month ago
Thank you very much for your suggestions.
I am surprised by the harsh tone I am hearing, I am not sure what I did to warrant it. I spent quite some time reading the documentation, found a setting for mod_proxy_core that seemed to have been a solution to my problem, and wanted to ask whether this can be achieved easily with lighttpd 1.4. The static load balancing mode was really all I was after, maybe I should have been more clear. I was trying to give more context without revealing too much company internals. I did this because I wanted to describe the bigger picture, and why I am attempting to find such a failover mode. I tried to provide enough context to avoid asking for an XY problem. This is because I read your forum rules before posting!
The reason we are unable to fix the backend is because the company bought this software a long time ago, and it is no longer supported. We do not have access to the source code and are not allowed to reverse-engineer it. This is a legal issue, not one that is within my or my colleagues' competence. The company is in the process of buying a replacement, but in the meantime we are tasked with keeping the software running smoothly. A lighttpd setting would have been a much easier solution than to introduce a new software or to implement a failover proxy on our own. I now understand that this has to be done.
Thanks again for the suggestions. You may consider this solved.
RE: failover balance option for fastcgi backends - Added by gstrauss about 1 month ago
Yes, you did some research. That I appreciate.
However, the tone you presented is "My company is largely helpless. Please do some work for me for free." On top of that, and the reason I responded with the tone I did, is that if your company is so helpless, why would I have any confidence that your company would be able to deploy a new version of lighttpd? Your post shared no indication of what your company is capable of doing as a solution -- you only presented all the things your company is unable to do. The explanation you included in your second post should have been included in your first post.
It is trivial for me to add a new balancing type to lighttpd 1.4 similar to the "static" option in the (dead and unreleased) lighttpd 1.5. What are you capable of doing with it?
RE: failover balance option for fastcgi backends - Added by rohrschacht about 1 month ago
It seems I have miscommunicated quite a bit then. Sorry, English is not my first language. I did not expect you to do any work. I was merely interested in the configuration options of the current 1.4. I thought that I may have missed something in the documentation, given that the 1.5 documentation stated such a balancing mode. I may have over-shared on our specific problem, leading you to read it as a request to solve the entire thing for me. This was not my intent.
I know that the situation we are in is weird. If hardly anybody else has use for this feature, it is questionable to have it in lighttpd. Then again, it seemed to have been a good idea during 1.5 development, so the call is really to be made by the people that have the work of implementing and supporting it. My current plan would be to write our own fastCGI "middleware" that listens for requests from lighttpd and handles the startup and failover of the other backend and just forwards requests and responses between the two.
RE: failover balance option for fastcgi backends - Added by gstrauss about 1 month ago
I was merely interested in the configuration options of the current 1.4. I thought that I may have missed something in the documentation
Nope. You did not miss something that is not there. The documentation is maintained.
If your question in your original post "Is there a way to achieve this?" really was "Is the documentation current?" then the answer is yes, the documentation is current. If it is not documented, the feature very likely does not currently exist.
I may have over-shared on our specific problem
I assure you that you did not.
You also have not yet answered the question I asked in my previous post:
It is trivial for me to add a new balancing type to lighttpd 1.4 similar to the "static" option in the (dead and unreleased) lighttpd 1.5. What are you capable of doing with it?
RE: failover balance option for fastcgi backends - Added by gstrauss about 1 month ago
I do not see this being added to lighttpd 1.4 since the behavior is more optimistic than determinstic. If connect()
fails, whether due to listen backlog filling up or connect()
timeout it might theoretically lead to a loop of retrying (and waiting for another timeout out). (lighttpd retries once for timeout, 5 times for other connect()
failure). That might be desirable to some users, but not other users. If the original host crashed and lighttpd detected that, then the next host would be used. In any case, a custom monitoring process in between lighttpd and the misbehaving backend is the best place to implement whatever retry policy is appropriate for the misbehaving backend, which at least in your case may include serialization of connections to your backend.
GW_BALANCE_STATIC
does not exist in the lighttpd 1.4 code. A few additional lines would be needed to expose it to lighttpd.conf.
--- a/src/gw_backend.c +++ b/src/gw_backend.c @@ -886,12 +886,16 @@ static gw_host * gw_host_get(request_st * const r, gw_extension *extension, int break; } case GW_BALANCE_RR: - { /* round robin */ + case GW_BALANCE_STATIC: + { /* GW_BALANCE_RR: round robin */ + /* GW_BALANCE_STATIC: no balancing; simple serial failover */ const gw_host *host = extension->hosts[0]; /* Use last_used_ndx from first host in list */ int k = extension->last_used_ndx; - ndx = k + 1; /* use next host after the last one */ + /* use next host after the last one if GW_BALANCE_RR */ + /* use same host as the last one if GW_BALANCE_STATIC */ + ndx = k + (balance == GW_BALANCE_RR); if (ndx < 0) ndx = 0; /* Search first active host after last_used_ndx */
RE: [Solved] failover balance option for fastcgi backends - Added by rohrschacht about 1 month ago
Thanks again for looking into this. I think we will go with the plan of implementing a small failover proxy ourselves. If we can make it generally applicable, we might publish it as open-source and notify this thread with the URL, if you'd like, along with a lighttpd configuration that showcases how to use it.