Balancing » History » Revision 3
Revision 2 (nitrox, 2011-01-22 13:16) → Revision 3/4 (nitrox, 2012-08-11 10:42)
h1. Balancing The goal is to provide a [[mod_balance|generic interface to balance between multiple backends]]; each balancer acts like a backend itself. A balancer is an @action@, which selects a backend from a list (and executes it via @action_enter@); if the backend fails due to specific reasons (overload/timeout), the balancer gets called again (@ActionBackendFail callback@). Now, what can go wrong with backends a balancer is interested in? * backend process died * backend process cannot be spawned * backend overloaded * connect() timeout * connect() reset (no backend listening, i.e. process down) There are more problems, like connection dropping after data was sent; but it should not be the job of the balancer to "fix" such problems. The above problems can be classified in two categories: * timeout/overloaded * backend down The spawn problems like restarting after the process died should be handled in another place. As balancer can be stacked like other actions, at most one balancer (in a path in the "actions-tree") should have a backlog-queue. Some ideas for handling the backlog-queue: * a backend has the states: "alive", "overloaded", "down", "down-retry" * queue has the states: "alive", "overloaded", "down" * if not all backends are "down" or "down-retry" stay/goto "alive" * if "alive" and all backends are "down" or "down-retry", goto "overloaded" and start a small timeout * if "overloaded" and overload-timeout is reached, goto "down" * while "down" and all backends "down", return "503 Service Unavailable" for all requests * "overloaded" backends are tried again (switched to "alive") after a small timeout (e.g. 3 seconds) or when another request from that backend gets completed. * "down" backends are tried again (switched to "down-retry") after a small timeout (e.g. 1 seconds) * the queue has a limit, after it return "503 Service Unavailable" * requests have timeouts for finding a backend, return "504 Gateway Timeout" after it