Project

General

Profile

Actions

Balancing » History » Revision 2

« Previous | Revision 2/4 (diff) | Next »
nitrox, 2011-01-22 13:16


Balancing

The goal is to provide a generic interface to balance between multiple backends; each balancer acts like a backend itself.

A balancer is an action, which selects a backend from a list (and executes it via action_enter); if the backend fails due to specific reasons (overload/timeout), the balancer
gets called again (ActionBackendFail callback).

Now, what can go wrong with backends a balancer is interested in?
  • backend process died
  • backend process cannot be spawned
  • backend overloaded
  • connect() timeout
  • connect() reset (no backend listening, i.e. process down)

There are more problems, like connection dropping after data was sent; but it should not be the job of the balancer to "fix" such problems.

The above problems can be classified in two categories:
  • timeout/overloaded
  • backend down

The spawn problems like restarting after the process died should be handled in another place.

As balancer can be stacked like other actions, at most one balancer (in a path in the "actions-tree") should have a backlog-queue.

Some ideas for handling the backlog-queue:
  • a backend has the states: "alive", "overloaded", "down", "down-retry"
  • queue has the states: "alive", "overloaded", "down"
  • if not all backends are "down" or "down-retry" stay/goto "alive"
  • if "alive" and all backends are "down" or "down-retry", goto "overloaded" and start a small timeout
  • if "overloaded" and overload-timeout is reached, goto "down"
  • while "down" and all backends "down", return "503 Service Unavailable" for all requests
  • "overloaded" backends are tried again (switched to "alive") after a small timeout (e.g. 3 seconds) or when another request from that backend gets completed.
  • "down" backends are tried again (switched to "down-retry") after a small timeout (e.g. 1 seconds)
  • the queue has a limit, after it return "503 Service Unavailable"
  • requests have timeouts for finding a backend, return "504 Gateway Timeout" after it

Updated by nitrox almost 14 years ago · 4 revisions