Lighty 1.4.20 leaks memory and stability problem with backend

Added by jiwei over 15 years ago.

I'm running lighttpd-1.4.20 and PHP 5.2.4 (cgi-fcgi) with PHP scripts connecting to other webservers. Those other webservers become non-responsive and causing Lighty to time-out after 360 seconds and receiving partial content. Lighty grows about 20MB a day.

Because I run those applications on a VPS with limited RAM, I cannot attach Valgrind to the process. The traffic pattern is unpredictable. But I strongly suspect the partial content with time-out was the cause.

So I wrote a simple test program (lighty-leaky.php, attached) which sleeps 400 seconds before completing the transaction. This simulates the situation cited above. Traffic is simulated with

   ab -n 100 -c 80 -k http://f8/lighty-leaky.php?loop=10800
   ab -n 100 -c 50 http://f8/lighty-leaky.php?loop=10800

The 'loop' control the size of the partial content. Lighty growed close to 7MB after a few tries.
% top -bn 1 | grep lighttpd

11493 apache    20   0  5784  988  624 S    0  0.2   0:00.00 lighttpd
11493 apache    20   0 12508 7932  832 S    0  1.5   0:02.28 lighttpd

And the following error log occurred
2008-12-19 00:41:44: (server.c.1247) NOTE: a request for /lighty-leaky.php?loop=10800 timed out after writing 2758099 bytes.  We waited 20 seconds. If this a problem increase server.max-write-idle

2008-12-19 00:42:33: (mod_fastcgi.c.2926) backend is overloaded; we'll disable it for 2 seconds and send the request to another backend instead: reconnects: 0 load: 130
2008-12-19 00:42:33: (mod_fastcgi.c.3568) all handlers for /lighty-leaky.php on .php are down.
2008-12-19 00:42:36: (mod_fastcgi.c.2681) fcgi-server re-enabled:  0 /var/run/lighttpd/php-fastcgi.socket

The first log recurred multiple times, presumably corresponds to the number of requests. I set server.max-write-idle = 20 in configuration to shorten the wait.

After that, Lighty never managed to reconnect to the backend. Requests return 500 error.

I'm using "mod_rewrite", "mod_access", "mod_auth", "mod_status", "mod_setenv", "mod_fastcgi", "mod_simple_vhost", "mod_compress", "mod_expire", "mod_rrdtool", "mod_accesslog" . My FastCGI configuration

fastcgi.server = ( ".php" =>
                   ( "localhost" =>
                       "socket" => "/var/run/lighttpd/php-fastcgi.socket",
                       "bin-path" => "/usr/bin/php-cgi",
                       "bin-environment" =>
                        "PHP_FCGI_CHILDREN" => "32",
                        "PHP_FCGI_MAX_REQUESTS" => "4000" 
                       "bin-copy-environment" => ( "PATH", "SHELL", "USER" ),
                       "min-procs" => 1,
                       "max-procs" => 1,
                       "max-load-per-proc" => 8,
                       "idle-timeout" => 50,
                       "broken-scriptfilename" => "enable" 

Updated by jiwei over 15 years ago

Lighttpd did manage to reconnect to the backend php-cgi after a few hours (could be shorter), but not in 2 seconds as claimed in the log.

Updated by stbuehler almost 15 years ago

Memory leak? prove it with valgrind (there are other reasons for growing memory usage... including buffer reusing and fragmentation).

Updated by stbuehler almost 15 years ago

I think the "all backends" down bug is fixed, see r2657 and #1825


