Project

General

Profile

cgi Children forking children?

Added by Morgon over 14 years ago

Odd name for a thread, but I'm having a horrible time with a high-perf install.
After about 10 minutes or so (which is about 30,000+ requests), the output of pstree shows that some of the initial php-cgi children decided to spawn children of their own, eventually turning the system CPU-bound. Sometimes it doesn't even take 10 minutes to choke itself.

The relevant parts of the lighttpd.conf are attached. I'm sure this is going to look odd, and feel free to make additional suggestions, but since I have to be able to handle an unspecified number (~500+) of concurrent requests that generate dynamic images, these settings have generally been very good to me.
As far as I can tell, the actual PHP scripts handle fine (debugging a request show very little PHP execution time, even when it takes 60+ seconds to get it served by lighty).

Box is a dual dual-core with 4GB RAM.

I'm also seeing a lot of "backend is overloaded" in the error logs -- which part of the config actually controls that? I will make sure to set whatever I need to prevent that from happening; it should never occur :)

I was running on 1.4.20 when this happened, but just upgraded to 1.4.23 in hopes of seeing some improvement (it has, but this spawning issue is killing me).
Any insight very much appreciated!


server.stat-cache-engine = "fam"

server.max-keep-alive-requests = 0
server.max-write-idle = 35
server.max-read-idle = 10
server.max-fds = 32000
server.max-connections=750
server.max-worker = 8

....
$SERVER["socket"] == "ip:80" { ## general config stuff removed

fastcgi.server = ( "/" =>
(
"local" => (
"socket" => "/tmp/php.socket" + var.PID,
"bin-path" => "/usr/bin/php-cgi",
"broken-scriptfilename" => "enable",
"docroot" => "/my/path/to/handler.php",
"check-local" => "disable",
"max-procs" => 4,
"bin-environment" => ( "PHP_FCGI_CHILDREN" => "75", "PHP_FCGI_MAX_REQUESTS" => "1024"),
"bin-copy-environment" => ( "PATH", "SHELL", "USER" )
)
)
)
}

$SERVER["socket"] == "ip:443" { ## similar config. handler is for '.php' in this case
##max-procs is 3, CHILDREN is 50, MAX_REQUESTS is 2048
}

lighty-pstree-before.txt (16 KB) lighty-pstree-before.txt pstree output soon after launch
lighty-pstree-after.txt (55.3 KB) lighty-pstree-after.txt pstree output at about 10 minutes

Replies (12)

RE: cgi Children forking children? - Added by nitrox over 14 years ago

We don´t care as long as you use the server.max-worker setting.

RE: cgi Children forking children? - Added by Olaf-van-der-Spek over 14 years ago

8 * 4 * 75 = 2400
4096 / 2400 = 1.7 mbyte
I think your number of PHP engines is way too high. Since you've got only 4 CPU cores, 32 or 64 PHP engines sounds reasonable, not 2400...

RE: cgi Children forking children? - Added by Morgon over 14 years ago

Hi Olaf,

Unfortunately, as I mentioned before, the nature of the application requires a large number of processes on-hand to deal with spikes.
Since this initial post, I brought it down to 4 * 5 * 35 (700), and I'm still running into issues. Too low, and I get idle CPU but non-responsive lighty due to so many unhandled requests.

Even removing the max-worker continues to bring issues, and unfortunately I can't seem to tell the difference between running too many processes, and hitting 100% CPU just trying to handle all of the incoming connections. Is there a good way to find this out? For example, looking at the Server Status page .. out of 588 simultaneous connections, 546 are in the 'handle request' state. Is that lighty 'handling' the connection, or PHP doing work? I haven't found any documentation that specifically clarifies this.

nitrox:
Just to make sure I'm reading this right, are you simply saying that you refuse to help because max-worker is turned on? Making sure I'm not misreading.
If so, I am still having this problem when removing max-worker, so let's run with that. ;)

RE: cgi Children forking children? - Added by nitrox over 14 years ago

I hope you made sure in what cases max-worker might help. Otherwise you´ve read it right. Search for it on the wiki, read first sentence (the bold one).

State "h" means lighty was sending request to the backend is waiting for it to responde. So i´d have a look at php or e.g. mysql or whatever is behind lighty. And btw. you can´t trust the numbers from your status-page with max-worker.

RE: cgi Children forking children? - Added by Morgon over 14 years ago

nitrox wrote:

I hope you made sure in what cases max-worker might help.

Yep - "This is usually only needed on servers which are fairly loaded and the network handler calls delay often (e.g. new requests are not handled instantaneously).", which is precisely what I have. I've actually been running with max-worker for years with great success and performance, that's why I'm troubled by this new problem.

But you're right, it could definitely be the backend, as well. While I do want to go back and install an older version of PHP to see if it's 5.1 giving me issues, I wanted to bring this up here in case I could get some decent clarification on why the settings do what they do.

For instance, running with max-procs 1 / FCGI_CHILDREN 128 / MAX_CONNECTIONS 2048, lighty spawns only 5 php-cgi children. Simply increasing max-procs to 2 mysteriously balloons this number to 135.

However, to me right now, the most important question is why the processes are being forked in the way they are (see original post attachments) - this is not happening with any of the other machines my application runs on, and they do have the same backend code (so I don't think it's my PHP app). Any insight as to how to even interpret this?

RE: cgi Children forking children? - Added by icy over 14 years ago

The process tree indeed looks weird. Haven't seen something like that before.
And yea, 2400 Children is a huge number, too huge. You mentioned you generate images on the fly. That can take up enourmous amounts of CPU time and if you really get 500 requests at a time then well, it would mean each core has to generate more than 100 images => good luck :)
Maybe you can implement some kind of caching logic, that could reduce the amount of processing power needed a lot I guess.
Also: max-worker doesn't really help with overloaded backends. In some cases it can help with a huge number of requests for static files.
So don't use that.

Hope this helps you a bit.

RE: cgi Children forking children? - Added by Olaf-van-der-Spek over 14 years ago

Morgon wrote:

Unfortunately, as I mentioned before, the nature of the application requires a large number of processes on-hand to deal with spikes.

I don't think that implies requiring a huge number of PHP engines.

Since this initial post, I brought it down to 4 * 5 * 35 (700), and I'm still running into issues. Too low, and I get idle CPU but non-responsive lighty due to so many unhandled requests.

What issues?

Even removing the max-worker continues to bring issues, and unfortunately I can't seem to tell the difference between running too many processes, and hitting 100% CPU just trying to handle all of the incoming

Watch the I/O due to paging/swapping. You should have (almost) none.

connections. Is there a good way to find this out? For example, looking at the Server Status page .. out of 588 simultaneous connections, 546 are in the 'handle request' state. Is that lighty 'handling' the connection, or PHP doing work? I haven't found any documentation that specifically clarifies this.

What's Lighttpd CPU usage?

BTW, isn't this a PHP issue?

RE: cgi Children forking children? - Added by Morgon over 14 years ago

I don't think it's PHP. This may even be a kernel issue - at this moment, it's the only major thing that differs. [Update: Reverting back to PHP 5.2.3 and lighty 1.4.18 still reveals the odd process tree on 2.6.18-92, so that's looking even more likely]

I have a bank of machines that have the same software. They've recently been updated to '2.6.18-128.2.1.el5 #1 SMP' (2.6.18-128). However, the machine where this weird process tree came from is stuck at 2.6.18-92 (I've updated it to 2.6.18-128 but for some reason grub won't 'take' and I don't have physical access to the machine to figure it out).

On the other hand, I launched an instance of my app on an older machine. This one runs lighty 1.4.18 and PHP 5.2.3, along with much of the same type of configuration as I mentioned in my initial post. It was never my speed demon (2GB of RAM, dual Xeon 2.4GHz CPUs), but it's handling about 30 requests/sec with only a minor slowdown in what I'm used to.

icy - Thanks for your feedback. I do have heavy caching in my app. While 'performance' isn't a new issue (as I've been doing this for about 4 years), I have been running on my current setup for about a year, pushing anywhere between 12 - 15 million images per day, with really no issues until now.
I wasn't specifically trying to blame lighty, but I do consider it a factor and also wanted to use this time to try to get a better understanding of the aforementioned configuration settings.

Olaf - I've gone through many changes over the past few years, and I settled on having a decent amount of PHP engines standing by. You quoted 2400 in your first post, but see, it never actually spawns 2400 processes. That's one thing that is confusing me - how to establish the number of backend processes. On my 'newer' machines that are having problems, I have 60 and 80 processes, respectively.

On the 'older' machine that is holding up pretty well, I have 75. That older machine also runs max-worker=4, max-procs => 6, and PHP_FCGI_CHILDREN => 24 with very little slowdown, and it certainly doesn't hit max-connections like the newer ones do.

There's no swapping, I still have 2.8G (mostly buffered) RAM left, so I don't forsee that happening.
Actual CPU usage of /lighty/ is minimal, however the entire system is currently pegged at 100%. The older machine I brought up has a decent amount of idle, even with the config it has.

--

On a more general note, this is really concerning me:
2009-07-30 11:20:29: (server.c.1383) [note] sockets disabled, connection limit reached
2009-07-30 11:20:32: (server.c.1383) [note] sockets disabled, connection limit reached
2009-07-30 11:20:33: (server.c.1337) [note] sockets enabled again
2009-07-30 11:20:35: (server.c.1383) [note] sockets disabled, connection limit reached

How is it that it disables sockets twice before it enables them again? Could this be contributing to resource issues in any fashion at all, especially since I'm getting so many of these messages in my error log?

RE: cgi Children forking children? - Added by nitrox over 14 years ago

Morgon wrote:

On a more general note, this is really concerning me:
2009-07-30 11:20:29: (server.c.1383) [note] sockets disabled, connection limit reached
2009-07-30 11:20:32: (server.c.1383) [note] sockets disabled, connection limit reached
2009-07-30 11:20:33: (server.c.1337) [note] sockets enabled again
2009-07-30 11:20:35: (server.c.1383) [note] sockets disabled, connection limit reached

How is it that it disables sockets twice before it enables them again? Could this be contributing to resource issues in any fashion at all, especially since I'm getting so many of these messages in my error log?

again, max-worker?

RE: cgi Children forking children? - Added by Morgon over 14 years ago

I am still having a lot of trouble with this very odd process behavior.

I've upgraded my kernel to 2.6.18-128 (CentOS 5.3), and I still see the processes forking multiple threads (at least that's what I've read the processes in {braces} are, am I correct?). No other machine in my datacenter has this, even though most of them share the same kernel.
I've even downgraded to PHP 5.2.3 and lighty 1.4.18 (the same as some older machines that run very well) to try to mitigate any outside forces.

Is there any way I can figure out why it's doing this? strace?

Thanks for any insight you can provide.

        ??lighttpd(5797)???cronolog(5873)
        ?                ??cronolog(5874)
        ?                ??cronolog(5875)
        ?                ??cronolog(5876)
        ?                ??php-cgi(5798)???php-cgi(5800)???{php-cgi}(6079)
        ?                ?               ?               ??{php-cgi}(6080)
        ?                ?               ?               ??{php-cgi}(6081)
        ?                ?               ??php-cgi(5801)???{php-cgi}(5930)
        ?                ?               ?               ??{php-cgi}(5931)
        ?                ?               ?               ??{php-cgi}(5932)
        ?                ?               ??php-cgi(5802)???{php-cgi}(6115)
        ?                ?               ?               ??{php-cgi}(6116)
        ?                ?               ?               ??{php-cgi}(6117)
        ?                ?               ??php-cgi(5803)???{php-cgi}(6025)
        ?                ?               ?               ??{php-cgi}(6026)
        ?                ?               ?               ??{php-cgi}(6027)
        ?                ?               ??php-cgi(5804)???{php-cgi}(6060)
        ?                ?                               ??{php-cgi}(6061)
        ?                ?                               ??{php-cgi}(6062)

RE: cgi Children forking children? - Added by stbuehler over 14 years ago

php creates threads -> not a lighttpd problem. you could check your php extensions.

RE: cgi Children forking children? - Added by icy over 14 years ago

From the pstree manpage:

Child  threads  of a process are found under the parent process and are
shown with the process name in curly braces

So those are threads inside the php children.

    (1-12/12)