Bug #934
closedlighttpd 1.4.13 crashes under PHP load
Description
I am running lighttpd 1.4.13, using FastCGI PHP.
Periodically I see two different failure behaviors.
The first is a server crash. I have attached several valgrind dumps for this. (Note also that there is a problem with lighttpd's use of setgroups when the group has many users in it).
The second may be related - the PHP FastCGI connection becomes "clogged", i.e. connections do not close out and the FastCGI interface rapidly runs out of connections to the PHP server.
This same lighttpd server handles a high static file serving load without any trouble - I encounter the crashing and FastCGI failure behaviors when running PHP scripts on it.
PHP is 5.2.0 FastCGI.
Also attached is the lighttpd configuration file.
-- jb
Files
Updated by Anonymous over 18 years ago
This part turned out to be a problem with nss_ldap which I corrected by installing the latest nss_ldap from source.
I still have the PHP issues.
--
(Note also that there is a problem with lighttpd's use of setgroups when the group has many users in it).
Updated by Anonymous over 18 years ago
The PHP problem seems to be related to some PHP scripts' use of the php "mail()" function.
mail() forks and execs a /bin/sh which is used to run /usr/bin/sendmail. Something about this appears to be confusing / hanging up PHP or the fastcgi interface.
When the problem behavior occurs, I see "sh" processes in the ps list. If I manually kill these sh processes the FastCGI load starts to come down.
Updated by Anonymous about 18 years ago
Ok, it turns out this was an extremely obscure issue between different version NFS server and NFS client. PHP opened a session file on our NFS server, which is quite a bit older than the NFS client. Apparently there is some incompatibility between these versions which was causing the strange FLocK behavior.
I moved the PHP sessions to a same-kernel-version NFS server and the problem is cured.
You can close this ticket.
Updated by darix about 18 years ago
ok ... i still dont see a relation how lighty can crash on bwarfed php scripts o.O
Updated by Anonymous about 18 years ago
The behavior was odd. It was positively triggered when PHP opened a session file and did fcntl(LOCK_EX) on it. The linux kernel would hang when cloning that file descriptor as part of the fork/exec to launch a child (e.g. when sending mail). So there would be a /bin/bash process in sleep state, with no memory allocated to it yet. An strace on that process would wake it up and resume normal execution (it got a SIGSTOP and SIGCONT pair which got it past whatever system call was deadlocked.)
Now, when this occurred, PHP would also stop properly handling new incoming requests. The whole FastCGI engine got gummed up, new FCGI requests would start backing up rapidly (fastcgi.active-requests and fastcgi.load would grow rapidly). If I didn't clean out the stuck /bin/bash child fast enough, eventually lighty/php would get into a state where I would be forced to kill lighty+php and restart it. If it got to this state, it would not start serving FCGI/PHP again even if I did kill the /bin/bash child. I don't know if the FCGI state was confused on the lighty, or the PHP side. It's also possible that it is something else in the PHP/FCGI code deadlocking on the same NFS problem, but in a different way. I really don't know.
I had assumed earlier, that the problem was lighty's FCGI processing becoming confused as a result of something PHP or its child was doing. Now I know the problems were really in PHP/linux kernel.
Updated by darix about 18 years ago
- Status changed from New to Fixed
- Resolution set to invalid
1. the group issue should be solved in 1.4.14 (to be released)
2. your valgrind crashed and not lighty.
3. this is not the place to discuss php nfs locking issues.
that said.... closing. ;)
Also available in: Atom