Bug #2144
closedcgic script dies on/after execution leaving zombie processes
Description
Hello,
I'm running lighttpd/1.4.25 with mod_cgi and mod_auth. cgiC scripts are doing some data fetching from a DB (sqlite3) in background.
server.modules += ( "mod_auth" ) auth.backend = "htdigest" auth.backend.htdigest.userfile = "/etc/.passwd" auth.debug = 2 auth.require = ( "/cshop/" => ( "method" => "digest", "realm" => "Authorized users only", "require" => "valid-user" ) ) server.modules += ( "mod_cgi" ) $HTTP["remoteip"] =~ "127.0.0.1" { alias.url += ( "/cgi-bin/" => "/usr/lib/cgi-bin/" ) $HTTP["url"] =~ "^/cgi-bin/" { cgi.assign = ( ".cgi" => "" ) } } $HTTP["url"] =~ "^/cgi-bin/" { cgi.assign = ( ".cgi" => "" ) } cgi.assign = ( ".pl" => "/usr/bin/perl", ".php" => "/usr/bin/php-cgi", ".py" => "/usr/bin/python", ".sh" => "/bin/sh", ".cgi" => "", )
Right in the shell, i can execute the cgiC scripts without any issue so i can confirm the scripts should be fine!
root@subZero:/etc/lighttpd# lighttpd -v lighttpd/1.4.25 - a light and fast webserver Build-Date: Jan 4 2010 09:46:10 root@subZero:/etc/lighttpd#
When the same cgiC script is executed by lighttpd i sea the script die and leave a zombie process (not everytime!!! <- this needs to be confirmed)... like this one!
tculjaga@subZero:~$ ps aux | grep lines_view | grep -v grep root 24209 0.0 0.0 0 0 pts/0 Z+ 23:44 0:00 [lines_view.cgi] <defunct> tculjaga@subZero:~$
...of course, every time the script dies leaving a zombie process, a memory leak happens on lighttpd process!
Going through the cgiC script code, lighttpd config, lighttpd debug logs, i was unable to pinpoint the root cause of this.
please find attached the stracelog (lighttpd.trace) and please help because im really getting crazy.
P.S.: I've tested on ubuntu and on blackfin appliance as well ... the issue seems to be the same... Anyhow here is the distro i reproduced the issue:
tculjaga@subZero:~$ uname -a Linux subZero 2.6.28-15-generic #52-Ubuntu SMP Wed Sep 9 10:49:34 UTC 2009 i686 GNU/Linux tculjaga@subZero:~$ cat /etc/issue Ubuntu 9.04 \n \l
Files
Updated by chenta almost 15 years ago
- File server.c.diff server.c.diff added
I encounter the same situation when I running my fastcgi program.
This problem is due to that we daemonize the lighttpd AFTER we initialize plugins. Therefore, the cgi/fcgi processes forked by mod_cgi/mod_fastcgi are still possible to become zombie processes.
Beside the order of invoking daemonize function in server.c, we also forget to ignore SIG_CHLD signal after the first fork, and it is the reason why the "fork twice" trick did not work.
The patch is for lighttpd-1.4.25, I did the following changes:
1. Invoke daemonize() before invoking plugins_call_init(srv).
2. Ignore SIG_CHLD after the first fork().
2. Do not handle SIG_CHLD signal after daemonize lighttpd.
Updated by stbuehler almost 15 years ago
cgi processes are certainly not forked while initializing plugins. And processes forked before daemonize() should not become zombie processes; the init process will become their new parent, and init should take care of them.
And fork() doesn't change the signal handlers anyway.
Btw: your patch is unreadable, there is no context. try diff -u.
Updated by stbuehler almost 15 years ago
- Status changed from New to Need Feedback
- Priority changed from High to Normal
I had a look at the strace too now: every clone i could find had a matching successful waitpid - I don't think there was any zombie left.
$ grep 24209 lighttpd.trace 23:44:39.620050 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7ea9708) = 24209 23:44:39.620825 waitpid(24209, 0xbff6ff78, WNOHANG) = 0 23:44:39.634836 waitpid(24209, 0xbff6ff78, WNOHANG) = 0 23:44:39.637434 waitpid(24209, 0xbff703c8, WNOHANG) = 0 23:44:39.637510 kill(24209, SIGTERM) = 0 23:44:40.621557 waitpid(24209, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 24209
Updated by chenta almost 15 years ago
- File server.c.diff server.c.diff added
- File echo-x.c echo-x.c added
I was wrong about the daemonize order. However, I still think we should ignore the SIG_CHLD signal after the first fork, it should be part of the fork twice trick. After I did that, there will be no more zombie process running.
I wrote a simple program to reproduce this problem, the program do the following things:
1. accept FCGI request
2. wait for 5 seconds
3. send http status code 500
4. terminate the FCGI program
On the client side, you just need to keep sending the request to that FCGI program.
The attachments are the new patch and the sample program to create zombie FCGI process.
Updated by stbuehler almost 15 years ago
- Status changed from Need Feedback to Invalid
I still see no reason to change anything. Ignoring SIGCHLD has nothing to do with the FastCGI zombies; the reason you are seeing zombie FastCGI processes is that your FastCGI process "crashed" and that lighty didn't need the backend. As soon as lighttpd tries to reach the backend, it will see it is down, call waitpid() (which "kills" the zombie) and fork a new one (if needed).
I don't think these zombie processes are a problem, so everything is fine.
If you don't like them, use runit/daemontools + spawn-fcgi instead.
Also available in: Atom