Project

General

Profile

Actions

Bug #2144

closed

cgic script dies on/after execution leaving zombie processes

Added by tculjaga over 14 years ago. Updated over 14 years ago.

Status:
Invalid
Priority:
Normal
Category:
mod_cgi
Target version:
-
ASK QUESTIONS IN Forums:

Description

Hello,

I'm running lighttpd/1.4.25 with mod_cgi and mod_auth. cgiC scripts are doing some data fetching from a DB (sqlite3) in background.

server.modules  += ( "mod_auth" )

auth.backend = "htdigest" 
auth.backend.htdigest.userfile = "/etc/.passwd" 
auth.debug = 2

auth.require = ( "/cshop/" =>
(
"method" => "digest",
"realm" => "Authorized users only",
"require" => "valid-user" 
)
)

server.modules  += ( "mod_cgi" )

$HTTP["remoteip"] =~ "127.0.0.1" {
        alias.url += ( "/cgi-bin/" => "/usr/lib/cgi-bin/" )
        $HTTP["url"] =~ "^/cgi-bin/" {
                cgi.assign = ( ".cgi" => "" )
        }
}

$HTTP["url"] =~ "^/cgi-bin/" {
        cgi.assign = ( ".cgi" => "" )
}

cgi.assign = (
       ".pl"  => "/usr/bin/perl",
       ".php" => "/usr/bin/php-cgi",
       ".py"  => "/usr/bin/python",
       ".sh"  => "/bin/sh",
       ".cgi"  => "",
)

Right in the shell, i can execute the cgiC scripts without any issue so i can confirm the scripts should be fine!

root@subZero:/etc/lighttpd# lighttpd -v
lighttpd/1.4.25 - a light and fast webserver
Build-Date: Jan  4 2010 09:46:10
root@subZero:/etc/lighttpd# 

When the same cgiC script is executed by lighttpd i sea the script die and leave a zombie process (not everytime!!! <- this needs to be confirmed)... like this one!

tculjaga@subZero:~$ ps aux | grep lines_view | grep -v grep
root     24209  0.0  0.0      0     0 pts/0    Z+   23:44   0:00 [lines_view.cgi] <defunct>
tculjaga@subZero:~$ 

...of course, every time the script dies leaving a zombie process, a memory leak happens on lighttpd process!

Going through the cgiC script code, lighttpd config, lighttpd debug logs, i was unable to pinpoint the root cause of this.

please find attached the stracelog (lighttpd.trace) and please help because im really getting crazy.

P.S.: I've tested on ubuntu and on blackfin appliance as well ... the issue seems to be the same... Anyhow here is the distro i reproduced the issue:

tculjaga@subZero:~$ uname -a
Linux subZero 2.6.28-15-generic #52-Ubuntu SMP Wed Sep 9 10:49:34 UTC 2009 i686 GNU/Linux
tculjaga@subZero:~$ cat /etc/issue
Ubuntu 9.04 \n \l

Files

lighttpd.trace.tar.gz (64.6 KB) lighttpd.trace.tar.gz tculjaga, 2010-01-05 23:05
server.c.diff (477 Bytes) server.c.diff chenta, 2010-02-05 14:01
server.c.diff (692 Bytes) server.c.diff chenta, 2010-02-08 04:10
echo-x.c (615 Bytes) echo-x.c chenta, 2010-02-08 04:10
Actions #1

Updated by chenta over 14 years ago

I encounter the same situation when I running my fastcgi program.
This problem is due to that we daemonize the lighttpd AFTER we initialize plugins. Therefore, the cgi/fcgi processes forked by mod_cgi/mod_fastcgi are still possible to become zombie processes.

Beside the order of invoking daemonize function in server.c, we also forget to ignore SIG_CHLD signal after the first fork, and it is the reason why the "fork twice" trick did not work.

The patch is for lighttpd-1.4.25, I did the following changes:
1. Invoke daemonize() before invoking plugins_call_init(srv).
2. Ignore SIG_CHLD after the first fork().
2. Do not handle SIG_CHLD signal after daemonize lighttpd.

Actions #2

Updated by stbuehler over 14 years ago

cgi processes are certainly not forked while initializing plugins. And processes forked before daemonize() should not become zombie processes; the init process will become their new parent, and init should take care of them.

And fork() doesn't change the signal handlers anyway.

Btw: your patch is unreadable, there is no context. try diff -u.

Actions #3

Updated by stbuehler over 14 years ago

  • Status changed from New to Need Feedback
  • Priority changed from High to Normal

I had a look at the strace too now: every clone i could find had a matching successful waitpid - I don't think there was any zombie left.

$ grep 24209 lighttpd.trace
23:44:39.620050 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7ea9708) = 24209
23:44:39.620825 waitpid(24209, 0xbff6ff78, WNOHANG) = 0
23:44:39.634836 waitpid(24209, 0xbff6ff78, WNOHANG) = 0
23:44:39.637434 waitpid(24209, 0xbff703c8, WNOHANG) = 0
23:44:39.637510 kill(24209, SIGTERM)    = 0
23:44:40.621557 waitpid(24209, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 24209

Updated by chenta over 14 years ago

I was wrong about the daemonize order. However, I still think we should ignore the SIG_CHLD signal after the first fork, it should be part of the fork twice trick. After I did that, there will be no more zombie process running.

I wrote a simple program to reproduce this problem, the program do the following things:
1. accept FCGI request
2. wait for 5 seconds
3. send http status code 500
4. terminate the FCGI program

On the client side, you just need to keep sending the request to that FCGI program.

The attachments are the new patch and the sample program to create zombie FCGI process.

Actions #5

Updated by stbuehler over 14 years ago

  • Status changed from Need Feedback to Invalid

I still see no reason to change anything. Ignoring SIGCHLD has nothing to do with the FastCGI zombies; the reason you are seeing zombie FastCGI processes is that your FastCGI process "crashed" and that lighty didn't need the backend. As soon as lighttpd tries to reach the backend, it will see it is down, call waitpid() (which "kills" the zombie) and fork a new one (if needed).

I don't think these zombie processes are a problem, so everything is fine.

If you don't like them, use runit/daemontools + spawn-fcgi instead.

Actions #6

Updated by stbuehler over 14 years ago

  • Target version deleted (1.4.x)
Actions

Also available in: Atom