Bug #1993
closedspawn-fcgi crashes system totally on FreeBSD
Description
spawn-cgi (spawn-fcgi v1.6.2 (ipv6)) manages to crash whole FreeBSD-system totally. Tested reproducable today on 2 different systems (7.1-RELEASE and 7.2-RELEASE).
Restart of the systems was only possible with a cold-reboot.
Both systems run spawn-cgi under control of daementools, the run-script:
#!/bin/sh SOCKET=/usr/home/ml/webpy/workdir/.fb.sock exec 2>&1 export PHP_FCGI_CHILDREN=1 exec softlimit -m 50000000 /usr/local/bin/spawn-fcgi -n \ -u ml -U www -s $SOCKET -M 0660 \ -- /usr/home/ml/webpy/code.py
The corresponding part from lighttpd.conf:
## The following is used for a separate fastcgi-process fastcgi.server = ("/code.py" => ("fastcgi.backend.code.py" => ("socket" => "/usr/home/ml/webpy/workdir/.fb.sock", "check-local" => "disable") )) fastcgi.debug = 1 url.rewrite-once = ( "^/py/(.*)$" => "/code.py/$1", )
code.py is a simple script based on web.py and offers i.e. the possibility to upload files.
Whenever I upload a file with more than ~25kB the receiving process (code.py) on the other side of the socket does not finish instead it looks like it consumes really ALL resources.
I'm really unsure which further informations I could provide but as far as this problem exists I can not use lighty any longer. The problem also occurs when I do not split lighty and spawn-fcgi as described above.
I tested the code also on a linux-box with an older spawn-fcgi-1.4.19 - there the problem did not occur (but under certain at the moment not reproducable circumstances the process also consumed a high amount of resources).
Cheers, Martin
Updated by icy almost 16 years ago
spawn-fcgi doesn't actually do any request handling, it just sets up the fcgi processes.
So if your python app consumes all resources then the bug is in it and not spawn-fcgi
Updated by bettercom almost 16 years ago
spawn-fcgi sets up the socket and controls it, doesn't it?
Please do not argue in such a superficial way - I could reproduce the problem about 10 times today on 2 stable systems. Running the app directly (web.py offers an own wsgi-compliant web-server) doesn't cause any problems.
Updated by icy almost 16 years ago
spawn-fcgi creates the socket and hands control over to the python app.
At the time of a request, spawn-fcgi literally doesn't exist anymore as a process.
Updated by bettercom almost 16 years ago
icy: After reproducing the problem at least one more time without spawn-fcgi I must agree.
So please take my apologies for this false bug-report.
Cheers
Martin
Updated by bettercom almost 16 years ago
Additional info (also if spawn-fcgi is not affected directly):
The problem does NOT occur when instead of using file-sockets
the packets are transmitted via network. Then everything works
as expected. Is this perhaps a bug in lighty?
Updated by darix almost 16 years ago
well. if you could tell which app is bringing your server down we could tell. maybe start "top" on a terminal and let it sort by memory usage. then start an upload and watch what happens.
Updated by bettercom almost 16 years ago
darix wrote:
well. if you could tell which app is bringing your server down we could tell.
It's a very simple app based on http://webpy.org/ where I tried to test
http://webpy.org/cookbook/fileupload locally.
But the quality of the app does IMO not affect the problem in such a way.
Taking in mind the concept and via which kind of socket they communicate
AF_INET AF_UNIX Client >---------< Lighttpd >---------< FCGI-App
it is not acceptable that lighty may crash a whole system also if lighty
gets nothing (or shit) from the client and/or the FCGI-App. But as I said:
this only happens if lighty talks via an AF_UNIX-socket to the FCGI-App
started by spawn-fcgi (at least on a FreeBSD-system).
BTW I tried the app with another, very, very simple fcgi-capable http-server
(http://gist.github.com/119795) and had no problems so I'm really sure that
this is a lighty-issue. See also
http://groups.google.com/group/webpy/browse_thread/thread/18b09416f1d6df41/81bee637963e5a9c
maybe start "top" on a terminal and let it sort by memory usage. then start an upload and watch what happens.
No chance: At the moment I start the upload the whole system freezed
totally - no update of a screen, no reaction on keyboard-input... Only
a hard-reset helped.
Unfortunately webpy recommends using file-sockets with lighttpd but I
proposed to change that to INET-sockets as long this issue is not cleared.
Cheers, Martin
Updated by icy almost 16 years ago
If the box really completely freezes, then the bug is in FreeBSD and just triggered by lighty.
Under normal circumstances, a userland app shouldn't be able to freeze a system.
Would be interesting to find out what really happens.
Updated by bettercom almost 16 years ago
icy wrote:
If the box really completely freezes, then the bug is in FreeBSD and just triggered by lighty.
ACK - no question should FreeBSD handle broken apps in a smarter way. But AFAIR there had been
some issues with broken or dereferenced pointers (null-pointers) to file-sockets in the past and
I assume at the moment that this is the problem but did not have a look into the lighty-code
until now.
Under normal circumstances, a userland app shouldn't be able to freeze a system.
Would be interesting to find out what really happens.
Indeed. As I mentioned earlier I had tested this also on a linux-box where the system did not
crash but top showed a very fast rising consumption of CPU and memory but there I had the
chance to terminate all relevant processes (IIRC at a VMEM-usage of more than 500MB...).
Updated by icy almost 16 years ago
bettercom wrote:
But AFAIR there had been some issues with broken or dereferenced pointers (null-pointers) to file-sockets in the past
There is no such thing as pointers to file-sockets. Files are referenced through file descriptors which or just integers so something like a NULL-pointer can't exist.
Indeed. As I mentioned earlier I had tested this also on a linux-box where the system did not
crash but top showed a very fast rising consumption of CPU and memory but there I had the
chance to terminate all relevant processes (IIRC at a VMEM-usage of more than 500MB...).
What process is consuming the resources? Lighty or the python app?
The relevant unit for ram usage is the resident size (RS).
Can you provide complete and detailed information how to reproduce the mentioned behaviour under linux?
Also available in: Atom