Bug #1922
closedstderr redirection breaks mod_rrdtool on FreeBSD7
Description
I have a few servers running 1.5, but yesterday i've decided to put 'stable' version (svn rev 2405) to the new server, just to test -- what's the fun using it?
The point is, mod_rrdtool didn't want to work, log file was filling with 'broken pipe' errors and, the strange thing, no rrdtool was running according to ps.
I've found out that that is because of using srv->errorlog_fd with FD_CLOEXEC flag set for stderr of the child process in the mod_rrdtool.c, so i've just commented out stderr redirection code from mod_rrd_create_pipe, but i'm not sure this is the right thing to do, maybe it's more correct just not to set FD_CLOEXEC to log fd?
Updated by stbuehler almost 16 years ago
- Status changed from New to Need Feedback
- Priority changed from Normal to High
I guess we could remove the lines 137-141, as STDERR gets redirected to /dev/null later anyway (line 151).
But i don't see where the broken pipe errors could come from, could you give some more details please?
Updated by dci almost 16 years ago
'Broken pipe' comes, i believe, from the attempt to write data to the rrdtool, which is already dead because of the stderr redirection.
With stderr redirection removed, it works fine without any errors.
Updated by stbuehler almost 16 years ago
I cannot reproduce your problem - rrdtool runs fine for me.
Maybe rrdtool wants to print an error and lighttpd couldn't redirect stderr to /dev/null as it couldn't open the file, so stderr is closed?
I now understand where the broken pipe errors come from (there is nothing wrong with them); but i don't see why rrdtool crashes.
Updated by dci almost 16 years ago
Sorry, my english doesn't allow me to write complicated sentences, so i'll write this way:
1. You right, lighty can't open /dev/null because of chroot, and the whole thing works fine without any modifications both outside the chroot or with regular file located at '/dev/null' in the chroot.
2. 'rrdtool -' produces no stderr output both inside and outside the chroot.
3. With 'dup2(errlog,stderr)' code removed, it works fine inside the chroot with stderr closed two times, by mod_rrdtool.c:135 (maybe this should be removed too?) and openDevNull function.
So i think the whole bug is not because of stderr closed, but because of some system inconsistency after dup2'ing a closed fd and then calling close() on both fds again.
Successful dup2 in openDevNull fixes that inconsistency, but i think you shouldn't dup a closed fd in the first place.
Updated by dci almost 16 years ago
Also it not look like rrdtool crashes as there are no coredumps and/or records in system messages log. It looks like t just gracefully quits.
I'll write a small program to test i/o functionality and will provide you with results.
Updated by dci almost 16 years ago
root@hydrion /home/dci cat ./test.c #include <stdio.h> #include <stdlib.h> int main ( int argc, char * argv [ ] ) { FILE * f = fopen ( "/tmp/t", "w" ); char t [ 8 ]; fprintf ( f, "stdin:%d\n", fread ( t, 1, 8, stdin ) ); fflush ( f ); fprintf ( f, "stdout:%d\n", fwrite ( t, 1, 8, stdout ) ); fflush ( f ); fprintf ( f, "stderr:%d\n", fwrite ( t, 1, 8, stderr ) ); fflush ( f ); fclose ( f ); return 0; } root@hydrion /home/dci gcc -o /home/www/bin/test test.c root@hydrion /home/dci /home/www/bin/test 12345678 1234567812345678root@hydrion /home/dci root@hydrion /home/dci cat /tmp/t stdin:8 stdout:8 stderr:8 root@hydrion /home/dci cat /home/www/etc/lighttpd.conf |grep rrdtool\.binary #rrdtool.binary = "/bin/rrdtool" rrdtool.binary = "/bin/test" root@hydrion /home/dci /home/www/wwwctl.sh start spawn-fcgi: child spawned successfully: PID: 81053 [*] php started [*] lighty started root@hydrion /home/dci cat /home/www/tmp/t stdin:0 stdout:8 stderr:0 root@hydrion /home/dci
It looks like dup2() on the closed fd somehow breaks writer pipe.
Updated by dci almost 16 years ago
And i can't understand how it can be fixed with openDevNull, but somehow it is.
Updated by nitrox almost 16 years ago
rrdtool can be a strange tool :-)
- do u use ntpdate?
- does it work for some time and then break?
- sudo to your lighty user and copy the current file to bla.rrd and try to update it by hand?
Updated by dci almost 16 years ago
Yes, but ntpdate runs once a month.
It works ok when either dup2(errlog,stderr) in not called or stderr is restored later with openDevNull function. Otherwise fork()'ed process gets closed stdin and exits instantly.
I don't know how dup2()'ing a closed fd on stderr can cause a closed stdin and why it can be restored by opening a valid stderr later, but calling dup2 for a closed fd is wrong anyway, like reading a null pointer.
Updated by spaam almost 16 years ago
what version of rrdtool do you use?
on my fbsd 7.1 with mod_rrdtool it works.
Updated by dci almost 16 years ago
Try running chrooted (with /dev/null unavailable).
rrdtool version doesn't matter, even my test program fails to read stdin.
Updated by stbuehler almost 16 years ago
Okay. I guess i now know what the problem is: you really need /dev/null. There is no way around that. We often need /dev/null, and it is important that fd 0 - 2 (stdin, stdout, stderr) are always valid, as the os will reuse them immediatly if they are not.
This means if you just close stdin, the next open()/socket()/pipe() ends on stdin, resulting in a "stdin" that really shouldn't be used as stdin.
I just tried it on my system: i replaced open("/dev/null") with -1, and the errorlog fd became 1. So stdout got closed and rrd-tool cannot send anything to lighty.
Btw: The dup2 is not the problem, neither are the fds closed at that time (FD_CLOEXEC is close on exec, not close on fork).
And the behaviour for closing/dup2ing invalid file descriptors is defined ("do nothing") - so that is not the same as dereferencing a NULL pointer (which results in a SEGV).
I think we still will remove that unneeded piece of code in 1.4.22, so it may work without /dev/null. But that is not a clean solution, you really need /dev/null.
Updated by stbuehler almost 16 years ago
- Status changed from Need Feedback to Fixed
- % Done changed from 0 to 100
Applied in changeset r2407.
Updated by dci almost 16 years ago
Sorry, stbuehler, you are certainly right. I'll replace /dev/null with named pipe then.
Updated by stbuehler almost 16 years ago
I don't think that is a good idea... unless you have a process reading that pipe.
Also available in: Atom