Bug #604
closedEINTR not check, rrdtool-read: failed Interrupted system call (stopped updating rrd)
Description
machine env: almost 0 traffic/request, full cpu usage, disk io busy.
i don't have the strace when it's stopping.
2006-03-27 17:23:26: (src/log.c.75) server started 2006-03-27 18:22:00: (src/mod_rrdtool.c.398) rrdtool-read: failed Interrupted system call 2006-03-27 18:22:00: (src/server.c.1085) one of the triggers failed 2006-03-27 21:26:39: (src/log.c.135) server stopped (manually) 2006-03-27 21:26:42: (src/log.c.75) server started 2006-03-28 19:19:00: (src/mod_rrdtool.c.398) rrdtool-read: failed Interrupted system call 2006-03-28 19:19:00: (src/server.c.1085) one of the triggers failed
# strace -p `pidof rrdtool` Process 11484 attached - interrupt to quit read(0, <unfinished ...> Process 11484 detached (CTRL+C)
# strace -p `pidof lighttpd` Process 11480 attached - interrupt to quit time(NULL) = 1143634917 epoll_wait(8, {}, 10231, 1000) = 0 time(NULL) = 1143634918 epoll_wait(8, {}, 10231, 1000) = 0 time(NULL) = 1143634919 epoll_wait(8, {}, 10231, 1000) = 0 time(NULL) = 1143634920 epoll_wait(8, <unfinished ...> Process 11480 detached (CTRL+C)
i guess it take more than 1 seconds to read() in mod_rrdtool.c because rrdtool take some time to update data to disk, as disk io is already heavy busy. and another lighttpd trigger/alarm kill the read() in progress.
Files
Updated by moo over 18 years ago
i'm sure that, the code failed to check the value (r) returned by read()/write(), and wrongly think as "rrdtool is quiting with error.", due to EINTR, simply check r == EINTR, and do something right: don't do p->rrdtool_running = 0 at least. i have no glue to make a best patch.
Updated by Anonymous over 16 years ago
Well, the server doesn't die now but still stops updating the rrd until restart. Would be nice to try again, wouldn't it?
-- grin
Updated by bestis almost 16 years ago
With 1.4.20 getting this. The server is quite loaded sometimes.
2008-12-09 09:46:00: (mod_rrdtool.c.401) rrdtool-read: failed Interrupted system call
2008-12-09 09:46:00: (server.c.1187) one of the triggers failed
It isn't nice that the rrdtool graphs just stop working with one failure.
I would also like to see somekind of retry on this or more time.
Updated by stbuehler almost 16 years ago
- Patch available set to No
You are free to provide patches :)
If you don't like that you can just poll mod-status pages (works better for remote anyway)
Updated by bestis almost 16 years ago
stbuehler wrote:
You are free to provide patches :)
Well, Here's the first try.
Based on that in mod_rrdtool_create_rrd those doensn't change rrdtool_running value, and if there
those fails mod_rrdtool tries again in one minute (noticed this when I didn't have rrdtool
installed).
So if those fails don't disable mod_rrdtool. Left disabling to if response failed, but if
write/read fails let's try again.
Updated by woods almost 16 years ago
I see the same problem on NetBSD-4.
EINTR should always result in a retry of the failed system call (or other operation), not a "fatal" error.
Other errors probably should remain fatal, so the patch suggested by "bestis" isn't ideal.
Updated by stbuehler almost 16 years ago
- Category changed from core to mod_rrdtool
Updated by stbuehler almost 16 years ago
- Status changed from New to Fixed
- % Done changed from 0 to 100
Applied in changeset r2400.
Updated by moo almost 16 years ago
stbuehler wrote:
Applied in changeset r2400.
can u pls merge it to trunk?
Also available in: Atom