Bug #604
EINTR not check, rrdtool-read: failed Interrupted system call (stopped updating rrd)
| Status: | Fixed | Start date: | ||
|---|---|---|---|---|
| Priority: | Normal | Due date: | ||
| Assignee: | jan | % Done: | 100% | |
| Category: | mod_rrdtool | |||
| Target version: | - | |||
| Missing in 1.5.x: |
Description
machine env: almost 0 traffic/request, full cpu usage, disk io busy.
i don't have the strace when it's stopping.
2006-03-27 17:23:26: (src/log.c.75) server started 2006-03-27 18:22:00: (src/mod_rrdtool.c.398) rrdtool-read: failed Interrupted system call 2006-03-27 18:22:00: (src/server.c.1085) one of the triggers failed 2006-03-27 21:26:39: (src/log.c.135) server stopped (manually) 2006-03-27 21:26:42: (src/log.c.75) server started 2006-03-28 19:19:00: (src/mod_rrdtool.c.398) rrdtool-read: failed Interrupted system call 2006-03-28 19:19:00: (src/server.c.1085) one of the triggers failed
# strace -p `pidof rrdtool` Process 11484 attached - interrupt to quit read(0, <unfinished ...> Process 11484 detached (CTRL+C)
# strace -p `pidof lighttpd`
Process 11480 attached - interrupt to quit
time(NULL) = 1143634917
epoll_wait(8, {}, 10231, 1000) = 0
time(NULL) = 1143634918
epoll_wait(8, {}, 10231, 1000) = 0
time(NULL) = 1143634919
epoll_wait(8, {}, 10231, 1000) = 0
time(NULL) = 1143634920
epoll_wait(8, <unfinished ...>
Process 11480 detached (CTRL+C)
i guess it take more than 1 seconds to read() in mod_rrdtool.c because rrdtool take some time to update data to disk, as disk io is already heavy busy. and another lighttpd trigger/alarm kill the read() in progress.
Related issues
History
#1 Updated by moo about 7 years ago
i'm sure that, the code failed to check the value (r) returned by read()/write(), and wrongly think as "rrdtool is quiting with error.", due to EINTR, simply check r == EINTR, and do something right: don't do p->rrdtool_running = 0 at least. i have no glue to make a best patch.
#2 Updated by Anonymous about 5 years ago
Well, the server doesn't die now but still stops updating the rrd until restart. Would be nice to try again, wouldn't it?
-- grin
#3 Updated by bestis over 4 years ago
With 1.4.20 getting this. The server is quite loaded sometimes.
2008-12-09 09:46:00: (mod_rrdtool.c.401) rrdtool-read: failed Interrupted system call
2008-12-09 09:46:00: (server.c.1187) one of the triggers failed
It isn't nice that the rrdtool graphs just stop working with one failure.
I would also like to see somekind of retry on this or more time.
#4 Updated by stbuehler over 4 years ago
- Patch available set to No
You are free to provide patches :)
If you don't like that you can just poll mod-status pages (works better for remote anyway)
#5 Updated by bestis over 4 years ago
- File mod_rrdtool-persistent.patch
added
stbuehler wrote:
You are free to provide patches :)
Well, Here's the first try.
Based on that in mod_rrdtool_create_rrd those doensn't change rrdtool_running value, and if there
those fails mod_rrdtool tries again in one minute (noticed this when I didn't have rrdtool
installed).
So if those fails don't disable mod_rrdtool. Left disabling to if response failed, but if
write/read fails let's try again.
#6 Updated by woods over 4 years ago
I see the same problem on NetBSD-4.
EINTR should always result in a retry of the failed system call (or other operation), not a "fatal" error.
Other errors probably should remain fatal, so the patch suggested by "bestis" isn't ideal.
#7 Updated by stbuehler over 4 years ago
- Category changed from core to mod_rrdtool
#8 Updated by stbuehler over 4 years ago
- Status changed from New to Fixed
- % Done changed from 0 to 100
Applied in changeset r2400.
#9 Updated by moo over 4 years ago
Also available in: Atom