Bug #604

EINTR not check, rrdtool-read: failed Interrupted system call (stopped updating rrd)

Added by moo over 8 years ago. Updated over 5 years ago.

Status:FixedStart date:
Priority:NormalDue date:
Assignee:jan% Done:

100%

Category:mod_rrdtool
Target version:-
Missing in 1.5.x:

Description

machine env: almost 0 traffic/request, full cpu usage, disk io busy.

i don't have the strace when it's stopping.


2006-03-27 17:23:26: (src/log.c.75) server started 
2006-03-27 18:22:00: (src/mod_rrdtool.c.398) rrdtool-read: failed Interrupted system call 
2006-03-27 18:22:00: (src/server.c.1085) one of the triggers failed 
2006-03-27 21:26:39: (src/log.c.135) server stopped
(manually) 
2006-03-27 21:26:42: (src/log.c.75) server started 
2006-03-28 19:19:00: (src/mod_rrdtool.c.398) rrdtool-read: failed Interrupted system call 
2006-03-28 19:19:00: (src/server.c.1085) one of the triggers failed

# strace -p `pidof rrdtool`
Process 11484 attached - interrupt to quit
read(0,  <unfinished ...>
Process 11484 detached (CTRL+C)

# strace -p `pidof lighttpd`
Process 11480 attached - interrupt to quit
time(NULL)                              = 1143634917
epoll_wait(8, {}, 10231, 1000)          = 0
time(NULL)                              = 1143634918
epoll_wait(8, {}, 10231, 1000)          = 0
time(NULL)                              = 1143634919
epoll_wait(8, {}, 10231, 1000)          = 0
time(NULL)                              = 1143634920
epoll_wait(8,  <unfinished ...>
Process 11480 detached (CTRL+C)

i guess it take more than 1 seconds to read() in mod_rrdtool.c because rrdtool take some time to update data to disk, as disk io is already heavy busy. and another lighttpd trigger/alarm kill the read() in progress.

mod_rrdtool-persistent.patch Magnifier - mod_rrdtool to be more persistent when updating datafile (759 Bytes) bestis, 2008-12-22 09:41


Related issues

Duplicated by Bug #1883: mod_rrdtool regularly fail in lighttpd 1.4.20 Fixed 2009-01-30

Associated revisions

Revision 2400
Added by stbuehler over 5 years ago

Handle EINTR in mod_rrdtool (fixes #604)

Revision 2416
Added by stbuehler over 5 years ago

Port some mod_rrdtool fixes from 1.4.x (#604, #419 and more)

History

#1 Updated by moo over 8 years ago

i'm sure that, the code failed to check the value (r) returned by read()/write(), and wrongly think as "rrdtool is quiting with error.", due to EINTR, simply check r == EINTR, and do something right: don't do p->rrdtool_running = 0 at least. i have no glue to make a best patch.

#2 Updated by Anonymous over 6 years ago

Well, the server doesn't die now but still stops updating the rrd until restart. Would be nice to try again, wouldn't it?

-- grin

#3 Updated by bestis over 5 years ago

With 1.4.20 getting this. The server is quite loaded sometimes.

2008-12-09 09:46:00: (mod_rrdtool.c.401) rrdtool-read: failed Interrupted system call
2008-12-09 09:46:00: (server.c.1187) one of the triggers failed

It isn't nice that the rrdtool graphs just stop working with one failure.
I would also like to see somekind of retry on this or more time.

#4 Updated by stbuehler over 5 years ago

  • Patch available set to No

You are free to provide patches :)

If you don't like that you can just poll mod-status pages (works better for remote anyway)

#5 Updated by bestis over 5 years ago

stbuehler wrote:

You are free to provide patches :)

Well, Here's the first try.

Based on that in mod_rrdtool_create_rrd those doensn't change rrdtool_running value, and if there
those fails mod_rrdtool tries again in one minute (noticed this when I didn't have rrdtool
installed).

So if those fails don't disable mod_rrdtool. Left disabling to if response failed, but if
write/read fails let's try again.

#6 Updated by woods over 5 years ago

I see the same problem on NetBSD-4.

EINTR should always result in a retry of the failed system call (or other operation), not a "fatal" error.

Other errors probably should remain fatal, so the patch suggested by "bestis" isn't ideal.

#7 Updated by stbuehler over 5 years ago

  • Category changed from core to mod_rrdtool

#8 Updated by stbuehler over 5 years ago

  • Status changed from New to Fixed
  • % Done changed from 0 to 100

Applied in changeset r2400.

#9 Updated by moo over 5 years ago

stbuehler wrote:

Applied in changeset r2400.

can u pls merge it to trunk?

Also available in: Atom