Project

General

Profile

Actions

Bug #604

closed

EINTR not check, rrdtool-read: failed Interrupted system call (stopped updating rrd)

Added by moo about 18 years ago. Updated about 15 years ago.

Status:
Fixed
Priority:
Normal
Category:
mod_rrdtool
Target version:
-
ASK QUESTIONS IN Forums:

Description

machine env: almost 0 traffic/request, full cpu usage, disk io busy.

i don't have the strace when it's stopping.


2006-03-27 17:23:26: (src/log.c.75) server started 
2006-03-27 18:22:00: (src/mod_rrdtool.c.398) rrdtool-read: failed Interrupted system call 
2006-03-27 18:22:00: (src/server.c.1085) one of the triggers failed 
2006-03-27 21:26:39: (src/log.c.135) server stopped
(manually) 
2006-03-27 21:26:42: (src/log.c.75) server started 
2006-03-28 19:19:00: (src/mod_rrdtool.c.398) rrdtool-read: failed Interrupted system call 
2006-03-28 19:19:00: (src/server.c.1085) one of the triggers failed

# strace -p `pidof rrdtool`
Process 11484 attached - interrupt to quit
read(0,  <unfinished ...>
Process 11484 detached (CTRL+C)

# strace -p `pidof lighttpd`
Process 11480 attached - interrupt to quit
time(NULL)                              = 1143634917
epoll_wait(8, {}, 10231, 1000)          = 0
time(NULL)                              = 1143634918
epoll_wait(8, {}, 10231, 1000)          = 0
time(NULL)                              = 1143634919
epoll_wait(8, {}, 10231, 1000)          = 0
time(NULL)                              = 1143634920
epoll_wait(8,  <unfinished ...>
Process 11480 detached (CTRL+C)

i guess it take more than 1 seconds to read() in mod_rrdtool.c because rrdtool take some time to update data to disk, as disk io is already heavy busy. and another lighttpd trigger/alarm kill the read() in progress.


Files

mod_rrdtool-persistent.patch (759 Bytes) mod_rrdtool-persistent.patch mod_rrdtool to be more persistent when updating datafile bestis, 2008-12-22 09:41

Related issues 1 (0 open1 closed)

Has duplicate Bug #1883: mod_rrdtool regularly fail in lighttpd 1.4.20Fixed2009-01-30Actions
Actions #1

Updated by moo almost 18 years ago

i'm sure that, the code failed to check the value (r) returned by read()/write(), and wrongly think as "rrdtool is quiting with error.", due to EINTR, simply check r == EINTR, and do something right: don't do p->rrdtool_running = 0 at least. i have no glue to make a best patch.

Actions #2

Updated by Anonymous about 16 years ago

Well, the server doesn't die now but still stops updating the rrd until restart. Would be nice to try again, wouldn't it?

-- grin

Actions #3

Updated by bestis over 15 years ago

With 1.4.20 getting this. The server is quite loaded sometimes.

2008-12-09 09:46:00: (mod_rrdtool.c.401) rrdtool-read: failed Interrupted system call
2008-12-09 09:46:00: (server.c.1187) one of the triggers failed

It isn't nice that the rrdtool graphs just stop working with one failure.
I would also like to see somekind of retry on this or more time.

Actions #4

Updated by stbuehler over 15 years ago

  • Patch available set to No

You are free to provide patches :)

If you don't like that you can just poll mod-status pages (works better for remote anyway)

Actions #5

Updated by bestis over 15 years ago

stbuehler wrote:

You are free to provide patches :)

Well, Here's the first try.

Based on that in mod_rrdtool_create_rrd those doensn't change rrdtool_running value, and if there
those fails mod_rrdtool tries again in one minute (noticed this when I didn't have rrdtool
installed).

So if those fails don't disable mod_rrdtool. Left disabling to if response failed, but if
write/read fails let's try again.

Actions #6

Updated by woods about 15 years ago

I see the same problem on NetBSD-4.

EINTR should always result in a retry of the failed system call (or other operation), not a "fatal" error.

Other errors probably should remain fatal, so the patch suggested by "bestis" isn't ideal.

Actions #7

Updated by stbuehler about 15 years ago

  • Category changed from core to mod_rrdtool
Actions #8

Updated by stbuehler about 15 years ago

  • Status changed from New to Fixed
  • % Done changed from 0 to 100

Applied in changeset r2400.

Actions #9

Updated by moo about 15 years ago

stbuehler wrote:

Applied in changeset r2400.

can u pls merge it to trunk?

Actions

Also available in: Atom